Publications-Theses
Article View/Open
Publication Export
-
Google ScholarTM
NCCU Library
Citation Infomation
Related Publications in TAIR
題名 總體基因組 Hi-C 接觸圖網絡分析及其重組總體基因組品質預測
Metagenomic Hi-C Contact Map Network Analysis and Prediction of Recovered Metagenome Assembled Genome Quality作者 王神鐸
Serrato, Armando貢獻者 張家銘
Chang, Jia-Ming
王神鐸
Armando Serrato關鍵詞 Hi-C 接觸圖
總體基因體學
總體基因體組裝基因體
網路相關指標
機器學習
生物資料科學
Hi-C Contact Maps
Metagenomics
Metagenome-Assembled genomes
Network Theory
Machine Learning
Bioinformatics日期 2024 上傳時間 4-Oct-2024 10:47:11 (UTC+8) 摘要 近年來,總體基因體學利用 Hi-C 定序數據從複雜的微生物群落中收復總體基因體組裝基因體 (MAGs)。本研究進一步驗證了先前提出的假設,即可以通過網路相關的指標來預測 MAG 質量。我們深入分析了總體基因體Hi-C 接觸圖,提取了額外的網路屬性,並整合了來自群聚基因組的生物信息,以提升預測表現。這種網路與生物屬性相結合的特徵在機器學習模型中的應用,不僅增強了 MAG 質量預測,還提供了對微生物群落動態的見解。
Recent advancements in Metagenomics leverage Hi-C sequencing data to recover Metagenome Assembled Genomes (MAGs) from complex microbial communities. This research advances MAG quality prediction by building upon previous hypotheses that network-based metrics could be used to predict MAG quality. Deeper analysis of metagenomic HI-C contact maps extracts additional network properties and integrates biological information from the clustered genomes, enhancing predictive performance. This combination of network and biological properties used as features in Machine Learning Models, enhances MAG quality prediction and offers insights into microbial community dynamics.參考文獻 Sait M, Hugenholtz P, Janssen PH. Cultivation of globally distributed soil bacteria from phylogenetic lineages previously only detected in cultivation-independent surveys. Environ Microbiol. 2002; 4(11):654–66. Hugenholtz et al. (2008) Metagenomics. Nature, 455, 481–483. Burton et al. (2014) Species-Level Deconvolution of Metagenome Assemblies with Hi-C–Based Contact Probability Maps. G3: GENES, GENOMES, GENETICS, 4, 7. Lieberman-Aiden et al. (2009) Comprehensive mapping of long range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. DeMaere et al. (2019) bin3C: exploiting Hi-C sequencing data to accurately resolve metagenome-assembled genomes. Genome Biology, 20, 46. Cheng et al. (2020) Bin3C_SLM: Deconvoluting metagenomic assemblies via Hi-C connect networks. Stalder, T., Press, M.O., Sullivan, S. et al. Linking the resistome and plasmidome to the microbiome. ISME J 13, 2437–2446 (2019). https://doi.org/10.1038/s41396-019-0446-4 Du, Y., Sun, F. HiCBin: binning metagenomic contigs and recovering metagenome-assembled genomes using Hi-C contact maps. Genome Biol 23, 63 (2022). https://doi.org/10.1186/s13059-022-02626-w Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. 2014. CheckM: Assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research, 25: 1043-1055. Yuting Hsu (2022) The Network Analysis of the metagenomic Hi-C contact map and its downstream metagenome assembly Watts DJ, Strogatz SH. Collective dynamics of ‘small-world’ networks. Nature. 1998;393:440–2 Waltman L, Eck NJ van. A smart local moving algorithm for large-scale modularity-based community detection. European Phys J B. 2013;86:471. Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Statistical Mech Theory Exp. 2008;2008:P10008. Rosvall M, Bergstrom CT. Maps of random walks on complex networks reveal community structure. Proc National Acad Sci. 2008;105:1118–23. Ke Zhang, Chenxi Wang, Liping Sun, Jie Zheng, Prediction of gene co-expression from chromatin contacts with graph attention network, Bioinformatics, Volume 38, Issue 19, October 2022, Pages 4457–4465, https://doi.org/10.1093/bioinformatics/btac535 Ernest YB, Daniel AA. A Review of the Logistic Regression Model with Emphasis on Medical Research. J Data Analysis Information Process. 2019;07:190–207. Hyatt, D., Chen, GL., LoCascio, P.F. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010). https://doi.org/10.1186/1471-2105-11-119 Gao, W., Lin, W., Li, Q. et al. Identification and validation of microbial biomarkers from cross-cohort datasets using xMarkerFinder. Nat Protoc 19, 2803–2830 (2024). https://doi.org/10.1038/s41596-024-00999-9 Scott M. Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 4768–4777. Manchanda, N., Portwood, J.L., Woodhouse, M.R. et al. GenomeQC: a quality assessment tool for genome assemblies and gene structure annotations. BMC Genomics 21, 193 (2020). https://doi.org/10.1186/s12864-020-6568-2 Hunt, M., Kikuchi, T., Sanders, M. et al. REAPR: a universal tool for genome assembly evaluation. Genome Biol 14, R47 (2013). https://doi.org/10.1186/gb-2013-14-5-r47 描述 碩士
國立政治大學
資訊科學系
107753048資料來源 http://thesis.lib.nccu.edu.tw/record/#G0107753048 資料類型 thesis dc.contributor.advisor 張家銘 zh_TW dc.contributor.advisor Chang, Jia-Ming en_US dc.contributor.author (Authors) 王神鐸 zh_TW dc.contributor.author (Authors) Armando Serrato en_US dc.creator (作者) 王神鐸 zh_TW dc.creator (作者) Serrato, Armando en_US dc.date (日期) 2024 en_US dc.date.accessioned 4-Oct-2024 10:47:11 (UTC+8) - dc.date.available 4-Oct-2024 10:47:11 (UTC+8) - dc.date.issued (上傳時間) 4-Oct-2024 10:47:11 (UTC+8) - dc.identifier (Other Identifiers) G0107753048 en_US dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/153914 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊科學系 zh_TW dc.description (描述) 107753048 zh_TW dc.description.abstract (摘要) 近年來,總體基因體學利用 Hi-C 定序數據從複雜的微生物群落中收復總體基因體組裝基因體 (MAGs)。本研究進一步驗證了先前提出的假設,即可以通過網路相關的指標來預測 MAG 質量。我們深入分析了總體基因體Hi-C 接觸圖,提取了額外的網路屬性,並整合了來自群聚基因組的生物信息,以提升預測表現。這種網路與生物屬性相結合的特徵在機器學習模型中的應用,不僅增強了 MAG 質量預測,還提供了對微生物群落動態的見解。 zh_TW dc.description.abstract (摘要) Recent advancements in Metagenomics leverage Hi-C sequencing data to recover Metagenome Assembled Genomes (MAGs) from complex microbial communities. This research advances MAG quality prediction by building upon previous hypotheses that network-based metrics could be used to predict MAG quality. Deeper analysis of metagenomic HI-C contact maps extracts additional network properties and integrates biological information from the clustered genomes, enhancing predictive performance. This combination of network and biological properties used as features in Machine Learning Models, enhances MAG quality prediction and offers insights into microbial community dynamics. en_US dc.description.tableofcontents 1. Introduction 1 1.1. Metagenomic Hi-C 1 1.2. Metagenome Assembled Genome Quality Assessment 2 1.3. Previous work 4 1.4. Experiment Design 6 2. Methods 8 2.1. Dataset and Genome Binning 8 2.2. Network Analysis 10 2.3. Statistical Significance Testing 12 2.4. Quality assessment prediction 12 3. Results 17 3.1. Dataset Variations 17 3.2. Small-World Properties Analysis 17 3.3. Degree Properties Analysis 18 4. Influence and Connectivity Analysis 20 4.1. CheckM Features 21 4.2. Statistical Significance Analysis of Network and Biological Properties 22 4.3. Feature Generation and Quality Prediction 25 4.4. Feature Importance 29 4.5. Prediction Across Datasets 33 5. Discussion 36 6. Future Work 37 7. References 38 zh_TW dc.format.extent 7591578 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0107753048 en_US dc.subject (關鍵詞) Hi-C 接觸圖 zh_TW dc.subject (關鍵詞) 總體基因體學 zh_TW dc.subject (關鍵詞) 總體基因體組裝基因體 zh_TW dc.subject (關鍵詞) 網路相關指標 zh_TW dc.subject (關鍵詞) 機器學習 zh_TW dc.subject (關鍵詞) 生物資料科學 zh_TW dc.subject (關鍵詞) Hi-C Contact Maps en_US dc.subject (關鍵詞) Metagenomics en_US dc.subject (關鍵詞) Metagenome-Assembled genomes en_US dc.subject (關鍵詞) Network Theory en_US dc.subject (關鍵詞) Machine Learning en_US dc.subject (關鍵詞) Bioinformatics en_US dc.title (題名) 總體基因組 Hi-C 接觸圖網絡分析及其重組總體基因組品質預測 zh_TW dc.title (題名) Metagenomic Hi-C Contact Map Network Analysis and Prediction of Recovered Metagenome Assembled Genome Quality en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) Sait M, Hugenholtz P, Janssen PH. Cultivation of globally distributed soil bacteria from phylogenetic lineages previously only detected in cultivation-independent surveys. Environ Microbiol. 2002; 4(11):654–66. Hugenholtz et al. (2008) Metagenomics. Nature, 455, 481–483. Burton et al. (2014) Species-Level Deconvolution of Metagenome Assemblies with Hi-C–Based Contact Probability Maps. G3: GENES, GENOMES, GENETICS, 4, 7. Lieberman-Aiden et al. (2009) Comprehensive mapping of long range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. DeMaere et al. (2019) bin3C: exploiting Hi-C sequencing data to accurately resolve metagenome-assembled genomes. Genome Biology, 20, 46. Cheng et al. (2020) Bin3C_SLM: Deconvoluting metagenomic assemblies via Hi-C connect networks. Stalder, T., Press, M.O., Sullivan, S. et al. Linking the resistome and plasmidome to the microbiome. ISME J 13, 2437–2446 (2019). https://doi.org/10.1038/s41396-019-0446-4 Du, Y., Sun, F. HiCBin: binning metagenomic contigs and recovering metagenome-assembled genomes using Hi-C contact maps. Genome Biol 23, 63 (2022). https://doi.org/10.1186/s13059-022-02626-w Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. 2014. CheckM: Assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research, 25: 1043-1055. Yuting Hsu (2022) The Network Analysis of the metagenomic Hi-C contact map and its downstream metagenome assembly Watts DJ, Strogatz SH. Collective dynamics of ‘small-world’ networks. Nature. 1998;393:440–2 Waltman L, Eck NJ van. A smart local moving algorithm for large-scale modularity-based community detection. European Phys J B. 2013;86:471. Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Statistical Mech Theory Exp. 2008;2008:P10008. Rosvall M, Bergstrom CT. Maps of random walks on complex networks reveal community structure. Proc National Acad Sci. 2008;105:1118–23. Ke Zhang, Chenxi Wang, Liping Sun, Jie Zheng, Prediction of gene co-expression from chromatin contacts with graph attention network, Bioinformatics, Volume 38, Issue 19, October 2022, Pages 4457–4465, https://doi.org/10.1093/bioinformatics/btac535 Ernest YB, Daniel AA. A Review of the Logistic Regression Model with Emphasis on Medical Research. J Data Analysis Information Process. 2019;07:190–207. Hyatt, D., Chen, GL., LoCascio, P.F. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010). https://doi.org/10.1186/1471-2105-11-119 Gao, W., Lin, W., Li, Q. et al. Identification and validation of microbial biomarkers from cross-cohort datasets using xMarkerFinder. Nat Protoc 19, 2803–2830 (2024). https://doi.org/10.1038/s41596-024-00999-9 Scott M. Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 4768–4777. Manchanda, N., Portwood, J.L., Woodhouse, M.R. et al. GenomeQC: a quality assessment tool for genome assemblies and gene structure annotations. BMC Genomics 21, 193 (2020). https://doi.org/10.1186/s12864-020-6568-2 Hunt, M., Kikuchi, T., Sanders, M. et al. REAPR: a universal tool for genome assembly evaluation. Genome Biol 14, R47 (2013). https://doi.org/10.1186/gb-2013-14-5-r47 zh_TW