學術產出-學位論文
題名 | 總體基因組 Hi-C 接觸圖網絡分析及其重組總體基因組品質預測 Metagenomic Hi-C Contact Map Network Analysis and Prediction of Recovered Metagenome Assembled Genome Quality |
作者 | 王神鐸 Serrato, Armando |
貢獻者 | 張家銘 Chang, Jia-Ming 王神鐸 Armando Serrato |
關鍵詞 | Hi-C 接觸圖 總體基因體學 總體基因體組裝基因體 網路相關指標 機器學習 生物資料科學 Hi-C Contact Maps Metagenomics Metagenome-Assembled genomes Network Theory Machine Learning Bioinformatics |
日期 | 2024 |
上傳時間 | 4-十月-2024 10:47:11 (UTC+8) |
摘要 | 近年來,總體基因體學利用 Hi-C 定序數據從複雜的微生物群落中收復總體基因體組裝基因體 (MAGs)。本研究進一步驗證了先前提出的假設,即可以通過網路相關的指標來預測 MAG 質量。我們深入分析了總體基因體Hi-C 接觸圖,提取了額外的網路屬性,並整合了來自群聚基因組的生物信息,以提升預測表現。這種網路與生物屬性相結合的特徵在機器學習模型中的應用,不僅增強了 MAG 質量預測,還提供了對微生物群落動態的見解。 Recent advancements in Metagenomics leverage Hi-C sequencing data to recover Metagenome Assembled Genomes (MAGs) from complex microbial communities. This research advances MAG quality prediction by building upon previous hypotheses that network-based metrics could be used to predict MAG quality. Deeper analysis of metagenomic HI-C contact maps extracts additional network properties and integrates biological information from the clustered genomes, enhancing predictive performance. This combination of network and biological properties used as features in Machine Learning Models, enhances MAG quality prediction and offers insights into microbial community dynamics. |
參考文獻 | Sait M, Hugenholtz P, Janssen PH. Cultivation of globally distributed soil bacteria from phylogenetic lineages previously only detected in cultivation-independent surveys. Environ Microbiol. 2002; 4(11):654–66. Hugenholtz et al. (2008) Metagenomics. Nature, 455, 481–483. Burton et al. (2014) Species-Level Deconvolution of Metagenome Assemblies with Hi-C–Based Contact Probability Maps. G3: GENES, GENOMES, GENETICS, 4, 7. Lieberman-Aiden et al. (2009) Comprehensive mapping of long range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. DeMaere et al. (2019) bin3C: exploiting Hi-C sequencing data to accurately resolve metagenome-assembled genomes. Genome Biology, 20, 46. Cheng et al. (2020) Bin3C_SLM: Deconvoluting metagenomic assemblies via Hi-C connect networks. Stalder, T., Press, M.O., Sullivan, S. et al. Linking the resistome and plasmidome to the microbiome. ISME J 13, 2437–2446 (2019). https://doi.org/10.1038/s41396-019-0446-4 Du, Y., Sun, F. HiCBin: binning metagenomic contigs and recovering metagenome-assembled genomes using Hi-C contact maps. Genome Biol 23, 63 (2022). https://doi.org/10.1186/s13059-022-02626-w Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. 2014. CheckM: Assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research, 25: 1043-1055. Yuting Hsu (2022) The Network Analysis of the metagenomic Hi-C contact map and its downstream metagenome assembly Watts DJ, Strogatz SH. Collective dynamics of ‘small-world’ networks. Nature. 1998;393:440–2 Waltman L, Eck NJ van. A smart local moving algorithm for large-scale modularity-based community detection. European Phys J B. 2013;86:471. Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Statistical Mech Theory Exp. 2008;2008:P10008. Rosvall M, Bergstrom CT. Maps of random walks on complex networks reveal community structure. Proc National Acad Sci. 2008;105:1118–23. Ke Zhang, Chenxi Wang, Liping Sun, Jie Zheng, Prediction of gene co-expression from chromatin contacts with graph attention network, Bioinformatics, Volume 38, Issue 19, October 2022, Pages 4457–4465, https://doi.org/10.1093/bioinformatics/btac535 Ernest YB, Daniel AA. A Review of the Logistic Regression Model with Emphasis on Medical Research. J Data Analysis Information Process. 2019;07:190–207. Hyatt, D., Chen, GL., LoCascio, P.F. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010). https://doi.org/10.1186/1471-2105-11-119 Gao, W., Lin, W., Li, Q. et al. Identification and validation of microbial biomarkers from cross-cohort datasets using xMarkerFinder. Nat Protoc 19, 2803–2830 (2024). https://doi.org/10.1038/s41596-024-00999-9 Scott M. Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 4768–4777. Manchanda, N., Portwood, J.L., Woodhouse, M.R. et al. GenomeQC: a quality assessment tool for genome assemblies and gene structure annotations. BMC Genomics 21, 193 (2020). https://doi.org/10.1186/s12864-020-6568-2 Hunt, M., Kikuchi, T., Sanders, M. et al. REAPR: a universal tool for genome assembly evaluation. Genome Biol 14, R47 (2013). https://doi.org/10.1186/gb-2013-14-5-r47 |
描述 | 碩士 國立政治大學 資訊科學系 107753048 |
資料來源 | http://thesis.lib.nccu.edu.tw/record/#G0107753048 |
資料類型 | thesis |
dc.contributor.advisor | 張家銘 | zh_TW |
dc.contributor.advisor | Chang, Jia-Ming | en_US |
dc.contributor.author (作者) | 王神鐸 | zh_TW |
dc.contributor.author (作者) | Armando Serrato | en_US |
dc.creator (作者) | 王神鐸 | zh_TW |
dc.creator (作者) | Serrato, Armando | en_US |
dc.date (日期) | 2024 | en_US |
dc.date.accessioned | 4-十月-2024 10:47:11 (UTC+8) | - |
dc.date.available | 4-十月-2024 10:47:11 (UTC+8) | - |
dc.date.issued (上傳時間) | 4-十月-2024 10:47:11 (UTC+8) | - |
dc.identifier (其他 識別碼) | G0107753048 | en_US |
dc.identifier.uri (URI) | https://nccur.lib.nccu.edu.tw/handle/140.119/153914 | - |
dc.description (描述) | 碩士 | zh_TW |
dc.description (描述) | 國立政治大學 | zh_TW |
dc.description (描述) | 資訊科學系 | zh_TW |
dc.description (描述) | 107753048 | zh_TW |
dc.description.abstract (摘要) | 近年來,總體基因體學利用 Hi-C 定序數據從複雜的微生物群落中收復總體基因體組裝基因體 (MAGs)。本研究進一步驗證了先前提出的假設,即可以通過網路相關的指標來預測 MAG 質量。我們深入分析了總體基因體Hi-C 接觸圖,提取了額外的網路屬性,並整合了來自群聚基因組的生物信息,以提升預測表現。這種網路與生物屬性相結合的特徵在機器學習模型中的應用,不僅增強了 MAG 質量預測,還提供了對微生物群落動態的見解。 | zh_TW |
dc.description.abstract (摘要) | Recent advancements in Metagenomics leverage Hi-C sequencing data to recover Metagenome Assembled Genomes (MAGs) from complex microbial communities. This research advances MAG quality prediction by building upon previous hypotheses that network-based metrics could be used to predict MAG quality. Deeper analysis of metagenomic HI-C contact maps extracts additional network properties and integrates biological information from the clustered genomes, enhancing predictive performance. This combination of network and biological properties used as features in Machine Learning Models, enhances MAG quality prediction and offers insights into microbial community dynamics. | en_US |
dc.description.tableofcontents | 1. Introduction 1 1.1. Metagenomic Hi-C 1 1.2. Metagenome Assembled Genome Quality Assessment 2 1.3. Previous work 4 1.4. Experiment Design 6 2. Methods 8 2.1. Dataset and Genome Binning 8 2.2. Network Analysis 10 2.3. Statistical Significance Testing 12 2.4. Quality assessment prediction 12 3. Results 17 3.1. Dataset Variations 17 3.2. Small-World Properties Analysis 17 3.3. Degree Properties Analysis 18 4. Influence and Connectivity Analysis 20 4.1. CheckM Features 21 4.2. Statistical Significance Analysis of Network and Biological Properties 22 4.3. Feature Generation and Quality Prediction 25 4.4. Feature Importance 29 4.5. Prediction Across Datasets 33 5. Discussion 36 6. Future Work 37 7. References 38 | zh_TW |
dc.format.extent | 7591578 bytes | - |
dc.format.mimetype | application/pdf | - |
dc.source.uri (資料來源) | http://thesis.lib.nccu.edu.tw/record/#G0107753048 | en_US |
dc.subject (關鍵詞) | Hi-C 接觸圖 | zh_TW |
dc.subject (關鍵詞) | 總體基因體學 | zh_TW |
dc.subject (關鍵詞) | 總體基因體組裝基因體 | zh_TW |
dc.subject (關鍵詞) | 網路相關指標 | zh_TW |
dc.subject (關鍵詞) | 機器學習 | zh_TW |
dc.subject (關鍵詞) | 生物資料科學 | zh_TW |
dc.subject (關鍵詞) | Hi-C Contact Maps | en_US |
dc.subject (關鍵詞) | Metagenomics | en_US |
dc.subject (關鍵詞) | Metagenome-Assembled genomes | en_US |
dc.subject (關鍵詞) | Network Theory | en_US |
dc.subject (關鍵詞) | Machine Learning | en_US |
dc.subject (關鍵詞) | Bioinformatics | en_US |
dc.title (題名) | 總體基因組 Hi-C 接觸圖網絡分析及其重組總體基因組品質預測 | zh_TW |
dc.title (題名) | Metagenomic Hi-C Contact Map Network Analysis and Prediction of Recovered Metagenome Assembled Genome Quality | en_US |
dc.type (資料類型) | thesis | en_US |
dc.relation.reference (參考文獻) | Sait M, Hugenholtz P, Janssen PH. Cultivation of globally distributed soil bacteria from phylogenetic lineages previously only detected in cultivation-independent surveys. Environ Microbiol. 2002; 4(11):654–66. Hugenholtz et al. (2008) Metagenomics. Nature, 455, 481–483. Burton et al. (2014) Species-Level Deconvolution of Metagenome Assemblies with Hi-C–Based Contact Probability Maps. G3: GENES, GENOMES, GENETICS, 4, 7. Lieberman-Aiden et al. (2009) Comprehensive mapping of long range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. DeMaere et al. (2019) bin3C: exploiting Hi-C sequencing data to accurately resolve metagenome-assembled genomes. Genome Biology, 20, 46. Cheng et al. (2020) Bin3C_SLM: Deconvoluting metagenomic assemblies via Hi-C connect networks. Stalder, T., Press, M.O., Sullivan, S. et al. Linking the resistome and plasmidome to the microbiome. ISME J 13, 2437–2446 (2019). https://doi.org/10.1038/s41396-019-0446-4 Du, Y., Sun, F. HiCBin: binning metagenomic contigs and recovering metagenome-assembled genomes using Hi-C contact maps. Genome Biol 23, 63 (2022). https://doi.org/10.1186/s13059-022-02626-w Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. 2014. CheckM: Assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research, 25: 1043-1055. Yuting Hsu (2022) The Network Analysis of the metagenomic Hi-C contact map and its downstream metagenome assembly Watts DJ, Strogatz SH. Collective dynamics of ‘small-world’ networks. Nature. 1998;393:440–2 Waltman L, Eck NJ van. A smart local moving algorithm for large-scale modularity-based community detection. European Phys J B. 2013;86:471. Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Statistical Mech Theory Exp. 2008;2008:P10008. Rosvall M, Bergstrom CT. Maps of random walks on complex networks reveal community structure. Proc National Acad Sci. 2008;105:1118–23. Ke Zhang, Chenxi Wang, Liping Sun, Jie Zheng, Prediction of gene co-expression from chromatin contacts with graph attention network, Bioinformatics, Volume 38, Issue 19, October 2022, Pages 4457–4465, https://doi.org/10.1093/bioinformatics/btac535 Ernest YB, Daniel AA. A Review of the Logistic Regression Model with Emphasis on Medical Research. J Data Analysis Information Process. 2019;07:190–207. Hyatt, D., Chen, GL., LoCascio, P.F. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010). https://doi.org/10.1186/1471-2105-11-119 Gao, W., Lin, W., Li, Q. et al. Identification and validation of microbial biomarkers from cross-cohort datasets using xMarkerFinder. Nat Protoc 19, 2803–2830 (2024). https://doi.org/10.1038/s41596-024-00999-9 Scott M. Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 4768–4777. Manchanda, N., Portwood, J.L., Woodhouse, M.R. et al. GenomeQC: a quality assessment tool for genome assemblies and gene structure annotations. BMC Genomics 21, 193 (2020). https://doi.org/10.1186/s12864-020-6568-2 Hunt, M., Kikuchi, T., Sanders, M. et al. REAPR: a universal tool for genome assembly evaluation. Genome Biol 14, R47 (2013). https://doi.org/10.1186/gb-2013-14-5-r47 | zh_TW |