Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 總體基因組 Hi-C 接觸圖網絡分析及其重組總體基因組品質預測
Metagenomic Hi-C Contact Map Network Analysis and Prediction of Recovered Metagenome Assembled Genome Quality
作者 王神鐸
Serrato, Armando
貢獻者 張家銘
Chang, Jia-Ming
王神鐸
Armando Serrato
關鍵詞 Hi-C 接觸圖
總體基因體學
總體基因體組裝基因體
網路相關指標
機器學習
生物資料科學
Hi-C Contact Maps
Metagenomics
Metagenome-Assembled genomes
Network Theory
Machine Learning
Bioinformatics
日期 2024
上傳時間 4-Oct-2024 10:47:11 (UTC+8)
摘要 近年來,總體基因體學利用 Hi-C 定序數據從複雜的微生物群落中收復總體基因體組裝基因體 (MAGs)。本研究進一步驗證了先前提出的假設,即可以通過網路相關的指標來預測 MAG 質量。我們深入分析了總體基因體Hi-C 接觸圖,提取了額外的網路屬性,並整合了來自群聚基因組的生物信息,以提升預測表現。這種網路與生物屬性相結合的特徵在機器學習模型中的應用,不僅增強了 MAG 質量預測,還提供了對微生物群落動態的見解。
Recent advancements in Metagenomics leverage Hi-C sequencing data to recover Metagenome Assembled Genomes (MAGs) from complex microbial communities. This research advances MAG quality prediction by building upon previous hypotheses that network-based metrics could be used to predict MAG quality. Deeper analysis of metagenomic HI-C contact maps extracts additional network properties and integrates biological information from the clustered genomes, enhancing predictive performance. This combination of network and biological properties used as features in Machine Learning Models, enhances MAG quality prediction and offers insights into microbial community dynamics.
參考文獻 Sait M, Hugenholtz P, Janssen PH. Cultivation of globally distributed soil bacteria from phylogenetic lineages previously only detected in cultivation-independent surveys. Environ Microbiol. 2002; 4(11):654–66. Hugenholtz et al. (2008) Metagenomics. Nature, 455, 481–483. Burton et al. (2014) Species-Level Deconvolution of Metagenome Assemblies with Hi-C–Based Contact Probability Maps. G3: GENES, GENOMES, GENETICS, 4, 7. Lieberman-Aiden et al. (2009) Comprehensive mapping of long range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. DeMaere et al. (2019) bin3C: exploiting Hi-C sequencing data to accurately resolve metagenome-assembled genomes. Genome Biology, 20, 46. Cheng et al. (2020) Bin3C_SLM: Deconvoluting metagenomic assemblies via Hi-C connect networks. Stalder, T., Press, M.O., Sullivan, S. et al. Linking the resistome and plasmidome to the microbiome. ISME J 13, 2437–2446 (2019). https://doi.org/10.1038/s41396-019-0446-4 Du, Y., Sun, F. HiCBin: binning metagenomic contigs and recovering metagenome-assembled genomes using Hi-C contact maps. Genome Biol 23, 63 (2022). https://doi.org/10.1186/s13059-022-02626-w Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. 2014. CheckM: Assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research, 25: 1043-1055. Yuting Hsu (2022) The Network Analysis of the metagenomic Hi-C contact map and its downstream metagenome assembly Watts DJ, Strogatz SH. Collective dynamics of ‘small-world’ networks. Nature. 1998;393:440–2 Waltman L, Eck NJ van. A smart local moving algorithm for large-scale modularity-based community detection. European Phys J B. 2013;86:471. Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Statistical Mech Theory Exp. 2008;2008:P10008. Rosvall M, Bergstrom CT. Maps of random walks on complex networks reveal community structure. Proc National Acad Sci. 2008;105:1118–23. Ke Zhang, Chenxi Wang, Liping Sun, Jie Zheng, Prediction of gene co-expression from chromatin contacts with graph attention network, Bioinformatics, Volume 38, Issue 19, October 2022, Pages 4457–4465, https://doi.org/10.1093/bioinformatics/btac535 Ernest YB, Daniel AA. A Review of the Logistic Regression Model with Emphasis on Medical Research. J Data Analysis Information Process. 2019;07:190–207. Hyatt, D., Chen, GL., LoCascio, P.F. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010). https://doi.org/10.1186/1471-2105-11-119 Gao, W., Lin, W., Li, Q. et al. Identification and validation of microbial biomarkers from cross-cohort datasets using xMarkerFinder. Nat Protoc 19, 2803–2830 (2024). https://doi.org/10.1038/s41596-024-00999-9 Scott M. Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 4768–4777. Manchanda, N., Portwood, J.L., Woodhouse, M.R. et al. GenomeQC: a quality assessment tool for genome assemblies and gene structure annotations. BMC Genomics 21, 193 (2020). https://doi.org/10.1186/s12864-020-6568-2 Hunt, M., Kikuchi, T., Sanders, M. et al. REAPR: a universal tool for genome assembly evaluation. Genome Biol 14, R47 (2013). https://doi.org/10.1186/gb-2013-14-5-r47
描述 碩士
國立政治大學
資訊科學系
107753048
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0107753048
資料類型 thesis
dc.contributor.advisor 張家銘zh_TW
dc.contributor.advisor Chang, Jia-Mingen_US
dc.contributor.author (Authors) 王神鐸zh_TW
dc.contributor.author (Authors) Armando Serratoen_US
dc.creator (作者) 王神鐸zh_TW
dc.creator (作者) Serrato, Armandoen_US
dc.date (日期) 2024en_US
dc.date.accessioned 4-Oct-2024 10:47:11 (UTC+8)-
dc.date.available 4-Oct-2024 10:47:11 (UTC+8)-
dc.date.issued (上傳時間) 4-Oct-2024 10:47:11 (UTC+8)-
dc.identifier (Other Identifiers) G0107753048en_US
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/153914-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學系zh_TW
dc.description (描述) 107753048zh_TW
dc.description.abstract (摘要) 近年來,總體基因體學利用 Hi-C 定序數據從複雜的微生物群落中收復總體基因體組裝基因體 (MAGs)。本研究進一步驗證了先前提出的假設,即可以通過網路相關的指標來預測 MAG 質量。我們深入分析了總體基因體Hi-C 接觸圖,提取了額外的網路屬性,並整合了來自群聚基因組的生物信息,以提升預測表現。這種網路與生物屬性相結合的特徵在機器學習模型中的應用,不僅增強了 MAG 質量預測,還提供了對微生物群落動態的見解。zh_TW
dc.description.abstract (摘要) Recent advancements in Metagenomics leverage Hi-C sequencing data to recover Metagenome Assembled Genomes (MAGs) from complex microbial communities. This research advances MAG quality prediction by building upon previous hypotheses that network-based metrics could be used to predict MAG quality. Deeper analysis of metagenomic HI-C contact maps extracts additional network properties and integrates biological information from the clustered genomes, enhancing predictive performance. This combination of network and biological properties used as features in Machine Learning Models, enhances MAG quality prediction and offers insights into microbial community dynamics.en_US
dc.description.tableofcontents 1. Introduction 1 1.1. Metagenomic Hi-C 1 1.2. Metagenome Assembled Genome Quality Assessment 2 1.3. Previous work 4 1.4. Experiment Design 6 2. Methods 8 2.1. Dataset and Genome Binning 8 2.2. Network Analysis 10 2.3. Statistical Significance Testing 12 2.4. Quality assessment prediction 12 3. Results 17 3.1. Dataset Variations 17 3.2. Small-World Properties Analysis 17 3.3. Degree Properties Analysis 18 4. Influence and Connectivity Analysis 20 4.1. CheckM Features 21 4.2. Statistical Significance Analysis of Network and Biological Properties 22 4.3. Feature Generation and Quality Prediction 25 4.4. Feature Importance 29 4.5. Prediction Across Datasets 33 5. Discussion 36 6. Future Work 37 7. References 38zh_TW
dc.format.extent 7591578 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0107753048en_US
dc.subject (關鍵詞) Hi-C 接觸圖zh_TW
dc.subject (關鍵詞) 總體基因體學zh_TW
dc.subject (關鍵詞) 總體基因體組裝基因體zh_TW
dc.subject (關鍵詞) 網路相關指標zh_TW
dc.subject (關鍵詞) 機器學習zh_TW
dc.subject (關鍵詞) 生物資料科學zh_TW
dc.subject (關鍵詞) Hi-C Contact Mapsen_US
dc.subject (關鍵詞) Metagenomicsen_US
dc.subject (關鍵詞) Metagenome-Assembled genomesen_US
dc.subject (關鍵詞) Network Theoryen_US
dc.subject (關鍵詞) Machine Learningen_US
dc.subject (關鍵詞) Bioinformaticsen_US
dc.title (題名) 總體基因組 Hi-C 接觸圖網絡分析及其重組總體基因組品質預測zh_TW
dc.title (題名) Metagenomic Hi-C Contact Map Network Analysis and Prediction of Recovered Metagenome Assembled Genome Qualityen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) Sait M, Hugenholtz P, Janssen PH. Cultivation of globally distributed soil bacteria from phylogenetic lineages previously only detected in cultivation-independent surveys. Environ Microbiol. 2002; 4(11):654–66. Hugenholtz et al. (2008) Metagenomics. Nature, 455, 481–483. Burton et al. (2014) Species-Level Deconvolution of Metagenome Assemblies with Hi-C–Based Contact Probability Maps. G3: GENES, GENOMES, GENETICS, 4, 7. Lieberman-Aiden et al. (2009) Comprehensive mapping of long range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. DeMaere et al. (2019) bin3C: exploiting Hi-C sequencing data to accurately resolve metagenome-assembled genomes. Genome Biology, 20, 46. Cheng et al. (2020) Bin3C_SLM: Deconvoluting metagenomic assemblies via Hi-C connect networks. Stalder, T., Press, M.O., Sullivan, S. et al. Linking the resistome and plasmidome to the microbiome. ISME J 13, 2437–2446 (2019). https://doi.org/10.1038/s41396-019-0446-4 Du, Y., Sun, F. HiCBin: binning metagenomic contigs and recovering metagenome-assembled genomes using Hi-C contact maps. Genome Biol 23, 63 (2022). https://doi.org/10.1186/s13059-022-02626-w Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. 2014. CheckM: Assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research, 25: 1043-1055. Yuting Hsu (2022) The Network Analysis of the metagenomic Hi-C contact map and its downstream metagenome assembly Watts DJ, Strogatz SH. Collective dynamics of ‘small-world’ networks. Nature. 1998;393:440–2 Waltman L, Eck NJ van. A smart local moving algorithm for large-scale modularity-based community detection. European Phys J B. 2013;86:471. Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Statistical Mech Theory Exp. 2008;2008:P10008. Rosvall M, Bergstrom CT. Maps of random walks on complex networks reveal community structure. Proc National Acad Sci. 2008;105:1118–23. Ke Zhang, Chenxi Wang, Liping Sun, Jie Zheng, Prediction of gene co-expression from chromatin contacts with graph attention network, Bioinformatics, Volume 38, Issue 19, October 2022, Pages 4457–4465, https://doi.org/10.1093/bioinformatics/btac535 Ernest YB, Daniel AA. A Review of the Logistic Regression Model with Emphasis on Medical Research. J Data Analysis Information Process. 2019;07:190–207. Hyatt, D., Chen, GL., LoCascio, P.F. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010). https://doi.org/10.1186/1471-2105-11-119 Gao, W., Lin, W., Li, Q. et al. Identification and validation of microbial biomarkers from cross-cohort datasets using xMarkerFinder. Nat Protoc 19, 2803–2830 (2024). https://doi.org/10.1038/s41596-024-00999-9 Scott M. Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 4768–4777. Manchanda, N., Portwood, J.L., Woodhouse, M.R. et al. GenomeQC: a quality assessment tool for genome assemblies and gene structure annotations. BMC Genomics 21, 193 (2020). https://doi.org/10.1186/s12864-020-6568-2 Hunt, M., Kikuchi, T., Sanders, M. et al. REAPR: a universal tool for genome assembly evaluation. Genome Biol 14, R47 (2013). https://doi.org/10.1186/gb-2013-14-5-r47zh_TW