學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

題名 總體基因Hi-C交互作用圖之網路分析與其組裝
The network analysis of the metagenomic Hi-C contact map and its downstream metagenome assemble binning
作者 許育庭
Hsu, Yu-Ting
貢獻者 張家銘
Chang, Jia-Ming
許育庭
Hsu, Yu-Ting
關鍵詞 Hi-C
總體基因組裝基因組
網路模型
社群發現
Hi-C
Metagenome-assembled genomes
Network models
Community detection
日期 2022
上傳時間 5-Oct-2022 09:14:07 (UTC+8)
摘要 背景:總體基因組學是對微生物群體進行基因分析研究,相對於傳統總體基因組分裝,近來利用染色體構象捕獲技術進行恢復個別物種的總體基因組,可以得到更好的分裝結果。在先前鄭惟文的碩士論文「HiCBin: 利用 Hi-C 交互網路對總體基因組裝進行反捲積」中,以bin3C的流程為主並提出了一個利用智慧局部移動法(SLM)分群的基因組分裝方法。
結果:除了利用Hi-C資料進行基因組分裝,我們對總體基因組Hi-C連結網路進行分析,並發現在高品質的網路有較多的小世界網路特性,於是我們利用這些特性進行分裝網路品質的預測。此外我們也以bin3C的流程,並替換其他不同的社群發現演算法,去測試是否改善分群結果,而調整解析度後的SLM在兩個資料中表現較好。
結論:我們的研究主要依據先前碩士論文但針對網路做更多分析,並多測試了三個資料集。雖較難得出以何種分群方法更好,但對於網路特性的發現可以為未來的研究提供一個新的觀點。實驗原始碼可以於以下連結中取得: https://github.com/changlabtw/Bin3C_SLM。
Background: Metagenomics is the genomic analysis of microbial communities. Current approaches to metagenome-assembled-genomes (MAGs) recovery draw on chromosome conformation capture techniques and have been shown to outperform traditional genome binning methods. In the previous Cheng’s thesis, `HiCBin: Deconvoluting metagenomic assemblies via Hi-C connect networks`, she based on bin3C pipeline and described a Hi-C-based metagenomic deconvolution method using smart local movement algorithm (SLM) for genome binning.
Results: In addition to using Hi-C data for genome binning, we further analyze the contact networks of metagenomic Hi-C and discover that the networks get higher quality to have more small-world characteristics. Therefore, we use the properties to predict the qualities of the clusters. We also follow the bin3C process and replace the clustering step with different community detection algorithms to check if it improves the outcome. SLM performs better in two datasets after adjusting the resolution parameter.
Conclusion: In this work, we mainly followed Cheng’s thesis but did more analyses on the metagenomic Hi-C networks and tested three more datasets. Though it is hard to conclude a better cluster algorithm from this work, the discovery of the network properties might provide a new aspect for future works. The source code for the experiments is publicly available at https://github.com/changlabtw/Bin3C_SLM.
參考文獻 1. Howe AC, Jansson JK, Malfatti SA, Tringe SG, Tiedje JM, Brown CT. Tackling soil diversity with the assembly of large, complex metagenomes. Proc National Acad Sci. 2014;111:4904–9.
2. Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, et al. Environmental Genome Shotgun Sequencing of the Sargasso Sea. Science. 2004;304:66–74.
3. Oh J, Byrd AL, Deming C, Conlan S, Barnabas B, Blakesley R, et al. Biogeography and individuality shape function in the human skin metagenome. Nature. 2014;514:59–64.
4. Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature. 2010;464:59–65.
5. Handelsman J. Metagenomics: Application of Genomics to Uncultured Microorganisms. Microbiol Mol Biol R. 2004;68:669–85.
6. Rappé MS, Giovannoni SJ. THE UNCULTURED MICROBIAL MAJORITY. Annu Rev Microbiol. 2003;57:369–94.
7. Beitel CW, Froenicke L, Lang JM, Korf IF, Michelmore RW, Eisen JA, et al. Strain- and plasmid-level deconvolution of a synthetic metagenome by sequencing proximity ligation products. Peerj. 2014;2:e415.
8. Hug LA, Baker BJ, Anantharaman K, Brown CT, Probst AJ, Castelle CJ, et al. A new view of the tree of life. Nat Microbiol. 2016;1:16048.
9. Parks DH, Rinke C, Chuvochina M, Chaumeil P-A, Woodcroft BJ, Evans PN, et al. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat Microbiol. 2017;2:1533–42.
10. Nayfach S, Roux S, Seshadri R, Udwary D, Varghese N, Schulz F, et al. A genomic catalog of Earth’s microbiomes. Nat Biotechnol. 2020;1–11.
11. Almeida A, Nayfach S, Boland M, Strozzi F, Beracochea M, Shi ZJ, et al. A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat Biotechnol. 2021;39:105–14.
12. Thomas T, Gilbert J, Meyer F. Metagenomics - a guide from sampling to data analysis. Microb Informatics Exp. 2012;2:3.
13. Hugerth LW, Larsson J, Alneberg J, Lindh MV, Legrand C, Pinhassi J, et al. Metagenome-assembled genomes uncover a global brackish microbiome. Genome Biol. 2015;16:279.
14. Burton JN, Liachko I, Dunham MJ, Shendure J. Species-level deconvolution of metagenome assemblies with Hi-C-based contact probability maps. G3 (Bethesda, Md). 2014;4:1339–46.
15. Iverson V, Morris RM, Frazar CD, Berthiaume CT, Morales RL, Armbrust EV. Untangling Genomes from Metagenomes: Revealing an Uncultured Class of Marine Euryarchaeota. Science. 2012;335:587–90.
16. Mitra S, Förster-Fromme K, Damms-Machado A, Scheurenbrand T, Biskup S, Huson DH, et al. Analysis of the intestinal microbiota using SOLiD 16S rRNA gene sequencing and SOLiD shotgun sequencing. Bmc Genomics. 2013;14:S16.
17. Narasingarao P, Podell S, Ugalde JA, Brochier-Armanet C, Emerson JB, Brocks JJ, et al. De novo metagenomic assembly reveals abundant novel major lineage of Archaea in hypersaline microbial communities. Isme J. 2012;6:81–93.
18. Rinke C, Schwientek P, Sczyrba A, Ivanova NN, Anderson IJ, Cheng J-F, et al. Insights into the phylogeny and coding potential of microbial dark matter. Nature. 2013;499:431–7.
19. Dick GJ, Andersson AF, Baker BJ, Simmons SL, Thomas BC, Yelton AP, et al. Community-wide analysis of microbial genome sequence signatures. Genome Biol. 2009;10:R85.
20. Hug LA, Castelle CJ, Wrighton KC, Thomas BC, Sharon I, Frischkorn KR, et al. Community genomic analyses constrain the distribution of metabolic traits across the Chloroflexi phylum and indicate roles in sediment carbon cycling. Microbiome. 2013;1:22.
21. Sharon I, Morowitz MJ, Thomas BC, Costello EK, Relman DA, Banfield JF. Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization. Genome Res. 2013;23:111–20.
22. Albertsen M, Hugenholtz P, Skarshewski A, Nielsen KL, Tyson GW, Nielsen PH. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat Biotechnol. 2013;31:533–8.
23. Mallawaarachchi V, Wickramarachchi A, Lin Y. GraphBin: Refined binning of metagenomic contigs using assembly graphs. Bioinform Oxf Engl. 2020;36:3307–13.
24. Alneberg J, Bjarnason BS, Bruijn I de, Schirmer M, Quick J, Ijaz UZ, et al. Binning metagenomic contigs by coverage and composition. Nat Methods. 2014;11:1144–6.
25. Wu Y-W, Tang Y-H, Tringe SG, Simmons BA, Singer SW. MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome. 2014;2:26.
26. Wu Y-W, Simmons BA, Singer SW. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics. 2016;32:605–7.
27. Kang DD, Froula J, Egan R, Wang Z. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. Peerj. 2015;3:e1165.
28. Lin H-H, Liao Y-C. Accurate binning of metagenomic contigs via automated clustering sequences using information of genomic signatures and marker genes. Sci Rep-uk. 2016;6:24175.
29. DeMaere MZ, Darling AE. bin3C: exploiting Hi-C sequencing data to accurately resolve metagenome-assembled genomes. Genome Biology. 2019;20:46.
30. Press MO, Wiser AH, Kronenberg ZN, Langford KW, Shakya M, Lo C-C, et al. Hi-C deconvolution of a human gut microbiome yields high-quality draft genomes and reveals plasmid-genome interactions. Biorxiv. 2017;198713.
31. Lieberman-Aiden E, Berkum NL van, Williams L, Imakaev M, Ragoczy T, Telling A, et al. Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome. Science. 2009;326:289–93.
32. Dekker J, Rippe K, Dekker M, Kleckner N. Capturing Chromosome Conformation. Science. 2002;295:1306–11.
33. Marbouty M, Cournac A, Flot J-F, Marie-Nelly H, Mozziconacci J, Koszul R. Metagenomic chromosome conformation capture (meta3C) unveils the diversity of chromosome organization in microorganisms. Elife. 2014;3:e03318.
34. Rosvall M, Axelsson D, Bergstrom CT. The map equation. European Phys J Special Top. 2009;178:13–23.
35. Domenico MD, Lancichinetti A, Arenas A, Rosvall M. Identifying Modular Flows on Multilayer Networks Reveals Highly Overlapping Organization in Interconnected Systems. Phys Rev X. 2015;5:011027.
36. Baudry L, Foutel-Rodier T, Thierry A, Koszul R, Marbouty M. MetaTOR: A Computational Pipeline to Recover High-Quality Metagenomic Bins From Mammalian Gut Proximity-Ligation (meta3C) Libraries. Frontiers Genetics. 2019;10:753.
37. Du Y, Sun F. HiCBin: binning metagenomic contigs and recovering metagenome-assembled genomes using Hi-C contact maps. Genome Biol. 2022;23:63.
38. Du Y, Laperriere SM, Fuhrman J, Sun F. Normalizing Metagenomic Hi-C Data and Detecting Spurious Contacts Using Zero-Inflated Negative Binomial Regression. J Comput Biol. 2022;29:106–20.
39. C IUq, C Q. TAXAassign v0. 4 [Internet]. 2013. Available from: https://github.com/umerijaz/TAXAassign
40. Marbouty M, Thierry A, Millot GA, Koszul R. MetaHiC phage-bacteria infection network reveals active cycling phages of the healthy human gut. Elife. 2021;10:e60608.
41. Nurk S, Meleshko D, Korobeynikov A, Pevzner PA. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 2017;27:824–34.
42. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Arxiv. 2013;
43. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics (Oxford, England). 2009;25:2078–9.
44. Knight PA, Ruiz D. A fast algorithm for matrix balancing. Ima J Numer Anal. 2012;33:1029–47.
45. Erdos P, Renyi A. On the Evolution of Random Graphs. Publication of the Mathematical Institute of the Hungarian Academy of Sciences. 1960. p. 17–61.
46. Tëmkin I, Eldredge N. Macroevolution, Explanation, Interpretation and Evidence. Interdisc Evol Res. 2015;183–226.
47. Barabási A-L, Ravasz E, Oltvai Z. Statistical Mechanics of Complex Networks. Lect Notes Phys. 2003;46–65.
48. Pombo A, Nicodemi M. Physical mechanisms behind the large scale features of chromatin organization. Biochem Soc Symp. 2014;5:e28447.
49. Ay F, Bailey TL, Noble WS. Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts. Genome research. 2014;24:999–1011.
50. Liu T, Wang Z. Reconstructing high-resolution chromosome three-dimensional structures by Hi-C complex networks. Bmc Bioinformatics. 2018;19:496.
51. Pigolotti S, Jensen MH, Zhan Y, Tiana G. Bifractal nature of chromosome contact maps. Biorxiv. 2020;686279.
52. Kan T-C. Apply graph theory to visualizing and analyzing Hi-C contact network. 2018.
53. Clauset A, Shalizi CR, Newman MEJ. Power-law distributions in empirical data. SIAM Review. 2009;4:661–703.
54. Broido AD, Clauset A. Scale-free networks are rare. Nat Commun. 2019;10:1017.
55. Gillespie CS. Fitting Heavy Tailed Distributions: The poweRlaw Package. J Stat Softw. 2015;64.
56. Alstott J, Bullmore E, Plenz D. powerlaw: A Python Package for Analysis of Heavy-Tailed Distributions. Plos One. 2014;9:e85777.
57. Watts DJ, Strogatz SH. Collective dynamics of ‘small-world’ networks. Nature. 1998;393:440–2.
58. Humphries MD, Gurney K, Prescott TJ. The brainstem reticular formation is a small-world, not scale-free, network. Proc Royal Soc B Biological Sci. 2006;273:503–11.
59. Telesford QK, Joyce KE, Hayasaka S, Burdette JH, Laurienti PJ. The Ubiquity of Small-World Networks. Brain Connectivity. 2011;1:367–75.
60. Emmons S, Kobourov S, Gallant M, Börner K. Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale. Plos One. 2016;11:e0159161.
61. Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Statistical Mech Theory Exp. 2008;2008:P10008.
62. Waltman L, Eck NJ van. A smart local moving algorithm for large-scale modularity-based community detection. European Phys J B. 2013;86:471.
63. Rosvall M, Bergstrom CT. Maps of random walks on complex networks reveal community structure. Proc National Acad Sci. 2008;105:1118–23.
64. Raghavan UN, Albert R, Kumara S. Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E. 2007;76:036106.
65. Lancichinetti A, Fortunato S, Radicchi F. Benchmark graphs for testing community detection algorithms. Phys Rev E. 2008;78:046110.
66. Rotta R, Noack A. Multilevel local search algorithms for modularity clustering. J Exp Algorithmics Jea. 2011;16:2.3.
67. Butler A, Hoffman P, Smibert P, Papalexi E, Satija R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol. 2018;36:411–20.
68. Reichardt J, Bornholdt S. Statistical mechanics of community detection. Phys Rev E. 2006;74:016110.
69. Traag VA, Waltman L, Eck NJ van. From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep-uk. 2019;9:5233.
70. Lancichinetti A. Louvain [Internet]. Available from: https://sites.google.com/site/andrealancichinetti/
71. Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25:1043–55.
72. Ernest YB, Daniel AA. A Review of the Logistic Regression Model with Emphasis on Medical Research. J Data Analysis Information Process. 2019;07:190–207.
73. Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013;29:1072–5.
74. Hunter JD. Matplotlib: A 2D Graphics Environment. Comput Sci Eng. 2007;9:90–5.
75. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. Arxiv. 2012;
76. Stevens J-L, Rudiger P, Bednar J. HoloViews: Building Complex Visualizations Easily for Reproducible Science. Proceedings of the 14th Python in Science Conference. 2015. p. 59–66.
77. Cheng W-W. HiCBin: Deconvoluting metagenomic assemblies by Hi-C connect network. 2020.
描述 碩士
國立政治大學
資訊科學系
108753127
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0108753127
資料類型 thesis
dc.contributor.advisor 張家銘zh_TW
dc.contributor.advisor Chang, Jia-Mingen_US
dc.contributor.author (Authors) 許育庭zh_TW
dc.contributor.author (Authors) Hsu, Yu-Tingen_US
dc.creator (作者) 許育庭zh_TW
dc.creator (作者) Hsu, Yu-Tingen_US
dc.date (日期) 2022en_US
dc.date.accessioned 5-Oct-2022 09:14:07 (UTC+8)-
dc.date.available 5-Oct-2022 09:14:07 (UTC+8)-
dc.date.issued (上傳時間) 5-Oct-2022 09:14:07 (UTC+8)-
dc.identifier (Other Identifiers) G0108753127en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/142120-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學系zh_TW
dc.description (描述) 108753127zh_TW
dc.description.abstract (摘要) 背景:總體基因組學是對微生物群體進行基因分析研究,相對於傳統總體基因組分裝,近來利用染色體構象捕獲技術進行恢復個別物種的總體基因組,可以得到更好的分裝結果。在先前鄭惟文的碩士論文「HiCBin: 利用 Hi-C 交互網路對總體基因組裝進行反捲積」中,以bin3C的流程為主並提出了一個利用智慧局部移動法(SLM)分群的基因組分裝方法。
結果:除了利用Hi-C資料進行基因組分裝,我們對總體基因組Hi-C連結網路進行分析,並發現在高品質的網路有較多的小世界網路特性,於是我們利用這些特性進行分裝網路品質的預測。此外我們也以bin3C的流程,並替換其他不同的社群發現演算法,去測試是否改善分群結果,而調整解析度後的SLM在兩個資料中表現較好。
結論:我們的研究主要依據先前碩士論文但針對網路做更多分析,並多測試了三個資料集。雖較難得出以何種分群方法更好,但對於網路特性的發現可以為未來的研究提供一個新的觀點。實驗原始碼可以於以下連結中取得: https://github.com/changlabtw/Bin3C_SLM。
zh_TW
dc.description.abstract (摘要) Background: Metagenomics is the genomic analysis of microbial communities. Current approaches to metagenome-assembled-genomes (MAGs) recovery draw on chromosome conformation capture techniques and have been shown to outperform traditional genome binning methods. In the previous Cheng’s thesis, `HiCBin: Deconvoluting metagenomic assemblies via Hi-C connect networks`, she based on bin3C pipeline and described a Hi-C-based metagenomic deconvolution method using smart local movement algorithm (SLM) for genome binning.
Results: In addition to using Hi-C data for genome binning, we further analyze the contact networks of metagenomic Hi-C and discover that the networks get higher quality to have more small-world characteristics. Therefore, we use the properties to predict the qualities of the clusters. We also follow the bin3C process and replace the clustering step with different community detection algorithms to check if it improves the outcome. SLM performs better in two datasets after adjusting the resolution parameter.
Conclusion: In this work, we mainly followed Cheng’s thesis but did more analyses on the metagenomic Hi-C networks and tested three more datasets. Though it is hard to conclude a better cluster algorithm from this work, the discovery of the network properties might provide a new aspect for future works. The source code for the experiments is publicly available at https://github.com/changlabtw/Bin3C_SLM.
en_US
dc.description.tableofcontents 1. Introduction 1
1.1. Metagenomics 1
1.2. Traditional genome binning 1
1.3. High-throughput Chromatin Conformation Capture (Hi-C) 2
1.4. Metagenome deconvolution using Hi-C 3
2. Methods 5
2.1. Datasets 5
2.2. Read cleanup and shotgun assembly 5
2.3. Hi-C map generation 6
2.4. Metagenomic Hi-C connect network 7
2.5. Degree distribution of the networks 8
2.5.1. Erdős–Rényi model 8
2.5.2. Barabási–Albert model 8
2.5.3. Watt-Strogatz model 10
2.6. Genome binning 12
2.7. Performance metrics of genome binning 13
2.7.1. Accessed by biology information 13
2.7.2. Accessed by its small-world network property 14
2.8. Platforms 14
3. Results 16
3.1. Metagenome assembly 16
3.2. Hi-C connect network analysis 17
3.3. The Small-world properties of the clusters in Hi-C connect networks 22
3.4. Prediction of the quality rank 22
3.5. Hi-C connect network deconvolution 25
3.6. Comparison with other work 30
3.7. Independent data 36
3.8. Different cluster algorithms comparison 36
4. Conclusion 41
5. References 43
zh_TW
dc.format.extent 2230883 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0108753127en_US
dc.subject (關鍵詞) Hi-Czh_TW
dc.subject (關鍵詞) 總體基因組裝基因組zh_TW
dc.subject (關鍵詞) 網路模型zh_TW
dc.subject (關鍵詞) 社群發現zh_TW
dc.subject (關鍵詞) Hi-Cen_US
dc.subject (關鍵詞) Metagenome-assembled genomesen_US
dc.subject (關鍵詞) Network modelsen_US
dc.subject (關鍵詞) Community detectionen_US
dc.title (題名) 總體基因Hi-C交互作用圖之網路分析與其組裝zh_TW
dc.title (題名) The network analysis of the metagenomic Hi-C contact map and its downstream metagenome assemble binningen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) 1. Howe AC, Jansson JK, Malfatti SA, Tringe SG, Tiedje JM, Brown CT. Tackling soil diversity with the assembly of large, complex metagenomes. Proc National Acad Sci. 2014;111:4904–9.
2. Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, et al. Environmental Genome Shotgun Sequencing of the Sargasso Sea. Science. 2004;304:66–74.
3. Oh J, Byrd AL, Deming C, Conlan S, Barnabas B, Blakesley R, et al. Biogeography and individuality shape function in the human skin metagenome. Nature. 2014;514:59–64.
4. Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature. 2010;464:59–65.
5. Handelsman J. Metagenomics: Application of Genomics to Uncultured Microorganisms. Microbiol Mol Biol R. 2004;68:669–85.
6. Rappé MS, Giovannoni SJ. THE UNCULTURED MICROBIAL MAJORITY. Annu Rev Microbiol. 2003;57:369–94.
7. Beitel CW, Froenicke L, Lang JM, Korf IF, Michelmore RW, Eisen JA, et al. Strain- and plasmid-level deconvolution of a synthetic metagenome by sequencing proximity ligation products. Peerj. 2014;2:e415.
8. Hug LA, Baker BJ, Anantharaman K, Brown CT, Probst AJ, Castelle CJ, et al. A new view of the tree of life. Nat Microbiol. 2016;1:16048.
9. Parks DH, Rinke C, Chuvochina M, Chaumeil P-A, Woodcroft BJ, Evans PN, et al. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat Microbiol. 2017;2:1533–42.
10. Nayfach S, Roux S, Seshadri R, Udwary D, Varghese N, Schulz F, et al. A genomic catalog of Earth’s microbiomes. Nat Biotechnol. 2020;1–11.
11. Almeida A, Nayfach S, Boland M, Strozzi F, Beracochea M, Shi ZJ, et al. A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat Biotechnol. 2021;39:105–14.
12. Thomas T, Gilbert J, Meyer F. Metagenomics - a guide from sampling to data analysis. Microb Informatics Exp. 2012;2:3.
13. Hugerth LW, Larsson J, Alneberg J, Lindh MV, Legrand C, Pinhassi J, et al. Metagenome-assembled genomes uncover a global brackish microbiome. Genome Biol. 2015;16:279.
14. Burton JN, Liachko I, Dunham MJ, Shendure J. Species-level deconvolution of metagenome assemblies with Hi-C-based contact probability maps. G3 (Bethesda, Md). 2014;4:1339–46.
15. Iverson V, Morris RM, Frazar CD, Berthiaume CT, Morales RL, Armbrust EV. Untangling Genomes from Metagenomes: Revealing an Uncultured Class of Marine Euryarchaeota. Science. 2012;335:587–90.
16. Mitra S, Förster-Fromme K, Damms-Machado A, Scheurenbrand T, Biskup S, Huson DH, et al. Analysis of the intestinal microbiota using SOLiD 16S rRNA gene sequencing and SOLiD shotgun sequencing. Bmc Genomics. 2013;14:S16.
17. Narasingarao P, Podell S, Ugalde JA, Brochier-Armanet C, Emerson JB, Brocks JJ, et al. De novo metagenomic assembly reveals abundant novel major lineage of Archaea in hypersaline microbial communities. Isme J. 2012;6:81–93.
18. Rinke C, Schwientek P, Sczyrba A, Ivanova NN, Anderson IJ, Cheng J-F, et al. Insights into the phylogeny and coding potential of microbial dark matter. Nature. 2013;499:431–7.
19. Dick GJ, Andersson AF, Baker BJ, Simmons SL, Thomas BC, Yelton AP, et al. Community-wide analysis of microbial genome sequence signatures. Genome Biol. 2009;10:R85.
20. Hug LA, Castelle CJ, Wrighton KC, Thomas BC, Sharon I, Frischkorn KR, et al. Community genomic analyses constrain the distribution of metabolic traits across the Chloroflexi phylum and indicate roles in sediment carbon cycling. Microbiome. 2013;1:22.
21. Sharon I, Morowitz MJ, Thomas BC, Costello EK, Relman DA, Banfield JF. Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization. Genome Res. 2013;23:111–20.
22. Albertsen M, Hugenholtz P, Skarshewski A, Nielsen KL, Tyson GW, Nielsen PH. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat Biotechnol. 2013;31:533–8.
23. Mallawaarachchi V, Wickramarachchi A, Lin Y. GraphBin: Refined binning of metagenomic contigs using assembly graphs. Bioinform Oxf Engl. 2020;36:3307–13.
24. Alneberg J, Bjarnason BS, Bruijn I de, Schirmer M, Quick J, Ijaz UZ, et al. Binning metagenomic contigs by coverage and composition. Nat Methods. 2014;11:1144–6.
25. Wu Y-W, Tang Y-H, Tringe SG, Simmons BA, Singer SW. MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome. 2014;2:26.
26. Wu Y-W, Simmons BA, Singer SW. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics. 2016;32:605–7.
27. Kang DD, Froula J, Egan R, Wang Z. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. Peerj. 2015;3:e1165.
28. Lin H-H, Liao Y-C. Accurate binning of metagenomic contigs via automated clustering sequences using information of genomic signatures and marker genes. Sci Rep-uk. 2016;6:24175.
29. DeMaere MZ, Darling AE. bin3C: exploiting Hi-C sequencing data to accurately resolve metagenome-assembled genomes. Genome Biology. 2019;20:46.
30. Press MO, Wiser AH, Kronenberg ZN, Langford KW, Shakya M, Lo C-C, et al. Hi-C deconvolution of a human gut microbiome yields high-quality draft genomes and reveals plasmid-genome interactions. Biorxiv. 2017;198713.
31. Lieberman-Aiden E, Berkum NL van, Williams L, Imakaev M, Ragoczy T, Telling A, et al. Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome. Science. 2009;326:289–93.
32. Dekker J, Rippe K, Dekker M, Kleckner N. Capturing Chromosome Conformation. Science. 2002;295:1306–11.
33. Marbouty M, Cournac A, Flot J-F, Marie-Nelly H, Mozziconacci J, Koszul R. Metagenomic chromosome conformation capture (meta3C) unveils the diversity of chromosome organization in microorganisms. Elife. 2014;3:e03318.
34. Rosvall M, Axelsson D, Bergstrom CT. The map equation. European Phys J Special Top. 2009;178:13–23.
35. Domenico MD, Lancichinetti A, Arenas A, Rosvall M. Identifying Modular Flows on Multilayer Networks Reveals Highly Overlapping Organization in Interconnected Systems. Phys Rev X. 2015;5:011027.
36. Baudry L, Foutel-Rodier T, Thierry A, Koszul R, Marbouty M. MetaTOR: A Computational Pipeline to Recover High-Quality Metagenomic Bins From Mammalian Gut Proximity-Ligation (meta3C) Libraries. Frontiers Genetics. 2019;10:753.
37. Du Y, Sun F. HiCBin: binning metagenomic contigs and recovering metagenome-assembled genomes using Hi-C contact maps. Genome Biol. 2022;23:63.
38. Du Y, Laperriere SM, Fuhrman J, Sun F. Normalizing Metagenomic Hi-C Data and Detecting Spurious Contacts Using Zero-Inflated Negative Binomial Regression. J Comput Biol. 2022;29:106–20.
39. C IUq, C Q. TAXAassign v0. 4 [Internet]. 2013. Available from: https://github.com/umerijaz/TAXAassign
40. Marbouty M, Thierry A, Millot GA, Koszul R. MetaHiC phage-bacteria infection network reveals active cycling phages of the healthy human gut. Elife. 2021;10:e60608.
41. Nurk S, Meleshko D, Korobeynikov A, Pevzner PA. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 2017;27:824–34.
42. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Arxiv. 2013;
43. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics (Oxford, England). 2009;25:2078–9.
44. Knight PA, Ruiz D. A fast algorithm for matrix balancing. Ima J Numer Anal. 2012;33:1029–47.
45. Erdos P, Renyi A. On the Evolution of Random Graphs. Publication of the Mathematical Institute of the Hungarian Academy of Sciences. 1960. p. 17–61.
46. Tëmkin I, Eldredge N. Macroevolution, Explanation, Interpretation and Evidence. Interdisc Evol Res. 2015;183–226.
47. Barabási A-L, Ravasz E, Oltvai Z. Statistical Mechanics of Complex Networks. Lect Notes Phys. 2003;46–65.
48. Pombo A, Nicodemi M. Physical mechanisms behind the large scale features of chromatin organization. Biochem Soc Symp. 2014;5:e28447.
49. Ay F, Bailey TL, Noble WS. Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts. Genome research. 2014;24:999–1011.
50. Liu T, Wang Z. Reconstructing high-resolution chromosome three-dimensional structures by Hi-C complex networks. Bmc Bioinformatics. 2018;19:496.
51. Pigolotti S, Jensen MH, Zhan Y, Tiana G. Bifractal nature of chromosome contact maps. Biorxiv. 2020;686279.
52. Kan T-C. Apply graph theory to visualizing and analyzing Hi-C contact network. 2018.
53. Clauset A, Shalizi CR, Newman MEJ. Power-law distributions in empirical data. SIAM Review. 2009;4:661–703.
54. Broido AD, Clauset A. Scale-free networks are rare. Nat Commun. 2019;10:1017.
55. Gillespie CS. Fitting Heavy Tailed Distributions: The poweRlaw Package. J Stat Softw. 2015;64.
56. Alstott J, Bullmore E, Plenz D. powerlaw: A Python Package for Analysis of Heavy-Tailed Distributions. Plos One. 2014;9:e85777.
57. Watts DJ, Strogatz SH. Collective dynamics of ‘small-world’ networks. Nature. 1998;393:440–2.
58. Humphries MD, Gurney K, Prescott TJ. The brainstem reticular formation is a small-world, not scale-free, network. Proc Royal Soc B Biological Sci. 2006;273:503–11.
59. Telesford QK, Joyce KE, Hayasaka S, Burdette JH, Laurienti PJ. The Ubiquity of Small-World Networks. Brain Connectivity. 2011;1:367–75.
60. Emmons S, Kobourov S, Gallant M, Börner K. Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale. Plos One. 2016;11:e0159161.
61. Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Statistical Mech Theory Exp. 2008;2008:P10008.
62. Waltman L, Eck NJ van. A smart local moving algorithm for large-scale modularity-based community detection. European Phys J B. 2013;86:471.
63. Rosvall M, Bergstrom CT. Maps of random walks on complex networks reveal community structure. Proc National Acad Sci. 2008;105:1118–23.
64. Raghavan UN, Albert R, Kumara S. Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E. 2007;76:036106.
65. Lancichinetti A, Fortunato S, Radicchi F. Benchmark graphs for testing community detection algorithms. Phys Rev E. 2008;78:046110.
66. Rotta R, Noack A. Multilevel local search algorithms for modularity clustering. J Exp Algorithmics Jea. 2011;16:2.3.
67. Butler A, Hoffman P, Smibert P, Papalexi E, Satija R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol. 2018;36:411–20.
68. Reichardt J, Bornholdt S. Statistical mechanics of community detection. Phys Rev E. 2006;74:016110.
69. Traag VA, Waltman L, Eck NJ van. From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep-uk. 2019;9:5233.
70. Lancichinetti A. Louvain [Internet]. Available from: https://sites.google.com/site/andrealancichinetti/
71. Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25:1043–55.
72. Ernest YB, Daniel AA. A Review of the Logistic Regression Model with Emphasis on Medical Research. J Data Analysis Information Process. 2019;07:190–207.
73. Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013;29:1072–5.
74. Hunter JD. Matplotlib: A 2D Graphics Environment. Comput Sci Eng. 2007;9:90–5.
75. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. Arxiv. 2012;
76. Stevens J-L, Rudiger P, Bednar J. HoloViews: Building Complex Visualizations Easily for Reproducible Science. Proceedings of the 14th Python in Science Conference. 2015. p. 59–66.
77. Cheng W-W. HiCBin: Deconvoluting metagenomic assemblies by Hi-C connect network. 2020.
zh_TW
dc.identifier.doi (DOI) 10.6814/NCCU202201535en_US