HiCBin：利用 Hi-C 交互網路對總體基因組裝進行反捲積

鄭惟文; Cheng, Wei-Wen

Please use this identifier to cite or link to this item: https://ah.lib.nccu.edu.tw/handle/140.119/132067

DC Field	Value	Language
dc.contributor.advisor	張家銘	zh_TW
dc.contributor.advisor	Chang, Jia-Ming	en_US
dc.contributor.author	鄭惟文	zh_TW
dc.contributor.author	Cheng, Wei-Wen	en_US
dc.creator	鄭惟文	zh_TW
dc.creator	Cheng, Wei-Wen	en_US
dc.date	2020	en_US
dc.date.accessioned	2020-10-05T07:16:42Z	-
dc.date.available	2020-10-05T07:16:42Z	-
dc.date.issued	2020-10-05T07:16:42Z	-
dc.identifier	G0106753031	en_US
dc.identifier.uri	http://nccur.lib.nccu.edu.tw/handle/140.119/132067	-
dc.description	碩士	zh_TW
dc.description	國立政治大學	zh_TW
dc.description	資訊科學系	zh_TW
dc.description	106753031	zh_TW
dc.description.abstract	背景:總體基因組學是一項從環境樣本中還原微生物群落基因組的研究。由於大部分微生物都無法獨立進行培養，因此從總體基因組中對個別物種的基因組(即由總體基因組組裝而成的基因組，簡稱 MAGs)進行反捲積，是一件困難的任務。先前有些研究描述如何應用 Hi-C 資料復原 MAG 的方法，例如 MetaPhase、ProxiMeta 和 bin3C。\n結果:在本研究中除了應用 Hi-C 資料來進行基因組分箱之外，我們更進一步分析 Hi-C 連結網路的特性。結果顯示 Hi-C 連結網路遵循「截斷的冪次定律分佈」，這是一種冪次定律分佈的變型。在先前的研究中，智慧局部移動法(簡稱 SLM)在分群遵循冪次定律分佈的網路時具有出色的表現，因此我們採用 SLM 演算法來進行基因組分箱。我們將此方法命名為 HiCBin，並與另外兩個相關的工具——bin3C 與 ProxiMeta，比較基因組分箱的結果。相較另外兩種工具，HiCBin 不只復原較多 Near 等級的 MAGs，也復原更多 Moderate 等級以上的 MAGs。\n結論:HiCBin 雖有許多部分的步驟是遵循 bin3C 的方法，但我們在基因組分箱的表現更為優異。這表示針對 Hi-C 連結網路的屬性分析，以及使用合適的叢集演算法，可以獲得更好的分箱結果。於此，HiCBin 提供了一個新的觀點，在未來可能改進基於 Hi-C 的總體基因組反捲積方法。實驗的原始碼可在以下連結公開取得: https://github.com/changlabtw/HiCBin	zh_TW
dc.description.abstract	Background: Metagenomics is the study of recovering the collective microbial genomes from an environmental sample. Due to most micro-organisms that can’t be cultured independently from their native community, it is challenging to identify individual species genomes from metagenomes, namely metagenome-assembled genomes (MAGs). Previous works like MetaPhase, ProxiMeta, and bin3C have described the methods applying Hi-C data to recover the MAGs.\nResults: In this work, in addition to using Hi-C data for genome binning, we further analyze the property of the Hi-C connect networks. The results show that the Hi-C connect networks follow the truncated power-law distribution, a variation of a power-law distribution. Thus, we use a smart local moving algorithm for genome binning, which has stellar performance on clustering the networks following a power-law distribution in previous works. Then, we compare our method, HiCBin, against two related tools, bin3C and ProxiMeta in a real biological data. HiCBin outperforms other tools in the number of retrieved near-complete MAGs and recovers more MAGs above the “Moderate” level.\nConclusions: Although HiCBin follows most of the steps of bin3C, we have better performance in genome binning. It indicates that the networks’ property and the suitable clustering algorithm should be considered to obtain better binning results. HiCBin could provide a new aspect where the Hi-C-based metagenomic deconvolution methods can be improved in the future. The source code for the whole experiment is publicly available at https://github.com/changlabtw/HiCBin.	en_US
dc.description.tableofcontents	1. Introduction 1\n1.1. Metagenomics 1\n1.2. Traditional genome binning 1\n1.3. High-throughput Chromatin Conformation Capture (Hi-C) 3\n1.4. Deconvolute metagenomes using Hi-C 4\n2. Methods 8\n2.1. Dataset 8\n2.2. Read cleanup and Shotgun assembly 8\n2.3. Hi-C read mapping 9\n2.4. Contact map generation 10\n2.5. Hi-C connect network 10\n2.6. Network model 12\n2.7. Degree distribution of the network 13\n2.8. Genome binning 17\n2.9. Performance metrics 20\n2.10. Platforms 21\n3. Result 22\n3.1. Metagenome assembly 22\n3.2. Hi-C connect network analysis 23\n3.3. Hi-C connect network deconvolution 26\n3.4. Comparison with other works 30\n4. Discussion 37\n5. Conclusion 39\n6. References 41	zh_TW
dc.format.extent	7550961 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri	http://thesis.lib.nccu.edu.tw/record/#G0106753031	en_US
dc.subject	Hi-C	zh_TW
dc.subject	總體基因組學	zh_TW
dc.subject	總體基因組組裝基因組	zh_TW
dc.subject	連結網路	zh_TW
dc.subject	基因組分箱	zh_TW
dc.subject	智慧局部移動法	zh_TW
dc.subject	Hi-C	en_US
dc.subject	Metagenomics	en_US
dc.subject	Metagenome-Assembled genomes	en_US
dc.subject	Connect network	en_US
dc.subject	Genome binning	en_US
dc.subject	SLM	en_US
dc.title	HiCBin：利用 Hi-C 交互網路對總體基因組裝進行反捲積	zh_TW
dc.title	HiCBin: Deconvoluting metagenomic assemblies by Hi-C connect network	en_US
dc.type	thesis	en_US
dc.relation.reference	[1] A. C. Howe, J. K. Jansson, S. A. Malfatti, S. G. Tringe, J. M. Tiedje, and C. T. Brown, “Tackling soil diversity with the assembly of large, complex metagenomes,” Proc. Natl. Acad. Sci. U. S. A., vol. 111, no. 13, pp. 4904–4909, 2014, doi: 10.1073/pnas.1402564111.\n[2] J. C. Venter et al., “Environmental Genome Shotgun Sequencing of the Sargasso Sea,” Science (80-. )., vol. 304, no. 5667, pp. 66–74, 2004, doi: 10.1126/science.1093857.\n[3] J. Oh et al., Biogeography and individuality shape function in the human skin metagenome, vol. 514, no. 7520. 2014.\n[4] J. Qin et al., “A human gut microbial gene catalogue established by metagenomic sequencing,” Nature, vol. 464, no. 7285, pp. 59–65, 2010, doi: 10.1038/nature08821.\n[5] Jo Handelsman, “Metagenomics: Application of Genomics to Uncultured Microorganisms,” Microbiol. Mol. Biol. Rev., vol. 68, no. 4, pp. 669–685, 2004, doi: 10.1128/MBR.68.4.669–685.2004.\n[6] M. S. Rappé and S. J. Giovannoni, “The Uncultured Microbial Majority,” Annu. Rev. Microbiol., vol. 57, no. 1, pp. 369–394, 2003, doi: 10.1146/annurev.micro.57.030502.090759.\n[7] C. W. Beitel et al., “Strain- and plasmid-level deconvolution of a synthetic metagenome by sequencing proximity ligation products,” PeerJ, vol. 2, p. e415, 2014, doi: 10.7717/peerj.415.\n[8] T. Thomas, J. Gilbert, and F. Meyer, “Metagenomics - a guide from sampling to data analysis,” Microb. Inform. Exp., vol. 2, no. 1, p. 3, 2012, doi: 10.1186/2042-5783-2-3.\n[9] L. W. Hugerth et al., “Metagenome-assembled genomes uncover a global brackish microbiome,” Genome Biol., vol. 16, no. 1, pp. 1–18, 2015, doi: 10.1186/s13059-015-0834-7.\n[10] J. N. Burton, I. Liachko, M. J. Dunham, and J. Shendure, “Species-level deconvolution of metagenome assemblies with Hi-C-based contact probability maps,” G3 Genes, Genomes, Genet., vol. 4, no. 7, pp. 1339– 1346, 2014, doi: 10.1534/g3.114.011825.\n[11] V. Iverson, R. M. Morris, C. D. Frazar, C. T. Berthiaume, R. L. Morales, and E. V. Armbrust, “Untangling Genomes from Metagenomes: Revealing an Uncultured Class of Marine Euryarchaeota,” Science (80-. )., vol. 335, no. 6068, pp. 587 LP – 590, Feb. 2012, doi: 10.1126/science.1212665.\n[12] S. Mitra et al., “Analysis of the intestinal microbiota using SOLiD 16S rRNA gene sequencing and SOLiD shotgun sequencing,” BMC Genomics, vol. 14 Suppl 5, no. Suppl 5, pp. S16–S16, 2013, doi: 10.1186/1471- 2164-14-S5-S16.\n[13] P. Narasingarao et al., “De novo metagenomic assembly reveals abundant novel major lineage of Archaea in hypersaline microbial communities,” ISME J., vol. 6, no. 1, pp. 81–93, Jan. 2012, doi: 10.1038/ismej.2011.78.\n[14] C. Rinke et al., “Insights into the phylogeny and coding potential of microbial dark matter,” Nature, vol. 499, no. 7459, pp. 431–437, 2013, doi: 10.1038/nature12352.\n[15] G. J. Dick et al., “Community-wide analysis of microbial genome sequence signatures,” Genome Biol., vol. 10, no. 8, p. R85, 2009, doi: 10.1186/gb-2009-10-8-r85.\n[16] L. A. Hug et al., “Community genomic analyses constrain the distribution of metabolic traits across the Chloroflexi phylum and indicate roles in sediment carbon cycling,” Microbiome, vol. 1, no. 1, p. 22, 2013, doi: 10.1186/2049-2618-1-22.\n[17] I. Sharon, M. J. Morowitz, B. C. Thomas, E. K. Costello, D. A. Relman, and J. F. Banfield, “Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization,” Genome Res., vol. 23, no. 1, pp. 111–120, Jan. 2013, doi: 10.1101/gr.142315.112.\n[18] M. Albertsen, P. Hugenholtz, A. Skarshewski, K. L. Nielsen, G. W. Tyson, and P. H. Nielsen, “Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes,” Nat. Biotechnol., vol. 31, no. 6, pp. 533–538, 2013, doi: 10.1038/nbt.2579.\n[19] V. Mallawaarachchi, A. Wickramarachchi, and Y. Lin, “GraphBin: refined binning of metagenomic contigs using assembly graphs,” Bioinformatics, Mar. 2020, doi: 10.1093/bioinformatics/btaa180.\n[20] Y.-W. Wu, Y.-H. Tang, S. G. Tringe, B. A. Simmons, and S. W. Singer, “MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm,” Microbiome, vol. 2, no. 1, p. 26, 2014, doi: 10.1186/2049-2618-2-26.\n[21] Y.-W. Wu, B. A. Simmons, and S. W. Singer, “MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets,” Bioinformatics, vol. 32, no. 4, pp. 605–607, Oct. 2015, doi: 10.1093/bioinformatics/btv638.\n[22] D. D. Kang, J. Froula, R. Egan, and Z. Wang, “MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities,” PeerJ, vol. 3, p. e1165, 2015, doi: 10.7717/peerj.1165.\n[23] J. Alneberg et al., “Binning metagenomic contigs by coverage and composition,” Nat. Methods, vol. 11, no. 11, pp. 1144–1146, 2014, doi: 10.1038/nmeth.3103.\n[24] M. Z. DeMaere and A. E. Darling, “bin3C: exploiting Hi-C sequencing data to accurately resolve metagenome-assembled genomes,” Genome Biol., vol. 20, no. 1, p. 46, 2019, doi: 10.1186/s13059-019- 1643-1.\n[25] M. O. Press et al., “Hi-C deconvolution of a human gut microbiome yields high-quality draft genomes and reveals plasmid-genome interactions,” bioRxiv, p. 198713, Jan. 2017, doi: 10.1101/198713.\n[26] E. Lieberman-Aiden et al., “Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome,” Science, vol. 326, pp. 289–293, Oct. 2009, doi: 10.1126/science.1181369.\n[27] M. Rosvall, D. Axelsson, and C. T. Bergstrom, “The map equation,” Eur. Phys. J. Spec. Top., vol. 178, no. 1, pp. 13–23, 2009, doi: 10.1140/epjst/e2010-01179-1.\n[28] M. De Domenico, A. Lancichinetti, A. Arenas, and M. Rosvall, “Identifying modular flows on multilayer networks reveals highly overlapping organization in interconnected systems,” Phys. Rev. X, vol. 5, no. 1, 2015, doi: 10.1103/PhysRevX.5.011027.\n[29] Bushnell B., “BBTools.” [Online]. Available: sourceforge.net/projects/bbmap/ (visited on 06/13/2019).\n[30] S. Nurk, D. Meleshko, A. Korobeynikov, and P. A. Pevzner, “MetaSPAdes: A new versatile metagenomic assembler,” Genome Res., vol. 27, no. 5, pp. 824–834, 2017, doi: 10.1101/gr.213959.116.\n[31] H. Li, “Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM,” ArXiv, vol. 1303, Mar. 2013.\n[32] H. Li et al., “The Sequence Alignment/Map format and SAMtools,” Bioinformatics, vol. 25, no. 16, pp. 2078–2079, Aug. 2009, doi: 10.1093/bioinformatics/btp352.\n[33] P. A. Knight and D. Ruiz, “A fast algorithm for matrix balancing,” IMA J. Numer. Anal., vol. 33, no. 3, pp. 1029–1047, Oct. 2012, doi: 10.1093/imanum/drs019.\n[34] I. Tëmkin and N. Eldredge, “Networks and Hierarchies: Approaching Complexity in Evolutionary Theory,” in Interdisciplinary Evolution Research, 2015, pp. 183–226.\n[35] P. Erdős and A. Rényi, “On the Evolution of Random Graphs,” in PUBLICATION OF THE MATHEMATICAL INSTITUTE OF THE HUNGARIAN ACADEMY OF SCIENCES, 1960, pp. 17–61.\n[36] M. E. J. Newman, “Power laws, Pareto distributions and Zipf’s law,” Contemp. Phys., vol. 46, no. 5, pp. 323–351, 2005, doi: 10.1080/00107510500052444.\n[37] A. Clauset, C. R. Shalizi, and M. E. J. Newman, “Power-Law Distributions in Empirical Data,” SIAM Rev., vol. 51, no. 4, pp. 661–703, Jul. 2009.\n[38] R. Kissell and J. Poserina, “Chapter 4 - Advanced Math and Statistics,” R. Kissell and J. B. T.-O. S. M. Poserina Statistics, and Fantasy, Eds. Academic Press, 2017, pp. 103–135.\n[39] A. Pombo and M. Nicodemi, “Physical mechanisms behind the large scale features of chromatin organization,” Transcription, vol. 5, no. 2, p. e28447, Apr. 2014, doi: 10.4161/trns.28447.\n[40] F. Ay, T. L. Bailey, and W. S. Noble, “Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts,” Genome Res., vol. 24, no. 6, pp. 999–1011, Jun. 2014, doi: 10.1101/gr.160374.113.\n[41] T. Liu and Z. Wang, “Reconstructing high-resolution chromosome three-dimensional structures by Hi-C complex networks,” BMC Bioinformatics, vol. 19, no. 17, p. 496, 2018, doi: 10.1186/s12859-018-2464-z.\n[42] S. Pigolotti, M. H. Jensen, and G. Tiana, “Hierarchical domain model explains multifractal scaling of chromosome contact maps,” bioRxiv, p. 686279, Jan. 2019, doi: 10.1101/686279.\n[43] T.-C. Kan, “Apply graph theory to visualizing and analyzing Hi-C contact network,” 國立政治大學, 2018.\n[44] S. Emmons, S. Kobourov, M. Gallant, and K. Börner, “Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale,” PLoS One, vol. 11, no. 7, p. e0159161, Jul. 2016.\n[45] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” J. Stat. Mech. Theory Exp., vol. 2008, no. 10, p. P10008, 2008, doi: 10.1088/1742-5468/2008/10/p10008.\n[46] L. Waltman and N. J. van Eck, “A smart local moving algorithm for large-scale modularity-based community detection,” Eur. Phys. J. B, vol. 86, no. 11, p. 471, 2013, doi: 10.1140/epjb/e2013-40829-0.\n[47] M. Rosvall and C. T. Bergstrom, “Maps of random walks on complex networks reveal community structure,” Proc. Natl. Acad. Sci., vol. 105, no. 4, pp. 1118 LP – 1123, Jan. 2008, doi: 10.1073/pnas.0706851105.\n[48] U. N. Raghavan, R. Albert, and S. Kumara, “Near linear time algorithm to detect community structures in large-scale networks,” Phys. Rev. E, vol. 76, no. 3, p. 36106, Sep. 2007, doi: 10.1103/PhysRevE.76.036106.\n[49] A. Lancichinetti, S. Fortunato, and F. Radicchi, “Benchmark graphs for testing community detection algorithms,” Phys. Rev. E, vol. 78, no. 4, p. 46110, Oct. 2008, doi: 10.1103/PhysRevE.78.046110.\n[50] R. Rotta and A. Noack, “Multilevel Local Search Algorithms for Modularity Clustering,” ACM J. Exp. Algorithmics, vol. 16, Jul. 2011, doi: 10.1145/1963190.1970376.\n[51] A. Butler, P. Hoffman, P. Smibert, E. Papalexi, and R. Satija, “Integrating single-cell transcriptomic data across different conditions, technologies, and species,” Nat. Biotechnol., vol. 36, no. 5, pp. 411–420, 2018, doi: 10.1038/nbt.4096.\n[52] L. Waltman and N. J. van Eck, “A smart local moving algorithm for large-scale modularity-based community detection.” [Online]. Available: http://www.ludowaltman.nl/slm/ (visited on 06/17/2020).\n[53] J. Reichardt and S. Bornholdt, “Statistical mechanics of community detection,” Phys. Rev. E, vol. 74, no. 1, p. 16110, Jul. 2006, doi: 10.1103/PhysRevE.74.016110.\n[54] W. Simeon, “E‐prints and the Open Archives Initiative,” Libr. Hi Tech, vol. 21, no. 2, pp. 151–158, Jan. 2003, doi: 10.1108/07378830310479794.\n[55] D. H. Parks, M. Imelfort, C. T. Skennerton, P. Hugenholtz, and G. W. Tyson, “CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes,” Genome Res., vol. 25, no. 7, pp. 1043–1055, Jul. 2015, doi: 10.1101/gr.186072.114.\n[56] A. Gurevich, V. Saveliev, N. Vyahhi, and G. Tesler, “QUAST: quality assessment tool for genome assemblies,” Bioinformatics, vol. 29, no. 8, pp. 1072–1075, Apr. 2013, doi: 10.1093/bioinformatics/btt086.\n[57] J. D. Hunter, “Matplotlib: A 2D graphics environment,” Comput. Sci. Eng., vol. 9, no. 3, pp. 90–95, 2007, doi: 10.1109/MCSE.2007.55.\n[58] J. Alstott, E. Bullmore, and D. Plenz, “powerlaw: A Python Package for Analysis of Heavy-Tailed Distributions,” PLoS One, vol. 9, no. 1, p. e85777, Jan. 2014.\n[59] J.-L. R. Stevens, P. Rudiger, and J. A. Bednar, “HoloViews: Building Complex Visualizations Easily for Reproducible Science,” in Proceedings of the 14th Python in Science Conference, 2015, pp. 59–66, doi: 10.25080/Majora-7b98e3ed-00a.\n[60] E. Almaas and A.-L. Barabási, “Power Laws in Biological Networks BT - Power Laws, Scale-Free Networks and Genome Biology,” E. V Koonin, Y. I. Wolf, and G. P. Karev, Eds. Boston, MA: Springer US, 2006, pp. 1–11.\n[61] O. Dudchenko et al., “De novo assembly of the <em>Aedes aegypti</em> genome using Hi-C yields chromosome-length scaffolds,” Science (80-. )., vol. 356, no. 6333, pp. 92 LP – 95, Apr. 2017, doi: 10.1126/science.aal3327.	zh_TW
dc.identifier.doi	10.6814/NCCU202001729	en_US
item.grantfulltext	restricted	-
item.openairecristype	http://purl.org/coar/resource_type/c_46ec	-
item.fulltext	With Fulltext	-
item.cerifentitytype	Publications	-
item.openairetype	thesis	-
Appears in Collections:	學位論文

Files in This Item:

File	Description	Size	Format
303101.pdf		7.37 MB	Adobe PDF2	View/Open

Show simple item record

Google Scholar^TM

Check

Files in This Item:

Google Scholar^TM

Altmetric

Altmetric

Files in This Item:

Google ScholarTM

Altmetric

Altmetric

Google Scholar^TM