Please use this identifier to cite or link to this item: https://ah.lib.nccu.edu.tw/handle/140.119/132067
題名: HiCBin:利用 Hi-C 交互網路對總體基因組裝進行反捲積
HiCBin: Deconvoluting metagenomic assemblies by Hi-C connect network
作者: 鄭惟文
Cheng, Wei-Wen
貢獻者: 張家銘
Chang, Jia-Ming
鄭惟文
Cheng, Wei-Wen
關鍵詞: Hi-C
總體基因組學
總體基因組組裝基因組
連結網路
基因組分箱
智慧局部移動法
Hi-C
Metagenomics
Metagenome-Assembled genomes
Connect network
Genome binning
SLM
日期: 2020
上傳時間: 5-Oct-2020
摘要: 背景:總體基因組學是一項從環境樣本中還原微生物群落基因組的研究。由於大部 分微生物都無法獨立進行培養,因此從總體基因組中對個別物種的基因組(即由總 體基因組組裝而成的基因組,簡稱 MAGs)進行反捲積,是一件困難的任務。先前有 些研究描述如何應用 Hi-C 資料復原 MAG 的方法,例如 MetaPhase、ProxiMeta 和 bin3C。\n結果:在本研究中除了應用 Hi-C 資料來進行基因組分箱之外,我們更進一步分析 Hi-C 連結網路的特性。結果顯示 Hi-C 連結網路遵循「截斷的冪次定律分佈」,這 是一種冪次定律分佈的變型。在先前的研究中,智慧局部移動法(簡稱 SLM)在分 群遵循冪次定律分佈的網路時具有出色的表現,因此我們採用 SLM 演算法來進行基 因組分箱。我們將此方法命名為 HiCBin,並與另外兩個相關的工具——bin3C 與 ProxiMeta,比較基因組分箱的結果。相較另外兩種工具,HiCBin 不只復原較多 Near 等級的 MAGs,也復原更多 Moderate 等級以上的 MAGs。\n結論:HiCBin 雖有許多部分的步驟是遵循 bin3C 的方法,但我們在基因組分箱的表 現更為優異。這表示針對 Hi-C 連結網路的屬性分析,以及使用合適的叢集演算法, 可以獲得更好的分箱結果。於此,HiCBin 提供了一個新的觀點,在未來可能改進基 於 Hi-C 的總體基因組反捲積方法。實驗的原始碼可在以下連結公開取得: https://github.com/changlabtw/HiCBin
Background: Metagenomics is the study of recovering the collective microbial genomes from an environmental sample. Due to most micro-organisms that can’t be cultured independently from their native community, it is challenging to identify individual species genomes from metagenomes, namely metagenome-assembled genomes (MAGs). Previous works like MetaPhase, ProxiMeta, and bin3C have described the methods applying Hi-C data to recover the MAGs.\nResults: In this work, in addition to using Hi-C data for genome binning, we further analyze the property of the Hi-C connect networks. The results show that the Hi-C connect networks follow the truncated power-law distribution, a variation of a power-law distribution. Thus, we use a smart local moving algorithm for genome binning, which has stellar performance on clustering the networks following a power-law distribution in previous works. Then, we compare our method, HiCBin, against two related tools, bin3C and ProxiMeta in a real biological data. HiCBin outperforms other tools in the number of retrieved near-complete MAGs and recovers more MAGs above the “Moderate” level.\nConclusions: Although HiCBin follows most of the steps of bin3C, we have better performance in genome binning. It indicates that the networks’ property and the suitable clustering algorithm should be considered to obtain better binning results. HiCBin could provide a new aspect where the Hi-C-based metagenomic deconvolution methods can be improved in the future. The source code for the whole experiment is publicly available at https://github.com/changlabtw/HiCBin.
參考文獻: [1] A. C. Howe, J. K. Jansson, S. A. Malfatti, S. G. Tringe, J. M. Tiedje, and C. T. Brown, “Tackling soil diversity with the assembly of large, complex metagenomes,” Proc. Natl. Acad. Sci. U. S. A., vol. 111, no. 13, pp. 4904–4909, 2014, doi: 10.1073/pnas.1402564111.\n[2] J. C. Venter et al., “Environmental Genome Shotgun Sequencing of the Sargasso Sea,” Science (80-. )., vol. 304, no. 5667, pp. 66–74, 2004, doi: 10.1126/science.1093857.\n[3] J. Oh et al., Biogeography and individuality shape function in the human skin metagenome, vol. 514, no. 7520. 2014.\n[4] J. Qin et al., “A human gut microbial gene catalogue established by metagenomic sequencing,” Nature, vol. 464, no. 7285, pp. 59–65, 2010, doi: 10.1038/nature08821.\n[5] Jo Handelsman, “Metagenomics: Application of Genomics to Uncultured Microorganisms,” Microbiol. Mol. Biol. Rev., vol. 68, no. 4, pp. 669–685, 2004, doi: 10.1128/MBR.68.4.669–685.2004.\n[6] M. S. Rappé and S. J. Giovannoni, “The Uncultured Microbial Majority,” Annu. Rev. Microbiol., vol. 57, no. 1, pp. 369–394, 2003, doi: 10.1146/annurev.micro.57.030502.090759.\n[7] C. W. Beitel et al., “Strain- and plasmid-level deconvolution of a synthetic metagenome by sequencing proximity ligation products,” PeerJ, vol. 2, p. e415, 2014, doi: 10.7717/peerj.415.\n[8] T. Thomas, J. Gilbert, and F. Meyer, “Metagenomics - a guide from sampling to data analysis,” Microb. Inform. Exp., vol. 2, no. 1, p. 3, 2012, doi: 10.1186/2042-5783-2-3.\n[9] L. W. Hugerth et al., “Metagenome-assembled genomes uncover a global brackish microbiome,” Genome Biol., vol. 16, no. 1, pp. 1–18, 2015, doi: 10.1186/s13059-015-0834-7.\n[10] J. N. Burton, I. Liachko, M. J. Dunham, and J. Shendure, “Species-level deconvolution of metagenome assemblies with Hi-C-based contact probability maps,” G3 Genes, Genomes, Genet., vol. 4, no. 7, pp. 1339– 1346, 2014, doi: 10.1534/g3.114.011825.\n[11] V. Iverson, R. M. Morris, C. D. Frazar, C. T. Berthiaume, R. L. Morales, and E. V. Armbrust, “Untangling Genomes from Metagenomes: Revealing an Uncultured Class of Marine Euryarchaeota,” Science (80-. )., vol. 335, no. 6068, pp. 587 LP – 590, Feb. 2012, doi: 10.1126/science.1212665.\n[12] S. Mitra et al., “Analysis of the intestinal microbiota using SOLiD 16S rRNA gene sequencing and SOLiD shotgun sequencing,” BMC Genomics, vol. 14 Suppl 5, no. Suppl 5, pp. S16–S16, 2013, doi: 10.1186/1471- 2164-14-S5-S16.\n[13] P. Narasingarao et al., “De novo metagenomic assembly reveals abundant novel major lineage of Archaea in hypersaline microbial communities,” ISME J., vol. 6, no. 1, pp. 81–93, Jan. 2012, doi: 10.1038/ismej.2011.78.\n[14] C. Rinke et al., “Insights into the phylogeny and coding potential of microbial dark matter,” Nature, vol. 499, no. 7459, pp. 431–437, 2013, doi: 10.1038/nature12352.\n[15] G. J. Dick et al., “Community-wide analysis of microbial genome sequence signatures,” Genome Biol., vol. 10, no. 8, p. R85, 2009, doi: 10.1186/gb-2009-10-8-r85.\n[16] L. A. Hug et al., “Community genomic analyses constrain the distribution of metabolic traits across the Chloroflexi phylum and indicate roles in sediment carbon cycling,” Microbiome, vol. 1, no. 1, p. 22, 2013, doi: 10.1186/2049-2618-1-22.\n[17] I. Sharon, M. J. Morowitz, B. C. Thomas, E. K. Costello, D. A. Relman, and J. F. Banfield, “Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization,” Genome Res., vol. 23, no. 1, pp. 111–120, Jan. 2013, doi: 10.1101/gr.142315.112.\n[18] M. Albertsen, P. Hugenholtz, A. Skarshewski, K. L. Nielsen, G. W. Tyson, and P. H. Nielsen, “Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes,” Nat. Biotechnol., vol. 31, no. 6, pp. 533–538, 2013, doi: 10.1038/nbt.2579.\n[19] V. Mallawaarachchi, A. Wickramarachchi, and Y. Lin, “GraphBin: refined binning of metagenomic contigs using assembly graphs,” Bioinformatics, Mar. 2020, doi: 10.1093/bioinformatics/btaa180.\n[20] Y.-W. Wu, Y.-H. Tang, S. G. Tringe, B. A. Simmons, and S. W. Singer, “MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm,” Microbiome, vol. 2, no. 1, p. 26, 2014, doi: 10.1186/2049-2618-2-26.\n[21] Y.-W. Wu, B. A. Simmons, and S. W. Singer, “MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets,” Bioinformatics, vol. 32, no. 4, pp. 605–607, Oct. 2015, doi: 10.1093/bioinformatics/btv638.\n[22] D. D. Kang, J. Froula, R. Egan, and Z. Wang, “MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities,” PeerJ, vol. 3, p. e1165, 2015, doi: 10.7717/peerj.1165.\n[23] J. Alneberg et al., “Binning metagenomic contigs by coverage and composition,” Nat. Methods, vol. 11, no. 11, pp. 1144–1146, 2014, doi: 10.1038/nmeth.3103.\n[24] M. Z. DeMaere and A. E. Darling, “bin3C: exploiting Hi-C sequencing data to accurately resolve metagenome-assembled genomes,” Genome Biol., vol. 20, no. 1, p. 46, 2019, doi: 10.1186/s13059-019- 1643-1.\n[25] M. O. Press et al., “Hi-C deconvolution of a human gut microbiome yields high-quality draft genomes and reveals plasmid-genome interactions,” bioRxiv, p. 198713, Jan. 2017, doi: 10.1101/198713.\n[26] E. Lieberman-Aiden et al., “Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome,” Science, vol. 326, pp. 289–293, Oct. 2009, doi: 10.1126/science.1181369.\n[27] M. Rosvall, D. Axelsson, and C. T. Bergstrom, “The map equation,” Eur. Phys. J. Spec. Top., vol. 178, no. 1, pp. 13–23, 2009, doi: 10.1140/epjst/e2010-01179-1.\n[28] M. De Domenico, A. Lancichinetti, A. Arenas, and M. Rosvall, “Identifying modular flows on multilayer networks reveals highly overlapping organization in interconnected systems,” Phys. Rev. X, vol. 5, no. 1, 2015, doi: 10.1103/PhysRevX.5.011027.\n[29] Bushnell B., “BBTools.” [Online]. Available: sourceforge.net/projects/bbmap/ (visited on 06/13/2019).\n[30] S. Nurk, D. Meleshko, A. Korobeynikov, and P. A. Pevzner, “MetaSPAdes: A new versatile metagenomic assembler,” Genome Res., vol. 27, no. 5, pp. 824–834, 2017, doi: 10.1101/gr.213959.116.\n[31] H. Li, “Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM,” ArXiv, vol. 1303, Mar. 2013.\n[32] H. Li et al., “The Sequence Alignment/Map format and SAMtools,” Bioinformatics, vol. 25, no. 16, pp. 2078–2079, Aug. 2009, doi: 10.1093/bioinformatics/btp352.\n[33] P. A. Knight and D. Ruiz, “A fast algorithm for matrix balancing,” IMA J. Numer. Anal., vol. 33, no. 3, pp. 1029–1047, Oct. 2012, doi: 10.1093/imanum/drs019.\n[34] I. Tëmkin and N. Eldredge, “Networks and Hierarchies: Approaching Complexity in Evolutionary Theory,” in Interdisciplinary Evolution Research, 2015, pp. 183–226.\n[35] P. Erdős and A. Rényi, “On the Evolution of Random Graphs,” in PUBLICATION OF THE MATHEMATICAL INSTITUTE OF THE HUNGARIAN ACADEMY OF SCIENCES, 1960, pp. 17–61.\n[36] M. E. J. Newman, “Power laws, Pareto distributions and Zipf’s law,” Contemp. Phys., vol. 46, no. 5, pp. 323–351, 2005, doi: 10.1080/00107510500052444.\n[37] A. Clauset, C. R. Shalizi, and M. E. J. Newman, “Power-Law Distributions in Empirical Data,” SIAM Rev., vol. 51, no. 4, pp. 661–703, Jul. 2009.\n[38] R. Kissell and J. Poserina, “Chapter 4 - Advanced Math and Statistics,” R. Kissell and J. B. T.-O. S. M. Poserina Statistics, and Fantasy, Eds. Academic Press, 2017, pp. 103–135.\n[39] A. Pombo and M. Nicodemi, “Physical mechanisms behind the large scale features of chromatin organization,” Transcription, vol. 5, no. 2, p. e28447, Apr. 2014, doi: 10.4161/trns.28447.\n[40] F. Ay, T. L. Bailey, and W. S. Noble, “Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts,” Genome Res., vol. 24, no. 6, pp. 999–1011, Jun. 2014, doi: 10.1101/gr.160374.113.\n[41] T. Liu and Z. Wang, “Reconstructing high-resolution chromosome three-dimensional structures by Hi-C complex networks,” BMC Bioinformatics, vol. 19, no. 17, p. 496, 2018, doi: 10.1186/s12859-018-2464-z.\n[42] S. Pigolotti, M. H. Jensen, and G. Tiana, “Hierarchical domain model explains multifractal scaling of chromosome contact maps,” bioRxiv, p. 686279, Jan. 2019, doi: 10.1101/686279.\n[43] T.-C. Kan, “Apply graph theory to visualizing and analyzing Hi-C contact network,” 國立政治大學, 2018.\n[44] S. Emmons, S. Kobourov, M. Gallant, and K. Börner, “Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale,” PLoS One, vol. 11, no. 7, p. e0159161, Jul. 2016.\n[45] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” J. Stat. Mech. Theory Exp., vol. 2008, no. 10, p. P10008, 2008, doi: 10.1088/1742-5468/2008/10/p10008.\n[46] L. Waltman and N. J. van Eck, “A smart local moving algorithm for large-scale modularity-based community detection,” Eur. Phys. J. B, vol. 86, no. 11, p. 471, 2013, doi: 10.1140/epjb/e2013-40829-0.\n[47] M. Rosvall and C. T. Bergstrom, “Maps of random walks on complex networks reveal community structure,” Proc. Natl. Acad. Sci., vol. 105, no. 4, pp. 1118 LP – 1123, Jan. 2008, doi: 10.1073/pnas.0706851105.\n[48] U. N. Raghavan, R. Albert, and S. Kumara, “Near linear time algorithm to detect community structures in large-scale networks,” Phys. Rev. E, vol. 76, no. 3, p. 36106, Sep. 2007, doi: 10.1103/PhysRevE.76.036106.\n[49] A. Lancichinetti, S. Fortunato, and F. Radicchi, “Benchmark graphs for testing community detection algorithms,” Phys. Rev. E, vol. 78, no. 4, p. 46110, Oct. 2008, doi: 10.1103/PhysRevE.78.046110.\n[50] R. Rotta and A. Noack, “Multilevel Local Search Algorithms for Modularity Clustering,” ACM J. Exp. Algorithmics, vol. 16, Jul. 2011, doi: 10.1145/1963190.1970376.\n[51] A. Butler, P. Hoffman, P. Smibert, E. Papalexi, and R. Satija, “Integrating single-cell transcriptomic data across different conditions, technologies, and species,” Nat. Biotechnol., vol. 36, no. 5, pp. 411–420, 2018, doi: 10.1038/nbt.4096.\n[52] L. Waltman and N. J. van Eck, “A smart local moving algorithm for large-scale modularity-based community detection.” [Online]. Available: http://www.ludowaltman.nl/slm/ (visited on 06/17/2020).\n[53] J. Reichardt and S. Bornholdt, “Statistical mechanics of community detection,” Phys. Rev. E, vol. 74, no. 1, p. 16110, Jul. 2006, doi: 10.1103/PhysRevE.74.016110.\n[54] W. Simeon, “E‐prints and the Open Archives Initiative,” Libr. Hi Tech, vol. 21, no. 2, pp. 151–158, Jan. 2003, doi: 10.1108/07378830310479794.\n[55] D. H. Parks, M. Imelfort, C. T. Skennerton, P. Hugenholtz, and G. W. Tyson, “CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes,” Genome Res., vol. 25, no. 7, pp. 1043–1055, Jul. 2015, doi: 10.1101/gr.186072.114.\n[56] A. Gurevich, V. Saveliev, N. Vyahhi, and G. Tesler, “QUAST: quality assessment tool for genome assemblies,” Bioinformatics, vol. 29, no. 8, pp. 1072–1075, Apr. 2013, doi: 10.1093/bioinformatics/btt086.\n[57] J. D. Hunter, “Matplotlib: A 2D graphics environment,” Comput. Sci. Eng., vol. 9, no. 3, pp. 90–95, 2007, doi: 10.1109/MCSE.2007.55.\n[58] J. Alstott, E. Bullmore, and D. Plenz, “powerlaw: A Python Package for Analysis of Heavy-Tailed Distributions,” PLoS One, vol. 9, no. 1, p. e85777, Jan. 2014.\n[59] J.-L. R. Stevens, P. Rudiger, and J. A. Bednar, “HoloViews: Building Complex Visualizations Easily for Reproducible Science,” in Proceedings of the 14th Python in Science Conference, 2015, pp. 59–66, doi: 10.25080/Majora-7b98e3ed-00a.\n[60] E. Almaas and A.-L. Barabási, “Power Laws in Biological Networks BT - Power Laws, Scale-Free Networks and Genome Biology,” E. V Koonin, Y. I. Wolf, and G. P. Karev, Eds. Boston, MA: Springer US, 2006, pp. 1–11.\n[61] O. Dudchenko et al., “De novo assembly of the <em>Aedes aegypti</em> genome using Hi-C yields chromosome-length scaffolds,” Science (80-. )., vol. 356, no. 6333, pp. 92 LP – 95, Apr. 2017, doi: 10.1126/science.aal3327.
描述: 碩士
國立政治大學
資訊科學系
106753031
資料來源: http://thesis.lib.nccu.edu.tw/record/#G0106753031
資料類型: thesis
Appears in Collections:學位論文

Files in This Item:
File Description SizeFormat
303101.pdf7.37 MBAdobe PDF2View/Open
Show full item record

Google ScholarTM

Check

Altmetric

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.