Please use this identifier to cite or link to this item: https://ah.lib.nccu.edu.tw/handle/140.119/132067
DC FieldValueLanguage
dc.contributor.advisor張家銘zh_TW
dc.contributor.advisorChang, Jia-Mingen_US
dc.contributor.author鄭惟文zh_TW
dc.contributor.authorCheng, Wei-Wenen_US
dc.creator鄭惟文zh_TW
dc.creatorCheng, Wei-Wenen_US
dc.date2020en_US
dc.date.accessioned2020-10-05T07:16:42Z-
dc.date.available2020-10-05T07:16:42Z-
dc.date.issued2020-10-05T07:16:42Z-
dc.identifierG0106753031en_US
dc.identifier.urihttp://nccur.lib.nccu.edu.tw/handle/140.119/132067-
dc.description碩士zh_TW
dc.description國立政治大學zh_TW
dc.description資訊科學系zh_TW
dc.description106753031zh_TW
dc.description.abstract背景:總體基因組學是一項從環境樣本中還原微生物群落基因組的研究。由於大部 分微生物都無法獨立進行培養,因此從總體基因組中對個別物種的基因組(即由總 體基因組組裝而成的基因組,簡稱 MAGs)進行反捲積,是一件困難的任務。先前有 些研究描述如何應用 Hi-C 資料復原 MAG 的方法,例如 MetaPhase、ProxiMeta 和 bin3C。\n結果:在本研究中除了應用 Hi-C 資料來進行基因組分箱之外,我們更進一步分析 Hi-C 連結網路的特性。結果顯示 Hi-C 連結網路遵循「截斷的冪次定律分佈」,這 是一種冪次定律分佈的變型。在先前的研究中,智慧局部移動法(簡稱 SLM)在分 群遵循冪次定律分佈的網路時具有出色的表現,因此我們採用 SLM 演算法來進行基 因組分箱。我們將此方法命名為 HiCBin,並與另外兩個相關的工具——bin3C 與 ProxiMeta,比較基因組分箱的結果。相較另外兩種工具,HiCBin 不只復原較多 Near 等級的 MAGs,也復原更多 Moderate 等級以上的 MAGs。\n結論:HiCBin 雖有許多部分的步驟是遵循 bin3C 的方法,但我們在基因組分箱的表 現更為優異。這表示針對 Hi-C 連結網路的屬性分析,以及使用合適的叢集演算法, 可以獲得更好的分箱結果。於此,HiCBin 提供了一個新的觀點,在未來可能改進基 於 Hi-C 的總體基因組反捲積方法。實驗的原始碼可在以下連結公開取得: https://github.com/changlabtw/HiCBinzh_TW
dc.description.abstractBackground: Metagenomics is the study of recovering the collective microbial genomes from an environmental sample. Due to most micro-organisms that can’t be cultured independently from their native community, it is challenging to identify individual species genomes from metagenomes, namely metagenome-assembled genomes (MAGs). Previous works like MetaPhase, ProxiMeta, and bin3C have described the methods applying Hi-C data to recover the MAGs.\nResults: In this work, in addition to using Hi-C data for genome binning, we further analyze the property of the Hi-C connect networks. The results show that the Hi-C connect networks follow the truncated power-law distribution, a variation of a power-law distribution. Thus, we use a smart local moving algorithm for genome binning, which has stellar performance on clustering the networks following a power-law distribution in previous works. Then, we compare our method, HiCBin, against two related tools, bin3C and ProxiMeta in a real biological data. HiCBin outperforms other tools in the number of retrieved near-complete MAGs and recovers more MAGs above the “Moderate” level.\nConclusions: Although HiCBin follows most of the steps of bin3C, we have better performance in genome binning. It indicates that the networks’ property and the suitable clustering algorithm should be considered to obtain better binning results. HiCBin could provide a new aspect where the Hi-C-based metagenomic deconvolution methods can be improved in the future. The source code for the whole experiment is publicly available at https://github.com/changlabtw/HiCBin.en_US
dc.description.tableofcontents1. Introduction 1\n1.1. Metagenomics 1\n1.2. Traditional genome binning 1\n1.3. High-throughput Chromatin Conformation Capture (Hi-C) 3\n1.4. Deconvolute metagenomes using Hi-C 4\n2. Methods 8\n2.1. Dataset 8\n2.2. Read cleanup and Shotgun assembly 8\n2.3. Hi-C read mapping 9\n2.4. Contact map generation 10\n2.5. Hi-C connect network 10\n2.6. Network model 12\n2.7. Degree distribution of the network 13\n2.8. Genome binning 17\n2.9. Performance metrics 20\n2.10. Platforms 21\n3. Result 22\n3.1. Metagenome assembly 22\n3.2. Hi-C connect network analysis 23\n3.3. Hi-C connect network deconvolution 26\n3.4. Comparison with other works 30\n4. Discussion 37\n5. Conclusion 39\n6. References 41zh_TW
dc.format.extent7550961 bytes-
dc.format.mimetypeapplication/pdf-
dc.source.urihttp://thesis.lib.nccu.edu.tw/record/#G0106753031en_US
dc.subjectHi-Czh_TW
dc.subject總體基因組學zh_TW
dc.subject總體基因組組裝基因組zh_TW
dc.subject連結網路zh_TW
dc.subject基因組分箱zh_TW
dc.subject智慧局部移動法zh_TW
dc.subjectHi-Cen_US
dc.subjectMetagenomicsen_US
dc.subjectMetagenome-Assembled genomesen_US
dc.subjectConnect networken_US
dc.subjectGenome binningen_US
dc.subjectSLMen_US
dc.titleHiCBin:利用 Hi-C 交互網路對總體基因組裝進行反捲積zh_TW
dc.titleHiCBin: Deconvoluting metagenomic assemblies by Hi-C connect networken_US
dc.typethesisen_US
dc.relation.reference[1] A. C. Howe, J. K. Jansson, S. A. Malfatti, S. G. Tringe, J. M. Tiedje, and C. T. Brown, “Tackling soil diversity with the assembly of large, complex metagenomes,” Proc. Natl. Acad. Sci. U. S. A., vol. 111, no. 13, pp. 4904–4909, 2014, doi: 10.1073/pnas.1402564111.\n[2] J. C. Venter et al., “Environmental Genome Shotgun Sequencing of the Sargasso Sea,” Science (80-. )., vol. 304, no. 5667, pp. 66–74, 2004, doi: 10.1126/science.1093857.\n[3] J. Oh et al., Biogeography and individuality shape function in the human skin metagenome, vol. 514, no. 7520. 2014.\n[4] J. Qin et al., “A human gut microbial gene catalogue established by metagenomic sequencing,” Nature, vol. 464, no. 7285, pp. 59–65, 2010, doi: 10.1038/nature08821.\n[5] Jo Handelsman, “Metagenomics: Application of Genomics to Uncultured Microorganisms,” Microbiol. Mol. Biol. Rev., vol. 68, no. 4, pp. 669–685, 2004, doi: 10.1128/MBR.68.4.669–685.2004.\n[6] M. S. Rappé and S. J. Giovannoni, “The Uncultured Microbial Majority,” Annu. Rev. Microbiol., vol. 57, no. 1, pp. 369–394, 2003, doi: 10.1146/annurev.micro.57.030502.090759.\n[7] C. W. Beitel et al., “Strain- and plasmid-level deconvolution of a synthetic metagenome by sequencing proximity ligation products,” PeerJ, vol. 2, p. e415, 2014, doi: 10.7717/peerj.415.\n[8] T. Thomas, J. Gilbert, and F. Meyer, “Metagenomics - a guide from sampling to data analysis,” Microb. Inform. Exp., vol. 2, no. 1, p. 3, 2012, doi: 10.1186/2042-5783-2-3.\n[9] L. W. Hugerth et al., “Metagenome-assembled genomes uncover a global brackish microbiome,” Genome Biol., vol. 16, no. 1, pp. 1–18, 2015, doi: 10.1186/s13059-015-0834-7.\n[10] J. N. Burton, I. Liachko, M. J. Dunham, and J. Shendure, “Species-level deconvolution of metagenome assemblies with Hi-C-based contact probability maps,” G3 Genes, Genomes, Genet., vol. 4, no. 7, pp. 1339– 1346, 2014, doi: 10.1534/g3.114.011825.\n[11] V. Iverson, R. M. Morris, C. D. Frazar, C. T. Berthiaume, R. L. Morales, and E. V. Armbrust, “Untangling Genomes from Metagenomes: Revealing an Uncultured Class of Marine Euryarchaeota,” Science (80-. )., vol. 335, no. 6068, pp. 587 LP – 590, Feb. 2012, doi: 10.1126/science.1212665.\n[12] S. Mitra et al., “Analysis of the intestinal microbiota using SOLiD 16S rRNA gene sequencing and SOLiD shotgun sequencing,” BMC Genomics, vol. 14 Suppl 5, no. Suppl 5, pp. S16–S16, 2013, doi: 10.1186/1471- 2164-14-S5-S16.\n[13] P. Narasingarao et al., “De novo metagenomic assembly reveals abundant novel major lineage of Archaea in hypersaline microbial communities,” ISME J., vol. 6, no. 1, pp. 81–93, Jan. 2012, doi: 10.1038/ismej.2011.78.\n[14] C. Rinke et al., “Insights into the phylogeny and coding potential of microbial dark matter,” Nature, vol. 499, no. 7459, pp. 431–437, 2013, doi: 10.1038/nature12352.\n[15] G. J. Dick et al., “Community-wide analysis of microbial genome sequence signatures,” Genome Biol., vol. 10, no. 8, p. R85, 2009, doi: 10.1186/gb-2009-10-8-r85.\n[16] L. A. Hug et al., “Community genomic analyses constrain the distribution of metabolic traits across the Chloroflexi phylum and indicate roles in sediment carbon cycling,” Microbiome, vol. 1, no. 1, p. 22, 2013, doi: 10.1186/2049-2618-1-22.\n[17] I. Sharon, M. J. Morowitz, B. C. Thomas, E. K. Costello, D. A. Relman, and J. F. Banfield, “Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization,” Genome Res., vol. 23, no. 1, pp. 111–120, Jan. 2013, doi: 10.1101/gr.142315.112.\n[18] M. Albertsen, P. Hugenholtz, A. Skarshewski, K. L. Nielsen, G. W. Tyson, and P. H. Nielsen, “Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes,” Nat. Biotechnol., vol. 31, no. 6, pp. 533–538, 2013, doi: 10.1038/nbt.2579.\n[19] V. Mallawaarachchi, A. Wickramarachchi, and Y. Lin, “GraphBin: refined binning of metagenomic contigs using assembly graphs,” Bioinformatics, Mar. 2020, doi: 10.1093/bioinformatics/btaa180.\n[20] Y.-W. Wu, Y.-H. Tang, S. G. Tringe, B. A. Simmons, and S. W. Singer, “MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm,” Microbiome, vol. 2, no. 1, p. 26, 2014, doi: 10.1186/2049-2618-2-26.\n[21] Y.-W. Wu, B. A. Simmons, and S. W. Singer, “MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets,” Bioinformatics, vol. 32, no. 4, pp. 605–607, Oct. 2015, doi: 10.1093/bioinformatics/btv638.\n[22] D. D. Kang, J. Froula, R. Egan, and Z. Wang, “MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities,” PeerJ, vol. 3, p. e1165, 2015, doi: 10.7717/peerj.1165.\n[23] J. Alneberg et al., “Binning metagenomic contigs by coverage and composition,” Nat. Methods, vol. 11, no. 11, pp. 1144–1146, 2014, doi: 10.1038/nmeth.3103.\n[24] M. Z. DeMaere and A. E. Darling, “bin3C: exploiting Hi-C sequencing data to accurately resolve metagenome-assembled genomes,” Genome Biol., vol. 20, no. 1, p. 46, 2019, doi: 10.1186/s13059-019- 1643-1.\n[25] M. O. Press et al., “Hi-C deconvolution of a human gut microbiome yields high-quality draft genomes and reveals plasmid-genome interactions,” bioRxiv, p. 198713, Jan. 2017, doi: 10.1101/198713.\n[26] E. Lieberman-Aiden et al., “Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome,” Science, vol. 326, pp. 289–293, Oct. 2009, doi: 10.1126/science.1181369.\n[27] M. Rosvall, D. Axelsson, and C. T. Bergstrom, “The map equation,” Eur. Phys. J. Spec. Top., vol. 178, no. 1, pp. 13–23, 2009, doi: 10.1140/epjst/e2010-01179-1.\n[28] M. De Domenico, A. Lancichinetti, A. Arenas, and M. Rosvall, “Identifying modular flows on multilayer networks reveals highly overlapping organization in interconnected systems,” Phys. Rev. X, vol. 5, no. 1, 2015, doi: 10.1103/PhysRevX.5.011027.\n[29] Bushnell B., “BBTools.” [Online]. Available: sourceforge.net/projects/bbmap/ (visited on 06/13/2019).\n[30] S. Nurk, D. Meleshko, A. Korobeynikov, and P. A. Pevzner, “MetaSPAdes: A new versatile metagenomic assembler,” Genome Res., vol. 27, no. 5, pp. 824–834, 2017, doi: 10.1101/gr.213959.116.\n[31] H. Li, “Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM,” ArXiv, vol. 1303, Mar. 2013.\n[32] H. Li et al., “The Sequence Alignment/Map format and SAMtools,” Bioinformatics, vol. 25, no. 16, pp. 2078–2079, Aug. 2009, doi: 10.1093/bioinformatics/btp352.\n[33] P. A. Knight and D. Ruiz, “A fast algorithm for matrix balancing,” IMA J. Numer. Anal., vol. 33, no. 3, pp. 1029–1047, Oct. 2012, doi: 10.1093/imanum/drs019.\n[34] I. Tëmkin and N. Eldredge, “Networks and Hierarchies: Approaching Complexity in Evolutionary Theory,” in Interdisciplinary Evolution Research, 2015, pp. 183–226.\n[35] P. Erdős and A. Rényi, “On the Evolution of Random Graphs,” in PUBLICATION OF THE MATHEMATICAL INSTITUTE OF THE HUNGARIAN ACADEMY OF SCIENCES, 1960, pp. 17–61.\n[36] M. E. J. Newman, “Power laws, Pareto distributions and Zipf’s law,” Contemp. Phys., vol. 46, no. 5, pp. 323–351, 2005, doi: 10.1080/00107510500052444.\n[37] A. Clauset, C. R. Shalizi, and M. E. J. Newman, “Power-Law Distributions in Empirical Data,” SIAM Rev., vol. 51, no. 4, pp. 661–703, Jul. 2009.\n[38] R. Kissell and J. Poserina, “Chapter 4 - Advanced Math and Statistics,” R. Kissell and J. B. T.-O. S. M. Poserina Statistics, and Fantasy, Eds. Academic Press, 2017, pp. 103–135.\n[39] A. Pombo and M. Nicodemi, “Physical mechanisms behind the large scale features of chromatin organization,” Transcription, vol. 5, no. 2, p. e28447, Apr. 2014, doi: 10.4161/trns.28447.\n[40] F. Ay, T. L. Bailey, and W. S. Noble, “Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts,” Genome Res., vol. 24, no. 6, pp. 999–1011, Jun. 2014, doi: 10.1101/gr.160374.113.\n[41] T. Liu and Z. Wang, “Reconstructing high-resolution chromosome three-dimensional structures by Hi-C complex networks,” BMC Bioinformatics, vol. 19, no. 17, p. 496, 2018, doi: 10.1186/s12859-018-2464-z.\n[42] S. Pigolotti, M. H. Jensen, and G. Tiana, “Hierarchical domain model explains multifractal scaling of chromosome contact maps,” bioRxiv, p. 686279, Jan. 2019, doi: 10.1101/686279.\n[43] T.-C. Kan, “Apply graph theory to visualizing and analyzing Hi-C contact network,” 國立政治大學, 2018.\n[44] S. Emmons, S. Kobourov, M. Gallant, and K. Börner, “Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale,” PLoS One, vol. 11, no. 7, p. e0159161, Jul. 2016.\n[45] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” J. Stat. Mech. Theory Exp., vol. 2008, no. 10, p. P10008, 2008, doi: 10.1088/1742-5468/2008/10/p10008.\n[46] L. Waltman and N. J. van Eck, “A smart local moving algorithm for large-scale modularity-based community detection,” Eur. Phys. J. B, vol. 86, no. 11, p. 471, 2013, doi: 10.1140/epjb/e2013-40829-0.\n[47] M. Rosvall and C. T. Bergstrom, “Maps of random walks on complex networks reveal community structure,” Proc. Natl. Acad. Sci., vol. 105, no. 4, pp. 1118 LP – 1123, Jan. 2008, doi: 10.1073/pnas.0706851105.\n[48] U. N. Raghavan, R. Albert, and S. Kumara, “Near linear time algorithm to detect community structures in large-scale networks,” Phys. Rev. E, vol. 76, no. 3, p. 36106, Sep. 2007, doi: 10.1103/PhysRevE.76.036106.\n[49] A. Lancichinetti, S. Fortunato, and F. Radicchi, “Benchmark graphs for testing community detection algorithms,” Phys. Rev. E, vol. 78, no. 4, p. 46110, Oct. 2008, doi: 10.1103/PhysRevE.78.046110.\n[50] R. Rotta and A. Noack, “Multilevel Local Search Algorithms for Modularity Clustering,” ACM J. Exp. Algorithmics, vol. 16, Jul. 2011, doi: 10.1145/1963190.1970376.\n[51] A. Butler, P. Hoffman, P. Smibert, E. Papalexi, and R. Satija, “Integrating single-cell transcriptomic data across different conditions, technologies, and species,” Nat. Biotechnol., vol. 36, no. 5, pp. 411–420, 2018, doi: 10.1038/nbt.4096.\n[52] L. Waltman and N. J. van Eck, “A smart local moving algorithm for large-scale modularity-based community detection.” [Online]. Available: http://www.ludowaltman.nl/slm/ (visited on 06/17/2020).\n[53] J. Reichardt and S. Bornholdt, “Statistical mechanics of community detection,” Phys. Rev. E, vol. 74, no. 1, p. 16110, Jul. 2006, doi: 10.1103/PhysRevE.74.016110.\n[54] W. Simeon, “E‐prints and the Open Archives Initiative,” Libr. Hi Tech, vol. 21, no. 2, pp. 151–158, Jan. 2003, doi: 10.1108/07378830310479794.\n[55] D. H. Parks, M. Imelfort, C. T. Skennerton, P. Hugenholtz, and G. W. Tyson, “CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes,” Genome Res., vol. 25, no. 7, pp. 1043–1055, Jul. 2015, doi: 10.1101/gr.186072.114.\n[56] A. Gurevich, V. Saveliev, N. Vyahhi, and G. Tesler, “QUAST: quality assessment tool for genome assemblies,” Bioinformatics, vol. 29, no. 8, pp. 1072–1075, Apr. 2013, doi: 10.1093/bioinformatics/btt086.\n[57] J. D. Hunter, “Matplotlib: A 2D graphics environment,” Comput. Sci. Eng., vol. 9, no. 3, pp. 90–95, 2007, doi: 10.1109/MCSE.2007.55.\n[58] J. Alstott, E. Bullmore, and D. Plenz, “powerlaw: A Python Package for Analysis of Heavy-Tailed Distributions,” PLoS One, vol. 9, no. 1, p. e85777, Jan. 2014.\n[59] J.-L. R. Stevens, P. Rudiger, and J. A. Bednar, “HoloViews: Building Complex Visualizations Easily for Reproducible Science,” in Proceedings of the 14th Python in Science Conference, 2015, pp. 59–66, doi: 10.25080/Majora-7b98e3ed-00a.\n[60] E. Almaas and A.-L. Barabási, “Power Laws in Biological Networks BT - Power Laws, Scale-Free Networks and Genome Biology,” E. V Koonin, Y. I. Wolf, and G. P. Karev, Eds. Boston, MA: Springer US, 2006, pp. 1–11.\n[61] O. Dudchenko et al., “De novo assembly of the <em>Aedes aegypti</em> genome using Hi-C yields chromosome-length scaffolds,” Science (80-. )., vol. 356, no. 6333, pp. 92 LP – 95, Apr. 2017, doi: 10.1126/science.aal3327.zh_TW
dc.identifier.doi10.6814/NCCU202001729en_US
item.grantfulltextrestricted-
item.openairecristypehttp://purl.org/coar/resource_type/c_46ec-
item.fulltextWith Fulltext-
item.cerifentitytypePublications-
item.openairetypethesis-
Appears in Collections:學位論文
Files in This Item:
File Description SizeFormat
303101.pdf7.37 MBAdobe PDF2View/Open
Show simple item record

Google ScholarTM

Check

Altmetric

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.