學術產出-Theses
Article View/Open
Publication Export
-
題名 HiCSeg:針對不同樣本和物種的互動式基因體分割
HiCSeg: an interactive genome segmentation cross samples and species作者 吳映函
Wu, Yin-Han貢獻者 張家銘
Chang, Jia-Ming
吳映函
Wu, Yin-Han關鍵詞 基因體分割
Hi-C
ChIP-Seq
Genome segmentation
Hi-C
ChIP-Seq日期 2021 上傳時間 2-Sep-2021 16:54:31 (UTC+8) 摘要 Hi-C的全基因組染色體接觸可用於研究染色體的更高級別組織,例如隔室或拓撲關聯域。根據哺乳動物Hi-C圖的主成分分析可得到數據中兩個區室A和B。TAD或隔室可被視為基因組的分段。通常我們會使用基因體分割進行數據壓縮,並在不同細胞類型中整理出不同的修飾。我們比較了不同解析度下的PCA結果以找出差異,然後引入ChIP-Seq數據進行進一步分析。我們還引進了其他兩種進行聚類的方法,Louvain和Leiden。它們不僅可以與PCA的結果進行比較,還可以計算出網路的相關性。此外,我們可以基於結合ChIP-Seq和Hi-C的資訊使用兩者相加及網路融合來分割基因組。
The genome-wide chromosomal contact by Hi-C can be used to investigate the higher-level organization of chromosomes, such as compartments or topologically associating domains (TAD). Hi-C data revealed two compartments, A and B, based on principal component analysis (PCA) of Hi-C maps in mammals. TAD or compartment can be considered as a segmentation of the genome. Generally, we use genome segmentation for data compression and sort out different modifications in different cell types. We compared the PCA results in various resolutions to determine the difference and introduced the ChIP-Seq data for further analysis. We also introduce other methods to do clustering, which are the Louvain and Leiden methods. They can not only compare with the result of PCA but also figure out the correlation of networks. Furthermore, we can segment the genome based on integrated ChIP-Seq and Hi-C information using adding function and network fusion.參考文獻 Balazs, R. (2014). Epigenetic mechanisms in Alzheimer’s disease. Degenerative neurological and neuromuscular disease, 4, 85.Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, 2008 (10), P10008.Bo Wang, Aziz M Mezlini, Feyyaz Demir, Marc Fiume, Zhuowen Tu, Michael Brudno, Benjamin Haibe-Kains & Anna Goldenberg (2014). Similarity network fusion for aggregating data types on a genomic scale. Nature Methods volume 11, 333–337.ChromHMM: Chromatin state discovery and characterization. http://compbio.mit.edu/ChromHMM/Community detection for NetworkX’s documentation (2010). https://Python-louvain.readthedocs.io/en/latest/Dekker,J. et al. (2002) Capturing chromosome conformation. Science, 295, 1306–11.Eigenvector, Juicer (2017). https://github.com/aidenlab/juicer/wiki/EigenvectorENCODE: Encyclopedia of DNA Elements. https://www.encodeproject.org/Eugenio Marco1, Wouter Meuleman, Jialiang Huang, Kimberly Glass, Luca Pinello, Jianrong Wang,Manolis Kellis & Guo-Cheng Yuan (2017). Multi-scale chromatin state annotation using a hierarchical hidden Markov model. Nature communications. DOI: 10.1038/ncomms15011Illumina et al. (2007) Pub. No. 770-2007-007 Current as of 26 November 2007. Whole-Genome Chromatin IP Sequencing (ChIP-Seq).Introduction of dataset preprocessing (2014). File: GSE63525_GM12878_combined_README.rtf. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63525Kloetgen, A., Thandapani, P., Ntziachristos, P., Ghebrechristos, Y., Nomikou, S., Lazaris, C., ... & Tsirigos, A. (2020). Three-dimensional chromatin landscapes in T cell acute lymphoblastic leukemia. Nature genetics, 52(4), 388-400.Lan, X., Witt, H., Katsumura, K., Ye, Z., Wang, Q., Bresnick, E. H., ... & Jin, V. X. (2012). Integration of Hi-C and ChIP-seq data reveals distinct types of chromatin linkages. Nucleic acids research, 40(16), 7690-7704.Lieberman-Aiden E, Van Berkum N L, Williams L, et al. Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome. Science 326, 289–293 (2009).Lin Liu, Yiqian Zhang, Jianxing Feng, Ning Zheng, Junfeng Yin, Yong Zhang (2012). GeSICA: genome segmentation from intra-chromosomal associations. BMC Genomics. 2012 May 4;13:164. doi: 10.1186/1471-2164-13-164.Luo, Z., Wang, X., Jiang, H., Wang, R., Chen, J., Chen, Y., ... & Song, X. (2020). Reorganized 3D genome structures support transcriptional regulation in mouse spermatogenesis. iScience, 23(4), 101034.Network fusion. https://nbisweden.github.io/workshop_omics_integration/session_nmf/SNF_main.htmlRao, S.S.P., Huntley, M.H., Durand, N.C., Stamenova, E.K., Bochkov, I.D., Robinson, J.T., Sanborn, A.L., Machol, I., Omer, A.D., Lander, E.S., et al. (2014). A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680.SIMILARITY NETWORK FUSION(SNF). http://compbio.cs.toronto.edu/SNF/SNF/Software.htmlStrahl, B. D., & Allis, C. D. (2000). The language of covalent histone modifications. Nature, 403(6765), 41-45.Traag, V.A., Waltman, L. & van Eck, N.J. From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep 9, 5233 (2019). https://doi.org/10.1038/s41598-019-41695-zVan Berkum, Nynke L et al. (2010) Hi-C: a method to study the three-dimensional architecture of genomes. Journal of visualized experiments : JoVE ,39, 1869.Visualization tool: Juicebox. https://www.aidenlab.org/juicebox/Waltman, L., & Van Eck, N. J. (2013). A smart local moving algorithm for large-scale modularity-based community detection. The European physical journal B, 86(11), 1-14.Weighted correlation network analysis. https://en.wikipedia.org/wiki/Weighted_correlation_network_analysisnetworkanalysis, CWTSLeiden (2020). https://github.com/CWTSLeiden/networkanalysis 描述 碩士
國立政治大學
資訊科學系
108753102資料來源 http://thesis.lib.nccu.edu.tw/record/#G0108753102 資料類型 thesis dc.contributor.advisor 張家銘 zh_TW dc.contributor.advisor Chang, Jia-Ming en_US dc.contributor.author (Authors) 吳映函 zh_TW dc.contributor.author (Authors) Wu, Yin-Han en_US dc.creator (作者) 吳映函 zh_TW dc.creator (作者) Wu, Yin-Han en_US dc.date (日期) 2021 en_US dc.date.accessioned 2-Sep-2021 16:54:31 (UTC+8) - dc.date.available 2-Sep-2021 16:54:31 (UTC+8) - dc.date.issued (上傳時間) 2-Sep-2021 16:54:31 (UTC+8) - dc.identifier (Other Identifiers) G0108753102 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/136962 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊科學系 zh_TW dc.description (描述) 108753102 zh_TW dc.description.abstract (摘要) Hi-C的全基因組染色體接觸可用於研究染色體的更高級別組織,例如隔室或拓撲關聯域。根據哺乳動物Hi-C圖的主成分分析可得到數據中兩個區室A和B。TAD或隔室可被視為基因組的分段。通常我們會使用基因體分割進行數據壓縮,並在不同細胞類型中整理出不同的修飾。我們比較了不同解析度下的PCA結果以找出差異,然後引入ChIP-Seq數據進行進一步分析。我們還引進了其他兩種進行聚類的方法,Louvain和Leiden。它們不僅可以與PCA的結果進行比較,還可以計算出網路的相關性。此外,我們可以基於結合ChIP-Seq和Hi-C的資訊使用兩者相加及網路融合來分割基因組。 zh_TW dc.description.abstract (摘要) The genome-wide chromosomal contact by Hi-C can be used to investigate the higher-level organization of chromosomes, such as compartments or topologically associating domains (TAD). Hi-C data revealed two compartments, A and B, based on principal component analysis (PCA) of Hi-C maps in mammals. TAD or compartment can be considered as a segmentation of the genome. Generally, we use genome segmentation for data compression and sort out different modifications in different cell types. We compared the PCA results in various resolutions to determine the difference and introduced the ChIP-Seq data for further analysis. We also introduce other methods to do clustering, which are the Louvain and Leiden methods. They can not only compare with the result of PCA but also figure out the correlation of networks. Furthermore, we can segment the genome based on integrated ChIP-Seq and Hi-C information using adding function and network fusion. en_US dc.description.tableofcontents Introduction 1High-throughput Chromatin Conformation Capture (Hi-C) 1Chromatin immunoprecipitation sequence (ChIP-Seq) 1ChromHMM 2Similarity Network Fusion (SNF) 3Integration of different types of data 3Methods 5Overview 5Data Sets 6Hi-C 6ChIP-Seq 6Hi-C Contact Matrix preprocessing 6Knight-Ruiz (KR) normalization processing 7ChIP-Seq binning 7Correlation Matrix 8Principal Component Analysis (PCA) 9Network transform 9Hi-C and ChIP-Seq Network fusion 10Network clustering 11Louvain method 11Leiden method 12Results 13A/B compartment reproducibility 13KR normalization effect 13The resolution influence the runtime 15The explanation proportion of PCA 16Hi-C network cluster 18Louvain cluster of Hi-C network 18Cluster Hi-C data by Leiden 22The consistency between network cluster and PCA decomposition 24ChIP-Seq data 26Cluster ChIP-Seq data by Louvain 26Cluster ChIP-Seq network by Leiden 31Further analysis of the A/B compartment 36Correlation of Hi-C clusters and ChIP-Seq clusters 37Combination of ChIP-Seq and Hi-C 39Discussion and Conclusion 43References 44 zh_TW dc.format.extent 3237872 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0108753102 en_US dc.subject (關鍵詞) 基因體分割 zh_TW dc.subject (關鍵詞) Hi-C zh_TW dc.subject (關鍵詞) ChIP-Seq zh_TW dc.subject (關鍵詞) Genome segmentation en_US dc.subject (關鍵詞) Hi-C en_US dc.subject (關鍵詞) ChIP-Seq en_US dc.title (題名) HiCSeg:針對不同樣本和物種的互動式基因體分割 zh_TW dc.title (題名) HiCSeg: an interactive genome segmentation cross samples and species en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) Balazs, R. (2014). Epigenetic mechanisms in Alzheimer’s disease. Degenerative neurological and neuromuscular disease, 4, 85.Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, 2008 (10), P10008.Bo Wang, Aziz M Mezlini, Feyyaz Demir, Marc Fiume, Zhuowen Tu, Michael Brudno, Benjamin Haibe-Kains & Anna Goldenberg (2014). Similarity network fusion for aggregating data types on a genomic scale. Nature Methods volume 11, 333–337.ChromHMM: Chromatin state discovery and characterization. http://compbio.mit.edu/ChromHMM/Community detection for NetworkX’s documentation (2010). https://Python-louvain.readthedocs.io/en/latest/Dekker,J. et al. (2002) Capturing chromosome conformation. Science, 295, 1306–11.Eigenvector, Juicer (2017). https://github.com/aidenlab/juicer/wiki/EigenvectorENCODE: Encyclopedia of DNA Elements. https://www.encodeproject.org/Eugenio Marco1, Wouter Meuleman, Jialiang Huang, Kimberly Glass, Luca Pinello, Jianrong Wang,Manolis Kellis & Guo-Cheng Yuan (2017). Multi-scale chromatin state annotation using a hierarchical hidden Markov model. Nature communications. DOI: 10.1038/ncomms15011Illumina et al. (2007) Pub. No. 770-2007-007 Current as of 26 November 2007. Whole-Genome Chromatin IP Sequencing (ChIP-Seq).Introduction of dataset preprocessing (2014). File: GSE63525_GM12878_combined_README.rtf. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63525Kloetgen, A., Thandapani, P., Ntziachristos, P., Ghebrechristos, Y., Nomikou, S., Lazaris, C., ... & Tsirigos, A. (2020). Three-dimensional chromatin landscapes in T cell acute lymphoblastic leukemia. Nature genetics, 52(4), 388-400.Lan, X., Witt, H., Katsumura, K., Ye, Z., Wang, Q., Bresnick, E. H., ... & Jin, V. X. (2012). Integration of Hi-C and ChIP-seq data reveals distinct types of chromatin linkages. Nucleic acids research, 40(16), 7690-7704.Lieberman-Aiden E, Van Berkum N L, Williams L, et al. Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome. Science 326, 289–293 (2009).Lin Liu, Yiqian Zhang, Jianxing Feng, Ning Zheng, Junfeng Yin, Yong Zhang (2012). GeSICA: genome segmentation from intra-chromosomal associations. BMC Genomics. 2012 May 4;13:164. doi: 10.1186/1471-2164-13-164.Luo, Z., Wang, X., Jiang, H., Wang, R., Chen, J., Chen, Y., ... & Song, X. (2020). Reorganized 3D genome structures support transcriptional regulation in mouse spermatogenesis. iScience, 23(4), 101034.Network fusion. https://nbisweden.github.io/workshop_omics_integration/session_nmf/SNF_main.htmlRao, S.S.P., Huntley, M.H., Durand, N.C., Stamenova, E.K., Bochkov, I.D., Robinson, J.T., Sanborn, A.L., Machol, I., Omer, A.D., Lander, E.S., et al. (2014). A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680.SIMILARITY NETWORK FUSION(SNF). http://compbio.cs.toronto.edu/SNF/SNF/Software.htmlStrahl, B. D., & Allis, C. D. (2000). The language of covalent histone modifications. Nature, 403(6765), 41-45.Traag, V.A., Waltman, L. & van Eck, N.J. From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep 9, 5233 (2019). https://doi.org/10.1038/s41598-019-41695-zVan Berkum, Nynke L et al. (2010) Hi-C: a method to study the three-dimensional architecture of genomes. Journal of visualized experiments : JoVE ,39, 1869.Visualization tool: Juicebox. https://www.aidenlab.org/juicebox/Waltman, L., & Van Eck, N. J. (2013). A smart local moving algorithm for large-scale modularity-based community detection. The European physical journal B, 86(11), 1-14.Weighted correlation network analysis. https://en.wikipedia.org/wiki/Weighted_correlation_network_analysisnetworkanalysis, CWTSLeiden (2020). https://github.com/CWTSLeiden/networkanalysis zh_TW dc.identifier.doi (DOI) 10.6814/NCCU202101389 en_US