學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

題名 HiCSeg:針對不同樣本和物種的互動式基因體分割
HiCSeg: an interactive genome segmentation cross samples and species
作者 吳映函
Wu, Yin-Han
貢獻者 張家銘
Chang, Jia-Ming
吳映函
Wu, Yin-Han
關鍵詞 基因體分割
Hi-C
ChIP-Seq
Genome segmentation
Hi-C
ChIP-Seq
日期 2021
上傳時間 2-Sep-2021 16:54:31 (UTC+8)
摘要 Hi-C的全基因組染色體接觸可用於研究染色體的更高級別組織,例如隔室或拓撲關聯域。根據哺乳動物Hi-C圖的主成分分析可得到數據中兩個區室A和B。TAD或隔室可被視為基因組的分段。通常我們會使用基因體分割進行數據壓縮,並在不同細胞類型中整理出不同的修飾。我們比較了不同解析度下的PCA結果以找出差異,然後引入ChIP-Seq數據進行進一步分析。我們還引進了其他兩種進行聚類的方法,Louvain和Leiden。它們不僅可以與PCA的結果進行比較,還可以計算出網路的相關性。此外,我們可以基於結合ChIP-Seq和Hi-C的資訊使用兩者相加及網路融合來分割基因組。
The genome-wide chromosomal contact by Hi-C can be used to investigate the higher-level organization of chromosomes, such as compartments or topologically associating domains (TAD). Hi-C data revealed two compartments, A and B, based on principal component analysis (PCA) of Hi-C maps in mammals. TAD or compartment can be considered as a segmentation of the genome. Generally, we use genome segmentation for data compression and sort out different modifications in different cell types. We compared the PCA results in various resolutions to determine the difference and introduced the ChIP-Seq data for further analysis. We also introduce other methods to do clustering, which are the Louvain and Leiden methods. They can not only compare with the result of PCA but also figure out the correlation of networks. Furthermore, we can segment the genome based on integrated ChIP-Seq and Hi-C information using adding function and network fusion.
參考文獻 Balazs, R. (2014). Epigenetic mechanisms in Alzheimer’s disease. Degenerative neurological and neuromuscular disease, 4, 85.
Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, 2008 (10), P10008.
Bo Wang, Aziz M Mezlini, Feyyaz Demir, Marc Fiume, Zhuowen Tu, Michael Brudno, Benjamin Haibe-Kains & Anna Goldenberg (2014). Similarity network fusion for aggregating data types on a genomic scale. Nature Methods volume 11, 333–337.
ChromHMM: Chromatin state discovery and characterization. http://compbio.mit.edu/ChromHMM/
Community detection for NetworkX’s documentation (2010). https://Python-louvain.readthedocs.io/en/latest/
Dekker,J. et al. (2002) Capturing chromosome conformation. Science, 295, 1306–11.
Eigenvector, Juicer (2017). https://github.com/aidenlab/juicer/wiki/Eigenvector
ENCODE: Encyclopedia of DNA Elements. https://www.encodeproject.org/
Eugenio Marco1, Wouter Meuleman, Jialiang Huang, Kimberly Glass, Luca Pinello, Jianrong Wang,Manolis Kellis & Guo-Cheng Yuan (2017). Multi-scale chromatin state annotation using a hierarchical hidden Markov model. Nature communications. DOI: 10.1038/ncomms15011
Illumina et al. (2007) Pub. No. 770-2007-007 Current as of 26 November 2007. Whole-Genome Chromatin IP Sequencing (ChIP-Seq).
Introduction of dataset preprocessing (2014). File: GSE63525_GM12878_combined_README.rtf. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63525
Kloetgen, A., Thandapani, P., Ntziachristos, P., Ghebrechristos, Y., Nomikou, S., Lazaris, C., ... & Tsirigos, A. (2020). Three-dimensional chromatin landscapes in T cell acute lymphoblastic leukemia. Nature genetics, 52(4), 388-400.
Lan, X., Witt, H., Katsumura, K., Ye, Z., Wang, Q., Bresnick, E. H., ... & Jin, V. X. (2012). Integration of Hi-C and ChIP-seq data reveals distinct types of chromatin linkages. Nucleic acids research, 40(16), 7690-7704.
Lieberman-Aiden E, Van Berkum N L, Williams L, et al. Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome. Science 326, 289–293 (2009).
Lin Liu, Yiqian Zhang, Jianxing Feng, Ning Zheng, Junfeng Yin, Yong Zhang (2012). GeSICA: genome segmentation from intra-chromosomal associations. BMC Genomics. 2012 May 4;13:164. doi: 10.1186/1471-2164-13-164.
Luo, Z., Wang, X., Jiang, H., Wang, R., Chen, J., Chen, Y., ... & Song, X. (2020). Reorganized 3D genome structures support transcriptional regulation in mouse spermatogenesis. iScience, 23(4), 101034.
Network fusion. https://nbisweden.github.io/workshop_omics_integration/session_nmf/SNF_main.html
Rao, S.S.P., Huntley, M.H., Durand, N.C., Stamenova, E.K., Bochkov, I.D., Robinson, J.T., Sanborn, A.L., Machol, I., Omer, A.D., Lander, E.S., et al. (2014). A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680.
SIMILARITY NETWORK FUSION(SNF). http://compbio.cs.toronto.edu/SNF/SNF/Software.html
Strahl, B. D., & Allis, C. D. (2000). The language of covalent histone modifications. Nature, 403(6765), 41-45.
Traag, V.A., Waltman, L. & van Eck, N.J. From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep 9, 5233 (2019). https://doi.org/10.1038/s41598-019-41695-z
Van Berkum, Nynke L et al. (2010) Hi-C: a method to study the three-dimensional architecture of genomes. Journal of visualized experiments : JoVE ,39, 1869.
Visualization tool: Juicebox. https://www.aidenlab.org/juicebox/
Waltman, L., & Van Eck, N. J. (2013). A smart local moving algorithm for large-scale modularity-based community detection. The European physical journal B, 86(11), 1-14.
Weighted correlation network analysis. https://en.wikipedia.org/wiki/Weighted_correlation_network_analysis
networkanalysis, CWTSLeiden (2020). https://github.com/CWTSLeiden/networkanalysis
描述 碩士
國立政治大學
資訊科學系
108753102
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0108753102
資料類型 thesis
dc.contributor.advisor 張家銘zh_TW
dc.contributor.advisor Chang, Jia-Mingen_US
dc.contributor.author (Authors) 吳映函zh_TW
dc.contributor.author (Authors) Wu, Yin-Hanen_US
dc.creator (作者) 吳映函zh_TW
dc.creator (作者) Wu, Yin-Hanen_US
dc.date (日期) 2021en_US
dc.date.accessioned 2-Sep-2021 16:54:31 (UTC+8)-
dc.date.available 2-Sep-2021 16:54:31 (UTC+8)-
dc.date.issued (上傳時間) 2-Sep-2021 16:54:31 (UTC+8)-
dc.identifier (Other Identifiers) G0108753102en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/136962-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學系zh_TW
dc.description (描述) 108753102zh_TW
dc.description.abstract (摘要) Hi-C的全基因組染色體接觸可用於研究染色體的更高級別組織,例如隔室或拓撲關聯域。根據哺乳動物Hi-C圖的主成分分析可得到數據中兩個區室A和B。TAD或隔室可被視為基因組的分段。通常我們會使用基因體分割進行數據壓縮,並在不同細胞類型中整理出不同的修飾。我們比較了不同解析度下的PCA結果以找出差異,然後引入ChIP-Seq數據進行進一步分析。我們還引進了其他兩種進行聚類的方法,Louvain和Leiden。它們不僅可以與PCA的結果進行比較,還可以計算出網路的相關性。此外,我們可以基於結合ChIP-Seq和Hi-C的資訊使用兩者相加及網路融合來分割基因組。zh_TW
dc.description.abstract (摘要) The genome-wide chromosomal contact by Hi-C can be used to investigate the higher-level organization of chromosomes, such as compartments or topologically associating domains (TAD). Hi-C data revealed two compartments, A and B, based on principal component analysis (PCA) of Hi-C maps in mammals. TAD or compartment can be considered as a segmentation of the genome. Generally, we use genome segmentation for data compression and sort out different modifications in different cell types. We compared the PCA results in various resolutions to determine the difference and introduced the ChIP-Seq data for further analysis. We also introduce other methods to do clustering, which are the Louvain and Leiden methods. They can not only compare with the result of PCA but also figure out the correlation of networks. Furthermore, we can segment the genome based on integrated ChIP-Seq and Hi-C information using adding function and network fusion.en_US
dc.description.tableofcontents Introduction 1
High-throughput Chromatin Conformation Capture (Hi-C) 1
Chromatin immunoprecipitation sequence (ChIP-Seq) 1
ChromHMM 2
Similarity Network Fusion (SNF) 3
Integration of different types of data 3
Methods 5
Overview 5
Data Sets 6
Hi-C 6
ChIP-Seq 6
Hi-C Contact Matrix preprocessing 6
Knight-Ruiz (KR) normalization processing 7
ChIP-Seq binning 7
Correlation Matrix 8
Principal Component Analysis (PCA) 9
Network transform 9
Hi-C and ChIP-Seq Network fusion 10
Network clustering 11
Louvain method 11
Leiden method 12
Results 13
A/B compartment reproducibility 13
KR normalization effect 13
The resolution influence the runtime 15
The explanation proportion of PCA 16
Hi-C network cluster 18
Louvain cluster of Hi-C network 18
Cluster Hi-C data by Leiden 22
The consistency between network cluster and PCA decomposition 24
ChIP-Seq data 26
Cluster ChIP-Seq data by Louvain 26
Cluster ChIP-Seq network by Leiden 31
Further analysis of the A/B compartment 36
Correlation of Hi-C clusters and ChIP-Seq clusters 37
Combination of ChIP-Seq and Hi-C 39
Discussion and Conclusion 43
References 44
zh_TW
dc.format.extent 3237872 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0108753102en_US
dc.subject (關鍵詞) 基因體分割zh_TW
dc.subject (關鍵詞) Hi-Czh_TW
dc.subject (關鍵詞) ChIP-Seqzh_TW
dc.subject (關鍵詞) Genome segmentationen_US
dc.subject (關鍵詞) Hi-Cen_US
dc.subject (關鍵詞) ChIP-Seqen_US
dc.title (題名) HiCSeg:針對不同樣本和物種的互動式基因體分割zh_TW
dc.title (題名) HiCSeg: an interactive genome segmentation cross samples and speciesen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) Balazs, R. (2014). Epigenetic mechanisms in Alzheimer’s disease. Degenerative neurological and neuromuscular disease, 4, 85.
Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, 2008 (10), P10008.
Bo Wang, Aziz M Mezlini, Feyyaz Demir, Marc Fiume, Zhuowen Tu, Michael Brudno, Benjamin Haibe-Kains & Anna Goldenberg (2014). Similarity network fusion for aggregating data types on a genomic scale. Nature Methods volume 11, 333–337.
ChromHMM: Chromatin state discovery and characterization. http://compbio.mit.edu/ChromHMM/
Community detection for NetworkX’s documentation (2010). https://Python-louvain.readthedocs.io/en/latest/
Dekker,J. et al. (2002) Capturing chromosome conformation. Science, 295, 1306–11.
Eigenvector, Juicer (2017). https://github.com/aidenlab/juicer/wiki/Eigenvector
ENCODE: Encyclopedia of DNA Elements. https://www.encodeproject.org/
Eugenio Marco1, Wouter Meuleman, Jialiang Huang, Kimberly Glass, Luca Pinello, Jianrong Wang,Manolis Kellis & Guo-Cheng Yuan (2017). Multi-scale chromatin state annotation using a hierarchical hidden Markov model. Nature communications. DOI: 10.1038/ncomms15011
Illumina et al. (2007) Pub. No. 770-2007-007 Current as of 26 November 2007. Whole-Genome Chromatin IP Sequencing (ChIP-Seq).
Introduction of dataset preprocessing (2014). File: GSE63525_GM12878_combined_README.rtf. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63525
Kloetgen, A., Thandapani, P., Ntziachristos, P., Ghebrechristos, Y., Nomikou, S., Lazaris, C., ... & Tsirigos, A. (2020). Three-dimensional chromatin landscapes in T cell acute lymphoblastic leukemia. Nature genetics, 52(4), 388-400.
Lan, X., Witt, H., Katsumura, K., Ye, Z., Wang, Q., Bresnick, E. H., ... & Jin, V. X. (2012). Integration of Hi-C and ChIP-seq data reveals distinct types of chromatin linkages. Nucleic acids research, 40(16), 7690-7704.
Lieberman-Aiden E, Van Berkum N L, Williams L, et al. Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome. Science 326, 289–293 (2009).
Lin Liu, Yiqian Zhang, Jianxing Feng, Ning Zheng, Junfeng Yin, Yong Zhang (2012). GeSICA: genome segmentation from intra-chromosomal associations. BMC Genomics. 2012 May 4;13:164. doi: 10.1186/1471-2164-13-164.
Luo, Z., Wang, X., Jiang, H., Wang, R., Chen, J., Chen, Y., ... & Song, X. (2020). Reorganized 3D genome structures support transcriptional regulation in mouse spermatogenesis. iScience, 23(4), 101034.
Network fusion. https://nbisweden.github.io/workshop_omics_integration/session_nmf/SNF_main.html
Rao, S.S.P., Huntley, M.H., Durand, N.C., Stamenova, E.K., Bochkov, I.D., Robinson, J.T., Sanborn, A.L., Machol, I., Omer, A.D., Lander, E.S., et al. (2014). A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680.
SIMILARITY NETWORK FUSION(SNF). http://compbio.cs.toronto.edu/SNF/SNF/Software.html
Strahl, B. D., & Allis, C. D. (2000). The language of covalent histone modifications. Nature, 403(6765), 41-45.
Traag, V.A., Waltman, L. & van Eck, N.J. From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep 9, 5233 (2019). https://doi.org/10.1038/s41598-019-41695-z
Van Berkum, Nynke L et al. (2010) Hi-C: a method to study the three-dimensional architecture of genomes. Journal of visualized experiments : JoVE ,39, 1869.
Visualization tool: Juicebox. https://www.aidenlab.org/juicebox/
Waltman, L., & Van Eck, N. J. (2013). A smart local moving algorithm for large-scale modularity-based community detection. The European physical journal B, 86(11), 1-14.
Weighted correlation network analysis. https://en.wikipedia.org/wiki/Weighted_correlation_network_analysis
networkanalysis, CWTSLeiden (2020). https://github.com/CWTSLeiden/networkanalysis
zh_TW
dc.identifier.doi (DOI) 10.6814/NCCU202101389en_US