Publications-Theses
Article View/Open
Publication Export
-
題名 深度學習應用在偵測拓撲結構域
Topology Association Domain Identification using Deep Learning作者 楊鎮遠
Yang, Jhen-Yuan貢獻者 張家銘
Chang, Jia-Ming
楊鎮遠
Yang, Jhen-Yuan關鍵詞 拓撲關聯域
TAD
Hi-C
染色體組織
深度學習
Topology Association Domain
TAD
Hi-C
Chromosome organization
Deep learning日期 2019 上傳時間 5-Sep-2019 16:15:31 (UTC+8) 摘要 摘要● 背景:近年來,越來越多的證據表明三維染色體結構在基因組功能中起著重要作用。拓撲關聯域(TAD)是一種自相互作用區域,已被證明是染色體的結構單元。然而,在高通量染色體構象捕獲圖中鑑定TAD 是一項計算挑戰。● 結果:我們提出了一個新問題,即TAD 分類,而不是原始的TAD 識別。具體地,我們將Hi-C 圖考慮為圖像,使得TAD 分類是使用兩個深度學習模型,卷積神經網絡和殘差神經網絡來解決的圖像分類問題。此外,我們設計了一種合乎邏輯的方法來生成非TAD 數據,用於二元分類問題。通過跨物種和細胞類型驗證,深度學習模型的表現良好,AUC> 0.80。● 結論:TAD 在進化過程中被證明是保守的。有趣的是,我們的結果證實TAD 分類模型是實用的跨物種。從圖像分類的角度來看,它表明人與鼠之間的TAD 顯示了共同的模式。我們的方法可以成為測試Hi-C 圖中TAD 變化或保存的新方法。例如,如果兩個分類模型是可交換的,則保留兩個Hi-C 圖的TAD
Abstract● Background: In the last years, increasing evidence indicates that three-dimensionalchromosome structure plays important rule in genomic function. A TopologicallyAssociating Domain (TAD), a self-interacting region, has been shownas a structure unit of chromosome. However, it is a computational challenge toidentify TADs in high-throughput chromosome conformation capture map.● Results: We proposed a novel problem, TAD classification, instead of originalTAD identification. Specifically, we consider Hi-C map as image such that TADclassification is an image classification problem which is solved using two deeplearning models, convolutional neural network and residual neural network. Besides,we designed an elegant way to generate non-TAD data for binary classificationproblem. The performance of deep learning models is quite promising,AUC > 0.80, through cross species and cell types validation.● Conclusions: TAD has been shown conserved during evolution. Interestingly,our results confirm TAD classification model is practical cross species. It indicatesTADs between human and mouse show common pattern from point ofview of image classification. Our approach could be a new way to test variationor conservation of TADs among Hi-C maps. For example, TADs of two Hi-Cmaps are conserved if two classification models are exchangeable.參考文獻 1. Bonev, B. & Cavalli, G. Organization and function of the 3D genome. Nat Rev Genet. 17:661–78. 2016.2. Dekker, J. et al. Capturing chromosome conformation. Science. 295(5558):1306–11. 2002.3. Simonis, M. et al. Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-ChIP (4C). Nat Genet. 38:1348–54. 2006.4. Dostie, J. & Dekker, J. Mapping networks of physical interactions between genomic ele-ments using 5C technology. Nat Protoc. 2:988–1002. 2007.5. Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals fold-ing principles of the human genome. Science. 326(5950):289–93. 2009.6. Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 485, pp. 376-380. 2012.7. Peifer, M. et al. Telomerase activation by genomic rearrangements in high-risk neuroblas-toma. Nature. 526:700–704. 2015.8. Zufferey, M. et al. Comparison of computational methods for the identification of topologically associating domains. Genome Biol. 19(1):217. 2018.9. van Berkum, N.L. et al. Hi-C: a method to study the three-dimensional architecture of ge-nomes. J Vis Exp. 39:pii:1869. 2010.10. Sexton, T. et al. Three-dimensional folding and functional organization principles of the Drosophila genome. Cell. 148, 458–472. 2012.11. Matharu, N. & Ahituv, N. Minor. Loops in major folds: enhancer-promoter looping, chromatin restructuring, and their association with transcriptional regulation and disease. PLoS Genet. 11: e1005640. 2015.12. Fukushima, K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern.1980; 36, 193–20213. LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradient-based learning applied to document recognition. IEEE. 86(11):2278–2324. 1998.14. Krizhevsky, A., Sutskever, I., and Hinton, G. ImageNet classification with deep convolutional neural networks. NIPS. 2012.15. Yaffe, E. & Tanay, A. Probabilistic modeling of Hi-C contact maps eliminates systematic bi-ases to characterize global chromosomal architecture. Nature Genet. 2012; 43, 1059–106516. Hi-C project at Ren Lab, http://chromosome.sdsc.edu/mouse/hi-c/download.html17. Pal, K., Forcato, M., and Ferrari, F. Hi-C analysis: from data generation to integration. Bio-phys Rev, 11. pp. 67-78. 2019.18. Dali, R. & Blanchette, M. A critical assessment of topologically associating domain predic-tion tools. Nucleic Acids Res. 45, 2994–3005. 2017.19. Hu, J. et al. Squeeze-and-excitation networks. CVPR.201820. Ioffe,S. & Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. CoRR; 2015.21. He, K. et al. Deep residual learning for image recognition. CVPR. 201622. Y. Shen. et al. A map of the cis-regulatory sequences in the mouse genome Nature, 488, pp. 116-120. 201223. Liu T, Wang Z. HiCNN: a very deep convolutional neural network to better enhance the resolution of Hi-C data. Bioinformatics. 201924. Z. Wang, W. Yan, and T. Oates. Time Series Classification from Scratch with Deep Neural Networks: A Strong Baseline. CoRR, abs/1611.06455. 2016.25. Zhou, B. et al. Learning deep features for discriminative localization. CVPR. 201426. Zhang, Y. et al. Enhancing Hi-C data resolution with deep convolutional neural network HiCPlus. Nat Commun 9, 750. 2018.27. Szabo, Q. et al. TADs are 3D structural units of higher-order chromosome organization in Drosophila. Science Advances 4, eaar8082. 2018.28. Henderson, J. et al. Accurate prediction of boundaries of high resolution topologically associated domains (TADs) in fruit flies using deep learning. Nucleic Acids Res. 47, e78. 2019.29. Schuettengruber, B. et al. Cooperativity, specificity, and evolutionary stability of Polycomb targeting in Drosophila. Cell Rep 9, 219–33. 2014.30. Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–80. 2014.31. Bonev, B. et al. Multiscale 3D Genome Rewiring during Mouse Neural Development. Cell 171, 557–572.e24. 2017. 描述 碩士
國立政治大學
資訊科學系
105753033資料來源 http://thesis.lib.nccu.edu.tw/record/#G1057530331 資料類型 thesis dc.contributor.advisor 張家銘 zh_TW dc.contributor.advisor Chang, Jia-Ming en_US dc.contributor.author (Authors) 楊鎮遠 zh_TW dc.contributor.author (Authors) Yang, Jhen-Yuan en_US dc.creator (作者) 楊鎮遠 zh_TW dc.creator (作者) Yang, Jhen-Yuan en_US dc.date (日期) 2019 en_US dc.date.accessioned 5-Sep-2019 16:15:31 (UTC+8) - dc.date.available 5-Sep-2019 16:15:31 (UTC+8) - dc.date.issued (上傳時間) 5-Sep-2019 16:15:31 (UTC+8) - dc.identifier (Other Identifiers) G1057530331 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/125646 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊科學系 zh_TW dc.description (描述) 105753033 zh_TW dc.description.abstract (摘要) 摘要● 背景:近年來,越來越多的證據表明三維染色體結構在基因組功能中起著重要作用。拓撲關聯域(TAD)是一種自相互作用區域,已被證明是染色體的結構單元。然而,在高通量染色體構象捕獲圖中鑑定TAD 是一項計算挑戰。● 結果:我們提出了一個新問題,即TAD 分類,而不是原始的TAD 識別。具體地,我們將Hi-C 圖考慮為圖像,使得TAD 分類是使用兩個深度學習模型,卷積神經網絡和殘差神經網絡來解決的圖像分類問題。此外,我們設計了一種合乎邏輯的方法來生成非TAD 數據,用於二元分類問題。通過跨物種和細胞類型驗證,深度學習模型的表現良好,AUC> 0.80。● 結論:TAD 在進化過程中被證明是保守的。有趣的是,我們的結果證實TAD 分類模型是實用的跨物種。從圖像分類的角度來看,它表明人與鼠之間的TAD 顯示了共同的模式。我們的方法可以成為測試Hi-C 圖中TAD 變化或保存的新方法。例如,如果兩個分類模型是可交換的,則保留兩個Hi-C 圖的TAD zh_TW dc.description.abstract (摘要) Abstract● Background: In the last years, increasing evidence indicates that three-dimensionalchromosome structure plays important rule in genomic function. A TopologicallyAssociating Domain (TAD), a self-interacting region, has been shownas a structure unit of chromosome. However, it is a computational challenge toidentify TADs in high-throughput chromosome conformation capture map.● Results: We proposed a novel problem, TAD classification, instead of originalTAD identification. Specifically, we consider Hi-C map as image such that TADclassification is an image classification problem which is solved using two deeplearning models, convolutional neural network and residual neural network. Besides,we designed an elegant way to generate non-TAD data for binary classificationproblem. The performance of deep learning models is quite promising,AUC > 0.80, through cross species and cell types validation.● Conclusions: TAD has been shown conserved during evolution. Interestingly,our results confirm TAD classification model is practical cross species. It indicatesTADs between human and mouse show common pattern from point ofview of image classification. Our approach could be a new way to test variationor conservation of TADs among Hi-C maps. For example, TADs of two Hi-Cmaps are conserved if two classification models are exchangeable. en_US dc.description.tableofcontents List of Figures iiList of Tables iiiAbstract ivKeywords iv1. Introduction 11.1 Overview chromosome conformation capture 11.2 High-throughput chromosome conformation capture 11.3 Topologically Associating Domains 21.4 CTCF 41.5 Deep learning algorithm 51.6 Fully Convolutional Neural Network 61.7 Residual Neural Network 71.8 Squeeze-and-Excitation Net 81.9 Deep learning with Hi-C 92. Methods 102.1 Data preparation 112.2 non-TAD generation 112.3 Deep learning models 122.3.1 Model architectures 122.4 Evaluation 142.4.1 Experimental designs 142.4.2 Metrics 152.5 TAD caller by Dynamic programming 163. Results 183.1 Five-cross validation in species-specific dataset 183.2 Prediction error analysis 263.3 Data preprocessing 283.4 Evaluate model 324. Discussion 355. Conclusion 366. References 37 zh_TW dc.format.extent 1669209 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G1057530331 en_US dc.subject (關鍵詞) 拓撲關聯域 zh_TW dc.subject (關鍵詞) TAD zh_TW dc.subject (關鍵詞) Hi-C zh_TW dc.subject (關鍵詞) 染色體組織 zh_TW dc.subject (關鍵詞) 深度學習 zh_TW dc.subject (關鍵詞) Topology Association Domain en_US dc.subject (關鍵詞) TAD en_US dc.subject (關鍵詞) Hi-C en_US dc.subject (關鍵詞) Chromosome organization en_US dc.subject (關鍵詞) Deep learning en_US dc.title (題名) 深度學習應用在偵測拓撲結構域 zh_TW dc.title (題名) Topology Association Domain Identification using Deep Learning en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) 1. Bonev, B. & Cavalli, G. Organization and function of the 3D genome. Nat Rev Genet. 17:661–78. 2016.2. Dekker, J. et al. Capturing chromosome conformation. Science. 295(5558):1306–11. 2002.3. Simonis, M. et al. Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-ChIP (4C). Nat Genet. 38:1348–54. 2006.4. Dostie, J. & Dekker, J. Mapping networks of physical interactions between genomic ele-ments using 5C technology. Nat Protoc. 2:988–1002. 2007.5. Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals fold-ing principles of the human genome. Science. 326(5950):289–93. 2009.6. Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 485, pp. 376-380. 2012.7. Peifer, M. et al. Telomerase activation by genomic rearrangements in high-risk neuroblas-toma. Nature. 526:700–704. 2015.8. Zufferey, M. et al. Comparison of computational methods for the identification of topologically associating domains. Genome Biol. 19(1):217. 2018.9. van Berkum, N.L. et al. Hi-C: a method to study the three-dimensional architecture of ge-nomes. J Vis Exp. 39:pii:1869. 2010.10. Sexton, T. et al. Three-dimensional folding and functional organization principles of the Drosophila genome. Cell. 148, 458–472. 2012.11. Matharu, N. & Ahituv, N. Minor. Loops in major folds: enhancer-promoter looping, chromatin restructuring, and their association with transcriptional regulation and disease. PLoS Genet. 11: e1005640. 2015.12. Fukushima, K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern.1980; 36, 193–20213. LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradient-based learning applied to document recognition. IEEE. 86(11):2278–2324. 1998.14. Krizhevsky, A., Sutskever, I., and Hinton, G. ImageNet classification with deep convolutional neural networks. NIPS. 2012.15. Yaffe, E. & Tanay, A. Probabilistic modeling of Hi-C contact maps eliminates systematic bi-ases to characterize global chromosomal architecture. Nature Genet. 2012; 43, 1059–106516. Hi-C project at Ren Lab, http://chromosome.sdsc.edu/mouse/hi-c/download.html17. Pal, K., Forcato, M., and Ferrari, F. Hi-C analysis: from data generation to integration. Bio-phys Rev, 11. pp. 67-78. 2019.18. Dali, R. & Blanchette, M. A critical assessment of topologically associating domain predic-tion tools. Nucleic Acids Res. 45, 2994–3005. 2017.19. Hu, J. et al. Squeeze-and-excitation networks. CVPR.201820. Ioffe,S. & Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. CoRR; 2015.21. He, K. et al. Deep residual learning for image recognition. CVPR. 201622. Y. Shen. et al. A map of the cis-regulatory sequences in the mouse genome Nature, 488, pp. 116-120. 201223. Liu T, Wang Z. HiCNN: a very deep convolutional neural network to better enhance the resolution of Hi-C data. Bioinformatics. 201924. Z. Wang, W. Yan, and T. Oates. Time Series Classification from Scratch with Deep Neural Networks: A Strong Baseline. CoRR, abs/1611.06455. 2016.25. Zhou, B. et al. Learning deep features for discriminative localization. CVPR. 201426. Zhang, Y. et al. Enhancing Hi-C data resolution with deep convolutional neural network HiCPlus. Nat Commun 9, 750. 2018.27. Szabo, Q. et al. TADs are 3D structural units of higher-order chromosome organization in Drosophila. Science Advances 4, eaar8082. 2018.28. Henderson, J. et al. Accurate prediction of boundaries of high resolution topologically associated domains (TADs) in fruit flies using deep learning. Nucleic Acids Res. 47, e78. 2019.29. Schuettengruber, B. et al. Cooperativity, specificity, and evolutionary stability of Polycomb targeting in Drosophila. Cell Rep 9, 219–33. 2014.30. Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–80. 2014.31. Bonev, B. et al. Multiscale 3D Genome Rewiring during Mouse Neural Development. Cell 171, 557–572.e24. 2017. zh_TW dc.identifier.doi (DOI) 10.6814/NCCU201901133 en_US