Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 深度學習應用在偵測拓撲結構域
Topology Association Domain Identification using Deep Learning
作者 楊鎮遠
Yang, Jhen-Yuan
貢獻者 張家銘
Chang, Jia-Ming
楊鎮遠
Yang, Jhen-Yuan
關鍵詞 拓撲關聯域
TAD
Hi-C
染色體組織
深度學習
Topology Association Domain
TAD
Hi-C
Chromosome organization
Deep learning
日期 2019
上傳時間 5-Sep-2019 16:15:31 (UTC+8)
摘要 摘要
● 背景:近年來,越來越多的證據表明三維染色體結構在基因組功能中起著重要作用。拓撲關聯域(TAD)是一種自相互作用區域,已被證明是染色體的結構單元。然而,在高通量染色體構象捕獲圖中鑑定TAD 是一項計算挑戰。
● 結果:我們提出了一個新問題,即TAD 分類,而不是原始的TAD 識別。具體地,我們將Hi-C 圖考慮為圖像,使得TAD 分類是使用兩個深度學習模型,卷積神經網絡和殘差神經網絡來解決的圖像分類問題。此外,我們設計了一種合乎邏輯的方法來生成非TAD 數據,用於二元分類問題。通過跨物種和細胞類型驗證,深度學習模型的表現
良好,AUC> 0.80。
● 結論:TAD 在進化過程中被證明是保守的。有趣的是,我們的結果證實TAD 分類模型是實用的跨物種。從圖像分類的角度來看,它表明人與鼠之間的TAD 顯示了共同的模式。我們的方法可以成為測試Hi-C 圖中TAD 變化或保存的新方法。例如,如果兩個分類模型是可交換的,則保留兩個Hi-C 圖的TAD
Abstract
● Background: In the last years, increasing evidence indicates that three-dimensional
chromosome structure plays important rule in genomic function. A Topologically
Associating Domain (TAD), a self-interacting region, has been shown
as a structure unit of chromosome. However, it is a computational challenge to
identify TADs in high-throughput chromosome conformation capture map.
● Results: We proposed a novel problem, TAD classification, instead of original
TAD identification. Specifically, we consider Hi-C map as image such that TAD
classification is an image classification problem which is solved using two deep
learning models, convolutional neural network and residual neural network. Besides,
we designed an elegant way to generate non-TAD data for binary classification
problem. The performance of deep learning models is quite promising,
AUC > 0.80, through cross species and cell types validation.
● Conclusions: TAD has been shown conserved during evolution. Interestingly,
our results confirm TAD classification model is practical cross species. It indicates
TADs between human and mouse show common pattern from point of
view of image classification. Our approach could be a new way to test variation
or conservation of TADs among Hi-C maps. For example, TADs of two Hi-C
maps are conserved if two classification models are exchangeable.
參考文獻 1. Bonev, B. & Cavalli, G. Organization and function of the 3D genome. Nat Rev Genet. 17:661–78. 2016.
2. Dekker, J. et al. Capturing chromosome conformation. Science. 295(5558):1306–11. 2002.
3. Simonis, M. et al. Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-ChIP (4C). Nat Genet. 38:1348–54. 2006.
4. Dostie, J. & Dekker, J. Mapping networks of physical interactions between genomic ele-ments using 5C technology. Nat Protoc. 2:988–1002. 2007.
5. Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals fold-ing principles of the human genome. Science. 326(5950):289–93. 2009.
6. Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 485, pp. 376-380. 2012.
7. Peifer, M. et al. Telomerase activation by genomic rearrangements in high-risk neuroblas-toma. Nature. 526:700–704. 2015.
8. Zufferey, M. et al. Comparison of computational methods for the identification of topologically associating domains. Genome Biol. 19(1):217. 2018.
9. van Berkum, N.L. et al. Hi-C: a method to study the three-dimensional architecture of ge-nomes. J Vis Exp. 39:pii:1869. 2010.
10. Sexton, T. et al. Three-dimensional folding and functional organization principles of the Drosophila genome. Cell. 148, 458–472. 2012.
11. Matharu, N. & Ahituv, N. Minor. Loops in major folds: enhancer-promoter looping, chromatin restructuring, and their association with transcriptional regulation and disease. PLoS Genet. 11: e1005640. 2015.
12. Fukushima, K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern.1980; 36, 193–202
13. LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradient-based learning applied to document recognition. IEEE. 86(11):2278–2324. 1998.
14. Krizhevsky, A., Sutskever, I., and Hinton, G. ImageNet classification with deep convolutional neural networks. NIPS. 2012.
15. Yaffe, E. & Tanay, A. Probabilistic modeling of Hi-C contact maps eliminates systematic bi-ases to characterize global chromosomal architecture. Nature Genet. 2012; 43, 1059–1065
16. Hi-C project at Ren Lab, http://chromosome.sdsc.edu/mouse/hi-c/download.html
17. Pal, K., Forcato, M., and Ferrari, F. Hi-C analysis: from data generation to integration. Bio-phys Rev, 11. pp. 67-78. 2019.
18. Dali, R. & Blanchette, M. A critical assessment of topologically associating domain predic-tion tools. Nucleic Acids Res. 45, 2994–3005. 2017.
19. Hu, J. et al. Squeeze-and-excitation networks. CVPR.2018
20. Ioffe,S. & Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. CoRR; 2015.
21. He, K. et al. Deep residual learning for image recognition. CVPR. 2016
22. Y. Shen. et al. A map of the cis-regulatory sequences in the mouse genome Nature, 488, pp. 116-120. 2012
23. Liu T, Wang Z. HiCNN: a very deep convolutional neural network to better enhance the resolution of Hi-C data. Bioinformatics. 2019
24. Z. Wang, W. Yan, and T. Oates. Time Series Classification from Scratch with Deep Neural Networks: A Strong Baseline. CoRR, abs/1611.06455. 2016.
25. Zhou, B. et al. Learning deep features for discriminative localization. CVPR. 2014
26. Zhang, Y. et al. Enhancing Hi-C data resolution with deep convolutional neural network HiCPlus. Nat Commun 9, 750. 2018.
27. Szabo, Q. et al. TADs are 3D structural units of higher-order chromosome organization in Drosophila. Science Advances 4, eaar8082. 2018.
28. Henderson, J. et al. Accurate prediction of boundaries of high resolution topologically associated domains (TADs) in fruit flies using deep learning. Nucleic Acids Res. 47, e78. 2019.
29. Schuettengruber, B. et al. Cooperativity, specificity, and evolutionary stability of Polycomb targeting in Drosophila. Cell Rep 9, 219–33. 2014.
30. Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–80. 2014.
31. Bonev, B. et al. Multiscale 3D Genome Rewiring during Mouse Neural Development. Cell 171, 557–572.e24. 2017.
描述 碩士
國立政治大學
資訊科學系
105753033
資料來源 http://thesis.lib.nccu.edu.tw/record/#G1057530331
資料類型 thesis
dc.contributor.advisor 張家銘zh_TW
dc.contributor.advisor Chang, Jia-Mingen_US
dc.contributor.author (Authors) 楊鎮遠zh_TW
dc.contributor.author (Authors) Yang, Jhen-Yuanen_US
dc.creator (作者) 楊鎮遠zh_TW
dc.creator (作者) Yang, Jhen-Yuanen_US
dc.date (日期) 2019en_US
dc.date.accessioned 5-Sep-2019 16:15:31 (UTC+8)-
dc.date.available 5-Sep-2019 16:15:31 (UTC+8)-
dc.date.issued (上傳時間) 5-Sep-2019 16:15:31 (UTC+8)-
dc.identifier (Other Identifiers) G1057530331en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/125646-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學系zh_TW
dc.description (描述) 105753033zh_TW
dc.description.abstract (摘要) 摘要
● 背景:近年來,越來越多的證據表明三維染色體結構在基因組功能中起著重要作用。拓撲關聯域(TAD)是一種自相互作用區域,已被證明是染色體的結構單元。然而,在高通量染色體構象捕獲圖中鑑定TAD 是一項計算挑戰。
● 結果:我們提出了一個新問題,即TAD 分類,而不是原始的TAD 識別。具體地,我們將Hi-C 圖考慮為圖像,使得TAD 分類是使用兩個深度學習模型,卷積神經網絡和殘差神經網絡來解決的圖像分類問題。此外,我們設計了一種合乎邏輯的方法來生成非TAD 數據,用於二元分類問題。通過跨物種和細胞類型驗證,深度學習模型的表現
良好,AUC> 0.80。
● 結論:TAD 在進化過程中被證明是保守的。有趣的是,我們的結果證實TAD 分類模型是實用的跨物種。從圖像分類的角度來看,它表明人與鼠之間的TAD 顯示了共同的模式。我們的方法可以成為測試Hi-C 圖中TAD 變化或保存的新方法。例如,如果兩個分類模型是可交換的,則保留兩個Hi-C 圖的TAD
zh_TW
dc.description.abstract (摘要) Abstract
● Background: In the last years, increasing evidence indicates that three-dimensional
chromosome structure plays important rule in genomic function. A Topologically
Associating Domain (TAD), a self-interacting region, has been shown
as a structure unit of chromosome. However, it is a computational challenge to
identify TADs in high-throughput chromosome conformation capture map.
● Results: We proposed a novel problem, TAD classification, instead of original
TAD identification. Specifically, we consider Hi-C map as image such that TAD
classification is an image classification problem which is solved using two deep
learning models, convolutional neural network and residual neural network. Besides,
we designed an elegant way to generate non-TAD data for binary classification
problem. The performance of deep learning models is quite promising,
AUC > 0.80, through cross species and cell types validation.
● Conclusions: TAD has been shown conserved during evolution. Interestingly,
our results confirm TAD classification model is practical cross species. It indicates
TADs between human and mouse show common pattern from point of
view of image classification. Our approach could be a new way to test variation
or conservation of TADs among Hi-C maps. For example, TADs of two Hi-C
maps are conserved if two classification models are exchangeable.
en_US
dc.description.tableofcontents List of Figures ii
List of Tables iii
Abstract iv
Keywords iv
1. Introduction 1
1.1 Overview chromosome conformation capture 1
1.2 High-throughput chromosome conformation capture 1
1.3 Topologically Associating Domains 2
1.4 CTCF 4
1.5 Deep learning algorithm 5
1.6 Fully Convolutional Neural Network 6
1.7 Residual Neural Network 7
1.8 Squeeze-and-Excitation Net 8
1.9 Deep learning with Hi-C 9
2. Methods 10
2.1 Data preparation 11
2.2 non-TAD generation 11
2.3 Deep learning models 12
2.3.1 Model architectures 12
2.4 Evaluation 14
2.4.1 Experimental designs 14
2.4.2 Metrics 15
2.5 TAD caller by Dynamic programming 16
3. Results 18
3.1 Five-cross validation in species-specific dataset 18
3.2 Prediction error analysis 26
3.3 Data preprocessing 28
3.4 Evaluate model 32
4. Discussion 35
5. Conclusion 36
6. References 37
zh_TW
dc.format.extent 1669209 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G1057530331en_US
dc.subject (關鍵詞) 拓撲關聯域zh_TW
dc.subject (關鍵詞) TADzh_TW
dc.subject (關鍵詞) Hi-Czh_TW
dc.subject (關鍵詞) 染色體組織zh_TW
dc.subject (關鍵詞) 深度學習zh_TW
dc.subject (關鍵詞) Topology Association Domainen_US
dc.subject (關鍵詞) TADen_US
dc.subject (關鍵詞) Hi-Cen_US
dc.subject (關鍵詞) Chromosome organizationen_US
dc.subject (關鍵詞) Deep learningen_US
dc.title (題名) 深度學習應用在偵測拓撲結構域zh_TW
dc.title (題名) Topology Association Domain Identification using Deep Learningen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) 1. Bonev, B. & Cavalli, G. Organization and function of the 3D genome. Nat Rev Genet. 17:661–78. 2016.
2. Dekker, J. et al. Capturing chromosome conformation. Science. 295(5558):1306–11. 2002.
3. Simonis, M. et al. Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-ChIP (4C). Nat Genet. 38:1348–54. 2006.
4. Dostie, J. & Dekker, J. Mapping networks of physical interactions between genomic ele-ments using 5C technology. Nat Protoc. 2:988–1002. 2007.
5. Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals fold-ing principles of the human genome. Science. 326(5950):289–93. 2009.
6. Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 485, pp. 376-380. 2012.
7. Peifer, M. et al. Telomerase activation by genomic rearrangements in high-risk neuroblas-toma. Nature. 526:700–704. 2015.
8. Zufferey, M. et al. Comparison of computational methods for the identification of topologically associating domains. Genome Biol. 19(1):217. 2018.
9. van Berkum, N.L. et al. Hi-C: a method to study the three-dimensional architecture of ge-nomes. J Vis Exp. 39:pii:1869. 2010.
10. Sexton, T. et al. Three-dimensional folding and functional organization principles of the Drosophila genome. Cell. 148, 458–472. 2012.
11. Matharu, N. & Ahituv, N. Minor. Loops in major folds: enhancer-promoter looping, chromatin restructuring, and their association with transcriptional regulation and disease. PLoS Genet. 11: e1005640. 2015.
12. Fukushima, K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern.1980; 36, 193–202
13. LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradient-based learning applied to document recognition. IEEE. 86(11):2278–2324. 1998.
14. Krizhevsky, A., Sutskever, I., and Hinton, G. ImageNet classification with deep convolutional neural networks. NIPS. 2012.
15. Yaffe, E. & Tanay, A. Probabilistic modeling of Hi-C contact maps eliminates systematic bi-ases to characterize global chromosomal architecture. Nature Genet. 2012; 43, 1059–1065
16. Hi-C project at Ren Lab, http://chromosome.sdsc.edu/mouse/hi-c/download.html
17. Pal, K., Forcato, M., and Ferrari, F. Hi-C analysis: from data generation to integration. Bio-phys Rev, 11. pp. 67-78. 2019.
18. Dali, R. & Blanchette, M. A critical assessment of topologically associating domain predic-tion tools. Nucleic Acids Res. 45, 2994–3005. 2017.
19. Hu, J. et al. Squeeze-and-excitation networks. CVPR.2018
20. Ioffe,S. & Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. CoRR; 2015.
21. He, K. et al. Deep residual learning for image recognition. CVPR. 2016
22. Y. Shen. et al. A map of the cis-regulatory sequences in the mouse genome Nature, 488, pp. 116-120. 2012
23. Liu T, Wang Z. HiCNN: a very deep convolutional neural network to better enhance the resolution of Hi-C data. Bioinformatics. 2019
24. Z. Wang, W. Yan, and T. Oates. Time Series Classification from Scratch with Deep Neural Networks: A Strong Baseline. CoRR, abs/1611.06455. 2016.
25. Zhou, B. et al. Learning deep features for discriminative localization. CVPR. 2014
26. Zhang, Y. et al. Enhancing Hi-C data resolution with deep convolutional neural network HiCPlus. Nat Commun 9, 750. 2018.
27. Szabo, Q. et al. TADs are 3D structural units of higher-order chromosome organization in Drosophila. Science Advances 4, eaar8082. 2018.
28. Henderson, J. et al. Accurate prediction of boundaries of high resolution topologically associated domains (TADs) in fruit flies using deep learning. Nucleic Acids Res. 47, e78. 2019.
29. Schuettengruber, B. et al. Cooperativity, specificity, and evolutionary stability of Polycomb targeting in Drosophila. Cell Rep 9, 219–33. 2014.
30. Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–80. 2014.
31. Bonev, B. et al. Multiscale 3D Genome Rewiring during Mouse Neural Development. Cell 171, 557–572.e24. 2017.
zh_TW
dc.identifier.doi (DOI) 10.6814/NCCU201901133en_US