Publications-Theses
Article View/Open
Publication Export
-
Google ScholarTM
NCCU Library
Citation Infomation
Related Publications in TAIR
題名 MCIENet : 基於 CNN 的 DNA 序列多尺度資訊提取模型用於三維染色質交互作用預測
MCIENet : Multi-scale CNN-based Information Extraction from DNA Sequences for 3D chromatin interactions Prediction作者 何彥南
Ho, Yen-Nan貢獻者 張家銘
Chang, Jia-Ming
何彥南
Ho, Yen-Nan關鍵詞 染色質環預測
深度學習
DNA序列
Inception架構
三維基因組學
Chromatin loop prediction
Deep learning
DNA sequence
Inception architecture
3D genomics日期 2024 上傳時間 2-Dec-2024 11:21:52 (UTC+8) 摘要 染色質三維結構對於基因調控具有重要影響,染色質環(Chromatin loops)作為其基本單位,其結構和功能在不同細胞類型中存在差異,研究染色質三維結構可以幫助科學家們進一步理解細胞功能與運作。可是實際透過儀器與實體實驗去獲取三維結構資訊需要較高的設備、時間與樣本取得上的成本,也因為如此,許多計算預測方法被提出來,目的是透過 DNA 序列資訊、蛋白質或是開放染色質(open chromatin)等資訊去預測是否存在 CTCF 環的結構,而其中僅使用 DNA 序列資訊進行預測是最為困難的任務。本研究提出了一種新型深度學習模型 MCIENet (Multi-scale CNN-based Information Extraction Net),MCIENet採用Inception架構,對DNA序列進行多尺度特徵提取。我們在正常細胞 (GM12878) 與癌症細胞 (Helas3) 上進行了驗證,結果表明 MCIENet在不同細胞類型上均取得了優異的預測性能,尤其是在較長的DNA序列作為輸入時效果顯著。並揭示了在預測不同細胞類型時,在模型模型架構的設計上是存在差異性的。此外,我們使用 DNABERT2-512 基於大量基因資料所訓練的預訓練模型進行微調,發現在癌症細胞(Helas3) 上的效果很差,證實了這種基於大量基因資訊訓練的預訓練模型無法應用在所有種類的細胞結構預測上。此外,透過 DeepLIFT 可解釋性分析,進一步去觀察 MCIENet 的效果,發現其在長序列輸入時對於細節的捕捉更優秀,此外本研究還透過可解釋分析證實了 anchor-base 方法在錨點中心偏移時所存在的問題,導致其在後續使用上缺乏穩定性,且有所限制。
The three-dimensional structure of chromatin plays a crucial role in gene regulation. Chromatin loops, as the fundamental units of chromatin structure, exhibit diverse structures and functions across different cell types. Investigating the three-dimensional chromatin structure can help scientists gain a deeper understanding of cellular functions and operations. However, experimentally obtaining three-dimensional structural information through instruments and physical experiments requires substantial equipment, time, and sample acquisition costs. Consequently, numerous computational prediction methods have been proposed to predict CTCF loops using DNA sequence information, protein information, or open chromatin information. Among these methods, prediction solely based on DNA sequence information is the most challenging task. In this study, we propose a novel deep learning model, MCIENet (Multi-scale CNN-based Information Extraction Net), which employs an Inception architecture to extract multi-scale features from DNA sequences. We validated MCIENet on normal cells (GM12878) and cancer cells (Helas3). The results demonstrate that MCIENet performs better prediction on different cell types, especially when longer DNA sequences are used as input. Furthermore, our findings reveal differences in model architecture design when predicting different cell types. Additionally, we fine-tuned the DNABERT2-512 pre-trained model, which was trained on a large amount of genetic data, and found that its performance on cancer cells (Helas3) was poor. This confirms that pre-trained models trained on large amounts of genetic information cannot be applied to all types of cell structure prediction. Moreover, through DeepLIFT interpretability analysis, we further observed that MCIENet excels at capturing details when inputting long sequences. This study also confirms, through interpretability analysis, the limitations of anchor-based methods when the anchor center is shifted, leading to a lack of stability and restrictions in subsequent applications.參考文獻 1. Dekker, Job, et al. "Capturing chromosome conformation." science 295.5558 (2002): 1306-1311. 2. Zhou, Zhihan, et al. "Dnabert-2: Efficient foundation model and benchmark for multi-species genome." arXiv preprint arXiv:2306.15006 (2023). 3. Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. 4. Mao, Weiguang, Dennis Kostka, and Maria Chikina. "Modeling enhancer-promoter interactions with attention-based neural networks." bioRxiv (2017): 219667. 5. Zhuang, Zhong, Xiaotong Shen, and Wei Pan. "A simple convolutional neural network for prediction of enhancer–promoter interactions with DNA sequence data." Bioinformatics 35.17 (2019): 2899-2906. 6. Zhang, Mingyang, Yujia Hu, and Min Zhu. "EPIsHilbert: Prediction of enhancer-promoter interactions via hilbert curve encoding and transfer learning." Genes 12.9 (2021): 1385. 7. Ni, Yu, et al. "EPI-Mind: Identifying Enhancer–Promoter Interactions Based on Transformer Mechanism." Interdisciplinary Sciences: Computational Life Sciences 14.3 (2022): 786-794. 8. Cao, Fan, et al. "Chromatin interaction neural network (ChINN): a machine learning-based method for predicting chromatin interactions from DNA sequences." Genome biology 22 (2021): 1-25. 9. Schwessinger, Ron, et al. "DeepC: predicting 3D genome folding using megabase-scale transfer learning." Nature methods 17.11 (2020): 1118-1124. 10. Zhou, Jian. "Sequence-based modeling of three-dimensional genome architecture from kilobase to chromosome scale." Nature genetics 54.5 (2022): 725-734. 11. Singh, Shashank, et al. "Predicting enhancer-promoter interaction from genomic sequence with deep neural networks." Quantitative Biology 7 (2019): 122-137. 12. Hong, Zengyan, et al. "Identifying enhancer–promoter interactions with neural network based on pre-trained DNA vectors and attention mechanism." Bioinformatics 36.4 (2020): 1037-1043. 13. Jing, Fang, Shao-Wu Zhang, and Shihua Zhang. "Prediction of enhancer–promoter interactions using the cross-cell type information and domain adversarial neural network." BMC bioinformatics 21.1 (2020): 1-16. 14. Agarwal, Aman, and Li Chen. "DeepPHiC: Predicting promoter-centered chromatin interactions using a novel deep learning approach." Bioinformatics 39.1 (2023): btac801. 15. Trieu, Tuan, Alexander Martinez-Fundichely, and Ekta Khurana. "DeepMILO: a deep learning approach to predict the impact of non-coding sequence variants on 3D chromatin structure." Genome biology 21 (2020): 1-11. 16. Fudenberg, Geoff, David R. Kelley, and Katherine S. Pollard. "Predicting 3D genome folding from DNA sequence with Akita." Nature methods 17.11 (2020): 1111-1117. 17. Tan, Jimin, et al. "Cell-type-specific prediction of 3D chromatin organization enables high-throughput in silico genetic screening." Nature biotechnology (2023): 1-11. 18. Yakovchuk, Peter, Ekaterina Protozanova, and Maxim D. Frank-Kamenetskii. "Base-stacking and base-pairing contributions into thermal stability of the DNA double helix." Nucleic acids research 34.2 (2006): 564-574. 19. Kumaran, R. Ileng, Rajika Thakar, and David L. Spector. "Chromatin dynamics and gene positioning." Cell 132.6 (2008): 929-934. 20. Dekker, Job, Marc A. Marti-Renom, and Leonid A. Mirny. "Exploring the three-dimensional organization of genomes: interpreting chromatin interaction data." Nature Reviews Genetics 14.6 (2013): 390-403. 21. Bonev, Boyan, and Giacomo Cavalli. "Organization and function of the 3D genome." Nature Reviews Genetics 17.11 (2016): 661-678. 22. Dekker, Job, et al. "The 4D nucleome project." Nature 549.7671 (2017): 219-226. 23. Dekker, Job, et al. "Spatial and temporal organization of the genome: Current state and future aims of the 4D nucleome project." Molecular cell (2023). 24. Soroczynski, Jan, and Viviana I. Risca. "Technological advances in probing 4D genome organization." Current Opinion in Cell Biology 84 (2023): 102211. 25. Lieberman-Aiden, Erez, et al. "Comprehensive mapping of long-range interactions reveals folding principles of the human genome." science 326.5950 (2009): 289-293. 26. Fullwood, Melissa J., and Yijun Ruan. "ChIP‐based methods for the identification of long‐range chromatin interactions." Journal of cellular biochemistry 107.1 (2009): 30-39. 27. Zhou, Tianming, Ruochi Zhang, and Jian Ma. "The 3D genome structure of single cells." Annual review of biomedical data science 4 (2021): 21-41. 28. Jerkovic, Ivana, and Giacomo Cavalli. "Understanding 3D genome organization by multidisciplinary methods." Nature Reviews Molecular Cell Biology 22.8 (2021): 511-528. 29. Babu, Deepak, and Melissa J. Fullwood. "3D genome organization in health and disease: emerging opportunities in cancer translational medicine." Nucleus 6.5 (2015): 382-393. 30. Akıncılar, Semih Can, et al. "Long-range chromatin interactions drive mutant TERT promoter activation." Cancer discovery 6.11 (2016): 1276-1291. 31. Krumm, Anton, and Zhijun Duan. "Understanding the 3D genome: emerging impacts on human disease." Seminars in cell & developmental biology. Vol. 90. Academic Press, 2019. 32. Umlauf, David, and Raphaël Mourad. "The 3D genome: From fundamental principles to disease and cancer." Seminars in cell & developmental biology. Vol. 90. Academic Press, 2019 33. Goel, Viraat Y., and Anders S. Hansen. "The macro and micro of chromosome conformation capture." Wiley Interdisciplinary Reviews: Developmental Biology 10.6 (2021): e395. 34. Pal, Koustav, Mattia Forcato, and Francesco Ferrari. "Hi-C analysis: from data generation to integration." Biophysical reviews 11 (2019): 67-78. 35. Rao, Suhas SP, et al. "A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping." Cell 159.7 (2014): 1665-1680. 36. Schoenfelder, Stefan, et al. "The pluripotent regulatory circuitry connecting promoters to their long-range interacting elements." Genome research 25.4 (2015): 582-597. 37. Piecyk, Robert S., Luca Schlegel, and Frank Johannes. "Predicting 3D chromatin interactions from DNA sequence using Deep Learning." Computational and Structural Biotechnology Journal 20 (2022): 3439-3448. 38. Jin, Fulai, et al. "A high-resolution map of the three-dimensional chromatin interactome in human cells." Nature 503.7475 (2013): 290-294. 39. Hsieh, Tsung-Han S., et al. "Mapping nucleosome resolution chromosome folding in yeast by micro-C." Cell 162.1 (2015): 108-119. 40. Schoenfelder, Stefan, et al. "Promoter capture Hi-C: high-resolution, genome-wide profiling of promoter interactions." JoVE (Journal of Visualized Experiments) 136 (2018): e57320. 41. Li, Guoliang, et al. "Chromatin interaction analysis with paired-end tag (ChIA-PET) sequencing technology and application." BMC genomics 15.12 (2014): 1-10. 42. Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017). 43. Whalen, Sean, Rebecca M. Truty, and Katherine S. Pollard. "Enhancer–promoter interactions are encoded by complex genomic signatures on looping chromatin." Nature genetics 48.5 (2016): 488-496. 44. Yang, Yang, et al. "Exploiting sequence-based features for predicting enhancer–promoter interactions." Bioinformatics 33.14 (2017): i252-i260. 45. Schreiber, Jacob, et al. "Nucleotide sequence and DNaseI sensitivity are predictive of 3D chromatin architecture." BioRxiv (2017): 103614. 46. Zeng, Wanwen, Mengmeng Wu, and Rui Jiang. "Prediction of enhancer-promoter interactions via natural language processing." BMC genomics 19 (2018): 13-22. 47. Min, Xiaoping, et al. "Predicting enhancer-promoter interactions by deep learning and matching heuristic." Briefings in Bioinformatics 22.4 (2021): bbaa254. 48. Fan, Yongxian, and Binchao Peng. "StackEPI: identification of cell line-specific enhancer–promoter interactions based on stacking ensemble learning." BMC bioinformatics 23.1 (2022): 1-18. 49. Chen, Ken, Huiying Zhao, and Yuedong Yang. "Capturing large genomic contexts for accurately predicting enhancer-promoter interactions." Briefings in Bioinformatics 23.2 (2022): bbab577. 50. Li, Wenran, Wing Hung Wong, and Rui Jiang. "DeepTACT: predicting 3D chromatin contacts via bootstrapping deep learning." Nucleic acids research 47.10 (2019): e60-e60. 51. Zhang, Ruochi, et al. "Predicting CTCF-mediated chromatin loops using CTCF-MP." Bioinformatics 34.13 (2018): i133-i141. 52. Wang, Weibing, et al. "CCIP: predicting CTCF-mediated chromatin loops with transitivity." Bioinformatics 37.24 (2021): 4635-4642. 53. Ahmad, Muneer, Low Tan Jung, and Al-Amin Bhuiyan. "From DNA to protein: Why genetic code context of nucleotides for DNA signal processing? A review." Biomedical Signal Processing and Control 34 (2017): 44-63. 54. Dakhli, Abdesselem, and Chokri Ben Amar. "Power spectrum and dynamic time warping for DNA sequences classification." Evolving Systems 11 (2020): 637-646. 55. Ng, Patrick. "dna2vec: Consistent vector representations of variable-length k-mers." arXiv preprint arXiv:1701.06279 (2017). 56. Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems 26 (2013). 57. Ji, Yanrong, et al. "DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome." Bioinformatics 37.15 (2021): 2112-2120. 58. Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018). 59. ENCODE Project Consortium. "An integrated encyclopedia of DNA elements in the human genome." Nature 489.7414 (2012): 57. 60. Tang, Zhonghui, et al. "CTCF-mediated human 3D genome architecture reveals chromatin topology for transcription." Cell 163.7 (2015): 1611-1627. 61. Li, Guoliang, et al. "Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation." Cell 148.1 (2012): 84-98. 62. Cao, Fan, and Melissa J. Fullwood. "Inflated performance measures in enhancer–promoter interaction-prediction methods." Nature genetics 51.8 (2019): 1196-1198. 63. Sharma, Sagar, Simone Sharma, and Anidhya Athaiya. "Activation functions in neural networks." Towards Data Sci 6.12 (2017): 310-316. 64. Mendoza-Pitti, Luis, et al. "Developing a Long Short-Term Memory-Based Model for Forecasting the Daily Energy Consumption of Heating, Ventilation, and Air Conditioning Systems in Buildings." Applied Sciences 11.15 (2021): 6722. 65. Li, Z., et al. "cardiGAN: A generative adversarial network model for design and discovery of multi principal element alloys." Journal of Materials Science & Technology 125 (2022): 81-96. 66. Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from overfitting." The journal of machine learning research 15.1 (2014): 1929-1958. 67. Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." International conference on machine learning. pmlr, 2015. 68. Shrikumar, Avanti, Peyton Greenside, and Anshul Kundaje. "Learning important features through propagating activation differences." International conference on machine learning. PMlR, 2017. 69. Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. "" Why should i trust you?" Explaining the predictions of any classifier." Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 2016. 70. Lundberg, Scott. "A unified approach to interpreting model predictions." arXiv preprint arXiv:1705.07874 (2017). 71. Huang, Gao, et al. "Densely connected convolutional networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. 72. Child, Rewon, et al. "Generating long sequences with sparse transformers." arXiv preprint arXiv:1904.10509 (2019). 73. Beltagy, Iz, Matthew E. Peters, and Arman Cohan. "Longformer: The long-document transformer." arXiv preprint arXiv:2004.05150 (2020). 描述 碩士
國立政治大學
資訊科學系
110753202資料來源 http://thesis.lib.nccu.edu.tw/record/#G0110753202 資料類型 thesis dc.contributor.advisor 張家銘 zh_TW dc.contributor.advisor Chang, Jia-Ming en_US dc.contributor.author (Authors) 何彥南 zh_TW dc.contributor.author (Authors) Ho, Yen-Nan en_US dc.creator (作者) 何彥南 zh_TW dc.creator (作者) Ho, Yen-Nan en_US dc.date (日期) 2024 en_US dc.date.accessioned 2-Dec-2024 11:21:52 (UTC+8) - dc.date.available 2-Dec-2024 11:21:52 (UTC+8) - dc.date.issued (上傳時間) 2-Dec-2024 11:21:52 (UTC+8) - dc.identifier (Other Identifiers) G0110753202 en_US dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/154569 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊科學系 zh_TW dc.description (描述) 110753202 zh_TW dc.description.abstract (摘要) 染色質三維結構對於基因調控具有重要影響,染色質環(Chromatin loops)作為其基本單位,其結構和功能在不同細胞類型中存在差異,研究染色質三維結構可以幫助科學家們進一步理解細胞功能與運作。可是實際透過儀器與實體實驗去獲取三維結構資訊需要較高的設備、時間與樣本取得上的成本,也因為如此,許多計算預測方法被提出來,目的是透過 DNA 序列資訊、蛋白質或是開放染色質(open chromatin)等資訊去預測是否存在 CTCF 環的結構,而其中僅使用 DNA 序列資訊進行預測是最為困難的任務。本研究提出了一種新型深度學習模型 MCIENet (Multi-scale CNN-based Information Extraction Net),MCIENet採用Inception架構,對DNA序列進行多尺度特徵提取。我們在正常細胞 (GM12878) 與癌症細胞 (Helas3) 上進行了驗證,結果表明 MCIENet在不同細胞類型上均取得了優異的預測性能,尤其是在較長的DNA序列作為輸入時效果顯著。並揭示了在預測不同細胞類型時,在模型模型架構的設計上是存在差異性的。此外,我們使用 DNABERT2-512 基於大量基因資料所訓練的預訓練模型進行微調,發現在癌症細胞(Helas3) 上的效果很差,證實了這種基於大量基因資訊訓練的預訓練模型無法應用在所有種類的細胞結構預測上。此外,透過 DeepLIFT 可解釋性分析,進一步去觀察 MCIENet 的效果,發現其在長序列輸入時對於細節的捕捉更優秀,此外本研究還透過可解釋分析證實了 anchor-base 方法在錨點中心偏移時所存在的問題,導致其在後續使用上缺乏穩定性,且有所限制。 zh_TW dc.description.abstract (摘要) The three-dimensional structure of chromatin plays a crucial role in gene regulation. Chromatin loops, as the fundamental units of chromatin structure, exhibit diverse structures and functions across different cell types. Investigating the three-dimensional chromatin structure can help scientists gain a deeper understanding of cellular functions and operations. However, experimentally obtaining three-dimensional structural information through instruments and physical experiments requires substantial equipment, time, and sample acquisition costs. Consequently, numerous computational prediction methods have been proposed to predict CTCF loops using DNA sequence information, protein information, or open chromatin information. Among these methods, prediction solely based on DNA sequence information is the most challenging task. In this study, we propose a novel deep learning model, MCIENet (Multi-scale CNN-based Information Extraction Net), which employs an Inception architecture to extract multi-scale features from DNA sequences. We validated MCIENet on normal cells (GM12878) and cancer cells (Helas3). The results demonstrate that MCIENet performs better prediction on different cell types, especially when longer DNA sequences are used as input. Furthermore, our findings reveal differences in model architecture design when predicting different cell types. Additionally, we fine-tuned the DNABERT2-512 pre-trained model, which was trained on a large amount of genetic data, and found that its performance on cancer cells (Helas3) was poor. This confirms that pre-trained models trained on large amounts of genetic information cannot be applied to all types of cell structure prediction. Moreover, through DeepLIFT interpretability analysis, we further observed that MCIENet excels at capturing details when inputting long sequences. This study also confirms, through interpretability analysis, the limitations of anchor-based methods when the anchor center is shifted, leading to a lack of stability and restrictions in subsequent applications. en_US dc.description.tableofcontents 摘要 1 Abstract 2 目錄 3 表次 5 圖次 6 第一章 緒論 1 1.1. 研究背景與動機 1 1.2. 三維基因組學 2 1.3. 捕捉染色體交互作用 4 1.4. 染色體交互作用預測 6 1.5. DNA序列的編碼方法 8 1.6. 基於 DNA 序列預測生物結構的深度學習方法 10 第二章 方法 14 2.1. 概覽 14 2.2. 資料集 16 2.2.1 生成 CTCF loop 正負樣本 17 2.2.2 訓練、驗證與測試資料集的切分 18 2.2.3 DNA 序列的編碼 18 2.3. 模型架構 19 2.3.1 BaseCNN 19 2.3.2 MCIENet 21 2.3.3 激活函數的選擇 26 2.3.4 特徵聚合架構 27 2.3.5 提升模型泛化性 30 2.4. 訓練策略 31 2.5. 損失函數與優化器 32 2.6. 評估指標 33 2.7. Benchmark 的選擇與實作 34 2.8. DeepLIFT 可解釋分析 35 2.9. 實作環境 37 第三章、結果 39 3.1. BaseCNN 39 3.1.1 基礎架構測試 - 特徵提取架構 39 3.1.2 基礎架構測試 - 模型泛化性 41 3.1.3 DNA 輸入長度對染色質交互作用預測的影響 43 3.1.4 DNA 資料顆粒度對染色質交互作用預測的影響 44 3.1.5 Performance 45 3.2. MCIENet 49 3.2.1 資訊提取分支中不同 kernel size 的提取器 49 3.2.2 1x1 conv 與 extractor maxpool 的重要性 50 3.2.3 Inception block 各分支的通道比例測試 51 3.2.4 Performance 52 3.3. Benchmark 56 3.4. 可解釋分析 58 3.4.1 MCIENet 與 BaseCNN 在 DeepLIFT 可解釋性 58 3.4.2 輸入更長的 DNA seq 模型是否可以學習的更好 61 3.4.3 輸入的錨點偏移問題 66 第四章、討論 69 5.1. Transformer-base 與 CNN-base 方法在 DNA 序列上的優勢與限制 69 5.2. 基於大量 DNA 序列資訊預訓練的語言模型所潛在的問題 70 5.3. 在實際預測時輸入的錨點中交互作用區域偏移問題 71 5.4. 重新思考關於使用多元的資料輔助互作用預測的必要性 71 第五章、結論 73 參考文獻 74 附錄 81 zh_TW dc.format.extent 25127907 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0110753202 en_US dc.subject (關鍵詞) 染色質環預測 zh_TW dc.subject (關鍵詞) 深度學習 zh_TW dc.subject (關鍵詞) DNA序列 zh_TW dc.subject (關鍵詞) Inception架構 zh_TW dc.subject (關鍵詞) 三維基因組學 zh_TW dc.subject (關鍵詞) Chromatin loop prediction en_US dc.subject (關鍵詞) Deep learning en_US dc.subject (關鍵詞) DNA sequence en_US dc.subject (關鍵詞) Inception architecture en_US dc.subject (關鍵詞) 3D genomics en_US dc.title (題名) MCIENet : 基於 CNN 的 DNA 序列多尺度資訊提取模型用於三維染色質交互作用預測 zh_TW dc.title (題名) MCIENet : Multi-scale CNN-based Information Extraction from DNA Sequences for 3D chromatin interactions Prediction en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) 1. Dekker, Job, et al. "Capturing chromosome conformation." science 295.5558 (2002): 1306-1311. 2. Zhou, Zhihan, et al. "Dnabert-2: Efficient foundation model and benchmark for multi-species genome." arXiv preprint arXiv:2306.15006 (2023). 3. Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. 4. Mao, Weiguang, Dennis Kostka, and Maria Chikina. "Modeling enhancer-promoter interactions with attention-based neural networks." bioRxiv (2017): 219667. 5. Zhuang, Zhong, Xiaotong Shen, and Wei Pan. "A simple convolutional neural network for prediction of enhancer–promoter interactions with DNA sequence data." Bioinformatics 35.17 (2019): 2899-2906. 6. Zhang, Mingyang, Yujia Hu, and Min Zhu. "EPIsHilbert: Prediction of enhancer-promoter interactions via hilbert curve encoding and transfer learning." Genes 12.9 (2021): 1385. 7. Ni, Yu, et al. "EPI-Mind: Identifying Enhancer–Promoter Interactions Based on Transformer Mechanism." Interdisciplinary Sciences: Computational Life Sciences 14.3 (2022): 786-794. 8. Cao, Fan, et al. "Chromatin interaction neural network (ChINN): a machine learning-based method for predicting chromatin interactions from DNA sequences." Genome biology 22 (2021): 1-25. 9. Schwessinger, Ron, et al. "DeepC: predicting 3D genome folding using megabase-scale transfer learning." Nature methods 17.11 (2020): 1118-1124. 10. Zhou, Jian. "Sequence-based modeling of three-dimensional genome architecture from kilobase to chromosome scale." Nature genetics 54.5 (2022): 725-734. 11. Singh, Shashank, et al. "Predicting enhancer-promoter interaction from genomic sequence with deep neural networks." Quantitative Biology 7 (2019): 122-137. 12. Hong, Zengyan, et al. "Identifying enhancer–promoter interactions with neural network based on pre-trained DNA vectors and attention mechanism." Bioinformatics 36.4 (2020): 1037-1043. 13. Jing, Fang, Shao-Wu Zhang, and Shihua Zhang. "Prediction of enhancer–promoter interactions using the cross-cell type information and domain adversarial neural network." BMC bioinformatics 21.1 (2020): 1-16. 14. Agarwal, Aman, and Li Chen. "DeepPHiC: Predicting promoter-centered chromatin interactions using a novel deep learning approach." Bioinformatics 39.1 (2023): btac801. 15. Trieu, Tuan, Alexander Martinez-Fundichely, and Ekta Khurana. "DeepMILO: a deep learning approach to predict the impact of non-coding sequence variants on 3D chromatin structure." Genome biology 21 (2020): 1-11. 16. Fudenberg, Geoff, David R. Kelley, and Katherine S. Pollard. "Predicting 3D genome folding from DNA sequence with Akita." Nature methods 17.11 (2020): 1111-1117. 17. Tan, Jimin, et al. "Cell-type-specific prediction of 3D chromatin organization enables high-throughput in silico genetic screening." Nature biotechnology (2023): 1-11. 18. Yakovchuk, Peter, Ekaterina Protozanova, and Maxim D. Frank-Kamenetskii. "Base-stacking and base-pairing contributions into thermal stability of the DNA double helix." Nucleic acids research 34.2 (2006): 564-574. 19. Kumaran, R. Ileng, Rajika Thakar, and David L. Spector. "Chromatin dynamics and gene positioning." Cell 132.6 (2008): 929-934. 20. Dekker, Job, Marc A. Marti-Renom, and Leonid A. Mirny. "Exploring the three-dimensional organization of genomes: interpreting chromatin interaction data." Nature Reviews Genetics 14.6 (2013): 390-403. 21. Bonev, Boyan, and Giacomo Cavalli. "Organization and function of the 3D genome." Nature Reviews Genetics 17.11 (2016): 661-678. 22. Dekker, Job, et al. "The 4D nucleome project." Nature 549.7671 (2017): 219-226. 23. Dekker, Job, et al. "Spatial and temporal organization of the genome: Current state and future aims of the 4D nucleome project." Molecular cell (2023). 24. Soroczynski, Jan, and Viviana I. Risca. "Technological advances in probing 4D genome organization." Current Opinion in Cell Biology 84 (2023): 102211. 25. Lieberman-Aiden, Erez, et al. "Comprehensive mapping of long-range interactions reveals folding principles of the human genome." science 326.5950 (2009): 289-293. 26. Fullwood, Melissa J., and Yijun Ruan. "ChIP‐based methods for the identification of long‐range chromatin interactions." Journal of cellular biochemistry 107.1 (2009): 30-39. 27. Zhou, Tianming, Ruochi Zhang, and Jian Ma. "The 3D genome structure of single cells." Annual review of biomedical data science 4 (2021): 21-41. 28. Jerkovic, Ivana, and Giacomo Cavalli. "Understanding 3D genome organization by multidisciplinary methods." Nature Reviews Molecular Cell Biology 22.8 (2021): 511-528. 29. Babu, Deepak, and Melissa J. Fullwood. "3D genome organization in health and disease: emerging opportunities in cancer translational medicine." Nucleus 6.5 (2015): 382-393. 30. Akıncılar, Semih Can, et al. "Long-range chromatin interactions drive mutant TERT promoter activation." Cancer discovery 6.11 (2016): 1276-1291. 31. Krumm, Anton, and Zhijun Duan. "Understanding the 3D genome: emerging impacts on human disease." Seminars in cell & developmental biology. Vol. 90. Academic Press, 2019. 32. Umlauf, David, and Raphaël Mourad. "The 3D genome: From fundamental principles to disease and cancer." Seminars in cell & developmental biology. Vol. 90. Academic Press, 2019 33. Goel, Viraat Y., and Anders S. Hansen. "The macro and micro of chromosome conformation capture." Wiley Interdisciplinary Reviews: Developmental Biology 10.6 (2021): e395. 34. Pal, Koustav, Mattia Forcato, and Francesco Ferrari. "Hi-C analysis: from data generation to integration." Biophysical reviews 11 (2019): 67-78. 35. Rao, Suhas SP, et al. "A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping." Cell 159.7 (2014): 1665-1680. 36. Schoenfelder, Stefan, et al. "The pluripotent regulatory circuitry connecting promoters to their long-range interacting elements." Genome research 25.4 (2015): 582-597. 37. Piecyk, Robert S., Luca Schlegel, and Frank Johannes. "Predicting 3D chromatin interactions from DNA sequence using Deep Learning." Computational and Structural Biotechnology Journal 20 (2022): 3439-3448. 38. Jin, Fulai, et al. "A high-resolution map of the three-dimensional chromatin interactome in human cells." Nature 503.7475 (2013): 290-294. 39. Hsieh, Tsung-Han S., et al. "Mapping nucleosome resolution chromosome folding in yeast by micro-C." Cell 162.1 (2015): 108-119. 40. Schoenfelder, Stefan, et al. "Promoter capture Hi-C: high-resolution, genome-wide profiling of promoter interactions." JoVE (Journal of Visualized Experiments) 136 (2018): e57320. 41. Li, Guoliang, et al. "Chromatin interaction analysis with paired-end tag (ChIA-PET) sequencing technology and application." BMC genomics 15.12 (2014): 1-10. 42. Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017). 43. Whalen, Sean, Rebecca M. Truty, and Katherine S. Pollard. "Enhancer–promoter interactions are encoded by complex genomic signatures on looping chromatin." Nature genetics 48.5 (2016): 488-496. 44. Yang, Yang, et al. "Exploiting sequence-based features for predicting enhancer–promoter interactions." Bioinformatics 33.14 (2017): i252-i260. 45. Schreiber, Jacob, et al. "Nucleotide sequence and DNaseI sensitivity are predictive of 3D chromatin architecture." BioRxiv (2017): 103614. 46. Zeng, Wanwen, Mengmeng Wu, and Rui Jiang. "Prediction of enhancer-promoter interactions via natural language processing." BMC genomics 19 (2018): 13-22. 47. Min, Xiaoping, et al. "Predicting enhancer-promoter interactions by deep learning and matching heuristic." Briefings in Bioinformatics 22.4 (2021): bbaa254. 48. Fan, Yongxian, and Binchao Peng. "StackEPI: identification of cell line-specific enhancer–promoter interactions based on stacking ensemble learning." BMC bioinformatics 23.1 (2022): 1-18. 49. Chen, Ken, Huiying Zhao, and Yuedong Yang. "Capturing large genomic contexts for accurately predicting enhancer-promoter interactions." Briefings in Bioinformatics 23.2 (2022): bbab577. 50. Li, Wenran, Wing Hung Wong, and Rui Jiang. "DeepTACT: predicting 3D chromatin contacts via bootstrapping deep learning." Nucleic acids research 47.10 (2019): e60-e60. 51. Zhang, Ruochi, et al. "Predicting CTCF-mediated chromatin loops using CTCF-MP." Bioinformatics 34.13 (2018): i133-i141. 52. Wang, Weibing, et al. "CCIP: predicting CTCF-mediated chromatin loops with transitivity." Bioinformatics 37.24 (2021): 4635-4642. 53. Ahmad, Muneer, Low Tan Jung, and Al-Amin Bhuiyan. "From DNA to protein: Why genetic code context of nucleotides for DNA signal processing? A review." Biomedical Signal Processing and Control 34 (2017): 44-63. 54. Dakhli, Abdesselem, and Chokri Ben Amar. "Power spectrum and dynamic time warping for DNA sequences classification." Evolving Systems 11 (2020): 637-646. 55. Ng, Patrick. "dna2vec: Consistent vector representations of variable-length k-mers." arXiv preprint arXiv:1701.06279 (2017). 56. Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems 26 (2013). 57. Ji, Yanrong, et al. "DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome." Bioinformatics 37.15 (2021): 2112-2120. 58. Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018). 59. ENCODE Project Consortium. "An integrated encyclopedia of DNA elements in the human genome." Nature 489.7414 (2012): 57. 60. Tang, Zhonghui, et al. "CTCF-mediated human 3D genome architecture reveals chromatin topology for transcription." Cell 163.7 (2015): 1611-1627. 61. Li, Guoliang, et al. "Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation." Cell 148.1 (2012): 84-98. 62. Cao, Fan, and Melissa J. Fullwood. "Inflated performance measures in enhancer–promoter interaction-prediction methods." Nature genetics 51.8 (2019): 1196-1198. 63. Sharma, Sagar, Simone Sharma, and Anidhya Athaiya. "Activation functions in neural networks." Towards Data Sci 6.12 (2017): 310-316. 64. Mendoza-Pitti, Luis, et al. "Developing a Long Short-Term Memory-Based Model for Forecasting the Daily Energy Consumption of Heating, Ventilation, and Air Conditioning Systems in Buildings." Applied Sciences 11.15 (2021): 6722. 65. Li, Z., et al. "cardiGAN: A generative adversarial network model for design and discovery of multi principal element alloys." Journal of Materials Science & Technology 125 (2022): 81-96. 66. Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from overfitting." The journal of machine learning research 15.1 (2014): 1929-1958. 67. Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." International conference on machine learning. pmlr, 2015. 68. Shrikumar, Avanti, Peyton Greenside, and Anshul Kundaje. "Learning important features through propagating activation differences." International conference on machine learning. PMlR, 2017. 69. Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. "" Why should i trust you?" Explaining the predictions of any classifier." Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 2016. 70. Lundberg, Scott. "A unified approach to interpreting model predictions." arXiv preprint arXiv:1705.07874 (2017). 71. Huang, Gao, et al. "Densely connected convolutional networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. 72. Child, Rewon, et al. "Generating long sequences with sparse transformers." arXiv preprint arXiv:1904.10509 (2019). 73. Beltagy, Iz, Matthew E. Peters, and Arman Cohan. "Longformer: The long-document transformer." arXiv preprint arXiv:2004.05150 (2020). zh_TW