Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 HiCNextflow:一個靈活且具再現性的Hi-C分析平台
HiCNextflow: a flexible and reproducible Hi-C workflow
作者 鄭博仁
Cheng, Po-Jen
貢獻者 張家銘
Chang, Jia-Ming
鄭博仁
Cheng, Po-Jen
關鍵詞 Hi-C
流程
對齊
Hi-C
pipeline
mapping
日期 2021
上傳時間 2-Mar-2021 14:31:33 (UTC+8)
摘要 我們建立一個以Nextflow為框架的Hi-C分析平台。 Hi-C分析是一個複雜的流程,包括多個步驟:對齊 (mapping)、過濾 (filtering) 和環形(looping) 。首先我們選用現有流程中所使用的對齊工具,評估其對齊的效率,並觀察對後續環形個數的影響,透過視覺化來分析其影響效益,總結地為Hi-C分析平台找到最佳的對齊工具。此外流程中各步驟均依據上一步的輸出結果來當作輸入值,因此開發流程時,容錯是關鍵的問題, 當發生錯誤時,重新執行應該僅從失敗的地方開始重跑,而不需要從頭開始執行。 我們根據Nextflow設計新穎的Hi-C分析平台,從而簡化複雜的平行運算和互動式工作流程,而它將使我們的流程更快且更有效率。
We are planning to implement a Hi-C analysis pipeline based on the Nextflow framework. Hi-C analysis is a complex pipeline that includes multiple steps: mapping, filtering, loop calling, and parallelization. Each step is based on the outputs of the previous step. Fault tolerance is a critical issue in developing the pipeline. When a fault
happens, the rerunning pipeline should only start from the fail step instead of rerunning from the beginning. We will choose some mapping tools used by existing pipelines.
Then we execute them according to our planned workflow. We will find the optimal mapping tool for our pipeline from these methods based on the number of called loops and visualized graphs. We designed a novel Hi-C analysis pipeline based on Nextflow, which simplifies the implementation and the deployment of complex parallel and
reactive workflows. It makes our pipeline more quickly and more efficiently.
參考文獻 Castellano G, Le Dily F, Hermoso Pulido A, Beato M, Roma G. Hi-Cpipe: a pipeline for high-throughput chromosome capture. bioRxiv; 2015. DOI: 10.1101/020636.
Dekker J, Rippe K, Dekker M, et al. Capturing chromosome conformation. Science 2002; 295:1306–1311.
DeMaere, M. Z. & Darling, A. E. bin3C: exploiting Hi-C sequencing data to accurately resolve metagenome-assembled genomes. Genome Biology 20, 46 (2019).
Dixon JR, Jung I, Selvaraj S, Shen Y et al. Chromatin architecture reorganization during stem cell differentiation. Nature 2015 Feb 19;518(7539):331-6.
Di Tommaso, P., Chatzou, M., Floden, E. et al. Nextflow enables reproducible computational workflows. Nat Biotechnol 35, 316–319 (2017).
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012 Sep 6;489(7414):57-74. doi: 10.1038/nature11247. PMID: 22955616; PMCID: PMC3439153.
Gove, Robert J., et al. "Multi-processor reconfigurable in single instruction multiple data (SIMD) and multiple instruction multiple data (MIMD) modes and method of operation." U.S. Patent No. 5,212,777. 18 May 1993.
Heng Li, Richard Durbin, Fast and accurate short read alignment with Burrows–Wheeler transform, Bioinformatics, Volume 25, Issue 14, 15 July 2009, Pages 1754–1760, https://doi.org/10.1093/bioinformatics/btp324
Imakaev M, Fudenberg G, McCord RP, Naumova N, Goloborodko A, Lajoie BR, et al. Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat Meth. 2012;9:999–1003.
James T. Robinson, Douglass Turner, Neva C. Durand, Helga Thorvaldsdóttir, Jill P. Mesirov, Erez Lieberman Aiden. "Juicebox.js Provides a Cloud-Based Visualization System for Hi-C Data." Cell Systems 6(2),2018.
Jin F, Li Y, Dixon JR, Selvaraj S et al. A high-resolution map of the three-dimensional chromatin interactome in human cells. Nature 2013 Nov 14;503(7475):290-4.
Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012 Mar 4;9(4):357-9. doi: 10.1038/nmeth.1923.
Lazaris, C., Kelly, S., Ntziachristos, P. et al. HiC-bench: comprehensive and reproducible Hi-C data analysis designed for parameter exploration and benchmarking. BMC Genomics 18, 22 (2017).
Langmead B, Wilks C, Antonescu V, Charles R. Scaling read aligners to hundreds of threads on general-purpose processors. Bioinformatics. 2018 Jul 18. doi: 10.1093/bioinformatics/bty648.
Lieberman-Aiden E, van Berkum NL, Williams L, et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009;326(5950):289-293. doi:10.1126/science.1181369.
Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009 Jul 15;25(14):1754-60. doi: 10.1093/bioinformatics/btp324. Epub 2009 May 18. PMID: 19451168; PMCID: PMC2705234.
Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. 2013 arXiv:1303.3997v2.
Neva C. Durand, Muhammad S. Shamim, Ido Machol, Suhas S. P. Rao, Miriam H. Huntley, Eric S. Lander, and Erez Lieberman Aiden. "Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments." Cell Systems 3(1), 2016.
Rao S. S.P., Huntley M.H., Durand N.C., Stamenova E.K., Bochkov I.D., Robinson J.T., Sanborn A.L., Machol I., Omer A.D., Lander E.S. et al. .. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014; 159:1665–1880.
Sauria ME, Phillips-Cremins JE, Corces VG, Taylor J. HiFive: a tool suite for easy and efficient HiC and 5C data analysis. Genome Biol. 2015;16:237.
Servant N, Varoquaux N, Lajoie BR, Viara E, Chen C-J, Vert J-P, et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 2015;16:259.
Stempor P, Ahringer J (2014). “SeqPlots - Interactive software for exploratory data analyses, pattern discovery and visualization in genomics [version 1; referees: 2 approved, 1 approved with reservations].” Wellcome Open Research, 1(14). doi: 10.12688/wellcomeopenres.10004.1, R package version 1.27.0, https://wellcomeopenresearch.org/articles/1-14.
Tan L, Xing D, Chang CH, Li H, Xie XS. Three-dimensional genome structures of single diploid human cells. Science. 2018;361(6405):924-928. doi:10.1126/science.aat5641.
Wingett S, Ewels P, Furlan-Magaril M, et al. HiCUP: pipeline for mapping and processing Hi-C data. F1000Res. 2015;4:1310. Published 2015 Nov 20. doi:10.12688/f1000research.7334.1
描述 碩士
國立政治大學
資訊科學系
107753017
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0107753017
資料類型 thesis
dc.contributor.advisor 張家銘zh_TW
dc.contributor.advisor Chang, Jia-Mingen_US
dc.contributor.author (Authors) 鄭博仁zh_TW
dc.contributor.author (Authors) Cheng, Po-Jenen_US
dc.creator (作者) 鄭博仁zh_TW
dc.creator (作者) Cheng, Po-Jenen_US
dc.date (日期) 2021en_US
dc.date.accessioned 2-Mar-2021 14:31:33 (UTC+8)-
dc.date.available 2-Mar-2021 14:31:33 (UTC+8)-
dc.date.issued (上傳時間) 2-Mar-2021 14:31:33 (UTC+8)-
dc.identifier (Other Identifiers) G0107753017en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/134082-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學系zh_TW
dc.description (描述) 107753017zh_TW
dc.description.abstract (摘要) 我們建立一個以Nextflow為框架的Hi-C分析平台。 Hi-C分析是一個複雜的流程,包括多個步驟:對齊 (mapping)、過濾 (filtering) 和環形(looping) 。首先我們選用現有流程中所使用的對齊工具,評估其對齊的效率,並觀察對後續環形個數的影響,透過視覺化來分析其影響效益,總結地為Hi-C分析平台找到最佳的對齊工具。此外流程中各步驟均依據上一步的輸出結果來當作輸入值,因此開發流程時,容錯是關鍵的問題, 當發生錯誤時,重新執行應該僅從失敗的地方開始重跑,而不需要從頭開始執行。 我們根據Nextflow設計新穎的Hi-C分析平台,從而簡化複雜的平行運算和互動式工作流程,而它將使我們的流程更快且更有效率。zh_TW
dc.description.abstract (摘要) We are planning to implement a Hi-C analysis pipeline based on the Nextflow framework. Hi-C analysis is a complex pipeline that includes multiple steps: mapping, filtering, loop calling, and parallelization. Each step is based on the outputs of the previous step. Fault tolerance is a critical issue in developing the pipeline. When a fault
happens, the rerunning pipeline should only start from the fail step instead of rerunning from the beginning. We will choose some mapping tools used by existing pipelines.
Then we execute them according to our planned workflow. We will find the optimal mapping tool for our pipeline from these methods based on the number of called loops and visualized graphs. We designed a novel Hi-C analysis pipeline based on Nextflow, which simplifies the implementation and the deployment of complex parallel and
reactive workflows. It makes our pipeline more quickly and more efficiently.
en_US
dc.description.tableofcontents 1. Introduction 1
1.1. High-throughput Chromatin Conformation Capture (Hi-C) 1
1.2. Hi-C packages 3
1.3. Read mapping 4
1.4. NextFlow 7
1.5. Research motivation 8
2. Methods 9
2.1. Dataset 10
2.2. Mapping comparison 10
2.3. Filtering comparison 12
2.4. Loop comparison 12
2.5. HiCNextFlow 12
2.6. Hardware 13
3. Results 15
3.1. Mapping Results 15
3.2. Filtering Results 16
3.3. Loop Results 18
3.4. Loop visualization in contact map 22
3.5. Loop quality measure 22
3.6. HiCNextflow Results 24
3.7. Downsampling 26
4. Discussion and Conclusion 31
Reference 32
Supplementary Materials 35
1. Supplementary commands 35
2. Supplementary scripts 36
zh_TW
dc.format.extent 8453232 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0107753017en_US
dc.subject (關鍵詞) Hi-Czh_TW
dc.subject (關鍵詞) 流程zh_TW
dc.subject (關鍵詞) 對齊zh_TW
dc.subject (關鍵詞) Hi-Cen_US
dc.subject (關鍵詞) pipelineen_US
dc.subject (關鍵詞) mappingen_US
dc.title (題名) HiCNextflow:一個靈活且具再現性的Hi-C分析平台zh_TW
dc.title (題名) HiCNextflow: a flexible and reproducible Hi-C workflowen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) Castellano G, Le Dily F, Hermoso Pulido A, Beato M, Roma G. Hi-Cpipe: a pipeline for high-throughput chromosome capture. bioRxiv; 2015. DOI: 10.1101/020636.
Dekker J, Rippe K, Dekker M, et al. Capturing chromosome conformation. Science 2002; 295:1306–1311.
DeMaere, M. Z. & Darling, A. E. bin3C: exploiting Hi-C sequencing data to accurately resolve metagenome-assembled genomes. Genome Biology 20, 46 (2019).
Dixon JR, Jung I, Selvaraj S, Shen Y et al. Chromatin architecture reorganization during stem cell differentiation. Nature 2015 Feb 19;518(7539):331-6.
Di Tommaso, P., Chatzou, M., Floden, E. et al. Nextflow enables reproducible computational workflows. Nat Biotechnol 35, 316–319 (2017).
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012 Sep 6;489(7414):57-74. doi: 10.1038/nature11247. PMID: 22955616; PMCID: PMC3439153.
Gove, Robert J., et al. "Multi-processor reconfigurable in single instruction multiple data (SIMD) and multiple instruction multiple data (MIMD) modes and method of operation." U.S. Patent No. 5,212,777. 18 May 1993.
Heng Li, Richard Durbin, Fast and accurate short read alignment with Burrows–Wheeler transform, Bioinformatics, Volume 25, Issue 14, 15 July 2009, Pages 1754–1760, https://doi.org/10.1093/bioinformatics/btp324
Imakaev M, Fudenberg G, McCord RP, Naumova N, Goloborodko A, Lajoie BR, et al. Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat Meth. 2012;9:999–1003.
James T. Robinson, Douglass Turner, Neva C. Durand, Helga Thorvaldsdóttir, Jill P. Mesirov, Erez Lieberman Aiden. "Juicebox.js Provides a Cloud-Based Visualization System for Hi-C Data." Cell Systems 6(2),2018.
Jin F, Li Y, Dixon JR, Selvaraj S et al. A high-resolution map of the three-dimensional chromatin interactome in human cells. Nature 2013 Nov 14;503(7475):290-4.
Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012 Mar 4;9(4):357-9. doi: 10.1038/nmeth.1923.
Lazaris, C., Kelly, S., Ntziachristos, P. et al. HiC-bench: comprehensive and reproducible Hi-C data analysis designed for parameter exploration and benchmarking. BMC Genomics 18, 22 (2017).
Langmead B, Wilks C, Antonescu V, Charles R. Scaling read aligners to hundreds of threads on general-purpose processors. Bioinformatics. 2018 Jul 18. doi: 10.1093/bioinformatics/bty648.
Lieberman-Aiden E, van Berkum NL, Williams L, et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009;326(5950):289-293. doi:10.1126/science.1181369.
Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009 Jul 15;25(14):1754-60. doi: 10.1093/bioinformatics/btp324. Epub 2009 May 18. PMID: 19451168; PMCID: PMC2705234.
Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. 2013 arXiv:1303.3997v2.
Neva C. Durand, Muhammad S. Shamim, Ido Machol, Suhas S. P. Rao, Miriam H. Huntley, Eric S. Lander, and Erez Lieberman Aiden. "Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments." Cell Systems 3(1), 2016.
Rao S. S.P., Huntley M.H., Durand N.C., Stamenova E.K., Bochkov I.D., Robinson J.T., Sanborn A.L., Machol I., Omer A.D., Lander E.S. et al. .. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014; 159:1665–1880.
Sauria ME, Phillips-Cremins JE, Corces VG, Taylor J. HiFive: a tool suite for easy and efficient HiC and 5C data analysis. Genome Biol. 2015;16:237.
Servant N, Varoquaux N, Lajoie BR, Viara E, Chen C-J, Vert J-P, et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 2015;16:259.
Stempor P, Ahringer J (2014). “SeqPlots - Interactive software for exploratory data analyses, pattern discovery and visualization in genomics [version 1; referees: 2 approved, 1 approved with reservations].” Wellcome Open Research, 1(14). doi: 10.12688/wellcomeopenres.10004.1, R package version 1.27.0, https://wellcomeopenresearch.org/articles/1-14.
Tan L, Xing D, Chang CH, Li H, Xie XS. Three-dimensional genome structures of single diploid human cells. Science. 2018;361(6405):924-928. doi:10.1126/science.aat5641.
Wingett S, Ewels P, Furlan-Magaril M, et al. HiCUP: pipeline for mapping and processing Hi-C data. F1000Res. 2015;4:1310. Published 2015 Nov 20. doi:10.12688/f1000research.7334.1
zh_TW
dc.identifier.doi (DOI) 10.6814/NCCU202100270en_US