學術產出-學位論文
文章檢視/開啟
書目匯出
-
題名 Hi-C實驗資料正規化
Hi-C data normalization作者 魏孝全 貢獻者 薛慧敏
魏孝全關鍵詞 染色體捕捉技術
Hi-C實驗資料
正規化
基因特徵偏差
Chromosome conformation capture
Hi-C data
Normalization
Genome feature日期 2017 上傳時間 11-七月-2017 11:26:01 (UTC+8) 摘要 本研究探討高通量染色體捕捉技術 (high-throughput chromosome conformation capture, Hi-C) 實驗所產生的關聯矩陣資料之正規化方法。已知該類實驗主要用來測量染色體之間的空間距離,正規化的目的是移除資料中的系統性偏差,本文主要針對基因特徵所造成之偏差。有別於Hu等人 (2012) 所提出的「局部基因特徵正規化法」(local genome feature normalization, LGF法),我們所提出的「二次函數正規化法」(quadratic function normalization, QF法) 建立在更為一般化的二次對數模型與負二項分配假設上。本研究透過模擬實驗以及人類淋巴細胞資料 (GSE18199) 來評估QF法的表現,並且與其他方法比較。在模擬實驗中,我們發現當模型正確時,QF法能有效消除偏差。在實例中,當基因特徵偏差被消除後,則染色體之間的相對距離在重複實驗資料之間有更為一致的結果。另一方面,我們發現實驗所採用的限制酶影響關聯矩陣的結果,而且運用這些正規化方法並不能有效消除限制酶造成的偏差。
Recently, the high-throughput chromosome conformation capture (Hi-C) experiment is developed to explore the three-dimensional structure of genomics. To assess the chromosomal interaction, a contact matrix is produced from a Hi-C experiment. Very often, systematic technical biases appear in the contact matrix and lead to inadequate conclusions. Consequently, data normalization to remove these biases is essential and necessary prior advanced inference. In this research, we propose the so-called quadratic function normalization method, which is a modification of the local genome feature normalization (Hu et al., 2012) by considering a more general model. Simulation studies are conducted to evaluate the proposed method. When the model assumption holds, the proposed method has adequate performance. Further, a Hi-C data set of a human lymphoblastoid cell GSE18199 is employed for a comparison of our method and two existing methods. It’s observed that normalization improves the reproducibility between experimental replicates. However, the effect of normalization is lean in eliminating the bias of restriction enzymes.參考文獻 參考資料Agard DA, Hiraoka Y, Shaw P, Sedat JW, (1989).Fluorescence microscopy in three dimensions, Methods Cell Biol., 30, 353-377.Dekker J, Rippe K, Dekker M, Kleckner N, (2002).Capturing chromosome conformation, Science, 295, 1306-1311.Dostie J, Richmond TA, Arnaout RA, Selzer RR, Lee WL, Honan TA, Rubio ED, Krumm A, Lamb J, Nusbaum C, Green RD, Dekker J, (2006).Chromosome Conformation Capture Carbon Copy (5C): A massively parallel solution for mapping interactions between genomic elements, Genome Res., 16, 1299-1309.Dudchenko O, Batra SS, Omer AD, Nyquist SK, Hoeger M, Durand NC, Shamim MS, Machol I, Lander ES, Aiden AP, Aiden EL, (2017).De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds, Science, 356, 92-95.Gerstein MB, Kundaje A, Hariharan M, Landt SG, Yan KK, Cheng C, Mu XJ, Khurana E, Rozowsky J, Alexander R, Min R, Alves P, Abyzov A, Addleman N, Bhardwaj N, Boyle AP, Cayting P, Charos A, Chen DZ, Cheng Y, Clarke D, Eastman C, Euskirchen G, Frietze S, Fu Y, Gertz J, Grubert F, Harmanci A, Jain P, Kasowski M, Lacroute P, Leng J, Lian J, Monahan H, O`Geen H, Ouyang Z, Partridge EC, Patacsil D, Pauli F, Raha D, Ramirez L, Reddy TE, Reed B, Shi M, Slifer T, Wang J, Wu L, Yang X, Yip KY, Zilberman-Schapira G, Batzoglou S, Sidow A, Farnham PJ, Myers RM, Weissman SM, Snyder M, (2012).Architecture of the human regulatory network derived from ENCODE data, Nature, 489, 91-100.Hu M, Deng K, Selvaraj S, Qin Z, Ren B, Liu JS, (2012).HiCNorm: removing biases in Hi-C data via Poisson regression, Bioinformatics, 28, 3131-3133.Imakaev M, Fudenberg G, McCord RP, Naumova N, Goloborodko A, Lajoie BR, Dekker J, Mirny LA, (2012).Iterative correction of Hi-C data reveals hallmarks of chromosome organization, Nature Methods, 9, 999-1003.Li H, Ruan J, Durbin R, (2008).Mapping short DNA sequencing reads and calling variants using mapping quality scores, Genome Res., 18, 1851-1858.Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO, Sandstrom R, Bernstein B, Bender MA, Groudine M, Gnirke A, Stamatoyannopoulos J, Mirny LA, Lander ES, Dekker J, (2009).Comprehensive mapping of long range interactions reveals folding principles of the human genome, Science, 326, 289-293.Lupiáñez DG, Kraft K, Heinrich V, Krawitz P, Brancati F, Klopocki E, Horn D, Kayserili H, Opitz JM, Laxova R, Santos-Simarro F, Gilbert-Dussardier B, Wittler L, Borschiwer M, Haas SA, Osterwalder M, Franke M, Timmermann B, Hecht J, Spielmann M, Visel A, Mundlos S, (2015).Disruptions of topological chromatin domains cause pathogenic rewiring of gene-enhancer interactions, Cell, 161, 1012-1025.Simonis M, Klous P, Splinter E, Moshkin Y, Willemsen R, de Wit E, van Steensel B, de Laat W, (2006).Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture–on-chip (4C), Nature Genetics, 38, 1348-1354.Sexton T, Cavalli G, (2015). The role of chromosome domains in shaping the functional genome, Cell, 160, 1049–1059.Yaffe E, Tanay A, (2011).Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture, Nature Genetics, 43, 1059-1065. 描述 碩士
國立政治大學
統計學系
104354025資料來源 http://thesis.lib.nccu.edu.tw/record/#G0104354025 資料類型 thesis dc.contributor.advisor 薛慧敏 zh_TW dc.contributor.author (作者) 魏孝全 zh_TW dc.creator (作者) 魏孝全 zh_TW dc.date (日期) 2017 en_US dc.date.accessioned 11-七月-2017 11:26:01 (UTC+8) - dc.date.available 11-七月-2017 11:26:01 (UTC+8) - dc.date.issued (上傳時間) 11-七月-2017 11:26:01 (UTC+8) - dc.identifier (其他 識別碼) G0104354025 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/110783 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 統計學系 zh_TW dc.description (描述) 104354025 zh_TW dc.description.abstract (摘要) 本研究探討高通量染色體捕捉技術 (high-throughput chromosome conformation capture, Hi-C) 實驗所產生的關聯矩陣資料之正規化方法。已知該類實驗主要用來測量染色體之間的空間距離,正規化的目的是移除資料中的系統性偏差,本文主要針對基因特徵所造成之偏差。有別於Hu等人 (2012) 所提出的「局部基因特徵正規化法」(local genome feature normalization, LGF法),我們所提出的「二次函數正規化法」(quadratic function normalization, QF法) 建立在更為一般化的二次對數模型與負二項分配假設上。本研究透過模擬實驗以及人類淋巴細胞資料 (GSE18199) 來評估QF法的表現,並且與其他方法比較。在模擬實驗中,我們發現當模型正確時,QF法能有效消除偏差。在實例中,當基因特徵偏差被消除後,則染色體之間的相對距離在重複實驗資料之間有更為一致的結果。另一方面,我們發現實驗所採用的限制酶影響關聯矩陣的結果,而且運用這些正規化方法並不能有效消除限制酶造成的偏差。 zh_TW dc.description.abstract (摘要) Recently, the high-throughput chromosome conformation capture (Hi-C) experiment is developed to explore the three-dimensional structure of genomics. To assess the chromosomal interaction, a contact matrix is produced from a Hi-C experiment. Very often, systematic technical biases appear in the contact matrix and lead to inadequate conclusions. Consequently, data normalization to remove these biases is essential and necessary prior advanced inference. In this research, we propose the so-called quadratic function normalization method, which is a modification of the local genome feature normalization (Hu et al., 2012) by considering a more general model. Simulation studies are conducted to evaluate the proposed method. When the model assumption holds, the proposed method has adequate performance. Further, a Hi-C data set of a human lymphoblastoid cell GSE18199 is employed for a comparison of our method and two existing methods. It’s observed that normalization improves the reproducibility between experimental replicates. However, the effect of normalization is lean in eliminating the bias of restriction enzymes. en_US dc.description.tableofcontents 第一章 緒論 1第二章 方法 4第三章 模擬 10第四章 實例分析 20第五章 結論 36參考資料 38 zh_TW dc.format.extent 1240046 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0104354025 en_US dc.subject (關鍵詞) 染色體捕捉技術 zh_TW dc.subject (關鍵詞) Hi-C實驗資料 zh_TW dc.subject (關鍵詞) 正規化 zh_TW dc.subject (關鍵詞) 基因特徵偏差 zh_TW dc.subject (關鍵詞) Chromosome conformation capture en_US dc.subject (關鍵詞) Hi-C data en_US dc.subject (關鍵詞) Normalization en_US dc.subject (關鍵詞) Genome feature en_US dc.title (題名) Hi-C實驗資料正規化 zh_TW dc.title (題名) Hi-C data normalization en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) 參考資料Agard DA, Hiraoka Y, Shaw P, Sedat JW, (1989).Fluorescence microscopy in three dimensions, Methods Cell Biol., 30, 353-377.Dekker J, Rippe K, Dekker M, Kleckner N, (2002).Capturing chromosome conformation, Science, 295, 1306-1311.Dostie J, Richmond TA, Arnaout RA, Selzer RR, Lee WL, Honan TA, Rubio ED, Krumm A, Lamb J, Nusbaum C, Green RD, Dekker J, (2006).Chromosome Conformation Capture Carbon Copy (5C): A massively parallel solution for mapping interactions between genomic elements, Genome Res., 16, 1299-1309.Dudchenko O, Batra SS, Omer AD, Nyquist SK, Hoeger M, Durand NC, Shamim MS, Machol I, Lander ES, Aiden AP, Aiden EL, (2017).De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds, Science, 356, 92-95.Gerstein MB, Kundaje A, Hariharan M, Landt SG, Yan KK, Cheng C, Mu XJ, Khurana E, Rozowsky J, Alexander R, Min R, Alves P, Abyzov A, Addleman N, Bhardwaj N, Boyle AP, Cayting P, Charos A, Chen DZ, Cheng Y, Clarke D, Eastman C, Euskirchen G, Frietze S, Fu Y, Gertz J, Grubert F, Harmanci A, Jain P, Kasowski M, Lacroute P, Leng J, Lian J, Monahan H, O`Geen H, Ouyang Z, Partridge EC, Patacsil D, Pauli F, Raha D, Ramirez L, Reddy TE, Reed B, Shi M, Slifer T, Wang J, Wu L, Yang X, Yip KY, Zilberman-Schapira G, Batzoglou S, Sidow A, Farnham PJ, Myers RM, Weissman SM, Snyder M, (2012).Architecture of the human regulatory network derived from ENCODE data, Nature, 489, 91-100.Hu M, Deng K, Selvaraj S, Qin Z, Ren B, Liu JS, (2012).HiCNorm: removing biases in Hi-C data via Poisson regression, Bioinformatics, 28, 3131-3133.Imakaev M, Fudenberg G, McCord RP, Naumova N, Goloborodko A, Lajoie BR, Dekker J, Mirny LA, (2012).Iterative correction of Hi-C data reveals hallmarks of chromosome organization, Nature Methods, 9, 999-1003.Li H, Ruan J, Durbin R, (2008).Mapping short DNA sequencing reads and calling variants using mapping quality scores, Genome Res., 18, 1851-1858.Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO, Sandstrom R, Bernstein B, Bender MA, Groudine M, Gnirke A, Stamatoyannopoulos J, Mirny LA, Lander ES, Dekker J, (2009).Comprehensive mapping of long range interactions reveals folding principles of the human genome, Science, 326, 289-293.Lupiáñez DG, Kraft K, Heinrich V, Krawitz P, Brancati F, Klopocki E, Horn D, Kayserili H, Opitz JM, Laxova R, Santos-Simarro F, Gilbert-Dussardier B, Wittler L, Borschiwer M, Haas SA, Osterwalder M, Franke M, Timmermann B, Hecht J, Spielmann M, Visel A, Mundlos S, (2015).Disruptions of topological chromatin domains cause pathogenic rewiring of gene-enhancer interactions, Cell, 161, 1012-1025.Simonis M, Klous P, Splinter E, Moshkin Y, Willemsen R, de Wit E, van Steensel B, de Laat W, (2006).Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture–on-chip (4C), Nature Genetics, 38, 1348-1354.Sexton T, Cavalli G, (2015). The role of chromosome domains in shaping the functional genome, Cell, 160, 1049–1059.Yaffe E, Tanay A, (2011).Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture, Nature Genetics, 43, 1059-1065. zh_TW