學術產出-學位論文

文章檢視/開啟

書目匯出

Google ScholarTM

政大圖書館

引文資訊

TAIR相關學術產出

題名 Hi-C實驗資料正規化
Hi-C data normalization
作者 魏孝全
貢獻者 薛慧敏
魏孝全
關鍵詞 染色體捕捉技術
Hi-C實驗資料
正規化
基因特徵偏差
Chromosome conformation capture
Hi-C data
Normalization
Genome feature
日期 2017
上傳時間 11-七月-2017 11:26:01 (UTC+8)
摘要 本研究探討高通量染色體捕捉技術 (high-throughput chromosome conformation capture, Hi-C) 實驗所產生的關聯矩陣資料之正規化方法。已知該類實驗主要用來測量染色體之間的空間距離,正規化的目的是移除資料中的系統性偏差,本文主要針對基因特徵所造成之偏差。有別於Hu等人 (2012) 所提出的「局部基因特徵正規化法」(local genome feature normalization, LGF法),我們所提出的「二次函數正規化法」(quadratic function normalization, QF法) 建立在更為一般化的二次對數模型與負二項分配假設上。本研究透過模擬實驗以及人類淋巴細胞資料 (GSE18199) 來評估QF法的表現,並且與其他方法比較。在模擬實驗中,我們發現當模型正確時,QF法能有效消除偏差。在實例中,當基因特徵偏差被消除後,則染色體之間的相對距離在重複實驗資料之間有更為一致的結果。另一方面,我們發現實驗所採用的限制酶影響關聯矩陣的結果,而且運用這些正規化方法並不能有效消除限制酶造成的偏差。
Recently, the high-throughput chromosome conformation capture (Hi-C) experiment is developed to explore the three-dimensional structure of genomics. To assess the chromosomal interaction, a contact matrix is produced from a Hi-C experiment. Very often, systematic technical biases appear in the contact matrix and lead to inadequate conclusions. Consequently, data normalization to remove these biases is essential and necessary prior advanced inference. In this research, we propose the so-called quadratic function normalization method, which is a modification of the local genome feature normalization (Hu et al., 2012) by considering a more general model. Simulation studies are conducted to evaluate the proposed method. When the model assumption holds, the proposed method has adequate performance. Further, a Hi-C data set of a human lymphoblastoid cell GSE18199 is employed for a comparison of our method and two existing methods. It’s observed that normalization improves the reproducibility between experimental replicates. However, the effect of normalization is lean in eliminating the bias of restriction enzymes.
參考文獻 參考資料
Agard DA, Hiraoka Y, Shaw P, Sedat JW, (1989).Fluorescence microscopy in three dimensions, Methods Cell Biol., 30, 353-377.
Dekker J, Rippe K, Dekker M, Kleckner N, (2002).Capturing chromosome conformation, Science, 295, 1306-1311.
Dostie J, Richmond TA, Arnaout RA, Selzer RR, Lee WL, Honan TA, Rubio ED, Krumm A, Lamb J, Nusbaum C, Green RD, Dekker J, (2006).Chromosome Conformation Capture Carbon Copy (5C): A massively parallel solution for mapping interactions between genomic elements, Genome Res., 16, 1299-1309.
Dudchenko O, Batra SS, Omer AD, Nyquist SK, Hoeger M, Durand NC, Shamim MS, Machol I, Lander ES, Aiden AP, Aiden EL, (2017).De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds, Science, 356, 92-95.
Gerstein MB, Kundaje A, Hariharan M, Landt SG, Yan KK, Cheng C, Mu XJ, Khurana E, Rozowsky J, Alexander R, Min R, Alves P, Abyzov A, Addleman N, Bhardwaj N, Boyle AP, Cayting P, Charos A, Chen DZ, Cheng Y, Clarke D, Eastman C, Euskirchen G, Frietze S, Fu Y, Gertz J, Grubert F, Harmanci A, Jain P, Kasowski M, Lacroute P, Leng J, Lian J, Monahan H, O`Geen H, Ouyang Z, Partridge EC, Patacsil D, Pauli F, Raha D, Ramirez L, Reddy TE, Reed B, Shi M, Slifer T, Wang J, Wu L, Yang X, Yip KY, Zilberman-Schapira G, Batzoglou S, Sidow A, Farnham PJ, Myers RM, Weissman SM, Snyder M, (2012).Architecture of the human regulatory network derived from ENCODE data, Nature, 489, 91-100.
Hu M, Deng K, Selvaraj S, Qin Z, Ren B, Liu JS, (2012).HiCNorm: removing biases
in Hi-C data via Poisson regression, Bioinformatics, 28, 3131-3133.
Imakaev M, Fudenberg G, McCord RP, Naumova N, Goloborodko A, Lajoie BR, Dekker J, Mirny LA, (2012).Iterative correction of Hi-C data reveals hallmarks of chromosome organization, Nature Methods, 9, 999-1003.
Li H, Ruan J, Durbin R, (2008).Mapping short DNA sequencing reads and calling variants using mapping quality scores, Genome Res., 18, 1851-1858.
Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO, Sandstrom R, Bernstein B, Bender MA, Groudine M, Gnirke A, Stamatoyannopoulos J, Mirny LA, Lander ES, Dekker J, (2009).Comprehensive mapping of long range interactions reveals folding principles of the human genome, Science, 326, 289-293.
Lupiáñez DG, Kraft K, Heinrich V, Krawitz P, Brancati F, Klopocki E, Horn D, Kayserili H, Opitz JM, Laxova R, Santos-Simarro F, Gilbert-Dussardier B, Wittler L, Borschiwer M, Haas SA, Osterwalder M, Franke M, Timmermann B, Hecht J, Spielmann M, Visel A, Mundlos S, (2015).Disruptions of topological chromatin domains cause pathogenic rewiring of gene-enhancer interactions, Cell, 161, 1012-1025.
Simonis M, Klous P, Splinter E, Moshkin Y, Willemsen R, de Wit E, van Steensel B, de Laat W, (2006).Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture–on-chip (4C), Nature Genetics, 38, 1348-1354.
Sexton T, Cavalli G, (2015). The role of chromosome domains in shaping the functional genome, Cell, 160, 1049–1059.
Yaffe E, Tanay A, (2011).Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture, Nature Genetics, 43, 1059-1065.
描述 碩士
國立政治大學
統計學系
104354025
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0104354025
資料類型 thesis
dc.contributor.advisor 薛慧敏zh_TW
dc.contributor.author (作者) 魏孝全zh_TW
dc.creator (作者) 魏孝全zh_TW
dc.date (日期) 2017en_US
dc.date.accessioned 11-七月-2017 11:26:01 (UTC+8)-
dc.date.available 11-七月-2017 11:26:01 (UTC+8)-
dc.date.issued (上傳時間) 11-七月-2017 11:26:01 (UTC+8)-
dc.identifier (其他 識別碼) G0104354025en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/110783-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 統計學系zh_TW
dc.description (描述) 104354025zh_TW
dc.description.abstract (摘要) 本研究探討高通量染色體捕捉技術 (high-throughput chromosome conformation capture, Hi-C) 實驗所產生的關聯矩陣資料之正規化方法。已知該類實驗主要用來測量染色體之間的空間距離,正規化的目的是移除資料中的系統性偏差,本文主要針對基因特徵所造成之偏差。有別於Hu等人 (2012) 所提出的「局部基因特徵正規化法」(local genome feature normalization, LGF法),我們所提出的「二次函數正規化法」(quadratic function normalization, QF法) 建立在更為一般化的二次對數模型與負二項分配假設上。本研究透過模擬實驗以及人類淋巴細胞資料 (GSE18199) 來評估QF法的表現,並且與其他方法比較。在模擬實驗中,我們發現當模型正確時,QF法能有效消除偏差。在實例中,當基因特徵偏差被消除後,則染色體之間的相對距離在重複實驗資料之間有更為一致的結果。另一方面,我們發現實驗所採用的限制酶影響關聯矩陣的結果,而且運用這些正規化方法並不能有效消除限制酶造成的偏差。zh_TW
dc.description.abstract (摘要) Recently, the high-throughput chromosome conformation capture (Hi-C) experiment is developed to explore the three-dimensional structure of genomics. To assess the chromosomal interaction, a contact matrix is produced from a Hi-C experiment. Very often, systematic technical biases appear in the contact matrix and lead to inadequate conclusions. Consequently, data normalization to remove these biases is essential and necessary prior advanced inference. In this research, we propose the so-called quadratic function normalization method, which is a modification of the local genome feature normalization (Hu et al., 2012) by considering a more general model. Simulation studies are conducted to evaluate the proposed method. When the model assumption holds, the proposed method has adequate performance. Further, a Hi-C data set of a human lymphoblastoid cell GSE18199 is employed for a comparison of our method and two existing methods. It’s observed that normalization improves the reproducibility between experimental replicates. However, the effect of normalization is lean in eliminating the bias of restriction enzymes.en_US
dc.description.tableofcontents 第一章 緒論 1
第二章 方法 4
第三章 模擬 10
第四章 實例分析 20
第五章 結論 36
參考資料 38
zh_TW
dc.format.extent 1240046 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0104354025en_US
dc.subject (關鍵詞) 染色體捕捉技術zh_TW
dc.subject (關鍵詞) Hi-C實驗資料zh_TW
dc.subject (關鍵詞) 正規化zh_TW
dc.subject (關鍵詞) 基因特徵偏差zh_TW
dc.subject (關鍵詞) Chromosome conformation captureen_US
dc.subject (關鍵詞) Hi-C dataen_US
dc.subject (關鍵詞) Normalizationen_US
dc.subject (關鍵詞) Genome featureen_US
dc.title (題名) Hi-C實驗資料正規化zh_TW
dc.title (題名) Hi-C data normalizationen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) 參考資料
Agard DA, Hiraoka Y, Shaw P, Sedat JW, (1989).Fluorescence microscopy in three dimensions, Methods Cell Biol., 30, 353-377.
Dekker J, Rippe K, Dekker M, Kleckner N, (2002).Capturing chromosome conformation, Science, 295, 1306-1311.
Dostie J, Richmond TA, Arnaout RA, Selzer RR, Lee WL, Honan TA, Rubio ED, Krumm A, Lamb J, Nusbaum C, Green RD, Dekker J, (2006).Chromosome Conformation Capture Carbon Copy (5C): A massively parallel solution for mapping interactions between genomic elements, Genome Res., 16, 1299-1309.
Dudchenko O, Batra SS, Omer AD, Nyquist SK, Hoeger M, Durand NC, Shamim MS, Machol I, Lander ES, Aiden AP, Aiden EL, (2017).De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds, Science, 356, 92-95.
Gerstein MB, Kundaje A, Hariharan M, Landt SG, Yan KK, Cheng C, Mu XJ, Khurana E, Rozowsky J, Alexander R, Min R, Alves P, Abyzov A, Addleman N, Bhardwaj N, Boyle AP, Cayting P, Charos A, Chen DZ, Cheng Y, Clarke D, Eastman C, Euskirchen G, Frietze S, Fu Y, Gertz J, Grubert F, Harmanci A, Jain P, Kasowski M, Lacroute P, Leng J, Lian J, Monahan H, O`Geen H, Ouyang Z, Partridge EC, Patacsil D, Pauli F, Raha D, Ramirez L, Reddy TE, Reed B, Shi M, Slifer T, Wang J, Wu L, Yang X, Yip KY, Zilberman-Schapira G, Batzoglou S, Sidow A, Farnham PJ, Myers RM, Weissman SM, Snyder M, (2012).Architecture of the human regulatory network derived from ENCODE data, Nature, 489, 91-100.
Hu M, Deng K, Selvaraj S, Qin Z, Ren B, Liu JS, (2012).HiCNorm: removing biases
in Hi-C data via Poisson regression, Bioinformatics, 28, 3131-3133.
Imakaev M, Fudenberg G, McCord RP, Naumova N, Goloborodko A, Lajoie BR, Dekker J, Mirny LA, (2012).Iterative correction of Hi-C data reveals hallmarks of chromosome organization, Nature Methods, 9, 999-1003.
Li H, Ruan J, Durbin R, (2008).Mapping short DNA sequencing reads and calling variants using mapping quality scores, Genome Res., 18, 1851-1858.
Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO, Sandstrom R, Bernstein B, Bender MA, Groudine M, Gnirke A, Stamatoyannopoulos J, Mirny LA, Lander ES, Dekker J, (2009).Comprehensive mapping of long range interactions reveals folding principles of the human genome, Science, 326, 289-293.
Lupiáñez DG, Kraft K, Heinrich V, Krawitz P, Brancati F, Klopocki E, Horn D, Kayserili H, Opitz JM, Laxova R, Santos-Simarro F, Gilbert-Dussardier B, Wittler L, Borschiwer M, Haas SA, Osterwalder M, Franke M, Timmermann B, Hecht J, Spielmann M, Visel A, Mundlos S, (2015).Disruptions of topological chromatin domains cause pathogenic rewiring of gene-enhancer interactions, Cell, 161, 1012-1025.
Simonis M, Klous P, Splinter E, Moshkin Y, Willemsen R, de Wit E, van Steensel B, de Laat W, (2006).Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture–on-chip (4C), Nature Genetics, 38, 1348-1354.
Sexton T, Cavalli G, (2015). The role of chromosome domains in shaping the functional genome, Cell, 160, 1049–1059.
Yaffe E, Tanay A, (2011).Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture, Nature Genetics, 43, 1059-1065.
zh_TW