RNA序列實驗中檢測差異表現基因之統計方法

學術產出-學位論文

文章檢視/開啟

pdf(556)

書目匯出

Google Scholar^TM

題名	RNA序列實驗中檢測差異表現基因之統計方法 Testing for differentially expressed genes with RNA-Seq data
作者	呂泓廷
貢獻者	薛慧敏呂泓廷
關鍵詞	負二項分配過度離散最大擬概似函數估計差異表現基因顯著性檢定 RNA Seq
日期	2013
上傳時間	10-二月-2014 14:47:42 (UTC+8)
摘要	近年來，由基因之次世代定序 (next generation sequencing)科技所發展出的RNA-Seq (RNA Sequencing)實驗隨著成本降低日益受到重視。該實驗利用高通量定序技術來探討基因體的基因轉錄(transcriptomes)，並以計數型態(count)的序列資料來測量基因表現量。在考慮資料中之過度離散(overdispersion)的特性，我們在此研究中採用負二項(negative binomial)分配假設，並以最大擬概似函數估計(maximum pseudo-likelihood estimation)方法來估計基因之平均表現量。為了進一步找出在兩組具有不同的外顯狀態(phenotype)受試者間存在著差異表現量的基因(differentially expressed genes)，我們運用上述估計量之Wald檢定統計量來檢定基因與外顯狀態相關程度之顯著性。我們利用統計模擬以驗證所提出的方法，最後也將此方法應用到真實範例資料。
參考文獻	1.Auer, P. L. and Doerge, R. W. (2011) A Two-stage Poisson Model for Testing RNA-Seq Data, Statistical Applications in Genetics and Molecular Biology, 10, 1–26. 2.Benjamini, Y. and Hochberg, Y. (1995) Controlling the False Discovery Rate: a Practical and Powerful Approach to Multiple Testing, J. Roy. Statist. Soc. Ser. B, 57, 289-300. 3.Hall, J.M., Lee, M.K., Newman, B., Morrow, J.E., Anderson, L.A., Huey, B., King, M.C.(1990) Linkage of Early-Onset Familial Breast Cancer to Chromosome 17q21. Science, 250, 1684–1689. 4.Li, J., Witten, D. M., Johnstone, I. M. and Tibshirani, R. (2012) Normalization, Testing,and False Discovery Rate Estimation for RNA-sequencing Data, Biostatistics, 13,523-538. 5.Marioni, J. C., Mason, C.E., Mane, S. M., Stephens, M. and Gilad, Y. (2008) Rna-seq:an Assessment of Technical Reproducibility and Comparison with Gene Expression Arrays,Genome Res., 18, 1509-1517. 6.Nakashima, E. (1997) Some Methods for Estimation in a Negative-Binomial Model, Ann.Inst. Statist. Math., 49, 101-105. 7.Pao, W., Miller, V., Zakowski, M., Doherty, J., Politi, K., Sarkaria, I., Singh, B., Heelan, R., Rusch, V., Fulton, L., Mardis, E., Kupfer, D., Wilson, R., Kris, M. and Varmus, H. (2004) EGF Receptor Gene mutations are Common in Lung Cancers from Never Smokers and are associated with Sensitivity of Tumors to Gefitinib and Erlotinib, Proceedings of the National Academy of Sciences of the United States of America, 101, 13306–13311. 8.Robinson, M. D., McCarthy, D. J. and Smyth, G. K. (2010) edgeR: a Bioconductor Package for Differential Expression Analysis of Digital Gene Expression Data, Bioinformatics, 26,139-140. 9.Robinson, M. D. and Smyth, G. K. (2007) Moderated Statistical Tests for Assessing Differences in Tag Abundance, Bioinformatics, 23, 2881-2887. 10.Robinson, M. D. and Smyth, G. K. (2008) Small-sample Estimation of Negative Binomial Dispersion, with Applications to SAGE Data, Biostatistics, 9, 321-332. 11.Storey, J. D. (2003) The Positive False Discovery Rate: a Bayesian Interpretation and the q-value, Annals of Statistics, 31, 2013-2035. 12.’t Hoen, P. A. C., Ariyurek, Y., Thygesen, H. H., Vreugdenhil, E., Vossen, R. H., De Menezes, R. X., Boer, J. M., Van Ommen, G. J. and Den Dunnen, J. T. (2008) Deep Sequencing-Based Expression Analysis Shows Major Advances in Robustness, Resolution and Inter-lab Portability over Five Microarray Platforms. Nucleic Acids Research, 36, e141. 13.Wang, Z., Gerstein, M. and Snyder, M. (2009) RNA-Seq: a Revolutionary Tool for Transcriptomics,Nat. Rev. Genet., 10, 57-63.
描述	碩士國立政治大學統計研究所 100354022 102
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0100354022
資料類型	thesis

dc.contributor.advisor	薛慧敏	zh_TW
dc.contributor.author (作者)	呂泓廷	zh_TW
dc.creator (作者)	呂泓廷	zh_TW
dc.date (日期)	2013	en_US
dc.date.accessioned	10-二月-2014 14:47:42 (UTC+8)	-
dc.date.available	10-二月-2014 14:47:42 (UTC+8)	-
dc.date.issued (上傳時間)	10-二月-2014 14:47:42 (UTC+8)	-
dc.identifier (其他識別碼)	G0100354022	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/63646	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	統計研究所	zh_TW
dc.description (描述)	100354022	zh_TW
dc.description (描述)	102	zh_TW
dc.description.abstract (摘要)	近年來，由基因之次世代定序 (next generation sequencing)科技所發展出的RNA-Seq (RNA Sequencing)實驗隨著成本降低日益受到重視。該實驗利用高通量定序技術來探討基因體的基因轉錄(transcriptomes)，並以計數型態(count)的序列資料來測量基因表現量。在考慮資料中之過度離散(overdispersion)的特性，我們在此研究中採用負二項(negative binomial)分配假設，並以最大擬概似函數估計(maximum pseudo-likelihood estimation)方法來估計基因之平均表現量。為了進一步找出在兩組具有不同的外顯狀態(phenotype)受試者間存在著差異表現量的基因(differentially expressed genes)，我們運用上述估計量之Wald檢定統計量來檢定基因與外顯狀態相關程度之顯著性。我們利用統計模擬以驗證所提出的方法，最後也將此方法應用到真實範例資料。	zh_TW
dc.description.tableofcontents	第一章、緒論 3 第二章、方法 5 第一節、序列資料 5 第二節、參數估計 5 第三節、假設檢定 7 第三章、模擬研究與探討 9 第一節、模擬設計 9 第二節、模擬結果 11 第四章、實證分析 20 第五章、結論 24 參考文獻、26	zh_TW
dc.format.extent	797803 bytes	-
dc.format.mimetype	application/pdf	-
dc.language.iso	en_US	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0100354022	en_US
dc.subject (關鍵詞)	負二項分配	zh_TW
dc.subject (關鍵詞)	過度離散	zh_TW
dc.subject (關鍵詞)	最大擬概似函數估計	zh_TW
dc.subject (關鍵詞)	差異表現基因顯著性檢定	zh_TW
dc.subject (關鍵詞)	RNA Seq	zh_TW
dc.title (題名)	RNA序列實驗中檢測差異表現基因之統計方法	zh_TW
dc.title (題名)	Testing for differentially expressed genes with RNA-Seq data	en_US
dc.type (資料類型)	thesis	en
dc.relation.reference (參考文獻)	1.Auer, P. L. and Doerge, R. W. (2011) A Two-stage Poisson Model for Testing RNA-Seq Data, Statistical Applications in Genetics and Molecular Biology, 10, 1–26. 2.Benjamini, Y. and Hochberg, Y. (1995) Controlling the False Discovery Rate: a Practical and Powerful Approach to Multiple Testing, J. Roy. Statist. Soc. Ser. B, 57, 289-300. 3.Hall, J.M., Lee, M.K., Newman, B., Morrow, J.E., Anderson, L.A., Huey, B., King, M.C.(1990) Linkage of Early-Onset Familial Breast Cancer to Chromosome 17q21. Science, 250, 1684–1689. 4.Li, J., Witten, D. M., Johnstone, I. M. and Tibshirani, R. (2012) Normalization, Testing,and False Discovery Rate Estimation for RNA-sequencing Data, Biostatistics, 13,523-538. 5.Marioni, J. C., Mason, C.E., Mane, S. M., Stephens, M. and Gilad, Y. (2008) Rna-seq:an Assessment of Technical Reproducibility and Comparison with Gene Expression Arrays,Genome Res., 18, 1509-1517. 6.Nakashima, E. (1997) Some Methods for Estimation in a Negative-Binomial Model, Ann.Inst. Statist. Math., 49, 101-105. 7.Pao, W., Miller, V., Zakowski, M., Doherty, J., Politi, K., Sarkaria, I., Singh, B., Heelan, R., Rusch, V., Fulton, L., Mardis, E., Kupfer, D., Wilson, R., Kris, M. and Varmus, H. (2004) EGF Receptor Gene mutations are Common in Lung Cancers from Never Smokers and are associated with Sensitivity of Tumors to Gefitinib and Erlotinib, Proceedings of the National Academy of Sciences of the United States of America, 101, 13306–13311. 8.Robinson, M. D., McCarthy, D. J. and Smyth, G. K. (2010) edgeR: a Bioconductor Package for Differential Expression Analysis of Digital Gene Expression Data, Bioinformatics, 26,139-140. 9.Robinson, M. D. and Smyth, G. K. (2007) Moderated Statistical Tests for Assessing Differences in Tag Abundance, Bioinformatics, 23, 2881-2887. 10.Robinson, M. D. and Smyth, G. K. (2008) Small-sample Estimation of Negative Binomial Dispersion, with Applications to SAGE Data, Biostatistics, 9, 321-332. 11.Storey, J. D. (2003) The Positive False Discovery Rate: a Bayesian Interpretation and the q-value, Annals of Statistics, 31, 2013-2035. 12.’t Hoen, P. A. C., Ariyurek, Y., Thygesen, H. H., Vreugdenhil, E., Vossen, R. H., De Menezes, R. X., Boer, J. M., Van Ommen, G. J. and Den Dunnen, J. T. (2008) Deep Sequencing-Based Expression Analysis Shows Major Advances in Robustness, Resolution and Inter-lab Portability over Five Microarray Platforms. Nucleic Acids Research, 36, e141. 13.Wang, Z., Gerstein, M. and Snyder, M. (2009) RNA-Seq: a Revolutionary Tool for Transcriptomics,Nat. Rev. Genet., 10, 57-63.	zh_TW

學術產出-學位論文

文章檢視/開啟

書目匯出

Google ScholarTM

政大圖書館

引文資訊

TAIR相關學術產出

Google Scholar^TM