學術產出-學位論文
文章檢視/開啟
書目匯出
-
題名 RNA序列實驗中檢測差異表現基因之統計方法
Testing for differentially expressed genes with RNA-Seq data作者 呂泓廷 貢獻者 薛慧敏
呂泓廷關鍵詞 負二項分配
過度離散
最大擬概似函數估計
差異表現基因顯著性檢定
RNA Seq日期 2013 上傳時間 10-二月-2014 14:47:42 (UTC+8) 摘要 近年來,由基因之次世代定序 (next generation sequencing)科技所發展出的RNA-Seq (RNA Sequencing)實驗隨著成本降低日益受到重視。該實驗利用高通量定序技術來探討基因體的基因轉錄(transcriptomes),並以計數型態(count)的序列資料來測量基因表現量。在考慮資料中之過度離散(overdispersion)的特性,我們在此研究中採用負二項(negative binomial)分配假設,並以最大擬概似函數估計(maximum pseudo-likelihood estimation)方法來估計基因之平均表現量。為了進一步找出在兩組具有不同的外顯狀態(phenotype)受試者間存在著差異表現量的基因(differentially expressed genes),我們運用上述估計量之Wald檢定統計量來檢定基因與外顯狀態相關程度之顯著性。我們利用統計模擬以驗證所提出的方法,最後也將此方法應用到真實範例資料。 參考文獻 1.Auer, P. L. and Doerge, R. W. (2011) A Two-stage Poisson Model for Testing RNA-Seq Data, Statistical Applications in Genetics and Molecular Biology, 10, 1–26.2.Benjamini, Y. and Hochberg, Y. (1995) Controlling the False Discovery Rate: a Practical and Powerful Approach to Multiple Testing, J. Roy. Statist. Soc. Ser. B, 57, 289-300.3.Hall, J.M., Lee, M.K., Newman, B., Morrow, J.E., Anderson, L.A., Huey, B., King, M.C.(1990) Linkage of Early-Onset Familial Breast Cancer to Chromosome 17q21. Science, 250, 1684–1689. 4.Li, J., Witten, D. M., Johnstone, I. M. and Tibshirani, R. (2012) Normalization, Testing,and False Discovery Rate Estimation for RNA-sequencing Data, Biostatistics, 13,523-538.5.Marioni, J. C., Mason, C.E., Mane, S. M., Stephens, M. and Gilad, Y. (2008) Rna-seq:an Assessment of Technical Reproducibility and Comparison with Gene Expression Arrays,Genome Res., 18, 1509-1517.6.Nakashima, E. (1997) Some Methods for Estimation in a Negative-Binomial Model, Ann.Inst. Statist. Math., 49, 101-105.7.Pao, W., Miller, V., Zakowski, M., Doherty, J., Politi, K., Sarkaria, I., Singh, B., Heelan, R., Rusch, V., Fulton, L., Mardis, E., Kupfer, D., Wilson, R., Kris, M. and Varmus, H. (2004) EGF Receptor Gene mutations are Common in Lung Cancers from Never Smokers and are associated with Sensitivity of Tumors to Gefitinib and Erlotinib, Proceedings of the National Academy of Sciences of the United States of America, 101, 13306–13311.8.Robinson, M. D., McCarthy, D. J. and Smyth, G. K. (2010) edgeR: a Bioconductor Package for Differential Expression Analysis of Digital Gene Expression Data, Bioinformatics, 26,139-140.9.Robinson, M. D. and Smyth, G. K. (2007) Moderated Statistical Tests for Assessing Differences in Tag Abundance, Bioinformatics, 23, 2881-2887.10.Robinson, M. D. and Smyth, G. K. (2008) Small-sample Estimation of Negative Binomial Dispersion, with Applications to SAGE Data, Biostatistics, 9, 321-332.11.Storey, J. D. (2003) The Positive False Discovery Rate: a Bayesian Interpretation and the q-value, Annals of Statistics, 31, 2013-2035.12.’t Hoen, P. A. C., Ariyurek, Y., Thygesen, H. H., Vreugdenhil, E., Vossen, R. H., De Menezes, R. X., Boer, J. M., Van Ommen, G. J. and Den Dunnen, J. T. (2008) Deep Sequencing-Based Expression Analysis Shows Major Advances in Robustness, Resolution and Inter-lab Portability over Five Microarray Platforms. Nucleic Acids Research, 36, e141.13.Wang, Z., Gerstein, M. and Snyder, M. (2009) RNA-Seq: a Revolutionary Tool for Transcriptomics,Nat. Rev. Genet., 10, 57-63. 描述 碩士
國立政治大學
統計研究所
100354022
102資料來源 http://thesis.lib.nccu.edu.tw/record/#G0100354022 資料類型 thesis dc.contributor.advisor 薛慧敏 zh_TW dc.contributor.author (作者) 呂泓廷 zh_TW dc.creator (作者) 呂泓廷 zh_TW dc.date (日期) 2013 en_US dc.date.accessioned 10-二月-2014 14:47:42 (UTC+8) - dc.date.available 10-二月-2014 14:47:42 (UTC+8) - dc.date.issued (上傳時間) 10-二月-2014 14:47:42 (UTC+8) - dc.identifier (其他 識別碼) G0100354022 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/63646 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 統計研究所 zh_TW dc.description (描述) 100354022 zh_TW dc.description (描述) 102 zh_TW dc.description.abstract (摘要) 近年來,由基因之次世代定序 (next generation sequencing)科技所發展出的RNA-Seq (RNA Sequencing)實驗隨著成本降低日益受到重視。該實驗利用高通量定序技術來探討基因體的基因轉錄(transcriptomes),並以計數型態(count)的序列資料來測量基因表現量。在考慮資料中之過度離散(overdispersion)的特性,我們在此研究中採用負二項(negative binomial)分配假設,並以最大擬概似函數估計(maximum pseudo-likelihood estimation)方法來估計基因之平均表現量。為了進一步找出在兩組具有不同的外顯狀態(phenotype)受試者間存在著差異表現量的基因(differentially expressed genes),我們運用上述估計量之Wald檢定統計量來檢定基因與外顯狀態相關程度之顯著性。我們利用統計模擬以驗證所提出的方法,最後也將此方法應用到真實範例資料。 zh_TW dc.description.tableofcontents 第一章、緒論 3第二章、方法 5第一節、序列資料 5第二節、參數估計 5第三節、假設檢定 7第三章、模擬研究與探討 9第一節、模擬設計 9第二節、模擬結果 11第四章、實證分析 20第五章、結論 24參考文獻、26 zh_TW dc.format.extent 797803 bytes - dc.format.mimetype application/pdf - dc.language.iso en_US - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0100354022 en_US dc.subject (關鍵詞) 負二項分配 zh_TW dc.subject (關鍵詞) 過度離散 zh_TW dc.subject (關鍵詞) 最大擬概似函數估計 zh_TW dc.subject (關鍵詞) 差異表現基因顯著性檢定 zh_TW dc.subject (關鍵詞) RNA Seq zh_TW dc.title (題名) RNA序列實驗中檢測差異表現基因之統計方法 zh_TW dc.title (題名) Testing for differentially expressed genes with RNA-Seq data en_US dc.type (資料類型) thesis en dc.relation.reference (參考文獻) 1.Auer, P. L. and Doerge, R. W. (2011) A Two-stage Poisson Model for Testing RNA-Seq Data, Statistical Applications in Genetics and Molecular Biology, 10, 1–26.2.Benjamini, Y. and Hochberg, Y. (1995) Controlling the False Discovery Rate: a Practical and Powerful Approach to Multiple Testing, J. Roy. Statist. Soc. Ser. B, 57, 289-300.3.Hall, J.M., Lee, M.K., Newman, B., Morrow, J.E., Anderson, L.A., Huey, B., King, M.C.(1990) Linkage of Early-Onset Familial Breast Cancer to Chromosome 17q21. Science, 250, 1684–1689. 4.Li, J., Witten, D. M., Johnstone, I. M. and Tibshirani, R. (2012) Normalization, Testing,and False Discovery Rate Estimation for RNA-sequencing Data, Biostatistics, 13,523-538.5.Marioni, J. C., Mason, C.E., Mane, S. M., Stephens, M. and Gilad, Y. (2008) Rna-seq:an Assessment of Technical Reproducibility and Comparison with Gene Expression Arrays,Genome Res., 18, 1509-1517.6.Nakashima, E. (1997) Some Methods for Estimation in a Negative-Binomial Model, Ann.Inst. Statist. Math., 49, 101-105.7.Pao, W., Miller, V., Zakowski, M., Doherty, J., Politi, K., Sarkaria, I., Singh, B., Heelan, R., Rusch, V., Fulton, L., Mardis, E., Kupfer, D., Wilson, R., Kris, M. and Varmus, H. (2004) EGF Receptor Gene mutations are Common in Lung Cancers from Never Smokers and are associated with Sensitivity of Tumors to Gefitinib and Erlotinib, Proceedings of the National Academy of Sciences of the United States of America, 101, 13306–13311.8.Robinson, M. D., McCarthy, D. J. and Smyth, G. K. (2010) edgeR: a Bioconductor Package for Differential Expression Analysis of Digital Gene Expression Data, Bioinformatics, 26,139-140.9.Robinson, M. D. and Smyth, G. K. (2007) Moderated Statistical Tests for Assessing Differences in Tag Abundance, Bioinformatics, 23, 2881-2887.10.Robinson, M. D. and Smyth, G. K. (2008) Small-sample Estimation of Negative Binomial Dispersion, with Applications to SAGE Data, Biostatistics, 9, 321-332.11.Storey, J. D. (2003) The Positive False Discovery Rate: a Bayesian Interpretation and the q-value, Annals of Statistics, 31, 2013-2035.12.’t Hoen, P. A. C., Ariyurek, Y., Thygesen, H. H., Vreugdenhil, E., Vossen, R. H., De Menezes, R. X., Boer, J. M., Van Ommen, G. J. and Den Dunnen, J. T. (2008) Deep Sequencing-Based Expression Analysis Shows Major Advances in Robustness, Resolution and Inter-lab Portability over Five Microarray Platforms. Nucleic Acids Research, 36, e141.13.Wang, Z., Gerstein, M. and Snyder, M. (2009) RNA-Seq: a Revolutionary Tool for Transcriptomics,Nat. Rev. Genet., 10, 57-63. zh_TW