學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 RNA序列實驗中檢測差異表現基因之統計方法
Testing for differentially expressed genes with RNA-Seq data
作者 呂泓廷
貢獻者 薛慧敏
呂泓廷
關鍵詞 負二項分配
過度離散
最大擬概似函數估計
差異表現基因顯著性檢定
RNA Seq
日期 2013
上傳時間 10-Feb-2014 14:47:42 (UTC+8)
摘要 近年來,由基因之次世代定序 (next generation sequencing)科技所發展出的RNA-Seq (RNA Sequencing)實驗隨著成本降低日益受到重視。該實驗利用高通量定序技術來探討基因體的基因轉錄(transcriptomes),並以計數型態(count)的序列資料來測量基因表現量。在考慮資料中之過度離散(overdispersion)的特性,我們在此研究中採用負二項(negative binomial)分配假設,並以最大擬概似函數估計(maximum pseudo-likelihood estimation)方法來估計基因之平均表現量。為了進一步找出在兩組具有不同的外顯狀態(phenotype)受試者間存在著差異表現量的基因(differentially expressed genes),我們運用上述估計量之Wald檢定統計量來檢定基因與外顯狀態相關程度之顯著性。我們利用統計模擬以驗證所提出的方法,最後也將此方法應用到真實範例資料。
參考文獻 1.Auer, P. L. and Doerge, R. W. (2011) A Two-stage Poisson Model for Testing RNA-Seq Data, Statistical Applications in Genetics and Molecular Biology, 10, 1–26.
2.Benjamini, Y. and Hochberg, Y. (1995) Controlling the False Discovery Rate: a Practical and Powerful Approach to Multiple Testing, J. Roy. Statist. Soc. Ser. B, 57, 289-300.
3.Hall, J.M., Lee, M.K., Newman, B., Morrow, J.E., Anderson, L.A., Huey, B., King, M.C.(1990) Linkage of Early-Onset Familial Breast Cancer to Chromosome 17q21. Science, 250, 1684–1689.
4.Li, J., Witten, D. M., Johnstone, I. M. and Tibshirani, R. (2012) Normalization, Testing,and False Discovery Rate Estimation for RNA-sequencing Data, Biostatistics, 13,523-538.
5.Marioni, J. C., Mason, C.E., Mane, S. M., Stephens, M. and Gilad, Y. (2008) Rna-seq:an Assessment of Technical Reproducibility and Comparison with Gene Expression Arrays,Genome Res., 18, 1509-1517.
6.Nakashima, E. (1997) Some Methods for Estimation in a Negative-Binomial Model, Ann.Inst. Statist. Math., 49, 101-105.
7.Pao, W., Miller, V., Zakowski, M., Doherty, J., Politi, K., Sarkaria, I., Singh, B., Heelan, R., Rusch, V., Fulton, L., Mardis, E., Kupfer, D., Wilson, R., Kris, M. and Varmus, H. (2004) EGF Receptor Gene mutations are Common in Lung Cancers from Never Smokers and are associated with Sensitivity of Tumors to Gefitinib and Erlotinib, Proceedings of the National Academy of Sciences of the United States of America, 101, 13306–13311.
8.Robinson, M. D., McCarthy, D. J. and Smyth, G. K. (2010) edgeR: a Bioconductor Package for Differential Expression Analysis of Digital Gene Expression Data, Bioinformatics, 26,139-140.
9.Robinson, M. D. and Smyth, G. K. (2007) Moderated Statistical Tests for Assessing Differences in Tag Abundance, Bioinformatics, 23, 2881-2887.
10.Robinson, M. D. and Smyth, G. K. (2008) Small-sample Estimation of Negative Binomial Dispersion, with Applications to SAGE Data, Biostatistics, 9, 321-332.
11.Storey, J. D. (2003) The Positive False Discovery Rate: a Bayesian Interpretation and the q-value, Annals of Statistics, 31, 2013-2035.
12.’t Hoen, P. A. C., Ariyurek, Y., Thygesen, H. H., Vreugdenhil, E., Vossen, R. H., De Menezes, R. X., Boer, J. M., Van Ommen, G. J. and Den Dunnen, J. T. (2008) Deep Sequencing-Based Expression Analysis Shows Major Advances in Robustness, Resolution and Inter-lab Portability over Five Microarray Platforms. Nucleic Acids Research, 36, e141.
13.Wang, Z., Gerstein, M. and Snyder, M. (2009) RNA-Seq: a Revolutionary Tool for Transcriptomics,Nat. Rev. Genet., 10, 57-63.
描述 碩士
國立政治大學
統計研究所
100354022
102
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0100354022
資料類型 thesis
dc.contributor.advisor 薛慧敏zh_TW
dc.contributor.author (Authors) 呂泓廷zh_TW
dc.creator (作者) 呂泓廷zh_TW
dc.date (日期) 2013en_US
dc.date.accessioned 10-Feb-2014 14:47:42 (UTC+8)-
dc.date.available 10-Feb-2014 14:47:42 (UTC+8)-
dc.date.issued (上傳時間) 10-Feb-2014 14:47:42 (UTC+8)-
dc.identifier (Other Identifiers) G0100354022en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/63646-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 統計研究所zh_TW
dc.description (描述) 100354022zh_TW
dc.description (描述) 102zh_TW
dc.description.abstract (摘要) 近年來,由基因之次世代定序 (next generation sequencing)科技所發展出的RNA-Seq (RNA Sequencing)實驗隨著成本降低日益受到重視。該實驗利用高通量定序技術來探討基因體的基因轉錄(transcriptomes),並以計數型態(count)的序列資料來測量基因表現量。在考慮資料中之過度離散(overdispersion)的特性,我們在此研究中採用負二項(negative binomial)分配假設,並以最大擬概似函數估計(maximum pseudo-likelihood estimation)方法來估計基因之平均表現量。為了進一步找出在兩組具有不同的外顯狀態(phenotype)受試者間存在著差異表現量的基因(differentially expressed genes),我們運用上述估計量之Wald檢定統計量來檢定基因與外顯狀態相關程度之顯著性。我們利用統計模擬以驗證所提出的方法,最後也將此方法應用到真實範例資料。zh_TW
dc.description.tableofcontents 第一章、緒論 3
第二章、方法 5
第一節、序列資料 5
第二節、參數估計 5
第三節、假設檢定 7
第三章、模擬研究與探討 9
第一節、模擬設計 9
第二節、模擬結果 11
第四章、實證分析 20
第五章、結論 24
參考文獻、26
zh_TW
dc.format.extent 797803 bytes-
dc.format.mimetype application/pdf-
dc.language.iso en_US-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0100354022en_US
dc.subject (關鍵詞) 負二項分配zh_TW
dc.subject (關鍵詞) 過度離散zh_TW
dc.subject (關鍵詞) 最大擬概似函數估計zh_TW
dc.subject (關鍵詞) 差異表現基因顯著性檢定zh_TW
dc.subject (關鍵詞) RNA Seqzh_TW
dc.title (題名) RNA序列實驗中檢測差異表現基因之統計方法zh_TW
dc.title (題名) Testing for differentially expressed genes with RNA-Seq dataen_US
dc.type (資料類型) thesisen
dc.relation.reference (參考文獻) 1.Auer, P. L. and Doerge, R. W. (2011) A Two-stage Poisson Model for Testing RNA-Seq Data, Statistical Applications in Genetics and Molecular Biology, 10, 1–26.
2.Benjamini, Y. and Hochberg, Y. (1995) Controlling the False Discovery Rate: a Practical and Powerful Approach to Multiple Testing, J. Roy. Statist. Soc. Ser. B, 57, 289-300.
3.Hall, J.M., Lee, M.K., Newman, B., Morrow, J.E., Anderson, L.A., Huey, B., King, M.C.(1990) Linkage of Early-Onset Familial Breast Cancer to Chromosome 17q21. Science, 250, 1684–1689.
4.Li, J., Witten, D. M., Johnstone, I. M. and Tibshirani, R. (2012) Normalization, Testing,and False Discovery Rate Estimation for RNA-sequencing Data, Biostatistics, 13,523-538.
5.Marioni, J. C., Mason, C.E., Mane, S. M., Stephens, M. and Gilad, Y. (2008) Rna-seq:an Assessment of Technical Reproducibility and Comparison with Gene Expression Arrays,Genome Res., 18, 1509-1517.
6.Nakashima, E. (1997) Some Methods for Estimation in a Negative-Binomial Model, Ann.Inst. Statist. Math., 49, 101-105.
7.Pao, W., Miller, V., Zakowski, M., Doherty, J., Politi, K., Sarkaria, I., Singh, B., Heelan, R., Rusch, V., Fulton, L., Mardis, E., Kupfer, D., Wilson, R., Kris, M. and Varmus, H. (2004) EGF Receptor Gene mutations are Common in Lung Cancers from Never Smokers and are associated with Sensitivity of Tumors to Gefitinib and Erlotinib, Proceedings of the National Academy of Sciences of the United States of America, 101, 13306–13311.
8.Robinson, M. D., McCarthy, D. J. and Smyth, G. K. (2010) edgeR: a Bioconductor Package for Differential Expression Analysis of Digital Gene Expression Data, Bioinformatics, 26,139-140.
9.Robinson, M. D. and Smyth, G. K. (2007) Moderated Statistical Tests for Assessing Differences in Tag Abundance, Bioinformatics, 23, 2881-2887.
10.Robinson, M. D. and Smyth, G. K. (2008) Small-sample Estimation of Negative Binomial Dispersion, with Applications to SAGE Data, Biostatistics, 9, 321-332.
11.Storey, J. D. (2003) The Positive False Discovery Rate: a Bayesian Interpretation and the q-value, Annals of Statistics, 31, 2013-2035.
12.’t Hoen, P. A. C., Ariyurek, Y., Thygesen, H. H., Vreugdenhil, E., Vossen, R. H., De Menezes, R. X., Boer, J. M., Van Ommen, G. J. and Den Dunnen, J. T. (2008) Deep Sequencing-Based Expression Analysis Shows Major Advances in Robustness, Resolution and Inter-lab Portability over Five Microarray Platforms. Nucleic Acids Research, 36, e141.
13.Wang, Z., Gerstein, M. and Snyder, M. (2009) RNA-Seq: a Revolutionary Tool for Transcriptomics,Nat. Rev. Genet., 10, 57-63.
zh_TW