學術產出-學位論文

文章檢視/開啟

書目匯出

Google ScholarTM

政大圖書館

引文資訊

TAIR相關學術產出

題名 應用存活分析在微陣列資料的基因表面定型之探討
Gene Expression Profiling with Survival Analysis on Microarray Data
作者 張仲凱
Chang,Chunf-Kai
貢獻者 郭訓志
Kuo,Hsun-Chih
張仲凱
Chang,Chunf-Kai
關鍵詞 基因表現資料
設限存活資料
Cox比例風險模型
重複抽樣Peto-Peto檢定
Gene expression data
Censored survival data
Cox proportional hazards model
Rasmpling based Peto-Peto test
Threshold gradient directed regularization
日期 2005
上傳時間 2009-09-14
摘要 如何藉由DNA微陣列資料跟存活資料的資訊來找出基因表現定型一直是個重要的議題。這些研究的主要目標是從大量的基因中找出那些真正跟存活時間或其它重要的臨床結果有顯著關係的小部分。Threshold Gradient Directed Regularization (TGDR)是ㄧ種已經被應用在高維度迴歸問題中能同時處理變數選取以及模型配適的演算法。然而,TGDR採用一種梯度投影型態的演算法使得收斂速率緩慢。在本篇論文中,我們建議新的包含Newton-Raphson求解演算法類型的改良版TGDR方法。我們建議的方法有類似TGDR的特性但卻有比較快的收斂速率。文中並利用一筆附有設限存活時間的真實微陣列癌症資料來做示範。
     本篇論文的第二部份是關於適用於區間設限存活資料的重複抽樣Peto-Peto檢定。這個重複抽樣Peto-Peto檢定能夠評估存活函數估計方法的檢定力,例如Turnbull的估計方法以及Kaplan-Meier的估計方法。這個檢定方法顯示出在區間設限資料時Kaplan-Meier的估計方法的檢定力要比Turnbull的估計方法的檢定力來得低。這個檢定方法將以模擬的區間設限資料以及一筆真實關於乳癌研究的區間設限資料來說明。
Analyzing censored survival data with high-dimensional covariates arising from the microarray data has been an important issue. The main goal is to find genes that have pivotal influence with patient`s survival time or other important clinical outcomes. Threshold Gradient Directed Regularization (TGDR) method has been used for simultaneous variable selection and model building in high-dimensional regression problems. However, the TGDR method adopts a gradient-projection type of method and would have slow convergence rate. In this thesis, we proposed Modified TGDR algorithms which incorporate Newton-Raphson type of search algorithm. Our proposed approaches have the similar characteristics with TGDR but faster convergence rates. A real cancer microarray data with censored survival times is used for demonstration.
     The second part of this thesis is about a proposed resampling based Peto-Peto test for survival functions on interval censored data. The proposed resampling based Peto-Peto test can evaluate the power of survival function estimation methods, such as Turnbull’s Procedure and Kaplan-Meier estimate. The test shows that the power based on Kaplan-Meier estimate is lower than that based on Turnbull’s estimation on interval censored data. This proposed test is demonstrated on simulated data and a real interval censored data from a breast cancer study.
參考文獻 1. Alizadeh A. A., Eisen M. B., Eric Davis R., Ma C., Lossos I. S., Rosenwald A., Boldrick J. C., Sabet H., Tran T., Yu X., Powell J. I., Yang L., Marti G. E., Moore T., Hudson J. Jr, Lu L., Lewis D. B., Tibshirani R., Sherlock G., Chan W. C., Greiner T. C., Weisenburger D. D., Armitage J. O., Warnke R., Levy R., Wilson W., Grever M. R., Byrd J. C., Botstein D., Brown P. O., and Staudt L. M. (2000). Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 403, 503-511.
2. Beadle, G., Come, S., Henderson, C., Silver, B., and Hellman, S. (1984). The effect of adjuvent chemotherapy on the cosmetic results after primary radiation treatment for early stage breast cancer. International Journal of Radiation Oncology, Biology and Physics, 10, 2131-2137.
3. Bertsekas, D. P. (1982). Projected Newton methods for optimization problems with simple constraints. SIAM Control and Optimization, 20, 221-246.
4. Cox, D. R. (1972). Regression models and life-tables. Journal of Royal Statistical Society, Series B, 34, 187-220
5. Craig, B. A., Black, M. A. and Doerge, R. W. (2003). Gene expression data: The technology and statistical analysis. Journal of Agricultural, Biological, and Environmental Statistic, 8, 1-28.
6. Dykstra, R. L. and Kuo, H. C. (2003). Constrained non-parametric estimation under arbitrarily grouped, censored, and truncated data. A thesis submitted in partial fulfillment of the requirement for the Doctor of Philosophy degree in Statistics in the Graduate College of The University of Iowa.
7. Friedman, J. H. and Popescu, B. E. (2004). Gradient directed regularization for linear regression and classification. Technical report, Department of Statistics, Stanford University. http://www-stat.stanford.edu/~jhf/PathSeeker.html
8. Gui, J. and Li, H. (2005). Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data. Bioinformatics, In press.
9. Huang, Y. W. (2004). The comparison of parameter estimation with application to Massachusetts heath care panel study. A thesis submitted in partial fulfillment of the requirement for the Master Science degree in Mathematic in National Sun Yat-Sen University.
10. Jolliffe I.T. (1986). Principal component analysis. New York: Springer-Verlag.
11. Ma, S. and Huang, J. (2005). Clustered threshold gradient directed regularization: with applications to survival analysis using microarray data. Technical Report No. 348, Department of Statistics and Actuarial Science, University of Iowa.
12. Pan W. (1997). Extending the iterative convex minorant algorithm to the Cox model. Report 1997-013, Division of Biostatistics, University of Minnesota.
13. Park P. J., Tian L. and Kohane I. S. (2002). Linking gene expression data with patient survival times using partial least squares. Bioinformatics, 18, S120-S127.
14. Petroni, G. R. and Wolfe, R. A. (1994). A two-sample test for stochastic ordering with interval-censored data. Biometrics, 50, 77-87.
15. Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society, B, 58, 267-288.
16. Turnbull, B. W. (1976). The empirical distribution function with arbitrarily grouped, censored, and truncated data. Journal of the Royal Statistical Society, B, 38, 290-295.
17. Wold, H. (1966). Estimation of principal components and related models by iterative least squares. In Multivariate Analysis, Ed. P.R. Krishnaiah, New York: Academic Press, 391-420.
描述 碩士
國立政治大學
統計研究所
93354012
94
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0093354012
資料類型 thesis
dc.contributor.advisor 郭訓志zh_TW
dc.contributor.advisor Kuo,Hsun-Chihen_US
dc.contributor.author (作者) 張仲凱zh_TW
dc.contributor.author (作者) Chang,Chunf-Kaien_US
dc.creator (作者) 張仲凱zh_TW
dc.creator (作者) Chang,Chunf-Kaien_US
dc.date (日期) 2005en_US
dc.date.accessioned 2009-09-14-
dc.date.available 2009-09-14-
dc.date.issued (上傳時間) 2009-09-14-
dc.identifier (其他 識別碼) G0093354012en_US
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/30900-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 統計研究所zh_TW
dc.description (描述) 93354012zh_TW
dc.description (描述) 94zh_TW
dc.description.abstract (摘要) 如何藉由DNA微陣列資料跟存活資料的資訊來找出基因表現定型一直是個重要的議題。這些研究的主要目標是從大量的基因中找出那些真正跟存活時間或其它重要的臨床結果有顯著關係的小部分。Threshold Gradient Directed Regularization (TGDR)是ㄧ種已經被應用在高維度迴歸問題中能同時處理變數選取以及模型配適的演算法。然而,TGDR採用一種梯度投影型態的演算法使得收斂速率緩慢。在本篇論文中,我們建議新的包含Newton-Raphson求解演算法類型的改良版TGDR方法。我們建議的方法有類似TGDR的特性但卻有比較快的收斂速率。文中並利用一筆附有設限存活時間的真實微陣列癌症資料來做示範。
     本篇論文的第二部份是關於適用於區間設限存活資料的重複抽樣Peto-Peto檢定。這個重複抽樣Peto-Peto檢定能夠評估存活函數估計方法的檢定力,例如Turnbull的估計方法以及Kaplan-Meier的估計方法。這個檢定方法顯示出在區間設限資料時Kaplan-Meier的估計方法的檢定力要比Turnbull的估計方法的檢定力來得低。這個檢定方法將以模擬的區間設限資料以及一筆真實關於乳癌研究的區間設限資料來說明。
zh_TW
dc.description.abstract (摘要) Analyzing censored survival data with high-dimensional covariates arising from the microarray data has been an important issue. The main goal is to find genes that have pivotal influence with patient`s survival time or other important clinical outcomes. Threshold Gradient Directed Regularization (TGDR) method has been used for simultaneous variable selection and model building in high-dimensional regression problems. However, the TGDR method adopts a gradient-projection type of method and would have slow convergence rate. In this thesis, we proposed Modified TGDR algorithms which incorporate Newton-Raphson type of search algorithm. Our proposed approaches have the similar characteristics with TGDR but faster convergence rates. A real cancer microarray data with censored survival times is used for demonstration.
     The second part of this thesis is about a proposed resampling based Peto-Peto test for survival functions on interval censored data. The proposed resampling based Peto-Peto test can evaluate the power of survival function estimation methods, such as Turnbull’s Procedure and Kaplan-Meier estimate. The test shows that the power based on Kaplan-Meier estimate is lower than that based on Turnbull’s estimation on interval censored data. This proposed test is demonstrated on simulated data and a real interval censored data from a breast cancer study.
en_US
dc.description.tableofcontents Chapter 1 Introduction ................................1
     Chapter 2 cDNA Mircoarrays ............................3
     2.1 The Central Dogma of Molecular Biology ............3
     2.2 cDNA Microarrays Experiment .......................5
     Chapter 3 Literature Review ...........................7
     Chapter 4 Methodology ................................14
     4.1 Cox Proportion Hazards Model (PH Model) ..........14
     4.2 TGDR Algorithm ...................................15
     4.3 Modified TGDR Algorithms .........................16
     4.4 Tuning Parameter Selection .......................17
     Chapter 5 Performance Evaluation via Prediction ......19
     Chapter 6 Example Using Diffuse Large B-Cell Lymphoma (DLBCL) Data .........................................20
     Chapter 7 A Resampling Based Peto-Peto Test for Survival Functions on Interval Censored Data ..................29
     7.1 Kaplan-Meier (Product-Limit) Estimate ............29
     7.2 Turnbull’s Procedure ............................29
     7.3 Two-Step-Constrained Turnbull’s Procedure .......31
     7.3.1 Standard Stochastic Ordering (for Multinomial Probability Vectors) .................................32
     7.3.2 Uniform Stochastic Ordering ...................32
     7.3.3 Likelihood Ratio Ordering ......................32
     7.4 Generalized Log-Rank Test ........................33
     7.5 Resampling Based Peto-Peto Test ..................33
     7.6 Example ..........................................34
     Chapter 8 Conclusion and Future Work .................39
     References ...........................................41
zh_TW
dc.language.iso en_US-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0093354012en_US
dc.subject (關鍵詞) 基因表現資料zh_TW
dc.subject (關鍵詞) 設限存活資料zh_TW
dc.subject (關鍵詞) Cox比例風險模型zh_TW
dc.subject (關鍵詞) 重複抽樣Peto-Peto檢定zh_TW
dc.subject (關鍵詞) Gene expression dataen_US
dc.subject (關鍵詞) Censored survival dataen_US
dc.subject (關鍵詞) Cox proportional hazards modelen_US
dc.subject (關鍵詞) Rasmpling based Peto-Peto testen_US
dc.subject (關鍵詞) Threshold gradient directed regularizationen_US
dc.title (題名) 應用存活分析在微陣列資料的基因表面定型之探討zh_TW
dc.title (題名) Gene Expression Profiling with Survival Analysis on Microarray Dataen_US
dc.type (資料類型) thesisen
dc.relation.reference (參考文獻) 1. Alizadeh A. A., Eisen M. B., Eric Davis R., Ma C., Lossos I. S., Rosenwald A., Boldrick J. C., Sabet H., Tran T., Yu X., Powell J. I., Yang L., Marti G. E., Moore T., Hudson J. Jr, Lu L., Lewis D. B., Tibshirani R., Sherlock G., Chan W. C., Greiner T. C., Weisenburger D. D., Armitage J. O., Warnke R., Levy R., Wilson W., Grever M. R., Byrd J. C., Botstein D., Brown P. O., and Staudt L. M. (2000). Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 403, 503-511.zh_TW
dc.relation.reference (參考文獻) 2. Beadle, G., Come, S., Henderson, C., Silver, B., and Hellman, S. (1984). The effect of adjuvent chemotherapy on the cosmetic results after primary radiation treatment for early stage breast cancer. International Journal of Radiation Oncology, Biology and Physics, 10, 2131-2137.zh_TW
dc.relation.reference (參考文獻) 3. Bertsekas, D. P. (1982). Projected Newton methods for optimization problems with simple constraints. SIAM Control and Optimization, 20, 221-246.zh_TW
dc.relation.reference (參考文獻) 4. Cox, D. R. (1972). Regression models and life-tables. Journal of Royal Statistical Society, Series B, 34, 187-220zh_TW
dc.relation.reference (參考文獻) 5. Craig, B. A., Black, M. A. and Doerge, R. W. (2003). Gene expression data: The technology and statistical analysis. Journal of Agricultural, Biological, and Environmental Statistic, 8, 1-28.zh_TW
dc.relation.reference (參考文獻) 6. Dykstra, R. L. and Kuo, H. C. (2003). Constrained non-parametric estimation under arbitrarily grouped, censored, and truncated data. A thesis submitted in partial fulfillment of the requirement for the Doctor of Philosophy degree in Statistics in the Graduate College of The University of Iowa.zh_TW
dc.relation.reference (參考文獻) 7. Friedman, J. H. and Popescu, B. E. (2004). Gradient directed regularization for linear regression and classification. Technical report, Department of Statistics, Stanford University. http://www-stat.stanford.edu/~jhf/PathSeeker.htmlzh_TW
dc.relation.reference (參考文獻) 8. Gui, J. and Li, H. (2005). Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data. Bioinformatics, In press.zh_TW
dc.relation.reference (參考文獻) 9. Huang, Y. W. (2004). The comparison of parameter estimation with application to Massachusetts heath care panel study. A thesis submitted in partial fulfillment of the requirement for the Master Science degree in Mathematic in National Sun Yat-Sen University.zh_TW
dc.relation.reference (參考文獻) 10. Jolliffe I.T. (1986). Principal component analysis. New York: Springer-Verlag.zh_TW
dc.relation.reference (參考文獻) 11. Ma, S. and Huang, J. (2005). Clustered threshold gradient directed regularization: with applications to survival analysis using microarray data. Technical Report No. 348, Department of Statistics and Actuarial Science, University of Iowa.zh_TW
dc.relation.reference (參考文獻) 12. Pan W. (1997). Extending the iterative convex minorant algorithm to the Cox model. Report 1997-013, Division of Biostatistics, University of Minnesota.zh_TW
dc.relation.reference (參考文獻) 13. Park P. J., Tian L. and Kohane I. S. (2002). Linking gene expression data with patient survival times using partial least squares. Bioinformatics, 18, S120-S127.zh_TW
dc.relation.reference (參考文獻) 14. Petroni, G. R. and Wolfe, R. A. (1994). A two-sample test for stochastic ordering with interval-censored data. Biometrics, 50, 77-87.zh_TW
dc.relation.reference (參考文獻) 15. Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society, B, 58, 267-288.zh_TW
dc.relation.reference (參考文獻) 16. Turnbull, B. W. (1976). The empirical distribution function with arbitrarily grouped, censored, and truncated data. Journal of the Royal Statistical Society, B, 38, 290-295.zh_TW
dc.relation.reference (參考文獻) 17. Wold, H. (1966). Estimation of principal components and related models by iterative least squares. In Multivariate Analysis, Ed. P.R. Krishnaiah, New York: Academic Press, 391-420.zh_TW