學術產出-學位論文
文章檢視/開啟
書目匯出
-
題名 LASSO迴歸在B-spline基底組成之危險函數上的應用
Application of LASSO regression in estimating B-Spline-Based hazard functions作者 林子元
Lin, Zi-Yuan貢獻者 黃子銘
Huang, Tzee-Ming
林子元
Lin, Zi-Yuan關鍵詞 比例危險模型
B樣條
Group lasso
拔靴法
Proportional hazards model
B-splines
Group lasso
Bootstrap日期 2017 上傳時間 5-四月-2017 15:35:28 (UTC+8) 摘要 一項關於比例危險模型的重要假設為對數危險函數與共變量之間的關係是線性的,本文探討當此假設不成立時,使用B樣條基底函數來近似共變量的非線性函數是可行的。在估計上,本文應用了group lasso方法。在適當的懲罰係數之下,對於不具解釋力的共變量而言,此方法可使對應至該共變量的一組基底係數同時估為零,以避免模型難以解讀的狀況。此外,本文嘗試為所提模型發展假設檢定。考慮的檢定量除了一般的Wald檢定量、概似比檢定量與分數檢定量之外,尚包括了因應懲罰項而作校正的檢定量與基於拔靴法的檢定量。本文採用模擬的方法比較各檢定量的優劣。
A strong assumption in the Cox proportional hazards model requires linearity of the covariates on the log hazard function. However, this assumption may be violated in practice. Alternatively, it is feasible to model the nonlinear effect via a combination of B-spline basis functions. In estimating the basis coefficients, the group lasso is applied. By so doing, a group of coefficients can be set zero simultaneously if the corresponding covariate is not predictive. Lastly, I develop hypothesis testing regarding this model. In addition to the ordinary Wald statistic, likelihood ratio statistic, and score statistic, two other types of testing statistic are considered: one adjust for penalty function and the other one based on bootstrap samples. Simulation studies are carried out to evaluate the performance of the proposed statistics.參考文獻 [1] Bøvelstad, H. M., Nygård, S., Størvold, H. L., Aldrin, M., Borgan, Ø., Frigessi, A., and Lingjærde, O. C. (2007). Predicting survival from microarray data—a comparative study. Bioinformatics 23 (16), 2080-2087.[2] Brent, R. P. (1973). Algorithms for minimization without derivatives. Prentice Hall.[3] Breslow, N. E. (1972). Contribution to the discussion of paper by D. R. Cox. Journal of the Royal Statistical Society, Series B 34, 216-217.[4] Breslow, N. E. and Crowley, J. (1974). A large-sample study of the life table and product limit estimates under random censorship. Annals of Statistics 2, 437-454.[5] Burr, D. (1994). A comparison of certain bootstrap confidence intervals in the Cox model. Journal of the American Statistical Association 89, 1290-1302.[6] Cox, D. R. (1972). Regression models and life-table (with discussion). Journal of the Royal Statistical Society, Series B 34, 187-220.[7] Cox, D. R. (1975). Partial likelihood. Biometrika 62, 269-276.[8] Curry, H. B. and Schoenberg, I. J. (1966). On Pólya Frequency Functions IV: The fundamental spline functions and their limits. Journale d`Analyse Mathématique 17, 71-107.[9] de Boor, C. (1978). A Practical Guide to Splines. New York: Springer.[10] Efron, B. (1977). The efficiency of Cox’s likelihood function for censored data. Journal of the American Statistical Association 72, 557-565.[11] Efron, B. (1979). Bootstrap methods: another look at the jackknife. Annals of Statistics 7, 1-26.[12] Gray, R. J. (1992). Flexible methods for analyzing survival data using splines, with applications to breast cancer prognosis. Journal of the American Statistical Association 87, 942-951.[13] Gray, R. J. (1994). Spline-based tests in survival analysis. Biometrics 50, 640-652.[14] Hastie, T. and Tibshirani, R. (1990). Exploring the nature of covariate effects in the proportional hazards model. Biometrics 46, 1005-1016.[15] Huang, J. Z. and Liu, L. (2006). Polynomial spline estimation and inference of proportional hazards regression models with flexible relative risk form. Biometrics 62, 793-802.[16] Kalbfleisch, J. D. and Prentice, R. L. (1973). Marginal likelihoods based on Cox’s regression and life model. Biometrika 60, 267-278.[17] Kaplan, E. L. and Meier, P. (1958). Nonparametric estimation from incomplete observations. Journal of the American Statistical Association 53, 457-481.[18] Keele, L. (2010). Proportionally difficult: testing for nonproportional hazards in Cox models. Political Analysis 18, 189-205.[19] Kim, J., Sohn, I., Jung, S. H., Kim, S., and Park, C. (2012). Analysis of survival data with group Lasso. Communications in Statistics—Simulation and Computation 41, 1593-1605.[20] Kooperberg, C., Stone, C. J., and Truong, Y. K. (1995). Hazard regression. Journal of the American Statistical Association 90, 78-94.[21] LeBlanc, M. and Crowley, J. (1999). Adaptive regression splines in the Cox model. Biometrics 55, 204-213.[22] Lenhoff, M. W., Santner, T. J., Otis, J. C., Peterson, M. G. E., Williams, B. J., and Backus, S. I. (1999). Bootstrap prediction and confidence bands: a superior statistical method for analysis of gait data. Gait & Posture 9 (1), 10-17.[23] Li, W., Xu, S., Zhao, G., and Goh, L. P. (2005). Adaptive knot placement in B-spline curve approximation. Computer-Aided Design 37, 791-797.[24] Liu, D. C. and Nocedal, J. (1989). On the limited memory BFGS method for large scale optimization. Mathematical Programming 45, 503-528.[25] Lockhart, R., Taylor, J., Tibshirani, R., and Tibshirani, R. J. (2014). A significance test for the lasso (with discussion). Annals of Statistics 42 (2), 413-468.[26] Meier, L., van de Geer, S., and Bühlmann, P. (2008). The group lasso for logistic regression. Journal of the Royal Statistical Society Series B-Statistical Methodology 70, 53-71.[27] Moody, J. E., Hanson, S. J., and Lippmann, R. P. (1992). The effective number of parameters: an analysis of generalization and regularization in nonlinear learning system. Advances in Neural Information Processing System 4, 847-854.[28] O`Sullivan, F. (1988). Nonparametric estimation of relative risk using splines and cross-validation. SIAM Journal on Scientific and Statistical Computing 9, 531-542.[29] Sleeper, L. A. and Harrington, D. P. (1990). Regression splines in the Cox model with application to covariate effects in liver disease. Journal of the American Statistical Association 85, 941-949.[30] Stone, C. J. (1985). Additive regression and other nonparametric models. Annals of Statistics 13, 689-705.[31] Therneau, T. M., Grambsch, P. M., and Pankratz, V. S. (2003). Penalized survival models and frailty. Journal of Computational and Graphical Statistics 12 (1), 156-175.[32] Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society Series B-Methodological 58 (1), 267-288.[33] Tibshirani, R. (1997). The LASSO method for variable selection in the Cox model. Statistics in Medicine 16 (4), 385-395.[34] Verweij, P. J. M. and van Houwelingen, H. C. (1993). Cross-validation in survival analysis. Statistics in Medicine 12, 2305-2314.[35] Wold, S. (1974). Spline functions in data analysis. Technometrics 16, 1-11.[36] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B-Statistical Methodology 86 (1), 49-67. 描述 碩士
國立政治大學
統計學系
103354014資料來源 http://thesis.lib.nccu.edu.tw/record/#G1033540143 資料類型 thesis dc.contributor.advisor 黃子銘 zh_TW dc.contributor.advisor Huang, Tzee-Ming en_US dc.contributor.author (作者) 林子元 zh_TW dc.contributor.author (作者) Lin, Zi-Yuan en_US dc.creator (作者) 林子元 zh_TW dc.creator (作者) Lin, Zi-Yuan en_US dc.date (日期) 2017 en_US dc.date.accessioned 5-四月-2017 15:35:28 (UTC+8) - dc.date.available 5-四月-2017 15:35:28 (UTC+8) - dc.date.issued (上傳時間) 5-四月-2017 15:35:28 (UTC+8) - dc.identifier (其他 識別碼) G1033540143 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/108112 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 統計學系 zh_TW dc.description (描述) 103354014 zh_TW dc.description.abstract (摘要) 一項關於比例危險模型的重要假設為對數危險函數與共變量之間的關係是線性的,本文探討當此假設不成立時,使用B樣條基底函數來近似共變量的非線性函數是可行的。在估計上,本文應用了group lasso方法。在適當的懲罰係數之下,對於不具解釋力的共變量而言,此方法可使對應至該共變量的一組基底係數同時估為零,以避免模型難以解讀的狀況。此外,本文嘗試為所提模型發展假設檢定。考慮的檢定量除了一般的Wald檢定量、概似比檢定量與分數檢定量之外,尚包括了因應懲罰項而作校正的檢定量與基於拔靴法的檢定量。本文採用模擬的方法比較各檢定量的優劣。 zh_TW dc.description.abstract (摘要) A strong assumption in the Cox proportional hazards model requires linearity of the covariates on the log hazard function. However, this assumption may be violated in practice. Alternatively, it is feasible to model the nonlinear effect via a combination of B-spline basis functions. In estimating the basis coefficients, the group lasso is applied. By so doing, a group of coefficients can be set zero simultaneously if the corresponding covariate is not predictive. Lastly, I develop hypothesis testing regarding this model. In addition to the ordinary Wald statistic, likelihood ratio statistic, and score statistic, two other types of testing statistic are considered: one adjust for penalty function and the other one based on bootstrap samples. Simulation studies are carried out to evaluate the performance of the proposed statistics. en_US dc.description.tableofcontents 第一章 緒論 1 第一節 研究動機 1 第二節 文獻回顧 2 第三節 方法摘要 2第二章 研究方法 4 第一節 模型架構 4 一、比例危險模型 4 二、延伸的比例危險模型 5 三、B樣條近似方法 6 四、最終模型 7 五、交互作用 9 第二節 模型估計 9 一、Lasso方法 9 二、Group lasso方法 10 三、懲罰係數的選取 11 四、節點與階數的選取 12 五、計算 14 第三節 參數推論 16 一、基於卡方分配的參數檢定 16 二、校正的參數檢定 17 三、基於拔靴法的參數檢定 18 四、聯合信賴束 20第三章 模擬與比較 23 第一節 與標準比例危險模型的比較 24 第二節 懲罰係數對檢定量的影響 28 第三節 拔靴法樣本的代表性 31 第四節 型一錯誤與檢定力 35第四章 結論與建議 41參考文獻 42 zh_TW dc.format.extent 1686458 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G1033540143 en_US dc.subject (關鍵詞) 比例危險模型 zh_TW dc.subject (關鍵詞) B樣條 zh_TW dc.subject (關鍵詞) Group lasso zh_TW dc.subject (關鍵詞) 拔靴法 zh_TW dc.subject (關鍵詞) Proportional hazards model en_US dc.subject (關鍵詞) B-splines en_US dc.subject (關鍵詞) Group lasso en_US dc.subject (關鍵詞) Bootstrap en_US dc.title (題名) LASSO迴歸在B-spline基底組成之危險函數上的應用 zh_TW dc.title (題名) Application of LASSO regression in estimating B-Spline-Based hazard functions en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) [1] Bøvelstad, H. M., Nygård, S., Størvold, H. L., Aldrin, M., Borgan, Ø., Frigessi, A., and Lingjærde, O. C. (2007). Predicting survival from microarray data—a comparative study. Bioinformatics 23 (16), 2080-2087.[2] Brent, R. P. (1973). Algorithms for minimization without derivatives. Prentice Hall.[3] Breslow, N. E. (1972). Contribution to the discussion of paper by D. R. Cox. Journal of the Royal Statistical Society, Series B 34, 216-217.[4] Breslow, N. E. and Crowley, J. (1974). A large-sample study of the life table and product limit estimates under random censorship. Annals of Statistics 2, 437-454.[5] Burr, D. (1994). A comparison of certain bootstrap confidence intervals in the Cox model. Journal of the American Statistical Association 89, 1290-1302.[6] Cox, D. R. (1972). Regression models and life-table (with discussion). Journal of the Royal Statistical Society, Series B 34, 187-220.[7] Cox, D. R. (1975). Partial likelihood. Biometrika 62, 269-276.[8] Curry, H. B. and Schoenberg, I. J. (1966). On Pólya Frequency Functions IV: The fundamental spline functions and their limits. Journale d`Analyse Mathématique 17, 71-107.[9] de Boor, C. (1978). A Practical Guide to Splines. New York: Springer.[10] Efron, B. (1977). The efficiency of Cox’s likelihood function for censored data. Journal of the American Statistical Association 72, 557-565.[11] Efron, B. (1979). Bootstrap methods: another look at the jackknife. Annals of Statistics 7, 1-26.[12] Gray, R. J. (1992). Flexible methods for analyzing survival data using splines, with applications to breast cancer prognosis. Journal of the American Statistical Association 87, 942-951.[13] Gray, R. J. (1994). Spline-based tests in survival analysis. Biometrics 50, 640-652.[14] Hastie, T. and Tibshirani, R. (1990). Exploring the nature of covariate effects in the proportional hazards model. Biometrics 46, 1005-1016.[15] Huang, J. Z. and Liu, L. (2006). Polynomial spline estimation and inference of proportional hazards regression models with flexible relative risk form. Biometrics 62, 793-802.[16] Kalbfleisch, J. D. and Prentice, R. L. (1973). Marginal likelihoods based on Cox’s regression and life model. Biometrika 60, 267-278.[17] Kaplan, E. L. and Meier, P. (1958). Nonparametric estimation from incomplete observations. Journal of the American Statistical Association 53, 457-481.[18] Keele, L. (2010). Proportionally difficult: testing for nonproportional hazards in Cox models. Political Analysis 18, 189-205.[19] Kim, J., Sohn, I., Jung, S. H., Kim, S., and Park, C. (2012). Analysis of survival data with group Lasso. Communications in Statistics—Simulation and Computation 41, 1593-1605.[20] Kooperberg, C., Stone, C. J., and Truong, Y. K. (1995). Hazard regression. Journal of the American Statistical Association 90, 78-94.[21] LeBlanc, M. and Crowley, J. (1999). Adaptive regression splines in the Cox model. Biometrics 55, 204-213.[22] Lenhoff, M. W., Santner, T. J., Otis, J. C., Peterson, M. G. E., Williams, B. J., and Backus, S. I. (1999). Bootstrap prediction and confidence bands: a superior statistical method for analysis of gait data. Gait & Posture 9 (1), 10-17.[23] Li, W., Xu, S., Zhao, G., and Goh, L. P. (2005). Adaptive knot placement in B-spline curve approximation. Computer-Aided Design 37, 791-797.[24] Liu, D. C. and Nocedal, J. (1989). On the limited memory BFGS method for large scale optimization. Mathematical Programming 45, 503-528.[25] Lockhart, R., Taylor, J., Tibshirani, R., and Tibshirani, R. J. (2014). A significance test for the lasso (with discussion). Annals of Statistics 42 (2), 413-468.[26] Meier, L., van de Geer, S., and Bühlmann, P. (2008). The group lasso for logistic regression. Journal of the Royal Statistical Society Series B-Statistical Methodology 70, 53-71.[27] Moody, J. E., Hanson, S. J., and Lippmann, R. P. (1992). The effective number of parameters: an analysis of generalization and regularization in nonlinear learning system. Advances in Neural Information Processing System 4, 847-854.[28] O`Sullivan, F. (1988). Nonparametric estimation of relative risk using splines and cross-validation. SIAM Journal on Scientific and Statistical Computing 9, 531-542.[29] Sleeper, L. A. and Harrington, D. P. (1990). Regression splines in the Cox model with application to covariate effects in liver disease. Journal of the American Statistical Association 85, 941-949.[30] Stone, C. J. (1985). Additive regression and other nonparametric models. Annals of Statistics 13, 689-705.[31] Therneau, T. M., Grambsch, P. M., and Pankratz, V. S. (2003). Penalized survival models and frailty. Journal of Computational and Graphical Statistics 12 (1), 156-175.[32] Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society Series B-Methodological 58 (1), 267-288.[33] Tibshirani, R. (1997). The LASSO method for variable selection in the Cox model. Statistics in Medicine 16 (4), 385-395.[34] Verweij, P. J. M. and van Houwelingen, H. C. (1993). Cross-validation in survival analysis. Statistics in Medicine 12, 2305-2314.[35] Wold, S. (1974). Spline functions in data analysis. Technometrics 16, 1-11.[36] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B-Statistical Methodology 86 (1), 49-67. zh_TW