Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 LASSO迴歸在B-spline基底組成之危險函數上的應用
Application of LASSO regression in estimating B-Spline-Based hazard functions
作者 林子元
Lin, Zi-Yuan
貢獻者 黃子銘
Huang, Tzee-Ming
林子元
Lin, Zi-Yuan
關鍵詞 比例危險模型
B樣條
Group lasso
拔靴法
Proportional hazards model
B-splines
Group lasso
Bootstrap
日期 2017
上傳時間 5-Apr-2017 15:35:28 (UTC+8)
摘要 一項關於比例危險模型的重要假設為對數危險函數與共變量之間的關係是線性的,本文探討當此假設不成立時,使用B樣條基底函數來近似共變量的非線性函數是可行的。在估計上,本文應用了group lasso方法。在適當的懲罰係數之下,對於不具解釋力的共變量而言,此方法可使對應至該共變量的一組基底係數同時估為零,以避免模型難以解讀的狀況。此外,本文嘗試為所提模型發展假設檢定。考慮的檢定量除了一般的Wald檢定量、概似比檢定量與分數檢定量之外,尚包括了因應懲罰項而作校正的檢定量與基於拔靴法的檢定量。本文採用模擬的方法比較各檢定量的優劣。
A strong assumption in the Cox proportional hazards model requires linearity of the covariates on the log hazard function. However, this assumption may be violated in practice. Alternatively, it is feasible to model the nonlinear effect via a combination of B-spline basis functions. In estimating the basis coefficients, the group lasso is applied. By so doing, a group of coefficients can be set zero simultaneously if the corresponding covariate is not predictive. Lastly, I develop hypothesis testing regarding this model. In addition to the ordinary Wald statistic, likelihood ratio statistic, and score statistic, two other types of testing statistic are considered: one adjust for penalty function and the other one based on bootstrap samples. Simulation studies are carried out to evaluate the performance of the proposed statistics.
參考文獻 [1] Bøvelstad, H. M., Nygård, S., Størvold, H. L., Aldrin, M., Borgan, Ø., Frigessi, A., and Lingjærde, O. C. (2007). Predicting survival from microarray data—a comparative study. Bioinformatics 23 (16), 2080-2087.
[2] Brent, R. P. (1973). Algorithms for minimization without derivatives. Prentice Hall.
[3] Breslow, N. E. (1972). Contribution to the discussion of paper by D. R. Cox. Journal of the Royal Statistical Society, Series B 34, 216-217.
[4] Breslow, N. E. and Crowley, J. (1974). A large-sample study of the life table and product limit estimates under random censorship. Annals of Statistics 2, 437-454.
[5] Burr, D. (1994). A comparison of certain bootstrap confidence intervals in the Cox model. Journal of the American Statistical Association 89, 1290-1302.
[6] Cox, D. R. (1972). Regression models and life-table (with discussion). Journal of the Royal Statistical Society, Series B 34, 187-220.
[7] Cox, D. R. (1975). Partial likelihood. Biometrika 62, 269-276.
[8] Curry, H. B. and Schoenberg, I. J. (1966). On Pólya Frequency Functions IV: The fundamental spline functions and their limits. Journale d`Analyse Mathématique 17, 71-107.
[9] de Boor, C. (1978). A Practical Guide to Splines. New York: Springer.
[10] Efron, B. (1977). The efficiency of Cox’s likelihood function for censored data. Journal of the American Statistical Association 72, 557-565.
[11] Efron, B. (1979). Bootstrap methods: another look at the jackknife. Annals of Statistics 7, 1-26.
[12] Gray, R. J. (1992). Flexible methods for analyzing survival data using splines, with applications to breast cancer prognosis. Journal of the American Statistical Association 87, 942-951.
[13] Gray, R. J. (1994). Spline-based tests in survival analysis. Biometrics 50, 640-652.
[14] Hastie, T. and Tibshirani, R. (1990). Exploring the nature of covariate effects in the proportional hazards model. Biometrics 46, 1005-1016.
[15] Huang, J. Z. and Liu, L. (2006). Polynomial spline estimation and inference of proportional hazards regression models with flexible relative risk form. Biometrics 62, 793-802.
[16] Kalbfleisch, J. D. and Prentice, R. L. (1973). Marginal likelihoods based on Cox’s regression and life model. Biometrika 60, 267-278.
[17] Kaplan, E. L. and Meier, P. (1958). Nonparametric estimation from incomplete observations. Journal of the American Statistical Association 53, 457-481.
[18] Keele, L. (2010). Proportionally difficult: testing for nonproportional hazards in Cox models. Political Analysis 18, 189-205.
[19] Kim, J., Sohn, I., Jung, S. H., Kim, S., and Park, C. (2012). Analysis of survival data with group Lasso. Communications in Statistics—Simulation and Computation 41, 1593-1605.
[20] Kooperberg, C., Stone, C. J., and Truong, Y. K. (1995). Hazard regression. Journal of the American Statistical Association 90, 78-94.
[21] LeBlanc, M. and Crowley, J. (1999). Adaptive regression splines in the Cox model. Biometrics 55, 204-213.
[22] Lenhoff, M. W., Santner, T. J., Otis, J. C., Peterson, M. G. E., Williams, B. J., and Backus, S. I. (1999). Bootstrap prediction and confidence bands: a superior statistical method for analysis of gait data. Gait & Posture 9 (1), 10-17.
[23] Li, W., Xu, S., Zhao, G., and Goh, L. P. (2005). Adaptive knot placement in B-spline curve approximation. Computer-Aided Design 37, 791-797.
[24] Liu, D. C. and Nocedal, J. (1989). On the limited memory BFGS method for large scale optimization. Mathematical Programming 45, 503-528.
[25] Lockhart, R., Taylor, J., Tibshirani, R., and Tibshirani, R. J. (2014). A significance test for the lasso (with discussion). Annals of Statistics 42 (2), 413-468.
[26] Meier, L., van de Geer, S., and Bühlmann, P. (2008). The group lasso for logistic regression. Journal of the Royal Statistical Society Series B-Statistical Methodology 70, 53-71.
[27] Moody, J. E., Hanson, S. J., and Lippmann, R. P. (1992). The effective number of parameters: an analysis of generalization and regularization in nonlinear learning system. Advances in Neural Information Processing System 4, 847-854.
[28] O`Sullivan, F. (1988). Nonparametric estimation of relative risk using splines and cross-validation. SIAM Journal on Scientific and Statistical Computing 9, 531-542.
[29] Sleeper, L. A. and Harrington, D. P. (1990). Regression splines in the Cox model with application to covariate effects in liver disease. Journal of the American Statistical Association 85, 941-949.
[30] Stone, C. J. (1985). Additive regression and other nonparametric models. Annals of Statistics 13, 689-705.
[31] Therneau, T. M., Grambsch, P. M., and Pankratz, V. S. (2003). Penalized survival models and frailty. Journal of Computational and Graphical Statistics 12 (1), 156-175.
[32] Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society Series B-Methodological 58 (1), 267-288.
[33] Tibshirani, R. (1997). The LASSO method for variable selection in the Cox model. Statistics in Medicine 16 (4), 385-395.
[34] Verweij, P. J. M. and van Houwelingen, H. C. (1993). Cross-validation in survival analysis. Statistics in Medicine 12, 2305-2314.
[35] Wold, S. (1974). Spline functions in data analysis. Technometrics 16, 1-11.
[36] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B-Statistical Methodology 86 (1), 49-67.
描述 碩士
國立政治大學
統計學系
103354014
資料來源 http://thesis.lib.nccu.edu.tw/record/#G1033540143
資料類型 thesis
dc.contributor.advisor 黃子銘zh_TW
dc.contributor.advisor Huang, Tzee-Mingen_US
dc.contributor.author (Authors) 林子元zh_TW
dc.contributor.author (Authors) Lin, Zi-Yuanen_US
dc.creator (作者) 林子元zh_TW
dc.creator (作者) Lin, Zi-Yuanen_US
dc.date (日期) 2017en_US
dc.date.accessioned 5-Apr-2017 15:35:28 (UTC+8)-
dc.date.available 5-Apr-2017 15:35:28 (UTC+8)-
dc.date.issued (上傳時間) 5-Apr-2017 15:35:28 (UTC+8)-
dc.identifier (Other Identifiers) G1033540143en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/108112-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 統計學系zh_TW
dc.description (描述) 103354014zh_TW
dc.description.abstract (摘要) 一項關於比例危險模型的重要假設為對數危險函數與共變量之間的關係是線性的,本文探討當此假設不成立時,使用B樣條基底函數來近似共變量的非線性函數是可行的。在估計上,本文應用了group lasso方法。在適當的懲罰係數之下,對於不具解釋力的共變量而言,此方法可使對應至該共變量的一組基底係數同時估為零,以避免模型難以解讀的狀況。此外,本文嘗試為所提模型發展假設檢定。考慮的檢定量除了一般的Wald檢定量、概似比檢定量與分數檢定量之外,尚包括了因應懲罰項而作校正的檢定量與基於拔靴法的檢定量。本文採用模擬的方法比較各檢定量的優劣。zh_TW
dc.description.abstract (摘要) A strong assumption in the Cox proportional hazards model requires linearity of the covariates on the log hazard function. However, this assumption may be violated in practice. Alternatively, it is feasible to model the nonlinear effect via a combination of B-spline basis functions. In estimating the basis coefficients, the group lasso is applied. By so doing, a group of coefficients can be set zero simultaneously if the corresponding covariate is not predictive. Lastly, I develop hypothesis testing regarding this model. In addition to the ordinary Wald statistic, likelihood ratio statistic, and score statistic, two other types of testing statistic are considered: one adjust for penalty function and the other one based on bootstrap samples. Simulation studies are carried out to evaluate the performance of the proposed statistics.en_US
dc.description.tableofcontents 第一章 緒論 1
 第一節 研究動機 1
 第二節 文獻回顧 2
 第三節 方法摘要 2
第二章 研究方法 4
 第一節 模型架構 4
  一、比例危險模型 4
  二、延伸的比例危險模型 5
  三、B樣條近似方法 6
  四、最終模型 7
  五、交互作用 9
 第二節 模型估計 9
  一、Lasso方法 9
  二、Group lasso方法 10
  三、懲罰係數的選取 11
  四、節點與階數的選取 12
  五、計算 14
 第三節 參數推論 16
  一、基於卡方分配的參數檢定 16
  二、校正的參數檢定 17
  三、基於拔靴法的參數檢定 18
  四、聯合信賴束 20
第三章 模擬與比較 23
 第一節 與標準比例危險模型的比較 24
 第二節 懲罰係數對檢定量的影響 28
 第三節 拔靴法樣本的代表性 31
 第四節 型一錯誤與檢定力 35
第四章 結論與建議 41
參考文獻 42
zh_TW
dc.format.extent 1686458 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G1033540143en_US
dc.subject (關鍵詞) 比例危險模型zh_TW
dc.subject (關鍵詞) B樣條zh_TW
dc.subject (關鍵詞) Group lassozh_TW
dc.subject (關鍵詞) 拔靴法zh_TW
dc.subject (關鍵詞) Proportional hazards modelen_US
dc.subject (關鍵詞) B-splinesen_US
dc.subject (關鍵詞) Group lassoen_US
dc.subject (關鍵詞) Bootstrapen_US
dc.title (題名) LASSO迴歸在B-spline基底組成之危險函數上的應用zh_TW
dc.title (題名) Application of LASSO regression in estimating B-Spline-Based hazard functionsen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1] Bøvelstad, H. M., Nygård, S., Størvold, H. L., Aldrin, M., Borgan, Ø., Frigessi, A., and Lingjærde, O. C. (2007). Predicting survival from microarray data—a comparative study. Bioinformatics 23 (16), 2080-2087.
[2] Brent, R. P. (1973). Algorithms for minimization without derivatives. Prentice Hall.
[3] Breslow, N. E. (1972). Contribution to the discussion of paper by D. R. Cox. Journal of the Royal Statistical Society, Series B 34, 216-217.
[4] Breslow, N. E. and Crowley, J. (1974). A large-sample study of the life table and product limit estimates under random censorship. Annals of Statistics 2, 437-454.
[5] Burr, D. (1994). A comparison of certain bootstrap confidence intervals in the Cox model. Journal of the American Statistical Association 89, 1290-1302.
[6] Cox, D. R. (1972). Regression models and life-table (with discussion). Journal of the Royal Statistical Society, Series B 34, 187-220.
[7] Cox, D. R. (1975). Partial likelihood. Biometrika 62, 269-276.
[8] Curry, H. B. and Schoenberg, I. J. (1966). On Pólya Frequency Functions IV: The fundamental spline functions and their limits. Journale d`Analyse Mathématique 17, 71-107.
[9] de Boor, C. (1978). A Practical Guide to Splines. New York: Springer.
[10] Efron, B. (1977). The efficiency of Cox’s likelihood function for censored data. Journal of the American Statistical Association 72, 557-565.
[11] Efron, B. (1979). Bootstrap methods: another look at the jackknife. Annals of Statistics 7, 1-26.
[12] Gray, R. J. (1992). Flexible methods for analyzing survival data using splines, with applications to breast cancer prognosis. Journal of the American Statistical Association 87, 942-951.
[13] Gray, R. J. (1994). Spline-based tests in survival analysis. Biometrics 50, 640-652.
[14] Hastie, T. and Tibshirani, R. (1990). Exploring the nature of covariate effects in the proportional hazards model. Biometrics 46, 1005-1016.
[15] Huang, J. Z. and Liu, L. (2006). Polynomial spline estimation and inference of proportional hazards regression models with flexible relative risk form. Biometrics 62, 793-802.
[16] Kalbfleisch, J. D. and Prentice, R. L. (1973). Marginal likelihoods based on Cox’s regression and life model. Biometrika 60, 267-278.
[17] Kaplan, E. L. and Meier, P. (1958). Nonparametric estimation from incomplete observations. Journal of the American Statistical Association 53, 457-481.
[18] Keele, L. (2010). Proportionally difficult: testing for nonproportional hazards in Cox models. Political Analysis 18, 189-205.
[19] Kim, J., Sohn, I., Jung, S. H., Kim, S., and Park, C. (2012). Analysis of survival data with group Lasso. Communications in Statistics—Simulation and Computation 41, 1593-1605.
[20] Kooperberg, C., Stone, C. J., and Truong, Y. K. (1995). Hazard regression. Journal of the American Statistical Association 90, 78-94.
[21] LeBlanc, M. and Crowley, J. (1999). Adaptive regression splines in the Cox model. Biometrics 55, 204-213.
[22] Lenhoff, M. W., Santner, T. J., Otis, J. C., Peterson, M. G. E., Williams, B. J., and Backus, S. I. (1999). Bootstrap prediction and confidence bands: a superior statistical method for analysis of gait data. Gait & Posture 9 (1), 10-17.
[23] Li, W., Xu, S., Zhao, G., and Goh, L. P. (2005). Adaptive knot placement in B-spline curve approximation. Computer-Aided Design 37, 791-797.
[24] Liu, D. C. and Nocedal, J. (1989). On the limited memory BFGS method for large scale optimization. Mathematical Programming 45, 503-528.
[25] Lockhart, R., Taylor, J., Tibshirani, R., and Tibshirani, R. J. (2014). A significance test for the lasso (with discussion). Annals of Statistics 42 (2), 413-468.
[26] Meier, L., van de Geer, S., and Bühlmann, P. (2008). The group lasso for logistic regression. Journal of the Royal Statistical Society Series B-Statistical Methodology 70, 53-71.
[27] Moody, J. E., Hanson, S. J., and Lippmann, R. P. (1992). The effective number of parameters: an analysis of generalization and regularization in nonlinear learning system. Advances in Neural Information Processing System 4, 847-854.
[28] O`Sullivan, F. (1988). Nonparametric estimation of relative risk using splines and cross-validation. SIAM Journal on Scientific and Statistical Computing 9, 531-542.
[29] Sleeper, L. A. and Harrington, D. P. (1990). Regression splines in the Cox model with application to covariate effects in liver disease. Journal of the American Statistical Association 85, 941-949.
[30] Stone, C. J. (1985). Additive regression and other nonparametric models. Annals of Statistics 13, 689-705.
[31] Therneau, T. M., Grambsch, P. M., and Pankratz, V. S. (2003). Penalized survival models and frailty. Journal of Computational and Graphical Statistics 12 (1), 156-175.
[32] Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society Series B-Methodological 58 (1), 267-288.
[33] Tibshirani, R. (1997). The LASSO method for variable selection in the Cox model. Statistics in Medicine 16 (4), 385-395.
[34] Verweij, P. J. M. and van Houwelingen, H. C. (1993). Cross-validation in survival analysis. Statistics in Medicine 12, 2305-2314.
[35] Wold, S. (1974). Spline functions in data analysis. Technometrics 16, 1-11.
[36] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B-Statistical Methodology 86 (1), 49-67.
zh_TW