Publications-Theses
Article View/Open
Publication Export
-
Google ScholarTM
NCCU Library
Citation Infomation
Related Publications in TAIR
題名 基於應變數測量誤差之下的函數型加速失效模型估計法
Estimation of Accelerated Functional Failure Time Models with Error-Prone Response作者 黃筱庭
Huang, Hsiao-Ting貢獻者 陳立榜
Chen, Li-Pang
黃筱庭
Huang, Hsiao-Ting關鍵詞 加速失效模型
提升法
測量誤差
校正迴歸
存活分析
變數選擇
accelerated failure time model
boosting
measurement error
regression calibration
survival analysis
variable selection日期 2023 上傳時間 2-Aug-2023 13:03:09 (UTC+8) 摘要 在存活分析中,常透過參數型加速失效模型描述自變數與存活時間之間的關係。在基於此模型架構並假設資料能被精準測量下,許多估計方法被提出以估計其參數。然而,自變數與存活時間之間的關係可能為非線性,且資料帶有測量誤差。在本論文中,我們考慮了資料帶有分類錯誤及測量誤差之函數型加速失效模型,並透過插入校正(insertion correction strategy)與迴歸校正(regression calibration)處理測量誤差,再利用提升法(boosting)估計自變數與活存時間之間的線型及非線性關係。從數值分析結果可知,提出之方法能提升估計表現並辨別重要變數,此方法並進一步應用在荷蘭癌症研究所提供之乳癌資料上以分析病人存活時間與基因表現之關係。
In survival analysis, accelerated failure time (AFT) models in the parametric form are commonly used to describe the relationship between survival time and covariates. Many methods have been proposed to estimate the parameter under this model with data assumed to be precisely measured. In applications, however, covariates are possibly non-linear with the survival time, which is possibly contaminated by measurement error. In this thesis, we consider the accelerated functional failure time model with survival data subject to measurement error. We use insertion correction strategy and regression calibration to correct for misclassification and error-prone survival time, respectively. Based on the corrected data, we use the boosting algorithm with the cubic spline estimation method to iteratively recover non-linear relationship between covariates and survival time. Theoretically, we justify the validity of measurement error correction and estimation procedure. Numerical studies show that the proposed method improves the performance of estimation and is able to capture informative covariates. The methodology is implemented to the breast cancer data provided by the Netherlands Cancer Institute for research.參考文獻 Barnwal, A., Cho, H., and Hocking, T. (2022). Survival regression with accelerated failure time model in XGBoost. Journal of Computational and Graphical Statistics, 31(4), 1292-1302.B ̈uhlmann, P. (2006). Boosting for high-dimensional linear models. The Annals of Statistics,34(2), 559-583.B ̈uhlmann, P. and Yu, B. (2003). Boosting with the L2 loss: regression and classification. Journal of the American Statistical Association, 98(462), 324-339.Carroll, R. J., Ruppert, D., Stefanski, L. A., and Crainiceanu, C. M. (2006). Measurement Error in Nonlinear Model. Chapman and Hall, Boca Raton, FL.Chen, L.-P. (2018). Semeparametric estimation for the accelerated failure time model with length-biased ampling and covariate measurement error. Stat, 7:e209Chen, L.-P. (2020). Variable selection and estimation for the additive hazards model subject to left-runcation, right-censoring and measurement error in covariates. Journal of Statistical Computation and Simulation, 90, 261-3300.Chen, L.-P. and Yi, G. Y. (2020). Model selection and model averaging for analysis of truncated and censored data with measurement error. Electronic Journal of Statistics,14, 4054-4109.Chen, L.-P. and Yi, G. Y. (2021a). Analysis of noisy survival data with graphical proportional hazards measurement error models. Biometrics, 77, 956-969.Chen, L.-P. and Yi, G. Y. (2021b). Semiparametric methods for left-truncated and right-censored survival data with covariate measurement error. Annals of the Institute of Statistical Mathematics, 73, 481-517.Chen, L.-P. and Yi, G. Y. (2023). Unbiased boosting estimation for censored survival data. Statistica Sinica. To appear. DOI: 10.5706/ss.202021.0050Chen, Y., Jia, Z., Mercola, D., and Xie, X. (2013). A gradient boosting algorithm for survival analysis via direct optimization of concordance index. Computational and Mathematical Methods in Medicine, Article ID 873595, 1-8.Gellar, J. E., Colantuoni, E., Needham, D. M., and Crainiceanu, C. M. (2015). Cox regression models with functional covariates for survival data. Statistical Modelling, 15(3), 256-278.Huang, Y. and Wang, C.Y. (2000). Cox regression with accurate covariates unascertainable: a nonparametric correction approach. Journal of the American Statistical Association, 95, 1209-1219.Huang, J., Ma, S., and Xie, H. (2006). Regularized estimation in the accelerated failure time model with high-dimensional covariates. Biometrics, 62(3), 813-820.He, W., Yi, G. Y., and Xiong, J. (2007). Accelerated failure time models with covariates subject to measurement error. Statistics in Medicine, 26(26), 4817-4832.Jin, Z., Lin, D. Y., Wei, L. J., and Ying, Z. (2003). Rank-based inference for the accelerated failure time model. Biometrika, 90(2), 341-353.Kalbfleisch, J.D. and Prentice, R.L. (1980). The Statistical Analysis of Failure Time Data. Wiley, New York.Lee, D. K., Chen, N., and Ishwaran, H. (2021). Boosted nonparametric hazards with time-dependent covariates. Annals of Statistics, 49(4), 2101-2128.Li, H., and Luan, Y. (2005). Boosting proportional hazards models using smoothing splines, with applications to high-dimensional microarray data. Bioinformatics, 21(10), 2403-2409.Lu, W. and Li, L. (2008). Boosting method for nonlinear transformation models with censored survival data. Biostatistics, 9(4), 658-667.Miller, R.G. (1976). Least squares regression with censored data. Biometrika, 63, 449-64.Miller, R.G. (1980). Survival Analysis. Wiley, New York.Mustefa, Y. A., and Chen, D. G. (2021). Accelerated failure-time model with weighted least-squares estimation: application on survival of HIV positives. Archives of Public Health,2279(1), 88.Oh, E. J., Shepherd, B. E., Lumley, T., and Shaw, P. A. (2021). Raking and regression calibration: Methods to address bias from correlated covariate and time-to-event error.Statistics in Medicine, 40(3), 631-649.Pang, M., Platt, R. W., Schuster, T., and Abrahamowicz, M. (2021a). Flexible extension of the accelerated failure time model to account for nonlinear and time-dependent effects of covariates on the hazard. Statistical Methods in Medical Research , 30(11), 2526-2542.Pang, M., Platt, R. W., Schuster, T., and Abrahamowicz, M. (2021b). Spline-based accelerated failure time model. Statistics in Medicine, 40(2), 481-497.van de Vijver, M. J., He, Y. D., van’t Veer, L. J., Dai, H., Hart, A. A.M., Voskuil, D. W. et al. 2002). A gene-expression signature as a predictor of survival in breast cancer. The New England Journal of Medicine, 347, 1999-2009.Wang, Z., and Wang, C. Y. (2010). Buckley-James boosting for survival analysis with high-dimensional biomarker data. Statistical Applications in Genetics and Molecular Biology,9(1), 012008 描述 碩士
國立政治大學
統計學系
110354005資料來源 http://thesis.lib.nccu.edu.tw/record/#G0110354005 資料類型 thesis dc.contributor.advisor 陳立榜 zh_TW dc.contributor.advisor Chen, Li-Pang en_US dc.contributor.author (Authors) 黃筱庭 zh_TW dc.contributor.author (Authors) Huang, Hsiao-Ting en_US dc.creator (作者) 黃筱庭 zh_TW dc.creator (作者) Huang, Hsiao-Ting en_US dc.date (日期) 2023 en_US dc.date.accessioned 2-Aug-2023 13:03:09 (UTC+8) - dc.date.available 2-Aug-2023 13:03:09 (UTC+8) - dc.date.issued (上傳時間) 2-Aug-2023 13:03:09 (UTC+8) - dc.identifier (Other Identifiers) G0110354005 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/146302 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 統計學系 zh_TW dc.description (描述) 110354005 zh_TW dc.description.abstract (摘要) 在存活分析中,常透過參數型加速失效模型描述自變數與存活時間之間的關係。在基於此模型架構並假設資料能被精準測量下,許多估計方法被提出以估計其參數。然而,自變數與存活時間之間的關係可能為非線性,且資料帶有測量誤差。在本論文中,我們考慮了資料帶有分類錯誤及測量誤差之函數型加速失效模型,並透過插入校正(insertion correction strategy)與迴歸校正(regression calibration)處理測量誤差,再利用提升法(boosting)估計自變數與活存時間之間的線型及非線性關係。從數值分析結果可知,提出之方法能提升估計表現並辨別重要變數,此方法並進一步應用在荷蘭癌症研究所提供之乳癌資料上以分析病人存活時間與基因表現之關係。 zh_TW dc.description.abstract (摘要) In survival analysis, accelerated failure time (AFT) models in the parametric form are commonly used to describe the relationship between survival time and covariates. Many methods have been proposed to estimate the parameter under this model with data assumed to be precisely measured. In applications, however, covariates are possibly non-linear with the survival time, which is possibly contaminated by measurement error. In this thesis, we consider the accelerated functional failure time model with survival data subject to measurement error. We use insertion correction strategy and regression calibration to correct for misclassification and error-prone survival time, respectively. Based on the corrected data, we use the boosting algorithm with the cubic spline estimation method to iteratively recover non-linear relationship between covariates and survival time. Theoretically, we justify the validity of measurement error correction and estimation procedure. Numerical studies show that the proposed method improves the performance of estimation and is able to capture informative covariates. The methodology is implemented to the breast cancer data provided by the Netherlands Cancer Institute for research. en_US dc.description.tableofcontents Chapter 1 Introduction 1Chapter 2 Notation and Models 32.1 Survival data 32.2 Functional Accelerated Failure Time Models 42.3 Measurement error model and misclassification 5Chapter 3 Methodology 63.1 Correction of Measurement Error Effects 63.2 Boosting Estimation under the Corrected Survival Data 7Chapter 4 AFFECT: An R Package Implementation 104.1 data_gen 114.2 ME_correction 124.3 Boosting 13Chapter 5 Numerical Studies 135.1 Simulation Setup 145.2 Simulation Results 145.3 Real Data Analysis 16Chapter 6 Summary 18Reference 19Appendix A 22Algorithm 1: AFTER 24 zh_TW dc.format.extent 8960653 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0110354005 en_US dc.subject (關鍵詞) 加速失效模型 zh_TW dc.subject (關鍵詞) 提升法 zh_TW dc.subject (關鍵詞) 測量誤差 zh_TW dc.subject (關鍵詞) 校正迴歸 zh_TW dc.subject (關鍵詞) 存活分析 zh_TW dc.subject (關鍵詞) 變數選擇 zh_TW dc.subject (關鍵詞) accelerated failure time model en_US dc.subject (關鍵詞) boosting en_US dc.subject (關鍵詞) measurement error en_US dc.subject (關鍵詞) regression calibration en_US dc.subject (關鍵詞) survival analysis en_US dc.subject (關鍵詞) variable selection en_US dc.title (題名) 基於應變數測量誤差之下的函數型加速失效模型估計法 zh_TW dc.title (題名) Estimation of Accelerated Functional Failure Time Models with Error-Prone Response en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) Barnwal, A., Cho, H., and Hocking, T. (2022). Survival regression with accelerated failure time model in XGBoost. Journal of Computational and Graphical Statistics, 31(4), 1292-1302.B ̈uhlmann, P. (2006). Boosting for high-dimensional linear models. The Annals of Statistics,34(2), 559-583.B ̈uhlmann, P. and Yu, B. (2003). Boosting with the L2 loss: regression and classification. Journal of the American Statistical Association, 98(462), 324-339.Carroll, R. J., Ruppert, D., Stefanski, L. A., and Crainiceanu, C. M. (2006). Measurement Error in Nonlinear Model. Chapman and Hall, Boca Raton, FL.Chen, L.-P. (2018). Semeparametric estimation for the accelerated failure time model with length-biased ampling and covariate measurement error. Stat, 7:e209Chen, L.-P. (2020). Variable selection and estimation for the additive hazards model subject to left-runcation, right-censoring and measurement error in covariates. Journal of Statistical Computation and Simulation, 90, 261-3300.Chen, L.-P. and Yi, G. Y. (2020). Model selection and model averaging for analysis of truncated and censored data with measurement error. Electronic Journal of Statistics,14, 4054-4109.Chen, L.-P. and Yi, G. Y. (2021a). Analysis of noisy survival data with graphical proportional hazards measurement error models. Biometrics, 77, 956-969.Chen, L.-P. and Yi, G. Y. (2021b). Semiparametric methods for left-truncated and right-censored survival data with covariate measurement error. Annals of the Institute of Statistical Mathematics, 73, 481-517.Chen, L.-P. and Yi, G. Y. (2023). Unbiased boosting estimation for censored survival data. Statistica Sinica. To appear. DOI: 10.5706/ss.202021.0050Chen, Y., Jia, Z., Mercola, D., and Xie, X. (2013). A gradient boosting algorithm for survival analysis via direct optimization of concordance index. Computational and Mathematical Methods in Medicine, Article ID 873595, 1-8.Gellar, J. E., Colantuoni, E., Needham, D. M., and Crainiceanu, C. M. (2015). Cox regression models with functional covariates for survival data. Statistical Modelling, 15(3), 256-278.Huang, Y. and Wang, C.Y. (2000). Cox regression with accurate covariates unascertainable: a nonparametric correction approach. Journal of the American Statistical Association, 95, 1209-1219.Huang, J., Ma, S., and Xie, H. (2006). Regularized estimation in the accelerated failure time model with high-dimensional covariates. Biometrics, 62(3), 813-820.He, W., Yi, G. Y., and Xiong, J. (2007). Accelerated failure time models with covariates subject to measurement error. Statistics in Medicine, 26(26), 4817-4832.Jin, Z., Lin, D. Y., Wei, L. J., and Ying, Z. (2003). Rank-based inference for the accelerated failure time model. Biometrika, 90(2), 341-353.Kalbfleisch, J.D. and Prentice, R.L. (1980). The Statistical Analysis of Failure Time Data. Wiley, New York.Lee, D. K., Chen, N., and Ishwaran, H. (2021). Boosted nonparametric hazards with time-dependent covariates. Annals of Statistics, 49(4), 2101-2128.Li, H., and Luan, Y. (2005). Boosting proportional hazards models using smoothing splines, with applications to high-dimensional microarray data. Bioinformatics, 21(10), 2403-2409.Lu, W. and Li, L. (2008). Boosting method for nonlinear transformation models with censored survival data. Biostatistics, 9(4), 658-667.Miller, R.G. (1976). Least squares regression with censored data. Biometrika, 63, 449-64.Miller, R.G. (1980). Survival Analysis. Wiley, New York.Mustefa, Y. A., and Chen, D. G. (2021). Accelerated failure-time model with weighted least-squares estimation: application on survival of HIV positives. Archives of Public Health,2279(1), 88.Oh, E. J., Shepherd, B. E., Lumley, T., and Shaw, P. A. (2021). Raking and regression calibration: Methods to address bias from correlated covariate and time-to-event error.Statistics in Medicine, 40(3), 631-649.Pang, M., Platt, R. W., Schuster, T., and Abrahamowicz, M. (2021a). Flexible extension of the accelerated failure time model to account for nonlinear and time-dependent effects of covariates on the hazard. Statistical Methods in Medical Research , 30(11), 2526-2542.Pang, M., Platt, R. W., Schuster, T., and Abrahamowicz, M. (2021b). Spline-based accelerated failure time model. Statistics in Medicine, 40(2), 481-497.van de Vijver, M. J., He, Y. D., van’t Veer, L. J., Dai, H., Hart, A. A.M., Voskuil, D. W. et al. 2002). A gene-expression signature as a predictor of survival in breast cancer. The New England Journal of Medicine, 347, 1999-2009.Wang, Z., and Wang, C. Y. (2010). Buckley-James boosting for survival analysis with high-dimensional biomarker data. Statistical Applications in Genetics and Molecular Biology,9(1), 012008 zh_TW