學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 基於應變數測量誤差之下的函數型加速失效模型估計法
Estimation of Accelerated Functional Failure Time Models with Error-Prone Response
作者 黃筱庭
Huang, Hsiao-Ting
貢獻者 陳立榜
Chen, Li-Pang
黃筱庭
Huang, Hsiao-Ting
關鍵詞 加速失效模型
提升法
測量誤差
校正迴歸
存活分析
變數選擇
accelerated failure time model
boosting
measurement error
regression calibration
survival analysis
variable selection
日期 2023
上傳時間 2-Aug-2023 13:03:09 (UTC+8)
摘要 在存活分析中,常透過參數型加速失效模型描述自變數與存活時間之間的關係。在基於此模型架構並假設資料能被精準測量下,許多估計方法被提出以估計其參數。然而,自變數與存活時間之間的關係可能為非線性,且資料帶有測量誤差。在本論文中,我們考慮了資料帶有分類錯誤及測量誤差之函數型加速失效模型,並透過插入校正(insertion correction strategy)與迴歸校正(regression calibration)處理測量誤差,再利用提升法(boosting)估計自變數與活存時間之間的線型及非線性關係。從數值分析結果可知,提出之方法能提升估計表現並辨別重要變數,此方法並進一步應用在荷蘭癌症研究所提供之乳癌資料上以分析病人存活時間與基因表現之關係。
In survival analysis, accelerated failure time (AFT) models in the parametric form are commonly used to describe the relationship between survival time and covariates. Many methods have been proposed to estimate the parameter under this model with data assumed to be precisely measured. In applications, however, covariates are possibly non-linear with the survival time, which is possibly contaminated by measurement error. In this thesis, we consider the accelerated functional failure time model with survival data subject to measurement error. We use insertion correction strategy and regression calibration to correct for misclassification and error-prone survival time, respectively. Based on the corrected data, we use the boosting algorithm with the cubic spline estimation method to iteratively recover non-linear relationship between covariates and survival time. Theoretically, we justify the validity of measurement error correction and estimation procedure. Numerical studies show that the proposed method improves the performance of estimation and is able to capture informative covariates. The methodology is implemented to the breast cancer data provided by the Netherlands Cancer Institute for research.
參考文獻 Barnwal, A., Cho, H., and Hocking, T. (2022). Survival regression with accelerated failure time model in XGBoost. Journal of Computational and Graphical Statistics, 31(4), 1292-1302.

B ̈uhlmann, P. (2006). Boosting for high-dimensional linear models. The Annals of Statistics,34(2), 559-583.

B ̈uhlmann, P. and Yu, B. (2003). Boosting with the L2 loss: regression and classification. Journal of the American Statistical Association, 98(462), 324-339.

Carroll, R. J., Ruppert, D., Stefanski, L. A., and Crainiceanu, C. M. (2006). Measurement Error in Nonlinear Model. Chapman and Hall, Boca Raton, FL.

Chen, L.-P. (2018). Semeparametric estimation for the accelerated failure time model with length-biased ampling and covariate measurement error. Stat, 7:e209

Chen, L.-P. (2020). Variable selection and estimation for the additive hazards model subject to left-runcation, right-censoring and measurement error in covariates. Journal of Statistical Computation and Simulation, 90, 261-3300.

Chen, L.-P. and Yi, G. Y. (2020). Model selection and model averaging for analysis of truncated and censored data with measurement error. Electronic Journal of Statistics,14, 4054-4109.

Chen, L.-P. and Yi, G. Y. (2021a). Analysis of noisy survival data with graphical proportional hazards measurement error models. Biometrics, 77, 956-969.

Chen, L.-P. and Yi, G. Y. (2021b). Semiparametric methods for left-truncated and right-censored survival data with covariate measurement error. Annals of the Institute of Statistical Mathematics, 73, 481-517.

Chen, L.-P. and Yi, G. Y. (2023). Unbiased boosting estimation for censored survival data. Statistica Sinica. To appear. DOI: 10.5706/ss.202021.0050

Chen, Y., Jia, Z., Mercola, D., and Xie, X. (2013). A gradient boosting algorithm for survival analysis via direct optimization of concordance index. Computational and Mathematical Methods in Medicine, Article ID 873595, 1-8.

Gellar, J. E., Colantuoni, E., Needham, D. M., and Crainiceanu, C. M. (2015). Cox regression models with functional covariates for survival data. Statistical Modelling, 15(3), 256-278.

Huang, Y. and Wang, C.Y. (2000). Cox regression with accurate covariates unascertainable: a nonparametric correction approach. Journal of the American Statistical Association, 95, 1209-1219.

Huang, J., Ma, S., and Xie, H. (2006). Regularized estimation in the accelerated failure time model with high-dimensional covariates. Biometrics, 62(3), 813-820.

He, W., Yi, G. Y., and Xiong, J. (2007). Accelerated failure time models with covariates subject to measurement error. Statistics in Medicine, 26(26), 4817-4832.

Jin, Z., Lin, D. Y., Wei, L. J., and Ying, Z. (2003). Rank-based inference for the accelerated failure time model. Biometrika, 90(2), 341-353.

Kalbfleisch, J.D. and Prentice, R.L. (1980). The Statistical Analysis of Failure Time Data. Wiley, New York.

Lee, D. K., Chen, N., and Ishwaran, H. (2021). Boosted nonparametric hazards with time-dependent covariates. Annals of Statistics, 49(4), 2101-2128.

Li, H., and Luan, Y. (2005). Boosting proportional hazards models using smoothing splines, with applications to high-dimensional microarray data. Bioinformatics, 21(10), 2403-2409.

Lu, W. and Li, L. (2008). Boosting method for nonlinear transformation models with censored survival data. Biostatistics, 9(4), 658-667.

Miller, R.G. (1976). Least squares regression with censored data. Biometrika, 63, 449-64.

Miller, R.G. (1980). Survival Analysis. Wiley, New York.

Mustefa, Y. A., and Chen, D. G. (2021). Accelerated failure-time model with weighted least-squares estimation: application on survival of HIV positives. Archives of Public Health,2279(1), 88.

Oh, E. J., Shepherd, B. E., Lumley, T., and Shaw, P. A. (2021). Raking and regression calibration: Methods to address bias from correlated covariate and time-to-event error.Statistics in Medicine, 40(3), 631-649.

Pang, M., Platt, R. W., Schuster, T., and Abrahamowicz, M. (2021a). Flexible extension of the accelerated failure time model to account for nonlinear and time-dependent effects of covariates on the hazard. Statistical Methods in Medical Research , 30(11), 2526-2542.

Pang, M., Platt, R. W., Schuster, T., and Abrahamowicz, M. (2021b). Spline-based accelerated failure time model. Statistics in Medicine, 40(2), 481-497.

van de Vijver, M. J., He, Y. D., van’t Veer, L. J., Dai, H., Hart, A. A.M., Voskuil, D. W. et al. 2002). A gene-expression signature as a predictor of survival in breast cancer. The New England Journal of Medicine, 347, 1999-2009.


Wang, Z., and Wang, C. Y. (2010). Buckley-James boosting for survival analysis with high-dimensional biomarker data. Statistical Applications in Genetics and Molecular Biology,9(1), 012008
描述 碩士
國立政治大學
統計學系
110354005
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0110354005
資料類型 thesis
dc.contributor.advisor 陳立榜zh_TW
dc.contributor.advisor Chen, Li-Pangen_US
dc.contributor.author (Authors) 黃筱庭zh_TW
dc.contributor.author (Authors) Huang, Hsiao-Tingen_US
dc.creator (作者) 黃筱庭zh_TW
dc.creator (作者) Huang, Hsiao-Tingen_US
dc.date (日期) 2023en_US
dc.date.accessioned 2-Aug-2023 13:03:09 (UTC+8)-
dc.date.available 2-Aug-2023 13:03:09 (UTC+8)-
dc.date.issued (上傳時間) 2-Aug-2023 13:03:09 (UTC+8)-
dc.identifier (Other Identifiers) G0110354005en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/146302-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 統計學系zh_TW
dc.description (描述) 110354005zh_TW
dc.description.abstract (摘要) 在存活分析中,常透過參數型加速失效模型描述自變數與存活時間之間的關係。在基於此模型架構並假設資料能被精準測量下,許多估計方法被提出以估計其參數。然而,自變數與存活時間之間的關係可能為非線性,且資料帶有測量誤差。在本論文中,我們考慮了資料帶有分類錯誤及測量誤差之函數型加速失效模型,並透過插入校正(insertion correction strategy)與迴歸校正(regression calibration)處理測量誤差,再利用提升法(boosting)估計自變數與活存時間之間的線型及非線性關係。從數值分析結果可知,提出之方法能提升估計表現並辨別重要變數,此方法並進一步應用在荷蘭癌症研究所提供之乳癌資料上以分析病人存活時間與基因表現之關係。zh_TW
dc.description.abstract (摘要) In survival analysis, accelerated failure time (AFT) models in the parametric form are commonly used to describe the relationship between survival time and covariates. Many methods have been proposed to estimate the parameter under this model with data assumed to be precisely measured. In applications, however, covariates are possibly non-linear with the survival time, which is possibly contaminated by measurement error. In this thesis, we consider the accelerated functional failure time model with survival data subject to measurement error. We use insertion correction strategy and regression calibration to correct for misclassification and error-prone survival time, respectively. Based on the corrected data, we use the boosting algorithm with the cubic spline estimation method to iteratively recover non-linear relationship between covariates and survival time. Theoretically, we justify the validity of measurement error correction and estimation procedure. Numerical studies show that the proposed method improves the performance of estimation and is able to capture informative covariates. The methodology is implemented to the breast cancer data provided by the Netherlands Cancer Institute for research.en_US
dc.description.tableofcontents Chapter 1 Introduction 1
Chapter 2 Notation and Models 3
2.1 Survival data 3
2.2 Functional Accelerated Failure Time Models 4
2.3 Measurement error model and misclassification 5
Chapter 3 Methodology 6
3.1 Correction of Measurement Error Effects 6
3.2 Boosting Estimation under the Corrected Survival Data 7
Chapter 4 AFFECT: An R Package Implementation 10
4.1 data_gen 11
4.2 ME_correction 12
4.3 Boosting 13
Chapter 5 Numerical Studies 13
5.1 Simulation Setup 14
5.2 Simulation Results 14
5.3 Real Data Analysis 16
Chapter 6 Summary 18
Reference 19
Appendix A 22
Algorithm 1: AFTER 24
zh_TW
dc.format.extent 8960653 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0110354005en_US
dc.subject (關鍵詞) 加速失效模型zh_TW
dc.subject (關鍵詞) 提升法zh_TW
dc.subject (關鍵詞) 測量誤差zh_TW
dc.subject (關鍵詞) 校正迴歸zh_TW
dc.subject (關鍵詞) 存活分析zh_TW
dc.subject (關鍵詞) 變數選擇zh_TW
dc.subject (關鍵詞) accelerated failure time modelen_US
dc.subject (關鍵詞) boostingen_US
dc.subject (關鍵詞) measurement erroren_US
dc.subject (關鍵詞) regression calibrationen_US
dc.subject (關鍵詞) survival analysisen_US
dc.subject (關鍵詞) variable selectionen_US
dc.title (題名) 基於應變數測量誤差之下的函數型加速失效模型估計法zh_TW
dc.title (題名) Estimation of Accelerated Functional Failure Time Models with Error-Prone Responseen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) Barnwal, A., Cho, H., and Hocking, T. (2022). Survival regression with accelerated failure time model in XGBoost. Journal of Computational and Graphical Statistics, 31(4), 1292-1302.

B ̈uhlmann, P. (2006). Boosting for high-dimensional linear models. The Annals of Statistics,34(2), 559-583.

B ̈uhlmann, P. and Yu, B. (2003). Boosting with the L2 loss: regression and classification. Journal of the American Statistical Association, 98(462), 324-339.

Carroll, R. J., Ruppert, D., Stefanski, L. A., and Crainiceanu, C. M. (2006). Measurement Error in Nonlinear Model. Chapman and Hall, Boca Raton, FL.

Chen, L.-P. (2018). Semeparametric estimation for the accelerated failure time model with length-biased ampling and covariate measurement error. Stat, 7:e209

Chen, L.-P. (2020). Variable selection and estimation for the additive hazards model subject to left-runcation, right-censoring and measurement error in covariates. Journal of Statistical Computation and Simulation, 90, 261-3300.

Chen, L.-P. and Yi, G. Y. (2020). Model selection and model averaging for analysis of truncated and censored data with measurement error. Electronic Journal of Statistics,14, 4054-4109.

Chen, L.-P. and Yi, G. Y. (2021a). Analysis of noisy survival data with graphical proportional hazards measurement error models. Biometrics, 77, 956-969.

Chen, L.-P. and Yi, G. Y. (2021b). Semiparametric methods for left-truncated and right-censored survival data with covariate measurement error. Annals of the Institute of Statistical Mathematics, 73, 481-517.

Chen, L.-P. and Yi, G. Y. (2023). Unbiased boosting estimation for censored survival data. Statistica Sinica. To appear. DOI: 10.5706/ss.202021.0050

Chen, Y., Jia, Z., Mercola, D., and Xie, X. (2013). A gradient boosting algorithm for survival analysis via direct optimization of concordance index. Computational and Mathematical Methods in Medicine, Article ID 873595, 1-8.

Gellar, J. E., Colantuoni, E., Needham, D. M., and Crainiceanu, C. M. (2015). Cox regression models with functional covariates for survival data. Statistical Modelling, 15(3), 256-278.

Huang, Y. and Wang, C.Y. (2000). Cox regression with accurate covariates unascertainable: a nonparametric correction approach. Journal of the American Statistical Association, 95, 1209-1219.

Huang, J., Ma, S., and Xie, H. (2006). Regularized estimation in the accelerated failure time model with high-dimensional covariates. Biometrics, 62(3), 813-820.

He, W., Yi, G. Y., and Xiong, J. (2007). Accelerated failure time models with covariates subject to measurement error. Statistics in Medicine, 26(26), 4817-4832.

Jin, Z., Lin, D. Y., Wei, L. J., and Ying, Z. (2003). Rank-based inference for the accelerated failure time model. Biometrika, 90(2), 341-353.

Kalbfleisch, J.D. and Prentice, R.L. (1980). The Statistical Analysis of Failure Time Data. Wiley, New York.

Lee, D. K., Chen, N., and Ishwaran, H. (2021). Boosted nonparametric hazards with time-dependent covariates. Annals of Statistics, 49(4), 2101-2128.

Li, H., and Luan, Y. (2005). Boosting proportional hazards models using smoothing splines, with applications to high-dimensional microarray data. Bioinformatics, 21(10), 2403-2409.

Lu, W. and Li, L. (2008). Boosting method for nonlinear transformation models with censored survival data. Biostatistics, 9(4), 658-667.

Miller, R.G. (1976). Least squares regression with censored data. Biometrika, 63, 449-64.

Miller, R.G. (1980). Survival Analysis. Wiley, New York.

Mustefa, Y. A., and Chen, D. G. (2021). Accelerated failure-time model with weighted least-squares estimation: application on survival of HIV positives. Archives of Public Health,2279(1), 88.

Oh, E. J., Shepherd, B. E., Lumley, T., and Shaw, P. A. (2021). Raking and regression calibration: Methods to address bias from correlated covariate and time-to-event error.Statistics in Medicine, 40(3), 631-649.

Pang, M., Platt, R. W., Schuster, T., and Abrahamowicz, M. (2021a). Flexible extension of the accelerated failure time model to account for nonlinear and time-dependent effects of covariates on the hazard. Statistical Methods in Medical Research , 30(11), 2526-2542.

Pang, M., Platt, R. W., Schuster, T., and Abrahamowicz, M. (2021b). Spline-based accelerated failure time model. Statistics in Medicine, 40(2), 481-497.

van de Vijver, M. J., He, Y. D., van’t Veer, L. J., Dai, H., Hart, A. A.M., Voskuil, D. W. et al. 2002). A gene-expression signature as a predictor of survival in breast cancer. The New England Journal of Medicine, 347, 1999-2009.


Wang, Z., and Wang, C. Y. (2010). Buckley-James boosting for survival analysis with high-dimensional biomarker data. Statistical Applications in Genetics and Molecular Biology,9(1), 012008
zh_TW