學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

題名 帶有高維度測量誤差之長度偏差與區間設限資料的提升方法
Boosting method for length-biased and interval-censored survival data subject to high-dimensional error-prone covariates
作者 邱邦旭
Qiu, Bang-Xu
貢獻者 陳立榜
Chen, Li-Pang
邱邦旭
Qiu, Bang-Xu
關鍵詞 加速失效模型
有偏抽樣
不完整數據
校正測量誤差
變數選取
SIMEX
AFT model
biased sampling
incomplete data
measurement error correction
variable selection
SIMEX
日期 2022
上傳時間 1-Aug-2022 17:17:00 (UTC+8)
摘要 長度偏差和區間設限資料分析是生存分析的一個重要課題,許多方法已被開發用來處理這種複雜的資料結構。然而現有的方法側重於低維資料,並假定協變數是精確測量的,而在應用中經常會收集到受測量誤差影響的高維數據。在本
篇論文中,我們提出了一種有效的推論方法來處理加速失效時間模型下協變數存在測量誤差的高維長度偏差和區間設限的生存資料。我們採用 SIMEX 方法來修正測量誤差的影響,並提出提升演算法來進行變數選擇和估計。所提出的方法能夠處理協變數的維度大於樣本量的情況,並能適應不同的協變數分佈。
Analysis of length-biased and interval-censored data is an important topic in survival analysis, and many methods have been developed to address this complex data structure. However, existing methods focus on low-dimensional data and assume the covariates to be precisely measured, while high-dimensional data subject to measurement error are frequently collected in applications. In this thesis, we explore a valid inference method for handling high-dimensional length-biased and interval-censored survival data with measurement error in covariates under the accelerated failure time model. We primarily employ the SIMEX method to correct for measurement error effects and propose the boosting procedure to do variable selection and estimation. The proposed method is able to handle the case that the dimension of covariates is larger than the sample size and enjoys appealing features that the distributions of the covariates are left unspecified.
參考文獻 Aktan, A. M., Kara, I., Sener, I., Bereket, C., Celik, S., Kirtay, M., Ciftci, M. E., and Arici, N. (2012). An evaluation of factors associated with persistent primary teeth. European
Journal of Orthodontics, 34, 208-212.
Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge, New York.
Brown, B., Miller, C. J., and Wolfson, J. (2017). ThrEEBoost: Thresholded boosting for variable selection and prediction via estimating equations. Journal of Computational and Graphical Statistics, 26, 579-588.
Cai, T. and Betensky, R. A. (2003). Hazard regression for interval-censored data with penalized spline. Biometrics, 59, 570-579
Carroll, R. J., Ruppert, D., Stefanski, L. A., and Crainiceanu, C. M. (2006). Measurement Error in Nonlinear Model, Chapman and Hall, New York
Chen, L.-P. (2018). Semiparametric estimation for the accelerated failure time model with length-biased sampling and covariate measurement error. Stat, 7, e209.
Chen, L.-P. (2019). Semiparametric estimation for cure survival model with left-truncated
and right-censored data and covariate measurement error. Statistics and Probability Letters, 154, 108547.
Chen, L.-P. (2020). Semiparametric estimation for the transformation model with length�biased data and covariate measurement error. Journal of Statistical Computation and
Simulation, 90, 420-442.
Chen, L.-P. (2021). Variable selection and estimation for the additive hazards model sub�ject to left-truncation, right-censoring and measurement error in covariates. Journal of Statistical Computation and Simulation, 90, 3261-3300.
Chen, L.-P. and Yi, G. Y. (2020). Model selection and model averaging for analysis of truncated and censored data with measurement error. Electronic Journal of Statistics, 14, 4054-4109.
Chen, L.-P. and Yi, G. Y. (2021a). Semiparametric methods for left-truncated and right�censored survival data with covariate measurement error. Annals of the Institute of
Statistical Mathematics, 73, 481–517.
Chen, L.-P. and Yi, G. Y. (2021b). Analysis of noisy survival data with graphical propor�tional hazards measurement error models. Biometrics, 77, 956–969.
Du, M. and Sun, J. (2021). Variable selection for interval-censored failure time data. Inter�national Statistical Review, 1-23.
Du, M., Zhao, H., and Sun, J. (2021). A unified approach to variable selection for Cox’s proportional hazards model with interval-censored failure time data. Statistical Methods in Medical Research, 30, 1833-1849.
Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.
Fu, W. and Simonoff, J. S. (2017). Survival trees for interval-censored survival data. Statis�tics in Medicine, 36, 4831-4842.
Gao, F., Zeng, D., and Lin, D. Y. (2017). Semiparametric estimation of the accelerated failure time model with partly interval-censored data. Biometrics, 73, 1161-1168.
Gao, F. and Chan, K. C. G. (2019). Semiparametric regression analysis of length-biased interval-censored data. Biometrics, 75, 121-132.
Hu, Q., Liang, Z., Liu, Y., Sun, J., Srivastava, D. K., and Robison, L. L. (2020). Nonpara�metric screening and feature selection for ultrahigh-dimensional Case II interval-censored failure time data. Biometrical Journal, 62, 1909–1925.
Huang, J. (1999). Asymptotic properties of nonparametric estimation based on partly interval-censored data. Statistica Sinica, 9, 501-519.
Kim, J. S. (2003). Maximum likelihood estimation for the proportional hazards model with partly interval-censored data. Journal of the Royal Statistical Society, Series B, 65, 489-502.
Kom´arek, A. and Lesaffre, E. (2007). Bayesian accelerated failure time model for correlated interval-censored data with a normal mixture as an error distribution. Statistica Sinica, 17, 549–569.
K¨uchenhoff, H., Lederer, W., and Lesaffre, E. (2007). Asymptotic variance estimation for the misclassification SIMEX. Computational Statistics & Data Analysis, 51, 6197-6211.
K¨uchenhoff, H., Mwalili, S. M., and Leasaffre, E. (2006). A general method for dealing with misclassificationin regression: The misclassification SIMEX. Biometrics, 62, 85-96.
Lawless, J. F. (2003). Statistical Models and Methods for Lifetime Data. Wiley, New York.
Mandal, S., Wang, S., and Sinha, S. (2019). Analysis of linear transformation models with covariate measurement error and interval censoring. Statistics in Medicine, 38, 4642-4655.
Ning, J., Qin, J., and Shen, Y. (2011). Buckley-James-type estimator with right-censored and length-biased data. Biometrics, 67, 1369-1378.
Qiu, Z., Qin, J., and Zhou, Y. (2016). Composite estimating equation method for the accelerated failure time model with length-biased sampling data. Scandinavian Journal of Statistics, 43, 396-415.
Scolas, S., Ghouch, A. E., Legrand, C., and Oulhaj, A. (2016). Variable selection in a flexible parametric mixture cure model with interval-censored data. Statistics in Medicine, 35,1210-1225.
Song, X. and Ma, S. (2008). Multiple augmentation for interval-censored data with mea�surement error. Statistics in Medicine, 27, 3178-3190.
Sun, L., Li, S., Wang, L., and Song, X. (2021). Simultaneous variable selection in regression analysis of multivariate interval-censored data. Biometrics, 1-12.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58, 267-288.
Wang, L., McMahan, C. S., Hudgens, M. G., and Qureshi, Z. P. (2016). A flexible, computationally efficient method for fitting the proportional hazards model to interval-censored
data. Biometrics, 72, 222-231.
Wang, P., Li, D., and Sun, J. (2021). A pairwise pseudo-likelihood approach for left�truncated and interval-censored data under the Cox model. Biometrics, 77, 1303-1314.
Wen, C.-C. and Chen, Y.-H. (2014). Functional inference for interval-censored data in proportional odds model with covariate measurement error. Statistica Sinica, 24, 1301-
1317.
Wolfson, J. (2011). EEBOOST: a general method for prediction and variable selection based on estimating equation. Journal of the American Statistical Association, 106, 296-305.
Wu, Y. and Cook, R. J. (2015). Penalized regression for interval-censored times of disease progression: selection of HLA markers in psoriatic arthritis. Biometrics, 71, 782-791.
Yao, W., Frydman, H., and Simonoff, J. S. (2019). An ensemble method for interval-censored time-to-event data. Biostatistics, 22, 198-213.
Yavuz, A. C¸ . and Lambert, P. (2011). Smooth estimation of survival functions and hazard ratios from interval-censored data using Bayesian penalized B-splines. Statistics in
Medicine, 30 75-90.
Zhang, T. and Yu, B. (2005). Boosting with early stopping: convergence and consistency. The Annals of Statistics, 33, 1538-1579.
Zhao, H., Wu, Q., Li, G., and Sun, J. (2020). Simultaneous estimation and variable selec�tion for interval-censored data With broken adaptive ridge regression. Journal of the
American Statistical Association, 115, 204-216.
Zhao, X., Zhao, Q., Sun, J., and Kim, J. S. (2008). Generalized log-rank tests for partly interval-censored failure time data. Biometrical Journal, 50, 375-385.
Zhou, Q., Hu, T., and Sun, J. (2017). A sieve semiparametric maximum likelihood approach for regression analysis of bivariate interval-censored failure time data. Journal of the American Statistical Association, 112, 664-672.
Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B, 67, 301-320.
Zou, H. (2006). The adaptive Lasso and its oracle properties. Journal of the American Statistical Association. 101, 1418–1429.
描述 碩士
國立政治大學
統計學系
109354029
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0109354029
資料類型 thesis
dc.contributor.advisor 陳立榜zh_TW
dc.contributor.advisor Chen, Li-Pangen_US
dc.contributor.author (Authors) 邱邦旭zh_TW
dc.contributor.author (Authors) Qiu, Bang-Xuen_US
dc.creator (作者) 邱邦旭zh_TW
dc.creator (作者) Qiu, Bang-Xuen_US
dc.date (日期) 2022en_US
dc.date.accessioned 1-Aug-2022 17:17:00 (UTC+8)-
dc.date.available 1-Aug-2022 17:17:00 (UTC+8)-
dc.date.issued (上傳時間) 1-Aug-2022 17:17:00 (UTC+8)-
dc.identifier (Other Identifiers) G0109354029en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/141013-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 統計學系zh_TW
dc.description (描述) 109354029zh_TW
dc.description.abstract (摘要) 長度偏差和區間設限資料分析是生存分析的一個重要課題,許多方法已被開發用來處理這種複雜的資料結構。然而現有的方法側重於低維資料,並假定協變數是精確測量的,而在應用中經常會收集到受測量誤差影響的高維數據。在本
篇論文中,我們提出了一種有效的推論方法來處理加速失效時間模型下協變數存在測量誤差的高維長度偏差和區間設限的生存資料。我們採用 SIMEX 方法來修正測量誤差的影響,並提出提升演算法來進行變數選擇和估計。所提出的方法能夠處理協變數的維度大於樣本量的情況,並能適應不同的協變數分佈。
zh_TW
dc.description.abstract (摘要) Analysis of length-biased and interval-censored data is an important topic in survival analysis, and many methods have been developed to address this complex data structure. However, existing methods focus on low-dimensional data and assume the covariates to be precisely measured, while high-dimensional data subject to measurement error are frequently collected in applications. In this thesis, we explore a valid inference method for handling high-dimensional length-biased and interval-censored survival data with measurement error in covariates under the accelerated failure time model. We primarily employ the SIMEX method to correct for measurement error effects and propose the boosting procedure to do variable selection and estimation. The proposed method is able to handle the case that the dimension of covariates is larger than the sample size and enjoys appealing features that the distributions of the covariates are left unspecified.en_US
dc.description.tableofcontents Abstract I
Table of Contents II
Tables III
Figures IV
Chapter 1 Introduction 1
Chapter 2 Notation and Models 3
2.1 Length-Biased and Partly Interval-Censored Data 3
2.2 Accelerated Failure Time Models 4
2.3 Measurement Error Models 7
Chapter 3 Methodology 8
3.1 SIMEXBoost 9
3.2 SIMEXBoost with Collinearity in Covariates 12
Chapter 4 Numerical Studies 13
4.1 Simulation Setup 13
4.2 Simulation Results 14
4.3 Application to The Signal Tandmobiel Study 16
Chapter 5 Summary 19
Reference 20
zh_TW
dc.format.extent 1292826 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0109354029en_US
dc.subject (關鍵詞) 加速失效模型zh_TW
dc.subject (關鍵詞) 有偏抽樣zh_TW
dc.subject (關鍵詞) 不完整數據zh_TW
dc.subject (關鍵詞) 校正測量誤差zh_TW
dc.subject (關鍵詞) 變數選取zh_TW
dc.subject (關鍵詞) SIMEXzh_TW
dc.subject (關鍵詞) AFT modelen_US
dc.subject (關鍵詞) biased samplingen_US
dc.subject (關鍵詞) incomplete dataen_US
dc.subject (關鍵詞) measurement error correctionen_US
dc.subject (關鍵詞) variable selectionen_US
dc.subject (關鍵詞) SIMEXen_US
dc.title (題名) 帶有高維度測量誤差之長度偏差與區間設限資料的提升方法zh_TW
dc.title (題名) Boosting method for length-biased and interval-censored survival data subject to high-dimensional error-prone covariatesen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) Aktan, A. M., Kara, I., Sener, I., Bereket, C., Celik, S., Kirtay, M., Ciftci, M. E., and Arici, N. (2012). An evaluation of factors associated with persistent primary teeth. European
Journal of Orthodontics, 34, 208-212.
Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge, New York.
Brown, B., Miller, C. J., and Wolfson, J. (2017). ThrEEBoost: Thresholded boosting for variable selection and prediction via estimating equations. Journal of Computational and Graphical Statistics, 26, 579-588.
Cai, T. and Betensky, R. A. (2003). Hazard regression for interval-censored data with penalized spline. Biometrics, 59, 570-579
Carroll, R. J., Ruppert, D., Stefanski, L. A., and Crainiceanu, C. M. (2006). Measurement Error in Nonlinear Model, Chapman and Hall, New York
Chen, L.-P. (2018). Semiparametric estimation for the accelerated failure time model with length-biased sampling and covariate measurement error. Stat, 7, e209.
Chen, L.-P. (2019). Semiparametric estimation for cure survival model with left-truncated
and right-censored data and covariate measurement error. Statistics and Probability Letters, 154, 108547.
Chen, L.-P. (2020). Semiparametric estimation for the transformation model with length�biased data and covariate measurement error. Journal of Statistical Computation and
Simulation, 90, 420-442.
Chen, L.-P. (2021). Variable selection and estimation for the additive hazards model sub�ject to left-truncation, right-censoring and measurement error in covariates. Journal of Statistical Computation and Simulation, 90, 3261-3300.
Chen, L.-P. and Yi, G. Y. (2020). Model selection and model averaging for analysis of truncated and censored data with measurement error. Electronic Journal of Statistics, 14, 4054-4109.
Chen, L.-P. and Yi, G. Y. (2021a). Semiparametric methods for left-truncated and right�censored survival data with covariate measurement error. Annals of the Institute of
Statistical Mathematics, 73, 481–517.
Chen, L.-P. and Yi, G. Y. (2021b). Analysis of noisy survival data with graphical propor�tional hazards measurement error models. Biometrics, 77, 956–969.
Du, M. and Sun, J. (2021). Variable selection for interval-censored failure time data. Inter�national Statistical Review, 1-23.
Du, M., Zhao, H., and Sun, J. (2021). A unified approach to variable selection for Cox’s proportional hazards model with interval-censored failure time data. Statistical Methods in Medical Research, 30, 1833-1849.
Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.
Fu, W. and Simonoff, J. S. (2017). Survival trees for interval-censored survival data. Statis�tics in Medicine, 36, 4831-4842.
Gao, F., Zeng, D., and Lin, D. Y. (2017). Semiparametric estimation of the accelerated failure time model with partly interval-censored data. Biometrics, 73, 1161-1168.
Gao, F. and Chan, K. C. G. (2019). Semiparametric regression analysis of length-biased interval-censored data. Biometrics, 75, 121-132.
Hu, Q., Liang, Z., Liu, Y., Sun, J., Srivastava, D. K., and Robison, L. L. (2020). Nonpara�metric screening and feature selection for ultrahigh-dimensional Case II interval-censored failure time data. Biometrical Journal, 62, 1909–1925.
Huang, J. (1999). Asymptotic properties of nonparametric estimation based on partly interval-censored data. Statistica Sinica, 9, 501-519.
Kim, J. S. (2003). Maximum likelihood estimation for the proportional hazards model with partly interval-censored data. Journal of the Royal Statistical Society, Series B, 65, 489-502.
Kom´arek, A. and Lesaffre, E. (2007). Bayesian accelerated failure time model for correlated interval-censored data with a normal mixture as an error distribution. Statistica Sinica, 17, 549–569.
K¨uchenhoff, H., Lederer, W., and Lesaffre, E. (2007). Asymptotic variance estimation for the misclassification SIMEX. Computational Statistics & Data Analysis, 51, 6197-6211.
K¨uchenhoff, H., Mwalili, S. M., and Leasaffre, E. (2006). A general method for dealing with misclassificationin regression: The misclassification SIMEX. Biometrics, 62, 85-96.
Lawless, J. F. (2003). Statistical Models and Methods for Lifetime Data. Wiley, New York.
Mandal, S., Wang, S., and Sinha, S. (2019). Analysis of linear transformation models with covariate measurement error and interval censoring. Statistics in Medicine, 38, 4642-4655.
Ning, J., Qin, J., and Shen, Y. (2011). Buckley-James-type estimator with right-censored and length-biased data. Biometrics, 67, 1369-1378.
Qiu, Z., Qin, J., and Zhou, Y. (2016). Composite estimating equation method for the accelerated failure time model with length-biased sampling data. Scandinavian Journal of Statistics, 43, 396-415.
Scolas, S., Ghouch, A. E., Legrand, C., and Oulhaj, A. (2016). Variable selection in a flexible parametric mixture cure model with interval-censored data. Statistics in Medicine, 35,1210-1225.
Song, X. and Ma, S. (2008). Multiple augmentation for interval-censored data with mea�surement error. Statistics in Medicine, 27, 3178-3190.
Sun, L., Li, S., Wang, L., and Song, X. (2021). Simultaneous variable selection in regression analysis of multivariate interval-censored data. Biometrics, 1-12.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58, 267-288.
Wang, L., McMahan, C. S., Hudgens, M. G., and Qureshi, Z. P. (2016). A flexible, computationally efficient method for fitting the proportional hazards model to interval-censored
data. Biometrics, 72, 222-231.
Wang, P., Li, D., and Sun, J. (2021). A pairwise pseudo-likelihood approach for left�truncated and interval-censored data under the Cox model. Biometrics, 77, 1303-1314.
Wen, C.-C. and Chen, Y.-H. (2014). Functional inference for interval-censored data in proportional odds model with covariate measurement error. Statistica Sinica, 24, 1301-
1317.
Wolfson, J. (2011). EEBOOST: a general method for prediction and variable selection based on estimating equation. Journal of the American Statistical Association, 106, 296-305.
Wu, Y. and Cook, R. J. (2015). Penalized regression for interval-censored times of disease progression: selection of HLA markers in psoriatic arthritis. Biometrics, 71, 782-791.
Yao, W., Frydman, H., and Simonoff, J. S. (2019). An ensemble method for interval-censored time-to-event data. Biostatistics, 22, 198-213.
Yavuz, A. C¸ . and Lambert, P. (2011). Smooth estimation of survival functions and hazard ratios from interval-censored data using Bayesian penalized B-splines. Statistics in
Medicine, 30 75-90.
Zhang, T. and Yu, B. (2005). Boosting with early stopping: convergence and consistency. The Annals of Statistics, 33, 1538-1579.
Zhao, H., Wu, Q., Li, G., and Sun, J. (2020). Simultaneous estimation and variable selec�tion for interval-censored data With broken adaptive ridge regression. Journal of the
American Statistical Association, 115, 204-216.
Zhao, X., Zhao, Q., Sun, J., and Kim, J. S. (2008). Generalized log-rank tests for partly interval-censored failure time data. Biometrical Journal, 50, 375-385.
Zhou, Q., Hu, T., and Sun, J. (2017). A sieve semiparametric maximum likelihood approach for regression analysis of bivariate interval-censored failure time data. Journal of the American Statistical Association, 112, 664-672.
Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B, 67, 301-320.
Zou, H. (2006). The adaptive Lasso and its oracle properties. Journal of the American Statistical Association. 101, 1418–1429.
zh_TW
dc.identifier.doi (DOI) 10.6814/NCCU202200857en_US