帶有高維度測量誤差之長度偏差與區間設限資料的提升方法

Publications-Theses

Article View/Open

pdf(187)

Publication Export

Google Scholar^TM

NCCU Library

Discovery System

Citation Infomation

Related Publications in TAIR

Simple Record
Full Record

題名	帶有高維度測量誤差之長度偏差與區間設限資料的提升方法 Boosting method for length-biased and interval-censored survival data subject to high-dimensional error-prone covariates
作者	邱邦旭 Qiu, Bang-Xu
貢獻者	陳立榜 Chen, Li-Pang 邱邦旭 Qiu, Bang-Xu
關鍵詞	加速失效模型有偏抽樣不完整數據校正測量誤差變數選取 SIMEX AFT model biased sampling incomplete data measurement error correction variable selection SIMEX
日期	2022
上傳時間	1-Aug-2022 17:17:00 (UTC+8)
摘要	長度偏差和區間設限資料分析是生存分析的一個重要課題，許多方法已被開發用來處理這種複雜的資料結構。然而現有的方法側重於低維資料，並假定協變數是精確測量的，而在應用中經常會收集到受測量誤差影響的高維數據。在本篇論文中，我們提出了一種有效的推論方法來處理加速失效時間模型下協變數存在測量誤差的高維長度偏差和區間設限的生存資料。我們採用 SIMEX 方法來修正測量誤差的影響，並提出提升演算法來進行變數選擇和估計。所提出的方法能夠處理協變數的維度大於樣本量的情況，並能適應不同的協變數分佈。 Analysis of length-biased and interval-censored data is an important topic in survival analysis, and many methods have been developed to address this complex data structure. However, existing methods focus on low-dimensional data and assume the covariates to be precisely measured, while high-dimensional data subject to measurement error are frequently collected in applications. In this thesis, we explore a valid inference method for handling high-dimensional length-biased and interval-censored survival data with measurement error in covariates under the accelerated failure time model. We primarily employ the SIMEX method to correct for measurement error effects and propose the boosting procedure to do variable selection and estimation. The proposed method is able to handle the case that the dimension of covariates is larger than the sample size and enjoys appealing features that the distributions of the covariates are left unspecified.
參考文獻	Aktan, A. M., Kara, I., Sener, I., Bereket, C., Celik, S., Kirtay, M., Ciftci, M. E., and Arici, N. (2012). An evaluation of factors associated with persistent primary teeth. European Journal of Orthodontics, 34, 208-212. Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge, New York. Brown, B., Miller, C. J., and Wolfson, J. (2017). ThrEEBoost: Thresholded boosting for variable selection and prediction via estimating equations. Journal of Computational and Graphical Statistics, 26, 579-588. Cai, T. and Betensky, R. A. (2003). Hazard regression for interval-censored data with penalized spline. Biometrics, 59, 570-579 Carroll, R. J., Ruppert, D., Stefanski, L. A., and Crainiceanu, C. M. (2006). Measurement Error in Nonlinear Model, Chapman and Hall, New York Chen, L.-P. (2018). Semiparametric estimation for the accelerated failure time model with length-biased sampling and covariate measurement error. Stat, 7, e209. Chen, L.-P. (2019). Semiparametric estimation for cure survival model with left-truncated and right-censored data and covariate measurement error. Statistics and Probability Letters, 154, 108547. Chen, L.-P. (2020). Semiparametric estimation for the transformation model with length�biased data and covariate measurement error. Journal of Statistical Computation and Simulation, 90, 420-442. Chen, L.-P. (2021). Variable selection and estimation for the additive hazards model sub�ject to left-truncation, right-censoring and measurement error in covariates. Journal of Statistical Computation and Simulation, 90, 3261-3300. Chen, L.-P. and Yi, G. Y. (2020). Model selection and model averaging for analysis of truncated and censored data with measurement error. Electronic Journal of Statistics, 14, 4054-4109. Chen, L.-P. and Yi, G. Y. (2021a). Semiparametric methods for left-truncated and right�censored survival data with covariate measurement error. Annals of the Institute of Statistical Mathematics, 73, 481–517. Chen, L.-P. and Yi, G. Y. (2021b). Analysis of noisy survival data with graphical propor�tional hazards measurement error models. Biometrics, 77, 956–969. Du, M. and Sun, J. (2021). Variable selection for interval-censored failure time data. Inter�national Statistical Review, 1-23. Du, M., Zhao, H., and Sun, J. (2021). A unified approach to variable selection for Cox’s proportional hazards model with interval-censored failure time data. Statistical Methods in Medical Research, 30, 1833-1849. Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360. Fu, W. and Simonoff, J. S. (2017). Survival trees for interval-censored survival data. Statis�tics in Medicine, 36, 4831-4842. Gao, F., Zeng, D., and Lin, D. Y. (2017). Semiparametric estimation of the accelerated failure time model with partly interval-censored data. Biometrics, 73, 1161-1168. Gao, F. and Chan, K. C. G. (2019). Semiparametric regression analysis of length-biased interval-censored data. Biometrics, 75, 121-132. Hu, Q., Liang, Z., Liu, Y., Sun, J., Srivastava, D. K., and Robison, L. L. (2020). Nonpara�metric screening and feature selection for ultrahigh-dimensional Case II interval-censored failure time data. Biometrical Journal, 62, 1909–1925. Huang, J. (1999). Asymptotic properties of nonparametric estimation based on partly interval-censored data. Statistica Sinica, 9, 501-519. Kim, J. S. (2003). Maximum likelihood estimation for the proportional hazards model with partly interval-censored data. Journal of the Royal Statistical Society, Series B, 65, 489-502. Kom´arek, A. and Lesaffre, E. (2007). Bayesian accelerated failure time model for correlated interval-censored data with a normal mixture as an error distribution. Statistica Sinica, 17, 549–569. K¨uchenhoff, H., Lederer, W., and Lesaffre, E. (2007). Asymptotic variance estimation for the misclassification SIMEX. Computational Statistics & Data Analysis, 51, 6197-6211. K¨uchenhoff, H., Mwalili, S. M., and Leasaffre, E. (2006). A general method for dealing with misclassificationin regression: The misclassification SIMEX. Biometrics, 62, 85-96. Lawless, J. F. (2003). Statistical Models and Methods for Lifetime Data. Wiley, New York. Mandal, S., Wang, S., and Sinha, S. (2019). Analysis of linear transformation models with covariate measurement error and interval censoring. Statistics in Medicine, 38, 4642-4655. Ning, J., Qin, J., and Shen, Y. (2011). Buckley-James-type estimator with right-censored and length-biased data. Biometrics, 67, 1369-1378. Qiu, Z., Qin, J., and Zhou, Y. (2016). Composite estimating equation method for the accelerated failure time model with length-biased sampling data. Scandinavian Journal of Statistics, 43, 396-415. Scolas, S., Ghouch, A. E., Legrand, C., and Oulhaj, A. (2016). Variable selection in a flexible parametric mixture cure model with interval-censored data. Statistics in Medicine, 35,1210-1225. Song, X. and Ma, S. (2008). Multiple augmentation for interval-censored data with mea�surement error. Statistics in Medicine, 27, 3178-3190. Sun, L., Li, S., Wang, L., and Song, X. (2021). Simultaneous variable selection in regression analysis of multivariate interval-censored data. Biometrics, 1-12. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58, 267-288. Wang, L., McMahan, C. S., Hudgens, M. G., and Qureshi, Z. P. (2016). A flexible, computationally efficient method for fitting the proportional hazards model to interval-censored data. Biometrics, 72, 222-231. Wang, P., Li, D., and Sun, J. (2021). A pairwise pseudo-likelihood approach for left�truncated and interval-censored data under the Cox model. Biometrics, 77, 1303-1314. Wen, C.-C. and Chen, Y.-H. (2014). Functional inference for interval-censored data in proportional odds model with covariate measurement error. Statistica Sinica, 24, 1301- 1317. Wolfson, J. (2011). EEBOOST: a general method for prediction and variable selection based on estimating equation. Journal of the American Statistical Association, 106, 296-305. Wu, Y. and Cook, R. J. (2015). Penalized regression for interval-censored times of disease progression: selection of HLA markers in psoriatic arthritis. Biometrics, 71, 782-791. Yao, W., Frydman, H., and Simonoff, J. S. (2019). An ensemble method for interval-censored time-to-event data. Biostatistics, 22, 198-213. Yavuz, A. C¸ . and Lambert, P. (2011). Smooth estimation of survival functions and hazard ratios from interval-censored data using Bayesian penalized B-splines. Statistics in Medicine, 30 75-90. Zhang, T. and Yu, B. (2005). Boosting with early stopping: convergence and consistency. The Annals of Statistics, 33, 1538-1579. Zhao, H., Wu, Q., Li, G., and Sun, J. (2020). Simultaneous estimation and variable selec�tion for interval-censored data With broken adaptive ridge regression. Journal of the American Statistical Association, 115, 204-216. Zhao, X., Zhao, Q., Sun, J., and Kim, J. S. (2008). Generalized log-rank tests for partly interval-censored failure time data. Biometrical Journal, 50, 375-385. Zhou, Q., Hu, T., and Sun, J. (2017). A sieve semiparametric maximum likelihood approach for regression analysis of bivariate interval-censored failure time data. Journal of the American Statistical Association, 112, 664-672. Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B, 67, 301-320. Zou, H. (2006). The adaptive Lasso and its oracle properties. Journal of the American Statistical Association. 101, 1418–1429.
描述	碩士國立政治大學統計學系 109354029
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0109354029
資料類型	thesis

dc.contributor.advisor	陳立榜	zh_TW
dc.contributor.advisor	Chen, Li-Pang	en_US
dc.contributor.author (Authors)	邱邦旭	zh_TW
dc.contributor.author (Authors)	Qiu, Bang-Xu	en_US
dc.creator (作者)	邱邦旭	zh_TW
dc.creator (作者)	Qiu, Bang-Xu	en_US
dc.date (日期)	2022	en_US
dc.date.accessioned	1-Aug-2022 17:17:00 (UTC+8)	-
dc.date.available	1-Aug-2022 17:17:00 (UTC+8)	-
dc.date.issued (上傳時間)	1-Aug-2022 17:17:00 (UTC+8)	-
dc.identifier (Other Identifiers)	G0109354029	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/141013	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	統計學系	zh_TW
dc.description (描述)	109354029	zh_TW
dc.description.abstract (摘要)	長度偏差和區間設限資料分析是生存分析的一個重要課題，許多方法已被開發用來處理這種複雜的資料結構。然而現有的方法側重於低維資料，並假定協變數是精確測量的，而在應用中經常會收集到受測量誤差影響的高維數據。在本篇論文中，我們提出了一種有效的推論方法來處理加速失效時間模型下協變數存在測量誤差的高維長度偏差和區間設限的生存資料。我們採用 SIMEX 方法來修正測量誤差的影響，並提出提升演算法來進行變數選擇和估計。所提出的方法能夠處理協變數的維度大於樣本量的情況，並能適應不同的協變數分佈。	zh_TW
dc.description.abstract (摘要)	Analysis of length-biased and interval-censored data is an important topic in survival analysis, and many methods have been developed to address this complex data structure. However, existing methods focus on low-dimensional data and assume the covariates to be precisely measured, while high-dimensional data subject to measurement error are frequently collected in applications. In this thesis, we explore a valid inference method for handling high-dimensional length-biased and interval-censored survival data with measurement error in covariates under the accelerated failure time model. We primarily employ the SIMEX method to correct for measurement error effects and propose the boosting procedure to do variable selection and estimation. The proposed method is able to handle the case that the dimension of covariates is larger than the sample size and enjoys appealing features that the distributions of the covariates are left unspecified.	en_US
dc.description.tableofcontents	Abstract I Table of Contents II Tables III Figures IV Chapter 1 Introduction 1 Chapter 2 Notation and Models 3 2.1 Length-Biased and Partly Interval-Censored Data 3 2.2 Accelerated Failure Time Models 4 2.3 Measurement Error Models 7 Chapter 3 Methodology 8 3.1 SIMEXBoost 9 3.2 SIMEXBoost with Collinearity in Covariates 12 Chapter 4 Numerical Studies 13 4.1 Simulation Setup 13 4.2 Simulation Results 14 4.3 Application to The Signal Tandmobiel Study 16 Chapter 5 Summary 19 Reference 20	zh_TW
dc.format.extent	1292826 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0109354029	en_US
dc.subject (關鍵詞)	加速失效模型	zh_TW
dc.subject (關鍵詞)	有偏抽樣	zh_TW
dc.subject (關鍵詞)	不完整數據	zh_TW
dc.subject (關鍵詞)	校正測量誤差	zh_TW
dc.subject (關鍵詞)	變數選取	zh_TW
dc.subject (關鍵詞)	SIMEX	zh_TW
dc.subject (關鍵詞)	AFT model	en_US
dc.subject (關鍵詞)	biased sampling	en_US
dc.subject (關鍵詞)	incomplete data	en_US
dc.subject (關鍵詞)	measurement error correction	en_US
dc.subject (關鍵詞)	variable selection	en_US
dc.subject (關鍵詞)	SIMEX	en_US
dc.title (題名)	帶有高維度測量誤差之長度偏差與區間設限資料的提升方法	zh_TW
dc.title (題名)	Boosting method for length-biased and interval-censored survival data subject to high-dimensional error-prone covariates	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	Aktan, A. M., Kara, I., Sener, I., Bereket, C., Celik, S., Kirtay, M., Ciftci, M. E., and Arici, N. (2012). An evaluation of factors associated with persistent primary teeth. European Journal of Orthodontics, 34, 208-212. Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge, New York. Brown, B., Miller, C. J., and Wolfson, J. (2017). ThrEEBoost: Thresholded boosting for variable selection and prediction via estimating equations. Journal of Computational and Graphical Statistics, 26, 579-588. Cai, T. and Betensky, R. A. (2003). Hazard regression for interval-censored data with penalized spline. Biometrics, 59, 570-579 Carroll, R. J., Ruppert, D., Stefanski, L. A., and Crainiceanu, C. M. (2006). Measurement Error in Nonlinear Model, Chapman and Hall, New York Chen, L.-P. (2018). Semiparametric estimation for the accelerated failure time model with length-biased sampling and covariate measurement error. Stat, 7, e209. Chen, L.-P. (2019). Semiparametric estimation for cure survival model with left-truncated and right-censored data and covariate measurement error. Statistics and Probability Letters, 154, 108547. Chen, L.-P. (2020). Semiparametric estimation for the transformation model with length�biased data and covariate measurement error. Journal of Statistical Computation and Simulation, 90, 420-442. Chen, L.-P. (2021). Variable selection and estimation for the additive hazards model sub�ject to left-truncation, right-censoring and measurement error in covariates. Journal of Statistical Computation and Simulation, 90, 3261-3300. Chen, L.-P. and Yi, G. Y. (2020). Model selection and model averaging for analysis of truncated and censored data with measurement error. Electronic Journal of Statistics, 14, 4054-4109. Chen, L.-P. and Yi, G. Y. (2021a). Semiparametric methods for left-truncated and right�censored survival data with covariate measurement error. Annals of the Institute of Statistical Mathematics, 73, 481–517. Chen, L.-P. and Yi, G. Y. (2021b). Analysis of noisy survival data with graphical propor�tional hazards measurement error models. Biometrics, 77, 956–969. Du, M. and Sun, J. (2021). Variable selection for interval-censored failure time data. Inter�national Statistical Review, 1-23. Du, M., Zhao, H., and Sun, J. (2021). A unified approach to variable selection for Cox’s proportional hazards model with interval-censored failure time data. Statistical Methods in Medical Research, 30, 1833-1849. Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360. Fu, W. and Simonoff, J. S. (2017). Survival trees for interval-censored survival data. Statis�tics in Medicine, 36, 4831-4842. Gao, F., Zeng, D., and Lin, D. Y. (2017). Semiparametric estimation of the accelerated failure time model with partly interval-censored data. Biometrics, 73, 1161-1168. Gao, F. and Chan, K. C. G. (2019). Semiparametric regression analysis of length-biased interval-censored data. Biometrics, 75, 121-132. Hu, Q., Liang, Z., Liu, Y., Sun, J., Srivastava, D. K., and Robison, L. L. (2020). Nonpara�metric screening and feature selection for ultrahigh-dimensional Case II interval-censored failure time data. Biometrical Journal, 62, 1909–1925. Huang, J. (1999). Asymptotic properties of nonparametric estimation based on partly interval-censored data. Statistica Sinica, 9, 501-519. Kim, J. S. (2003). Maximum likelihood estimation for the proportional hazards model with partly interval-censored data. Journal of the Royal Statistical Society, Series B, 65, 489-502. Kom´arek, A. and Lesaffre, E. (2007). Bayesian accelerated failure time model for correlated interval-censored data with a normal mixture as an error distribution. Statistica Sinica, 17, 549–569. K¨uchenhoff, H., Lederer, W., and Lesaffre, E. (2007). Asymptotic variance estimation for the misclassification SIMEX. Computational Statistics & Data Analysis, 51, 6197-6211. K¨uchenhoff, H., Mwalili, S. M., and Leasaffre, E. (2006). A general method for dealing with misclassificationin regression: The misclassification SIMEX. Biometrics, 62, 85-96. Lawless, J. F. (2003). Statistical Models and Methods for Lifetime Data. Wiley, New York. Mandal, S., Wang, S., and Sinha, S. (2019). Analysis of linear transformation models with covariate measurement error and interval censoring. Statistics in Medicine, 38, 4642-4655. Ning, J., Qin, J., and Shen, Y. (2011). Buckley-James-type estimator with right-censored and length-biased data. Biometrics, 67, 1369-1378. Qiu, Z., Qin, J., and Zhou, Y. (2016). Composite estimating equation method for the accelerated failure time model with length-biased sampling data. Scandinavian Journal of Statistics, 43, 396-415. Scolas, S., Ghouch, A. E., Legrand, C., and Oulhaj, A. (2016). Variable selection in a flexible parametric mixture cure model with interval-censored data. Statistics in Medicine, 35,1210-1225. Song, X. and Ma, S. (2008). Multiple augmentation for interval-censored data with mea�surement error. Statistics in Medicine, 27, 3178-3190. Sun, L., Li, S., Wang, L., and Song, X. (2021). Simultaneous variable selection in regression analysis of multivariate interval-censored data. Biometrics, 1-12. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58, 267-288. Wang, L., McMahan, C. S., Hudgens, M. G., and Qureshi, Z. P. (2016). A flexible, computationally efficient method for fitting the proportional hazards model to interval-censored data. Biometrics, 72, 222-231. Wang, P., Li, D., and Sun, J. (2021). A pairwise pseudo-likelihood approach for left�truncated and interval-censored data under the Cox model. Biometrics, 77, 1303-1314. Wen, C.-C. and Chen, Y.-H. (2014). Functional inference for interval-censored data in proportional odds model with covariate measurement error. Statistica Sinica, 24, 1301- 1317. Wolfson, J. (2011). EEBOOST: a general method for prediction and variable selection based on estimating equation. Journal of the American Statistical Association, 106, 296-305. Wu, Y. and Cook, R. J. (2015). Penalized regression for interval-censored times of disease progression: selection of HLA markers in psoriatic arthritis. Biometrics, 71, 782-791. Yao, W., Frydman, H., and Simonoff, J. S. (2019). An ensemble method for interval-censored time-to-event data. Biostatistics, 22, 198-213. Yavuz, A. C¸ . and Lambert, P. (2011). Smooth estimation of survival functions and hazard ratios from interval-censored data using Bayesian penalized B-splines. Statistics in Medicine, 30 75-90. Zhang, T. and Yu, B. (2005). Boosting with early stopping: convergence and consistency. The Annals of Statistics, 33, 1538-1579. Zhao, H., Wu, Q., Li, G., and Sun, J. (2020). Simultaneous estimation and variable selec�tion for interval-censored data With broken adaptive ridge regression. Journal of the American Statistical Association, 115, 204-216. Zhao, X., Zhao, Q., Sun, J., and Kim, J. S. (2008). Generalized log-rank tests for partly interval-censored failure time data. Biometrical Journal, 50, 375-385. Zhou, Q., Hu, T., and Sun, J. (2017). A sieve semiparametric maximum likelihood approach for regression analysis of bivariate interval-censored failure time data. Journal of the American Statistical Association, 112, 664-672. Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B, 67, 301-320. Zou, H. (2006). The adaptive Lasso and its oracle properties. Journal of the American Statistical Association. 101, 1418–1429.	zh_TW
dc.identifier.doi (DOI)	10.6814/NCCU202200857	en_US

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM