帶有錯誤分類與測量誤差數據的高維度變數選取與估計

學術產出-學位論文

文章檢視/開啟

pdf(0)

書目匯出

Google Scholar^TM

政大圖書館

學術資源探索系統

引文資訊

TAIR相關學術產出

Simple Record
Full Record

題名	帶有錯誤分類與測量誤差數據的高維度變數選取與估計 Variable selection and estimation for misclassified responses and high-dimensional error-prone predictors
作者	歐陽沁縈 Ou Yang, Qin-Ying
貢獻者	陳立榜 Chen, Li-Pang 歐陽沁縈 Ou Yang, Qin-Ying
關鍵詞	二元分類資料 boosting 誤差校正測量誤差回歸模型校正 binary data boosting error elimination measurement error regression calibration
日期	2022
上傳時間	1-八月-2022 17:15:25 (UTC+8)
摘要	二元分類一直是統計分析或監督式學習中值得被討論的內容。在建立二元結果與變數的模型選擇上,logistic 與 probit 的模型是較常被使用的。然而,在資料維度遽增以及不可忽視的測量誤差存在測量結果、變數當中,過去的傳統方法已不適用,這為我們在資料分析上帶來了重大的挑戰。為了解決上述的問題,我們提出有效的推論方法處理測量誤差並同時進行變數選取。具體來說,我們首先考慮 logistic 或 probit 的模型,將經過校正的應變數與自變數放入我們的估計函數中。接著,我們透過 boosting 方法去做變數選取並計算參數的估計值。在數值研究當中,我們所提出的方法能夠準確地保留重要變數且能精準地計算出估計參數。此外,經過誤差校正的結果在整體的分析表現上是顯著優於沒有校正的結果。 Binary classification has been an attractive topic in statistical analysis or supervised learning. To model a binary response and predictors, logistic regression models or probit models are perhaps commonly used approaches. However, because of the rapid growth of the dimension of the data as well as the non ignorability of measurement error in responses and/or predictors, data analysis becomes challenging and conventional methods are invalid. To address those concerns, we propose a valid inferential method to deal with measurement error and handle variable selection simultaneously. Specifically, we primarily consider logistic regression models or probit models, and propose corrected estimating functions by incorporating error-eliminated responses and predictors. After that, we develop the boosting procedure with corrected estimating functions accommodated to do variable selection and estimation.Through numerical studies, we find that the proposed method accurately retains informative predictors as well as gives precise estimators, and its performance is generally better than that without measurement error correction.
參考文獻	Brown, B., Miller, C. J., and Wolfson, J. (2017). ThrEEBoost: Thresholded boosting for variable selection and prediction via estimating equations. Journal of Computational and Graphical Statistics, 26, 579-588. Brown, B., Weaver, T., and Wolfson, J. (2019). MEBoost: Variable selection in the presence of measurement error. Statistics in Medicine, 38, 2705-2718. Carroll, R. J., Ruppert, D., Stefanski, L. A., and Crainiceanu, C. M. (2006). Measurement Error in Nonlinear Model. Chapman and Hall. Carroll, R. J., Spiegelman, C. H., Gordon Lan, K. K., Bailey, K. T., and Abbott, R. D. (1984). On errors-in-variables for binary regression models. Biometrika, 71, 19-25. Chen, L.-P. (2020). Variable selection and estimation for the additive hazards model subject to left-truncation, right-censoring and measurement error in covariates. Journal of Statistical Computation and Simulation, 90, 3261-3300. Chen, L.-P. and Yi, G. Y. (2020). Model selection and model averaging for analysis of truncated and censored data with measurement error. Electronic Journal of Statistics, 14, 4054–4109. Chen, L.-P. and Yi, G. Y. (2021a). Analysis of noisy survival data with graphical proportional hazards measurement error models. Biometrics, 77, 956–969. Chen, L.-P. and Yi, G. Y. (2021b). Semiparametric methods for left-truncated and right-censored survival data with covariate measurement error. Annals of the Institute of Statistical Mathematics, 73, 481–517. Chen, L.-P. and Yang, S.-F. (2022). A new p-chart with measurement error correction. arXiv:2203.03384. Hastie, T., Tibshironi, R., and Wainwright, M. (2015). Statistical Learning with Sparsity: The Lasso and Generalization. CRC Press, Boca Raton, FL. Laitinen, E. K., and Laitinen, T. (1997). Misclassification in bankruptcy prediction in Finland: human information processing approach. Accounting, Auditing & Accountability Journal, 11, 216-244. Liang, D., Lu, C. C., Tsai, C. F., and Shih, G. A. (2016). Financial ratios and corporate governance indicators in bankruptcy prediction: A comprehensive study. European Journal of Operational Research, 16, 561-572. Ma, Y. and Li, R. (2010). Variable selection in measurement error models. Bernoulli, 16, 273-300. Marquardt, D. W. and Snee, R. D. (1975). Ridge regression in practice. The American Statistician, 29, 3-20. McGlothlin, A., Stamey, J. D., and Seaman, J. W. (2008). Binary regression with misclassified response and covariate subject to measurement error: a bayesian approach. Biometrika, 50, 123-134. Nanda, S. and Pendharkar, P. (2001). Linear models for minimizing misclassification costs in bankruptcy prediction. International Journal of Intelligent Systems in Accounting, Finance & Management, 10, 155–168. Reeves, G. K., Cox, D. R., Darry, S. C., and Whitley, E. (1998). Some aspects of measurement error in explanatory variables for continuous and binary regression models. Statistics in Medicine, 17, 2157-2177. Roy, S., Banerjee, T., and Maiti, T. (2005). Measurement error model for misclassified binary responses. Statistics in Medicine, 24, 269-283. Schafer, D. W. (1993). Analysis for probit regression with measurement errors. Biometrika, 80, 899-904. Shao, J. (2003). Mathematical Statistics. Springer, New York. Sørensen, Ø., Frigessi, A., and Thoresen, M. (2015). Measurement error in lasso: impact and likelihood bias correction. Statistica Sinica, 25, 809-829. Stefanski, L. A. and Carroll, R. J. (1987). Conditional scores and optimal scores for generalized linear measurement error models. Biometrika, 74, 703-716. Wolfson, J. (2011). EEBOOST: a general method for prediction and variable selection based on estimating equation. Journal of the American Statistical Association, 106, 296-305. Yi, G. Y. (2017). Statistical Analysis With Measurement Error and Misclassication: Strategy, Method and Application. New York: Springer. Zhang, T. and Yu, B. (2005). Boosting with early stopping: convergence and consistency. The Annals of Statistics , 33, 1538-1579. Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B, 67, 301-320.
描述	碩士國立政治大學統計學系 109354014
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0109354014
資料類型	thesis

dc.contributor.advisor	陳立榜	zh_TW
dc.contributor.advisor	Chen, Li-Pang	en_US
dc.contributor.author (作者)	歐陽沁縈	zh_TW
dc.contributor.author (作者)	Ou Yang, Qin-Ying	en_US
dc.creator (作者)	歐陽沁縈	zh_TW
dc.creator (作者)	Ou Yang, Qin-Ying	en_US
dc.date (日期)	2022	en_US
dc.date.accessioned	1-八月-2022 17:15:25 (UTC+8)	-
dc.date.available	1-八月-2022 17:15:25 (UTC+8)	-
dc.date.issued (上傳時間)	1-八月-2022 17:15:25 (UTC+8)	-
dc.identifier (其他識別碼)	G0109354014	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/141006	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	統計學系	zh_TW
dc.description (描述)	109354014	zh_TW
dc.description.abstract (摘要)	二元分類一直是統計分析或監督式學習中值得被討論的內容。在建立二元結果與變數的模型選擇上,logistic 與 probit 的模型是較常被使用的。然而,在資料維度遽增以及不可忽視的測量誤差存在測量結果、變數當中,過去的傳統方法已不適用,這為我們在資料分析上帶來了重大的挑戰。為了解決上述的問題,我們提出有效的推論方法處理測量誤差並同時進行變數選取。具體來說,我們首先考慮 logistic 或 probit 的模型,將經過校正的應變數與自變數放入我們的估計函數中。接著,我們透過 boosting 方法去做變數選取並計算參數的估計值。在數值研究當中,我們所提出的方法能夠準確地保留重要變數且能精準地計算出估計參數。此外,經過誤差校正的結果在整體的分析表現上是顯著優於沒有校正的結果。	zh_TW
dc.description.abstract (摘要)	Binary classification has been an attractive topic in statistical analysis or supervised learning. To model a binary response and predictors, logistic regression models or probit models are perhaps commonly used approaches. However, because of the rapid growth of the dimension of the data as well as the non ignorability of measurement error in responses and/or predictors, data analysis becomes challenging and conventional methods are invalid. To address those concerns, we propose a valid inferential method to deal with measurement error and handle variable selection simultaneously. Specifically, we primarily consider logistic regression models or probit models, and propose corrected estimating functions by incorporating error-eliminated responses and predictors. After that, we develop the boosting procedure with corrected estimating functions accommodated to do variable selection and estimation.Through numerical studies, we find that the proposed method accurately retains informative predictors as well as gives precise estimators, and its performance is generally better than that without measurement error correction.	en_US
dc.description.tableofcontents	Chapter 1 Introduction 1 Chapter 2 Notation and Models 3 2.1 Data Structure 3 2.2 Measurement Error Models 4 Chapter 3 Methodology 6 3.1 Correction of Measurement Error Effects 6 3.2 Variable Selection via Boosting 8 3.3 Extension: Collinearity 11 Chapter 4 Estimation with Validation Data 11 4.1 BOOME via External Validation 12 4.2 BOOME via Internal Validation 13 Chapter 5 Python Package: BOOME 14 5.1 ME_Generate 14 5.2 LR_Boost 15 5.3 PM_Boost 16 Chapter 6 Numerical Studies 16 6.1 Simulation Setup 16 6.2 Simulation Results 17 6.3 Simulation Results based on Validation Data 19 6.4 Analysis of Bankruptcy Data 19 Chapter 7 Summary 21 References 23 Appendix 26 A.1 Proof of Theorem 3.1 26 A.2 Proof of Theorem 3.2 28 A.3 Proof of Theorem 3.3 29	zh_TW
dc.format.extent	1511038 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0109354014	en_US
dc.subject (關鍵詞)	二元分類資料	zh_TW
dc.subject (關鍵詞)	boosting	zh_TW
dc.subject (關鍵詞)	誤差校正	zh_TW
dc.subject (關鍵詞)	測量誤差	zh_TW
dc.subject (關鍵詞)	回歸模型校正	zh_TW
dc.subject (關鍵詞)	binary data	en_US
dc.subject (關鍵詞)	boosting	en_US
dc.subject (關鍵詞)	error elimination	en_US
dc.subject (關鍵詞)	measurement error	en_US
dc.subject (關鍵詞)	regression calibration	en_US
dc.title (題名)	帶有錯誤分類與測量誤差數據的高維度變數選取與估計	zh_TW
dc.title (題名)	Variable selection and estimation for misclassified responses and high-dimensional error-prone predictors	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	Brown, B., Miller, C. J., and Wolfson, J. (2017). ThrEEBoost: Thresholded boosting for variable selection and prediction via estimating equations. Journal of Computational and Graphical Statistics, 26, 579-588. Brown, B., Weaver, T., and Wolfson, J. (2019). MEBoost: Variable selection in the presence of measurement error. Statistics in Medicine, 38, 2705-2718. Carroll, R. J., Ruppert, D., Stefanski, L. A., and Crainiceanu, C. M. (2006). Measurement Error in Nonlinear Model. Chapman and Hall. Carroll, R. J., Spiegelman, C. H., Gordon Lan, K. K., Bailey, K. T., and Abbott, R. D. (1984). On errors-in-variables for binary regression models. Biometrika, 71, 19-25. Chen, L.-P. (2020). Variable selection and estimation for the additive hazards model subject to left-truncation, right-censoring and measurement error in covariates. Journal of Statistical Computation and Simulation, 90, 3261-3300. Chen, L.-P. and Yi, G. Y. (2020). Model selection and model averaging for analysis of truncated and censored data with measurement error. Electronic Journal of Statistics, 14, 4054–4109. Chen, L.-P. and Yi, G. Y. (2021a). Analysis of noisy survival data with graphical proportional hazards measurement error models. Biometrics, 77, 956–969. Chen, L.-P. and Yi, G. Y. (2021b). Semiparametric methods for left-truncated and right-censored survival data with covariate measurement error. Annals of the Institute of Statistical Mathematics, 73, 481–517. Chen, L.-P. and Yang, S.-F. (2022). A new p-chart with measurement error correction. arXiv:2203.03384. Hastie, T., Tibshironi, R., and Wainwright, M. (2015). Statistical Learning with Sparsity: The Lasso and Generalization. CRC Press, Boca Raton, FL. Laitinen, E. K., and Laitinen, T. (1997). Misclassification in bankruptcy prediction in Finland: human information processing approach. Accounting, Auditing & Accountability Journal, 11, 216-244. Liang, D., Lu, C. C., Tsai, C. F., and Shih, G. A. (2016). Financial ratios and corporate governance indicators in bankruptcy prediction: A comprehensive study. European Journal of Operational Research, 16, 561-572. Ma, Y. and Li, R. (2010). Variable selection in measurement error models. Bernoulli, 16, 273-300. Marquardt, D. W. and Snee, R. D. (1975). Ridge regression in practice. The American Statistician, 29, 3-20. McGlothlin, A., Stamey, J. D., and Seaman, J. W. (2008). Binary regression with misclassified response and covariate subject to measurement error: a bayesian approach. Biometrika, 50, 123-134. Nanda, S. and Pendharkar, P. (2001). Linear models for minimizing misclassification costs in bankruptcy prediction. International Journal of Intelligent Systems in Accounting, Finance & Management, 10, 155–168. Reeves, G. K., Cox, D. R., Darry, S. C., and Whitley, E. (1998). Some aspects of measurement error in explanatory variables for continuous and binary regression models. Statistics in Medicine, 17, 2157-2177. Roy, S., Banerjee, T., and Maiti, T. (2005). Measurement error model for misclassified binary responses. Statistics in Medicine, 24, 269-283. Schafer, D. W. (1993). Analysis for probit regression with measurement errors. Biometrika, 80, 899-904. Shao, J. (2003). Mathematical Statistics. Springer, New York. Sørensen, Ø., Frigessi, A., and Thoresen, M. (2015). Measurement error in lasso: impact and likelihood bias correction. Statistica Sinica, 25, 809-829. Stefanski, L. A. and Carroll, R. J. (1987). Conditional scores and optimal scores for generalized linear measurement error models. Biometrika, 74, 703-716. Wolfson, J. (2011). EEBOOST: a general method for prediction and variable selection based on estimating equation. Journal of the American Statistical Association, 106, 296-305. Yi, G. Y. (2017). Statistical Analysis With Measurement Error and Misclassication: Strategy, Method and Application. New York: Springer. Zhang, T. and Yu, B. (2005). Boosting with early stopping: convergence and consistency. The Annals of Statistics , 33, 1538-1579. Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B, 67, 301-320.	zh_TW
dc.identifier.doi (DOI)	10.6814/NCCU202200889	en_US

學術產出-學位論文

文章檢視/開啟

書目匯出

Google ScholarTM

政大圖書館

引文資訊

TAIR相關學術產出

Google Scholar^TM