學術產出-學位論文
文章檢視/開啟
書目匯出
-
題名 高維度測量誤差變數與錯誤分類處理的因果推論
Causal Inference with High-Dimensional Error-Prone Covariates and Misclassified Treatment作者 徐偉鑫
Hsu, Wei-Hsin貢獻者 陳立榜
Chen, Li-Pang
徐偉鑫
Hsu, Wei-Hsin關鍵詞 平均處理效應
因果推論
共線性
特徵篩選
誤差校正
超高維 度的協變量
ATE
Causal Inference
Collinearity
Feature Screening
Measurement Error Correction
Ultrahigh-Dimension日期 2023 上傳時間 2-八月-2023 13:05:44 (UTC+8) 摘要 在因果推論中,平均處理效應(ATE)通常用於衡量在因果推論中不同的「處理」對感興趣「結果」之間的關係,基於傾向評分的逆概率加權方法來估計ATE是常用的方法。然而在應用中,超高維度的特徵在數據集中普遍存在測量誤差。如果忽略這些特徵可能最終導致ATE的不可靠估計。在本論文中,我們主要考慮一個可能存在測量誤差的數據集,特徵和處理可能受到測量誤差的影響,而結果可能遵循指數分佈並與特徵變量呈非線性關係。為了應對這些挑戰並得出ATE的精確估計,我們開發了FATE方法,即特徵篩選(Feature screening)、自適應套索(Adaptive lasso)、處理調整(Treatment adjustment)和特徵變量誤差校正(Error correction for covariates)。在特徵篩選過程前先消除數據中誤差,並且我們的方法是可以套用在指數分佈的結果。此外,只要修正了錯誤分類的處理和特徵變量測量誤差,我們可以得出可靠傾向評分估計同時考慮了共線性,從而得出具有測量誤差校正的ATE估計值。最後通過數值研究,我們發現所提出的FATE方法具有滿意的估計效能,以及優於其競爭方法。
In causal inference, the average treatment effect (ATE) is usually used to measure the causal effect of a treatment on the outcome of interest. The inverse probability weight method based on the propensity score is a commonly used strategy to estimate ATE. However, in applications, ultrahigh-dimensional covariates and measurement error are ubiquitous in datasets. Ignoring those features may eventually induce unreliable estimator of ATE. In this thesis, we primarily consider a dataset where covariates and treatments are possibly subject to measurement error, and potential outcomes follow exponential distributions and have nonlinear relationship with covariates. To tackle these challenges and derive precise estimator of ATE, we develop the FATE method, referring to Feature screening, Adaptive lasso, Treatment adjustment, and Error correction for covariates. Our feature screening procedure is based on error-eliminated data, and is valid to handle exponentially distributed outcomes. In addition, provided that misclassified treatment and measurement error in covariates are corrected, we derive the reliable estimator of propensity score with collinearity taken into account, and thus, the estimator of ATE with measurement error correction is derived. Throughout numerical studies, we find that the proposed FATE method has satisfactory performance and is better than its competitive methods.參考文獻 Acerenza, S., Ban, K., and K ́edagni, D. (2022). Marginal treatment effects with misclassified treatment. arXiv:2105.00358Bald ́e, I., Yang, Y.A., and Lefebvre, G. (2022). Reader reaction to “Outcome-adaptive lasso: Variable selection for causal inference” by Shortreed and Ertefaie (2017). Biometrics, 00, 1-7. https://doi.org/10.1111/biom.13683Bang, H. and Robins, J.M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics, 61, 962-973.Battistin, E. and Sianesi, B. (2011). Misclassified treatment status and treatment effects: an application to returns to education in the united kingdom. The Review of Economics and Statistics, 93, 495-509.Braun, D., Gorfine, M., Parmigiani, G., Arvold, N.D., Dominici, F., and Zigler, C. (2017). Propensity scores with misclassified treatment assignment: a likelihood-based adjustment. Biostatistics, 18, 695-710.Chatterjee, S. (2021). A new coefficient of correlation. Journal of the American Statistical Association, 116, 2009-2022.Chen, L.-P. (2020). Causal inference for left-truncated and right-censored data with covariates measurement error. Computational & Applied Mathematics, 39:126. DOI: 10.1007/s40314-020-01152-4.Chen, L.-P. and Yi, G.Y. (2021a). Semiparametric methods for left-truncated and right-censored survival data with covariate measurement error. Annals of the Institute of Statistical Mathematics, 73, 481-517.Chen, L.-P. and Yi, G.Y. (2021b). Analysis of noisy survival data with graphical proportional hazards measurement error models. Biometrics, 77, 956-969.Chen, L.-P. (2023). A note of feature screening via rank-based coefficient of correlation. Biometrics Journal, To appear. DOI: 10.1002/bimj.202100373.Ertefaie, A., Asgharian, M., and Stephens, D.A. (2018). Variable selection in causal inference using a simultaneous penalization method. Journal Causal Inference, 6: 20170010, 1-16.Ghosh, D., Zhu, Y., and Coffman, D.L. (2015). Penalized regression procedures for variable selection in the potential outcomes framework. Statistics in Medicine, 34, 1645-1658.Hern ́an, M.A. and Robins, J.M. (2020). Causal Inference: What If. Chapman & Hall/CRC, Boca Raton. Link: https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/Koch, B., Vock, D.M., and Wolfson, J. (2020). Variable selection and estimation in causal inference using Bayesian spike and slab priors. Statistical Methods in Medical Research, 29, 2445-2469.Kyle, R.P., Moodie, E.E.M., and Klein, M.B. (2016). Correcting for measurement error in time-varying covariates in marginal structural models. American Journal Epidemiology, 184, 249-258.Lewbel, A. (2007). Endogenous selection or treatment model estimation.Journal of Econometrics, 141, 777-806.Lunceford, J.K. and Davidian, M. (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects. a comparative study. Statistics in Medicine, 23, 2937-2960.McCaffrey, D.F., Lockwood, J.R., and Setodji, C.M. (2013). Inverse probability weighting with error-prone covariates. Biometrika, 100, 671-680.Negi, A. and Negi, D.S. (2022). Difference-in-differences with a misclassified treatment. arXiv:2208.02412Pearl, J. (2000). Causality. Cambridge University Press, Cambridge.Plackett, R.L. (1953). The truncated Poisson distribution. Biometrics, 9, 485-488.Rosenbaum, P. (2012). Design of Observational Studies. Springer, New York.Rosenbaum, P. and Rubin, D. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41-55.Ross, R. K., Su, I.-H., Webster-Clark, M., and Funk, M. J. (2022). Nondifferential treatment misclassification biases toward the null? Not a safe bet for active comparator studies. American Journal of Epidemiology, 191, 1917-1925.Saldana, D.F. and Feng, Y. (2018). SIS: An R Package for Sure Independence Screening in Ultrahigh-Dimensional Statistical Models. Journal of Statistical Software, 83(2), 1–25.Shortreed, S.M. and Ertefaie, A. (2017). Outcome-adaptive lasso: variable selection forcausal inference. Biometrics, 73(4), 1111-1122.Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P., Lander, E.S., Loda, M., Kantoff, P.W., Golub, T.R., and Sellers, W.R. (2002). Gene expression correlates of clinical prostate cancer behavior Cancer Cell, 1(2), 203-209.Shu, D. and Yi, G.Y. (2019a). Causal inference with measurement error in outcomes: Bias analysis and estimation methods. Statistical Methods in Medical Research, 28, 2049-2068.Shu, D. and Yi, G.Y. (2019b). Inverse-probability-of-treatment weighted estimation of causal parameters in the presence of error-contaminated and time-dependent confounders. Biometrical Journal, 61, 1507-1525.Tang, D., Kong, D., Pan, and W.,Wang, L. (2022). Ultra-high dimensional variable selection for doubly robust causal inference. Biometrics, 1-12. https://doi.org/10.1111/biom.13625Van Der Laan, M.J. and Robins, J.M. (2003). Unified Methods for Censored Longitudinal Data and Causality. Spring-Verlag, New York.Vansteelandt, S., Bekaert, M., and Claeskens, G. (2010). On model selection and model misspecification in causal inference. Statistical Methods in Medical Research, 21, 7-30.Yanagi, T. (2019). Inference on local average treatment effects for misclassified treatment. Econometric Reviews, 38, 938-960.Yi, G.Y. (2017). Statistical Analysis with Measurement Error and Misclassication: Strategy, Method and Application. New York: Springer.Yi, G.Y. and Chen, L.-P. (2023). Estimation of the average treatment effect with variable selection and measurement error simultaneously addressed for potential confounders. Statistical Methods in Medical Research, 32, 691-711.Zou, H., and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B, 67, 301-320. 描述 碩士
國立政治大學
統計學系
110354032資料來源 http://thesis.lib.nccu.edu.tw/record/#G0110354032 資料類型 thesis dc.contributor.advisor 陳立榜 zh_TW dc.contributor.advisor Chen, Li-Pang en_US dc.contributor.author (作者) 徐偉鑫 zh_TW dc.contributor.author (作者) Hsu, Wei-Hsin en_US dc.creator (作者) 徐偉鑫 zh_TW dc.creator (作者) Hsu, Wei-Hsin en_US dc.date (日期) 2023 en_US dc.date.accessioned 2-八月-2023 13:05:44 (UTC+8) - dc.date.available 2-八月-2023 13:05:44 (UTC+8) - dc.date.issued (上傳時間) 2-八月-2023 13:05:44 (UTC+8) - dc.identifier (其他 識別碼) G0110354032 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/146313 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 統計學系 zh_TW dc.description (描述) 110354032 zh_TW dc.description.abstract (摘要) 在因果推論中,平均處理效應(ATE)通常用於衡量在因果推論中不同的「處理」對感興趣「結果」之間的關係,基於傾向評分的逆概率加權方法來估計ATE是常用的方法。然而在應用中,超高維度的特徵在數據集中普遍存在測量誤差。如果忽略這些特徵可能最終導致ATE的不可靠估計。在本論文中,我們主要考慮一個可能存在測量誤差的數據集,特徵和處理可能受到測量誤差的影響,而結果可能遵循指數分佈並與特徵變量呈非線性關係。為了應對這些挑戰並得出ATE的精確估計,我們開發了FATE方法,即特徵篩選(Feature screening)、自適應套索(Adaptive lasso)、處理調整(Treatment adjustment)和特徵變量誤差校正(Error correction for covariates)。在特徵篩選過程前先消除數據中誤差,並且我們的方法是可以套用在指數分佈的結果。此外,只要修正了錯誤分類的處理和特徵變量測量誤差,我們可以得出可靠傾向評分估計同時考慮了共線性,從而得出具有測量誤差校正的ATE估計值。最後通過數值研究,我們發現所提出的FATE方法具有滿意的估計效能,以及優於其競爭方法。 zh_TW dc.description.abstract (摘要) In causal inference, the average treatment effect (ATE) is usually used to measure the causal effect of a treatment on the outcome of interest. The inverse probability weight method based on the propensity score is a commonly used strategy to estimate ATE. However, in applications, ultrahigh-dimensional covariates and measurement error are ubiquitous in datasets. Ignoring those features may eventually induce unreliable estimator of ATE. In this thesis, we primarily consider a dataset where covariates and treatments are possibly subject to measurement error, and potential outcomes follow exponential distributions and have nonlinear relationship with covariates. To tackle these challenges and derive precise estimator of ATE, we develop the FATE method, referring to Feature screening, Adaptive lasso, Treatment adjustment, and Error correction for covariates. Our feature screening procedure is based on error-eliminated data, and is valid to handle exponentially distributed outcomes. In addition, provided that misclassified treatment and measurement error in covariates are corrected, we derive the reliable estimator of propensity score with collinearity taken into account, and thus, the estimator of ATE with measurement error correction is derived. Throughout numerical studies, we find that the proposed FATE method has satisfactory performance and is better than its competitive methods. en_US dc.description.tableofcontents Table of ContentsAbstract IVTable of Contents VTables VIChapter 1 Introduction 1Chapter 2 Notation and Models 42.1 Data and Causal Inference 42.2 Measurement Error Models 5Chapter 3 Methodology 73.1 Feature Screening for Ultrahigh Dimensional Covariates 83.2 De-Noised Estimation of Propensity Score 93.3 De-Noised Estimation of ATE 12Algorithm 1: FATE 13Chapter 4 Chemist: An R package Implication 144.1 Data_Gen 144.2 FATE 17Chapter 5 Numerical Studies 185.1 Simulation Setup 185.2 Simulation Results 205.3 Real Data Analysis 21Chapter 6 Summary 22Reference 24 zh_TW dc.format.extent 644846 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0110354032 en_US dc.subject (關鍵詞) 平均處理效應 zh_TW dc.subject (關鍵詞) 因果推論 zh_TW dc.subject (關鍵詞) 共線性 zh_TW dc.subject (關鍵詞) 特徵篩選 zh_TW dc.subject (關鍵詞) 誤差校正 zh_TW dc.subject (關鍵詞) 超高維 度的協變量 zh_TW dc.subject (關鍵詞) ATE en_US dc.subject (關鍵詞) Causal Inference en_US dc.subject (關鍵詞) Collinearity en_US dc.subject (關鍵詞) Feature Screening en_US dc.subject (關鍵詞) Measurement Error Correction en_US dc.subject (關鍵詞) Ultrahigh-Dimension en_US dc.title (題名) 高維度測量誤差變數與錯誤分類處理的因果推論 zh_TW dc.title (題名) Causal Inference with High-Dimensional Error-Prone Covariates and Misclassified Treatment en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) Acerenza, S., Ban, K., and K ́edagni, D. (2022). Marginal treatment effects with misclassified treatment. arXiv:2105.00358Bald ́e, I., Yang, Y.A., and Lefebvre, G. (2022). Reader reaction to “Outcome-adaptive lasso: Variable selection for causal inference” by Shortreed and Ertefaie (2017). Biometrics, 00, 1-7. https://doi.org/10.1111/biom.13683Bang, H. and Robins, J.M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics, 61, 962-973.Battistin, E. and Sianesi, B. (2011). Misclassified treatment status and treatment effects: an application to returns to education in the united kingdom. The Review of Economics and Statistics, 93, 495-509.Braun, D., Gorfine, M., Parmigiani, G., Arvold, N.D., Dominici, F., and Zigler, C. (2017). Propensity scores with misclassified treatment assignment: a likelihood-based adjustment. Biostatistics, 18, 695-710.Chatterjee, S. (2021). A new coefficient of correlation. Journal of the American Statistical Association, 116, 2009-2022.Chen, L.-P. (2020). Causal inference for left-truncated and right-censored data with covariates measurement error. Computational & Applied Mathematics, 39:126. DOI: 10.1007/s40314-020-01152-4.Chen, L.-P. and Yi, G.Y. (2021a). Semiparametric methods for left-truncated and right-censored survival data with covariate measurement error. Annals of the Institute of Statistical Mathematics, 73, 481-517.Chen, L.-P. and Yi, G.Y. (2021b). Analysis of noisy survival data with graphical proportional hazards measurement error models. Biometrics, 77, 956-969.Chen, L.-P. (2023). A note of feature screening via rank-based coefficient of correlation. Biometrics Journal, To appear. DOI: 10.1002/bimj.202100373.Ertefaie, A., Asgharian, M., and Stephens, D.A. (2018). Variable selection in causal inference using a simultaneous penalization method. Journal Causal Inference, 6: 20170010, 1-16.Ghosh, D., Zhu, Y., and Coffman, D.L. (2015). Penalized regression procedures for variable selection in the potential outcomes framework. Statistics in Medicine, 34, 1645-1658.Hern ́an, M.A. and Robins, J.M. (2020). Causal Inference: What If. Chapman & Hall/CRC, Boca Raton. Link: https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/Koch, B., Vock, D.M., and Wolfson, J. (2020). Variable selection and estimation in causal inference using Bayesian spike and slab priors. Statistical Methods in Medical Research, 29, 2445-2469.Kyle, R.P., Moodie, E.E.M., and Klein, M.B. (2016). Correcting for measurement error in time-varying covariates in marginal structural models. American Journal Epidemiology, 184, 249-258.Lewbel, A. (2007). Endogenous selection or treatment model estimation.Journal of Econometrics, 141, 777-806.Lunceford, J.K. and Davidian, M. (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects. a comparative study. Statistics in Medicine, 23, 2937-2960.McCaffrey, D.F., Lockwood, J.R., and Setodji, C.M. (2013). Inverse probability weighting with error-prone covariates. Biometrika, 100, 671-680.Negi, A. and Negi, D.S. (2022). Difference-in-differences with a misclassified treatment. arXiv:2208.02412Pearl, J. (2000). Causality. Cambridge University Press, Cambridge.Plackett, R.L. (1953). The truncated Poisson distribution. Biometrics, 9, 485-488.Rosenbaum, P. (2012). Design of Observational Studies. Springer, New York.Rosenbaum, P. and Rubin, D. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41-55.Ross, R. K., Su, I.-H., Webster-Clark, M., and Funk, M. J. (2022). Nondifferential treatment misclassification biases toward the null? Not a safe bet for active comparator studies. American Journal of Epidemiology, 191, 1917-1925.Saldana, D.F. and Feng, Y. (2018). SIS: An R Package for Sure Independence Screening in Ultrahigh-Dimensional Statistical Models. Journal of Statistical Software, 83(2), 1–25.Shortreed, S.M. and Ertefaie, A. (2017). Outcome-adaptive lasso: variable selection forcausal inference. Biometrics, 73(4), 1111-1122.Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P., Lander, E.S., Loda, M., Kantoff, P.W., Golub, T.R., and Sellers, W.R. (2002). Gene expression correlates of clinical prostate cancer behavior Cancer Cell, 1(2), 203-209.Shu, D. and Yi, G.Y. (2019a). Causal inference with measurement error in outcomes: Bias analysis and estimation methods. Statistical Methods in Medical Research, 28, 2049-2068.Shu, D. and Yi, G.Y. (2019b). Inverse-probability-of-treatment weighted estimation of causal parameters in the presence of error-contaminated and time-dependent confounders. Biometrical Journal, 61, 1507-1525.Tang, D., Kong, D., Pan, and W.,Wang, L. (2022). Ultra-high dimensional variable selection for doubly robust causal inference. Biometrics, 1-12. https://doi.org/10.1111/biom.13625Van Der Laan, M.J. and Robins, J.M. (2003). Unified Methods for Censored Longitudinal Data and Causality. Spring-Verlag, New York.Vansteelandt, S., Bekaert, M., and Claeskens, G. (2010). On model selection and model misspecification in causal inference. Statistical Methods in Medical Research, 21, 7-30.Yanagi, T. (2019). Inference on local average treatment effects for misclassified treatment. Econometric Reviews, 38, 938-960.Yi, G.Y. (2017). Statistical Analysis with Measurement Error and Misclassication: Strategy, Method and Application. New York: Springer.Yi, G.Y. and Chen, L.-P. (2023). Estimation of the average treatment effect with variable selection and measurement error simultaneously addressed for potential confounders. Statistical Methods in Medical Research, 32, 691-711.Zou, H., and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B, 67, 301-320. zh_TW