學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 高維度測量誤差變數與錯誤分類處理的因果推論
Causal Inference with High-Dimensional Error-Prone Covariates and Misclassified Treatment
作者 徐偉鑫
Hsu, Wei-Hsin
貢獻者 陳立榜
Chen, Li-Pang
徐偉鑫
Hsu, Wei-Hsin
關鍵詞 平均處理效應
因果推論
共線性
特徵篩選
誤差校正
超高維 度的協變量
ATE
Causal Inference
Collinearity
Feature Screening
Measurement Error Correction
Ultrahigh-Dimension
日期 2023
上傳時間 2-Aug-2023 13:05:44 (UTC+8)
摘要 在因果推論中,平均處理效應(ATE)通常用於衡量在因果推論中不同的「處理」對感興趣「結果」之間的關係,基於傾向評分的逆概率加權方法來估計ATE是常用的方法。然而在應用中,超高維度的特徵在數據集中普遍存在測量誤差。如果忽略這些特徵可能最終導致ATE的不可靠估計。在本論文中,我們主要考慮一個可能存在測量誤差的數據集,特徵和處理可能受到測量誤差的影響,而結果可能遵循指數分佈並與特徵變量呈非線性關係。為了應對這些挑戰並得出ATE的精確估計,我們開發了FATE方法,即特徵篩選(Feature screening)、自適應套索(Adaptive lasso)、處理調整(Treatment adjustment)和特徵變量誤差校正(Error correction for covariates)。在特徵篩選過程前先消除數據中誤差,並且我們的方法是可以套用在指數分佈的結果。此外,只要修正了錯誤分類的處理和特徵變量測量誤差,我們可以得出可靠傾向評分估計同時考慮了共線性,從而得出具有測量誤差校正的ATE估計值。最後通過數值研究,我們發現所提出的FATE方法具有滿意的估計效能,以及優於其競爭方法。
In causal inference, the average treatment effect (ATE) is usually used to measure the causal effect of a treatment on the outcome of interest. The inverse probability weight method based on the propensity score is a commonly used strategy to estimate ATE. However, in applications, ultrahigh-dimensional covariates and measurement error are ubiquitous in datasets. Ignoring those features may eventually induce unreliable estimator of ATE. In this thesis, we primarily consider a dataset where covariates and treatments are possibly subject to measurement error, and potential outcomes follow exponential distributions and have nonlinear relationship with covariates. To tackle these challenges and derive precise estimator of ATE, we develop the FATE method, referring to Feature screening, Adaptive lasso, Treatment adjustment, and Error correction for covariates. Our feature screening procedure is based on error-eliminated data, and is valid to handle exponentially distributed outcomes. In addition, provided that misclassified treatment and measurement error in covariates are corrected, we derive the reliable estimator of propensity score with collinearity taken into account, and thus, the estimator of ATE with measurement error correction is derived. Throughout numerical studies, we find that the proposed FATE method has satisfactory performance and is better than its competitive methods.
參考文獻 Acerenza, S., Ban, K., and K ́edagni, D. (2022). Marginal treatment effects with misclassified treatment. arXiv:2105.00358

Bald ́e, I., Yang, Y.A., and Lefebvre, G. (2022). Reader reaction to “Outcome-adaptive lasso: Variable selection for causal inference” by Shortreed and Ertefaie (2017). Biometrics, 00, 1-7. https://doi.org/10.1111/biom.13683

Bang, H. and Robins, J.M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics, 61, 962-973.

Battistin, E. and Sianesi, B. (2011). Misclassified treatment status and treatment effects: an application to returns to education in the united kingdom. The Review of Economics and Statistics, 93, 495-509.

Braun, D., Gorfine, M., Parmigiani, G., Arvold, N.D., Dominici, F., and Zigler, C. (2017). Propensity scores with misclassified treatment assignment: a likelihood-based adjustment. Biostatistics, 18, 695-710.

Chatterjee, S. (2021). A new coefficient of correlation. Journal of the American Statistical Association, 116, 2009-2022.

Chen, L.-P. (2020). Causal inference for left-truncated and right-censored data with covariates measurement error. Computational & Applied Mathematics, 39:126. DOI: 10.1007/s40314-020-01152-4.

Chen, L.-P. and Yi, G.Y. (2021a). Semiparametric methods for left-truncated and right-censored survival data with covariate measurement error. Annals of the Institute of Statistical Mathematics, 73, 481-517.

Chen, L.-P. and Yi, G.Y. (2021b). Analysis of noisy survival data with graphical proportional hazards measurement error models. Biometrics, 77, 956-969.

Chen, L.-P. (2023). A note of feature screening via rank-based coefficient of correlation. Biometrics Journal, To appear. DOI: 10.1002/bimj.202100373.

Ertefaie, A., Asgharian, M., and Stephens, D.A. (2018). Variable selection in causal inference using a simultaneous penalization method. Journal Causal Inference, 6: 20170010, 1-16.

Ghosh, D., Zhu, Y., and Coffman, D.L. (2015). Penalized regression procedures for variable selection in the potential outcomes framework. Statistics in Medicine, 34, 1645-1658.

Hern ́an, M.A. and Robins, J.M. (2020). Causal Inference: What If. Chapman & Hall/CRC, Boca Raton. Link: https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/

Koch, B., Vock, D.M., and Wolfson, J. (2020). Variable selection and estimation in causal inference using Bayesian spike and slab priors. Statistical Methods in Medical Research, 29, 2445-2469.

Kyle, R.P., Moodie, E.E.M., and Klein, M.B. (2016). Correcting for measurement error in time-varying covariates in marginal structural models. American Journal Epidemiology, 184, 249-258.

Lewbel, A. (2007). Endogenous selection or treatment model estimation.Journal of Econometrics, 141, 777-806.

Lunceford, J.K. and Davidian, M. (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects. a comparative study. Statistics in Medicine, 23, 2937-2960.

McCaffrey, D.F., Lockwood, J.R., and Setodji, C.M. (2013). Inverse probability weighting with error-prone covariates. Biometrika, 100, 671-680.

Negi, A. and Negi, D.S. (2022). Difference-in-differences with a misclassified treatment. arXiv:2208.02412

Pearl, J. (2000). Causality. Cambridge University Press, Cambridge.

Plackett, R.L. (1953). The truncated Poisson distribution. Biometrics, 9, 485-488.

Rosenbaum, P. (2012). Design of Observational Studies. Springer, New York.

Rosenbaum, P. and Rubin, D. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41-55.

Ross, R. K., Su, I.-H., Webster-Clark, M., and Funk, M. J. (2022). Nondifferential treatment misclassification biases toward the null? Not a safe bet for active comparator studies. American Journal of Epidemiology, 191, 1917-1925.

Saldana, D.F. and Feng, Y. (2018). SIS: An R Package for Sure Independence Screening in Ultrahigh-Dimensional Statistical Models. Journal of Statistical Software, 83(2), 1–25.

Shortreed, S.M. and Ertefaie, A. (2017). Outcome-adaptive lasso: variable selection for
causal inference. Biometrics, 73(4), 1111-1122.

Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P., Lander, E.S., Loda, M., Kantoff, P.W., Golub, T.R., and Sellers, W.R. (2002). Gene expression correlates of clinical prostate cancer behavior Cancer Cell, 1(2), 203-209.

Shu, D. and Yi, G.Y. (2019a). Causal inference with measurement error in outcomes: Bias analysis and estimation methods. Statistical Methods in Medical Research, 28, 2049-2068.

Shu, D. and Yi, G.Y. (2019b). Inverse-probability-of-treatment weighted estimation of causal parameters in the presence of error-contaminated and time-dependent confounders. Biometrical Journal, 61, 1507-1525.

Tang, D., Kong, D., Pan, and W.,Wang, L. (2022). Ultra-high dimensional variable selection for doubly robust causal inference. Biometrics, 1-12. https://doi.org/10.1111/biom.13625

Van Der Laan, M.J. and Robins, J.M. (2003). Unified Methods for Censored Longitudinal Data and Causality. Spring-Verlag, New York.

Vansteelandt, S., Bekaert, M., and Claeskens, G. (2010). On model selection and model misspecification in causal inference. Statistical Methods in Medical Research, 21, 7-30.

Yanagi, T. (2019). Inference on local average treatment effects for misclassified treatment. Econometric Reviews, 38, 938-960.

Yi, G.Y. (2017). Statistical Analysis with Measurement Error and Misclassication: Strategy, Method and Application. New York: Springer.

Yi, G.Y. and Chen, L.-P. (2023). Estimation of the average treatment effect with variable selection and measurement error simultaneously addressed for potential confounders. Statistical Methods in Medical Research, 32, 691-711.

Zou, H., and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B, 67, 301-320.
描述 碩士
國立政治大學
統計學系
110354032
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0110354032
資料類型 thesis
dc.contributor.advisor 陳立榜zh_TW
dc.contributor.advisor Chen, Li-Pangen_US
dc.contributor.author (Authors) 徐偉鑫zh_TW
dc.contributor.author (Authors) Hsu, Wei-Hsinen_US
dc.creator (作者) 徐偉鑫zh_TW
dc.creator (作者) Hsu, Wei-Hsinen_US
dc.date (日期) 2023en_US
dc.date.accessioned 2-Aug-2023 13:05:44 (UTC+8)-
dc.date.available 2-Aug-2023 13:05:44 (UTC+8)-
dc.date.issued (上傳時間) 2-Aug-2023 13:05:44 (UTC+8)-
dc.identifier (Other Identifiers) G0110354032en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/146313-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 統計學系zh_TW
dc.description (描述) 110354032zh_TW
dc.description.abstract (摘要) 在因果推論中,平均處理效應(ATE)通常用於衡量在因果推論中不同的「處理」對感興趣「結果」之間的關係,基於傾向評分的逆概率加權方法來估計ATE是常用的方法。然而在應用中,超高維度的特徵在數據集中普遍存在測量誤差。如果忽略這些特徵可能最終導致ATE的不可靠估計。在本論文中,我們主要考慮一個可能存在測量誤差的數據集,特徵和處理可能受到測量誤差的影響,而結果可能遵循指數分佈並與特徵變量呈非線性關係。為了應對這些挑戰並得出ATE的精確估計,我們開發了FATE方法,即特徵篩選(Feature screening)、自適應套索(Adaptive lasso)、處理調整(Treatment adjustment)和特徵變量誤差校正(Error correction for covariates)。在特徵篩選過程前先消除數據中誤差,並且我們的方法是可以套用在指數分佈的結果。此外,只要修正了錯誤分類的處理和特徵變量測量誤差,我們可以得出可靠傾向評分估計同時考慮了共線性,從而得出具有測量誤差校正的ATE估計值。最後通過數值研究,我們發現所提出的FATE方法具有滿意的估計效能,以及優於其競爭方法。zh_TW
dc.description.abstract (摘要) In causal inference, the average treatment effect (ATE) is usually used to measure the causal effect of a treatment on the outcome of interest. The inverse probability weight method based on the propensity score is a commonly used strategy to estimate ATE. However, in applications, ultrahigh-dimensional covariates and measurement error are ubiquitous in datasets. Ignoring those features may eventually induce unreliable estimator of ATE. In this thesis, we primarily consider a dataset where covariates and treatments are possibly subject to measurement error, and potential outcomes follow exponential distributions and have nonlinear relationship with covariates. To tackle these challenges and derive precise estimator of ATE, we develop the FATE method, referring to Feature screening, Adaptive lasso, Treatment adjustment, and Error correction for covariates. Our feature screening procedure is based on error-eliminated data, and is valid to handle exponentially distributed outcomes. In addition, provided that misclassified treatment and measurement error in covariates are corrected, we derive the reliable estimator of propensity score with collinearity taken into account, and thus, the estimator of ATE with measurement error correction is derived. Throughout numerical studies, we find that the proposed FATE method has satisfactory performance and is better than its competitive methods.en_US
dc.description.tableofcontents Table of Contents
Abstract IV
Table of Contents V
Tables VI
Chapter 1 Introduction 1
Chapter 2 Notation and Models 4
2.1 Data and Causal Inference 4
2.2 Measurement Error Models 5
Chapter 3 Methodology 7
3.1 Feature Screening for Ultrahigh Dimensional Covariates 8
3.2 De-Noised Estimation of Propensity Score 9
3.3 De-Noised Estimation of ATE 12
Algorithm 1: FATE 13
Chapter 4 Chemist: An R package Implication 14
4.1 Data_Gen 14
4.2 FATE 17
Chapter 5 Numerical Studies 18
5.1 Simulation Setup 18
5.2 Simulation Results 20
5.3 Real Data Analysis 21
Chapter 6 Summary 22
Reference 24
zh_TW
dc.format.extent 644846 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0110354032en_US
dc.subject (關鍵詞) 平均處理效應zh_TW
dc.subject (關鍵詞) 因果推論zh_TW
dc.subject (關鍵詞) 共線性zh_TW
dc.subject (關鍵詞) 特徵篩選zh_TW
dc.subject (關鍵詞) 誤差校正zh_TW
dc.subject (關鍵詞) 超高維 度的協變量zh_TW
dc.subject (關鍵詞) ATEen_US
dc.subject (關鍵詞) Causal Inferenceen_US
dc.subject (關鍵詞) Collinearityen_US
dc.subject (關鍵詞) Feature Screeningen_US
dc.subject (關鍵詞) Measurement Error Correctionen_US
dc.subject (關鍵詞) Ultrahigh-Dimensionen_US
dc.title (題名) 高維度測量誤差變數與錯誤分類處理的因果推論zh_TW
dc.title (題名) Causal Inference with High-Dimensional Error-Prone Covariates and Misclassified Treatmenten_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) Acerenza, S., Ban, K., and K ́edagni, D. (2022). Marginal treatment effects with misclassified treatment. arXiv:2105.00358

Bald ́e, I., Yang, Y.A., and Lefebvre, G. (2022). Reader reaction to “Outcome-adaptive lasso: Variable selection for causal inference” by Shortreed and Ertefaie (2017). Biometrics, 00, 1-7. https://doi.org/10.1111/biom.13683

Bang, H. and Robins, J.M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics, 61, 962-973.

Battistin, E. and Sianesi, B. (2011). Misclassified treatment status and treatment effects: an application to returns to education in the united kingdom. The Review of Economics and Statistics, 93, 495-509.

Braun, D., Gorfine, M., Parmigiani, G., Arvold, N.D., Dominici, F., and Zigler, C. (2017). Propensity scores with misclassified treatment assignment: a likelihood-based adjustment. Biostatistics, 18, 695-710.

Chatterjee, S. (2021). A new coefficient of correlation. Journal of the American Statistical Association, 116, 2009-2022.

Chen, L.-P. (2020). Causal inference for left-truncated and right-censored data with covariates measurement error. Computational & Applied Mathematics, 39:126. DOI: 10.1007/s40314-020-01152-4.

Chen, L.-P. and Yi, G.Y. (2021a). Semiparametric methods for left-truncated and right-censored survival data with covariate measurement error. Annals of the Institute of Statistical Mathematics, 73, 481-517.

Chen, L.-P. and Yi, G.Y. (2021b). Analysis of noisy survival data with graphical proportional hazards measurement error models. Biometrics, 77, 956-969.

Chen, L.-P. (2023). A note of feature screening via rank-based coefficient of correlation. Biometrics Journal, To appear. DOI: 10.1002/bimj.202100373.

Ertefaie, A., Asgharian, M., and Stephens, D.A. (2018). Variable selection in causal inference using a simultaneous penalization method. Journal Causal Inference, 6: 20170010, 1-16.

Ghosh, D., Zhu, Y., and Coffman, D.L. (2015). Penalized regression procedures for variable selection in the potential outcomes framework. Statistics in Medicine, 34, 1645-1658.

Hern ́an, M.A. and Robins, J.M. (2020). Causal Inference: What If. Chapman & Hall/CRC, Boca Raton. Link: https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/

Koch, B., Vock, D.M., and Wolfson, J. (2020). Variable selection and estimation in causal inference using Bayesian spike and slab priors. Statistical Methods in Medical Research, 29, 2445-2469.

Kyle, R.P., Moodie, E.E.M., and Klein, M.B. (2016). Correcting for measurement error in time-varying covariates in marginal structural models. American Journal Epidemiology, 184, 249-258.

Lewbel, A. (2007). Endogenous selection or treatment model estimation.Journal of Econometrics, 141, 777-806.

Lunceford, J.K. and Davidian, M. (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects. a comparative study. Statistics in Medicine, 23, 2937-2960.

McCaffrey, D.F., Lockwood, J.R., and Setodji, C.M. (2013). Inverse probability weighting with error-prone covariates. Biometrika, 100, 671-680.

Negi, A. and Negi, D.S. (2022). Difference-in-differences with a misclassified treatment. arXiv:2208.02412

Pearl, J. (2000). Causality. Cambridge University Press, Cambridge.

Plackett, R.L. (1953). The truncated Poisson distribution. Biometrics, 9, 485-488.

Rosenbaum, P. (2012). Design of Observational Studies. Springer, New York.

Rosenbaum, P. and Rubin, D. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41-55.

Ross, R. K., Su, I.-H., Webster-Clark, M., and Funk, M. J. (2022). Nondifferential treatment misclassification biases toward the null? Not a safe bet for active comparator studies. American Journal of Epidemiology, 191, 1917-1925.

Saldana, D.F. and Feng, Y. (2018). SIS: An R Package for Sure Independence Screening in Ultrahigh-Dimensional Statistical Models. Journal of Statistical Software, 83(2), 1–25.

Shortreed, S.M. and Ertefaie, A. (2017). Outcome-adaptive lasso: variable selection for
causal inference. Biometrics, 73(4), 1111-1122.

Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P., Lander, E.S., Loda, M., Kantoff, P.W., Golub, T.R., and Sellers, W.R. (2002). Gene expression correlates of clinical prostate cancer behavior Cancer Cell, 1(2), 203-209.

Shu, D. and Yi, G.Y. (2019a). Causal inference with measurement error in outcomes: Bias analysis and estimation methods. Statistical Methods in Medical Research, 28, 2049-2068.

Shu, D. and Yi, G.Y. (2019b). Inverse-probability-of-treatment weighted estimation of causal parameters in the presence of error-contaminated and time-dependent confounders. Biometrical Journal, 61, 1507-1525.

Tang, D., Kong, D., Pan, and W.,Wang, L. (2022). Ultra-high dimensional variable selection for doubly robust causal inference. Biometrics, 1-12. https://doi.org/10.1111/biom.13625

Van Der Laan, M.J. and Robins, J.M. (2003). Unified Methods for Censored Longitudinal Data and Causality. Spring-Verlag, New York.

Vansteelandt, S., Bekaert, M., and Claeskens, G. (2010). On model selection and model misspecification in causal inference. Statistical Methods in Medical Research, 21, 7-30.

Yanagi, T. (2019). Inference on local average treatment effects for misclassified treatment. Econometric Reviews, 38, 938-960.

Yi, G.Y. (2017). Statistical Analysis with Measurement Error and Misclassication: Strategy, Method and Application. New York: Springer.

Yi, G.Y. and Chen, L.-P. (2023). Estimation of the average treatment effect with variable selection and measurement error simultaneously addressed for potential confounders. Statistical Methods in Medical Research, 32, 691-711.

Zou, H., and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B, 67, 301-320.
zh_TW