高維度測量誤差變數與錯誤分類處理的因果推論

學術產出-學位論文

文章檢視/開啟

pdf(0)

書目匯出

Google Scholar^TM

題名	高維度測量誤差變數與錯誤分類處理的因果推論 Causal Inference with High-Dimensional Error-Prone Covariates and Misclassified Treatment
作者	徐偉鑫 Hsu, Wei-Hsin
貢獻者	陳立榜 Chen, Li-Pang 徐偉鑫 Hsu, Wei-Hsin
關鍵詞	平均處理效應因果推論共線性特徵篩選誤差校正超高維度的協變量 ATE Causal Inference Collinearity Feature Screening Measurement Error Correction Ultrahigh-Dimension
日期	2023
上傳時間	2-八月-2023 13:05:44 (UTC+8)
摘要	在因果推論中，平均處理效應（ATE）通常用於衡量在因果推論中不同的「處理」對感興趣「結果」之間的關係，基於傾向評分的逆概率加權方法來估計ATE是常用的方法。然而在應用中，超高維度的特徵在數據集中普遍存在測量誤差。如果忽略這些特徵可能最終導致ATE的不可靠估計。在本論文中，我們主要考慮一個可能存在測量誤差的數據集，特徵和處理可能受到測量誤差的影響，而結果可能遵循指數分佈並與特徵變量呈非線性關係。為了應對這些挑戰並得出ATE的精確估計，我們開發了FATE方法，即特徵篩選（Feature screening）、自適應套索（Adaptive lasso）、處理調整（Treatment adjustment）和特徵變量誤差校正（Error correction for covariates）。在特徵篩選過程前先消除數據中誤差，並且我們的方法是可以套用在指數分佈的結果。此外，只要修正了錯誤分類的處理和特徵變量測量誤差，我們可以得出可靠傾向評分估計同時考慮了共線性，從而得出具有測量誤差校正的ATE估計值。最後通過數值研究，我們發現所提出的FATE方法具有滿意的估計效能，以及優於其競爭方法。 In causal inference, the average treatment effect (ATE) is usually used to measure the causal effect of a treatment on the outcome of interest. The inverse probability weight method based on the propensity score is a commonly used strategy to estimate ATE. However, in applications, ultrahigh-dimensional covariates and measurement error are ubiquitous in datasets. Ignoring those features may eventually induce unreliable estimator of ATE. In this thesis, we primarily consider a dataset where covariates and treatments are possibly subject to measurement error, and potential outcomes follow exponential distributions and have nonlinear relationship with covariates. To tackle these challenges and derive precise estimator of ATE, we develop the FATE method, referring to Feature screening, Adaptive lasso, Treatment adjustment, and Error correction for covariates. Our feature screening procedure is based on error-eliminated data, and is valid to handle exponentially distributed outcomes. In addition, provided that misclassified treatment and measurement error in covariates are corrected, we derive the reliable estimator of propensity score with collinearity taken into account, and thus, the estimator of ATE with measurement error correction is derived. Throughout numerical studies, we find that the proposed FATE method has satisfactory performance and is better than its competitive methods.
參考文獻	Acerenza, S., Ban, K., and K ́edagni, D. (2022). Marginal treatment effects with misclassified treatment. arXiv:2105.00358 Bald ́e, I., Yang, Y.A., and Lefebvre, G. (2022). Reader reaction to “Outcome-adaptive lasso: Variable selection for causal inference” by Shortreed and Ertefaie (2017). Biometrics, 00, 1-7. https://doi.org/10.1111/biom.13683 Bang, H. and Robins, J.M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics, 61, 962-973. Battistin, E. and Sianesi, B. (2011). Misclassified treatment status and treatment effects: an application to returns to education in the united kingdom. The Review of Economics and Statistics, 93, 495-509. Braun, D., Gorfine, M., Parmigiani, G., Arvold, N.D., Dominici, F., and Zigler, C. (2017). Propensity scores with misclassified treatment assignment: a likelihood-based adjustment. Biostatistics, 18, 695-710. Chatterjee, S. (2021). A new coefficient of correlation. Journal of the American Statistical Association, 116, 2009-2022. Chen, L.-P. (2020). Causal inference for left-truncated and right-censored data with covariates measurement error. Computational & Applied Mathematics, 39:126. DOI: 10.1007/s40314-020-01152-4. Chen, L.-P. and Yi, G.Y. (2021a). Semiparametric methods for left-truncated and right-censored survival data with covariate measurement error. Annals of the Institute of Statistical Mathematics, 73, 481-517. Chen, L.-P. and Yi, G.Y. (2021b). Analysis of noisy survival data with graphical proportional hazards measurement error models. Biometrics, 77, 956-969. Chen, L.-P. (2023). A note of feature screening via rank-based coefficient of correlation. Biometrics Journal, To appear. DOI: 10.1002/bimj.202100373. Ertefaie, A., Asgharian, M., and Stephens, D.A. (2018). Variable selection in causal inference using a simultaneous penalization method. Journal Causal Inference, 6: 20170010, 1-16. Ghosh, D., Zhu, Y., and Coffman, D.L. (2015). Penalized regression procedures for variable selection in the potential outcomes framework. Statistics in Medicine, 34, 1645-1658. Hern ́an, M.A. and Robins, J.M. (2020). Causal Inference: What If. Chapman & Hall/CRC, Boca Raton. Link: https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/ Koch, B., Vock, D.M., and Wolfson, J. (2020). Variable selection and estimation in causal inference using Bayesian spike and slab priors. Statistical Methods in Medical Research, 29, 2445-2469. Kyle, R.P., Moodie, E.E.M., and Klein, M.B. (2016). Correcting for measurement error in time-varying covariates in marginal structural models. American Journal Epidemiology, 184, 249-258. Lewbel, A. (2007). Endogenous selection or treatment model estimation.Journal of Econometrics, 141, 777-806. Lunceford, J.K. and Davidian, M. (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects. a comparative study. Statistics in Medicine, 23, 2937-2960. McCaffrey, D.F., Lockwood, J.R., and Setodji, C.M. (2013). Inverse probability weighting with error-prone covariates. Biometrika, 100, 671-680. Negi, A. and Negi, D.S. (2022). Difference-in-differences with a misclassified treatment. arXiv:2208.02412 Pearl, J. (2000). Causality. Cambridge University Press, Cambridge. Plackett, R.L. (1953). The truncated Poisson distribution. Biometrics, 9, 485-488. Rosenbaum, P. (2012). Design of Observational Studies. Springer, New York. Rosenbaum, P. and Rubin, D. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41-55. Ross, R. K., Su, I.-H., Webster-Clark, M., and Funk, M. J. (2022). Nondifferential treatment misclassification biases toward the null? Not a safe bet for active comparator studies. American Journal of Epidemiology, 191, 1917-1925. Saldana, D.F. and Feng, Y. (2018). SIS: An R Package for Sure Independence Screening in Ultrahigh-Dimensional Statistical Models. Journal of Statistical Software, 83(2), 1–25. Shortreed, S.M. and Ertefaie, A. (2017). Outcome-adaptive lasso: variable selection for causal inference. Biometrics, 73(4), 1111-1122. Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P., Lander, E.S., Loda, M., Kantoff, P.W., Golub, T.R., and Sellers, W.R. (2002). Gene expression correlates of clinical prostate cancer behavior Cancer Cell, 1(2), 203-209. Shu, D. and Yi, G.Y. (2019a). Causal inference with measurement error in outcomes: Bias analysis and estimation methods. Statistical Methods in Medical Research, 28, 2049-2068. Shu, D. and Yi, G.Y. (2019b). Inverse-probability-of-treatment weighted estimation of causal parameters in the presence of error-contaminated and time-dependent confounders. Biometrical Journal, 61, 1507-1525. Tang, D., Kong, D., Pan, and W.,Wang, L. (2022). Ultra-high dimensional variable selection for doubly robust causal inference. Biometrics, 1-12. https://doi.org/10.1111/biom.13625 Van Der Laan, M.J. and Robins, J.M. (2003). Unified Methods for Censored Longitudinal Data and Causality. Spring-Verlag, New York. Vansteelandt, S., Bekaert, M., and Claeskens, G. (2010). On model selection and model misspecification in causal inference. Statistical Methods in Medical Research, 21, 7-30. Yanagi, T. (2019). Inference on local average treatment effects for misclassified treatment. Econometric Reviews, 38, 938-960. Yi, G.Y. (2017). Statistical Analysis with Measurement Error and Misclassication: Strategy, Method and Application. New York: Springer. Yi, G.Y. and Chen, L.-P. (2023). Estimation of the average treatment effect with variable selection and measurement error simultaneously addressed for potential confounders. Statistical Methods in Medical Research, 32, 691-711. Zou, H., and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B, 67, 301-320.
描述	碩士國立政治大學統計學系 110354032
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0110354032
資料類型	thesis

dc.contributor.advisor	陳立榜	zh_TW
dc.contributor.advisor	Chen, Li-Pang	en_US
dc.contributor.author (作者)	徐偉鑫	zh_TW
dc.contributor.author (作者)	Hsu, Wei-Hsin	en_US
dc.creator (作者)	徐偉鑫	zh_TW
dc.creator (作者)	Hsu, Wei-Hsin	en_US
dc.date (日期)	2023	en_US
dc.date.accessioned	2-八月-2023 13:05:44 (UTC+8)	-
dc.date.available	2-八月-2023 13:05:44 (UTC+8)	-
dc.date.issued (上傳時間)	2-八月-2023 13:05:44 (UTC+8)	-
dc.identifier (其他識別碼)	G0110354032	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/146313	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	統計學系	zh_TW
dc.description (描述)	110354032	zh_TW
dc.description.abstract (摘要)	在因果推論中，平均處理效應（ATE）通常用於衡量在因果推論中不同的「處理」對感興趣「結果」之間的關係，基於傾向評分的逆概率加權方法來估計ATE是常用的方法。然而在應用中，超高維度的特徵在數據集中普遍存在測量誤差。如果忽略這些特徵可能最終導致ATE的不可靠估計。在本論文中，我們主要考慮一個可能存在測量誤差的數據集，特徵和處理可能受到測量誤差的影響，而結果可能遵循指數分佈並與特徵變量呈非線性關係。為了應對這些挑戰並得出ATE的精確估計，我們開發了FATE方法，即特徵篩選（Feature screening）、自適應套索（Adaptive lasso）、處理調整（Treatment adjustment）和特徵變量誤差校正（Error correction for covariates）。在特徵篩選過程前先消除數據中誤差，並且我們的方法是可以套用在指數分佈的結果。此外，只要修正了錯誤分類的處理和特徵變量測量誤差，我們可以得出可靠傾向評分估計同時考慮了共線性，從而得出具有測量誤差校正的ATE估計值。最後通過數值研究，我們發現所提出的FATE方法具有滿意的估計效能，以及優於其競爭方法。	zh_TW
dc.description.abstract (摘要)	In causal inference, the average treatment effect (ATE) is usually used to measure the causal effect of a treatment on the outcome of interest. The inverse probability weight method based on the propensity score is a commonly used strategy to estimate ATE. However, in applications, ultrahigh-dimensional covariates and measurement error are ubiquitous in datasets. Ignoring those features may eventually induce unreliable estimator of ATE. In this thesis, we primarily consider a dataset where covariates and treatments are possibly subject to measurement error, and potential outcomes follow exponential distributions and have nonlinear relationship with covariates. To tackle these challenges and derive precise estimator of ATE, we develop the FATE method, referring to Feature screening, Adaptive lasso, Treatment adjustment, and Error correction for covariates. Our feature screening procedure is based on error-eliminated data, and is valid to handle exponentially distributed outcomes. In addition, provided that misclassified treatment and measurement error in covariates are corrected, we derive the reliable estimator of propensity score with collinearity taken into account, and thus, the estimator of ATE with measurement error correction is derived. Throughout numerical studies, we find that the proposed FATE method has satisfactory performance and is better than its competitive methods.	en_US
dc.description.tableofcontents	Table of Contents Abstract IV Table of Contents V Tables VI Chapter 1 Introduction 1 Chapter 2 Notation and Models 4 2.1 Data and Causal Inference 4 2.2 Measurement Error Models 5 Chapter 3 Methodology 7 3.1 Feature Screening for Ultrahigh Dimensional Covariates 8 3.2 De-Noised Estimation of Propensity Score 9 3.3 De-Noised Estimation of ATE 12 Algorithm 1: FATE 13 Chapter 4 Chemist: An R package Implication 14 4.1 Data_Gen 14 4.2 FATE 17 Chapter 5 Numerical Studies 18 5.1 Simulation Setup 18 5.2 Simulation Results 20 5.3 Real Data Analysis 21 Chapter 6 Summary 22 Reference 24	zh_TW
dc.format.extent	644846 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0110354032	en_US
dc.subject (關鍵詞)	平均處理效應	zh_TW
dc.subject (關鍵詞)	因果推論	zh_TW
dc.subject (關鍵詞)	共線性	zh_TW
dc.subject (關鍵詞)	特徵篩選	zh_TW
dc.subject (關鍵詞)	誤差校正	zh_TW
dc.subject (關鍵詞)	超高維度的協變量	zh_TW
dc.subject (關鍵詞)	ATE	en_US
dc.subject (關鍵詞)	Causal Inference	en_US
dc.subject (關鍵詞)	Collinearity	en_US
dc.subject (關鍵詞)	Feature Screening	en_US
dc.subject (關鍵詞)	Measurement Error Correction	en_US
dc.subject (關鍵詞)	Ultrahigh-Dimension	en_US
dc.title (題名)	高維度測量誤差變數與錯誤分類處理的因果推論	zh_TW
dc.title (題名)	Causal Inference with High-Dimensional Error-Prone Covariates and Misclassified Treatment	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	Acerenza, S., Ban, K., and K ́edagni, D. (2022). Marginal treatment effects with misclassified treatment. arXiv:2105.00358 Bald ́e, I., Yang, Y.A., and Lefebvre, G. (2022). Reader reaction to “Outcome-adaptive lasso: Variable selection for causal inference” by Shortreed and Ertefaie (2017). Biometrics, 00, 1-7. https://doi.org/10.1111/biom.13683 Bang, H. and Robins, J.M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics, 61, 962-973. Battistin, E. and Sianesi, B. (2011). Misclassified treatment status and treatment effects: an application to returns to education in the united kingdom. The Review of Economics and Statistics, 93, 495-509. Braun, D., Gorfine, M., Parmigiani, G., Arvold, N.D., Dominici, F., and Zigler, C. (2017). Propensity scores with misclassified treatment assignment: a likelihood-based adjustment. Biostatistics, 18, 695-710. Chatterjee, S. (2021). A new coefficient of correlation. Journal of the American Statistical Association, 116, 2009-2022. Chen, L.-P. (2020). Causal inference for left-truncated and right-censored data with covariates measurement error. Computational & Applied Mathematics, 39:126. DOI: 10.1007/s40314-020-01152-4. Chen, L.-P. and Yi, G.Y. (2021a). Semiparametric methods for left-truncated and right-censored survival data with covariate measurement error. Annals of the Institute of Statistical Mathematics, 73, 481-517. Chen, L.-P. and Yi, G.Y. (2021b). Analysis of noisy survival data with graphical proportional hazards measurement error models. Biometrics, 77, 956-969. Chen, L.-P. (2023). A note of feature screening via rank-based coefficient of correlation. Biometrics Journal, To appear. DOI: 10.1002/bimj.202100373. Ertefaie, A., Asgharian, M., and Stephens, D.A. (2018). Variable selection in causal inference using a simultaneous penalization method. Journal Causal Inference, 6: 20170010, 1-16. Ghosh, D., Zhu, Y., and Coffman, D.L. (2015). Penalized regression procedures for variable selection in the potential outcomes framework. Statistics in Medicine, 34, 1645-1658. Hern ́an, M.A. and Robins, J.M. (2020). Causal Inference: What If. Chapman & Hall/CRC, Boca Raton. Link: https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/ Koch, B., Vock, D.M., and Wolfson, J. (2020). Variable selection and estimation in causal inference using Bayesian spike and slab priors. Statistical Methods in Medical Research, 29, 2445-2469. Kyle, R.P., Moodie, E.E.M., and Klein, M.B. (2016). Correcting for measurement error in time-varying covariates in marginal structural models. American Journal Epidemiology, 184, 249-258. Lewbel, A. (2007). Endogenous selection or treatment model estimation.Journal of Econometrics, 141, 777-806. Lunceford, J.K. and Davidian, M. (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects. a comparative study. Statistics in Medicine, 23, 2937-2960. McCaffrey, D.F., Lockwood, J.R., and Setodji, C.M. (2013). Inverse probability weighting with error-prone covariates. Biometrika, 100, 671-680. Negi, A. and Negi, D.S. (2022). Difference-in-differences with a misclassified treatment. arXiv:2208.02412 Pearl, J. (2000). Causality. Cambridge University Press, Cambridge. Plackett, R.L. (1953). The truncated Poisson distribution. Biometrics, 9, 485-488. Rosenbaum, P. (2012). Design of Observational Studies. Springer, New York. Rosenbaum, P. and Rubin, D. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41-55. Ross, R. K., Su, I.-H., Webster-Clark, M., and Funk, M. J. (2022). Nondifferential treatment misclassification biases toward the null? Not a safe bet for active comparator studies. American Journal of Epidemiology, 191, 1917-1925. Saldana, D.F. and Feng, Y. (2018). SIS: An R Package for Sure Independence Screening in Ultrahigh-Dimensional Statistical Models. Journal of Statistical Software, 83(2), 1–25. Shortreed, S.M. and Ertefaie, A. (2017). Outcome-adaptive lasso: variable selection for causal inference. Biometrics, 73(4), 1111-1122. Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P., Lander, E.S., Loda, M., Kantoff, P.W., Golub, T.R., and Sellers, W.R. (2002). Gene expression correlates of clinical prostate cancer behavior Cancer Cell, 1(2), 203-209. Shu, D. and Yi, G.Y. (2019a). Causal inference with measurement error in outcomes: Bias analysis and estimation methods. Statistical Methods in Medical Research, 28, 2049-2068. Shu, D. and Yi, G.Y. (2019b). Inverse-probability-of-treatment weighted estimation of causal parameters in the presence of error-contaminated and time-dependent confounders. Biometrical Journal, 61, 1507-1525. Tang, D., Kong, D., Pan, and W.,Wang, L. (2022). Ultra-high dimensional variable selection for doubly robust causal inference. Biometrics, 1-12. https://doi.org/10.1111/biom.13625 Van Der Laan, M.J. and Robins, J.M. (2003). Unified Methods for Censored Longitudinal Data and Causality. Spring-Verlag, New York. Vansteelandt, S., Bekaert, M., and Claeskens, G. (2010). On model selection and model misspecification in causal inference. Statistical Methods in Medical Research, 21, 7-30. Yanagi, T. (2019). Inference on local average treatment effects for misclassified treatment. Econometric Reviews, 38, 938-960. Yi, G.Y. (2017). Statistical Analysis with Measurement Error and Misclassication: Strategy, Method and Application. New York: Springer. Yi, G.Y. and Chen, L.-P. (2023). Estimation of the average treatment effect with variable selection and measurement error simultaneously addressed for potential confounders. Statistical Methods in Medical Research, 32, 691-711. Zou, H., and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B, 67, 301-320.	zh_TW

學術產出-學位論文

文章檢視/開啟

書目匯出

Google ScholarTM

政大圖書館

引文資訊

TAIR相關學術產出

Google Scholar^TM