Missing data imputation using classification and regression trees | NCCU Academic Hub

學術產出-期刊論文

文章檢視/開啟

html(154)

書目匯出

Google Scholar^TM

政大圖書館

學術資源探索系統

引文資訊

TAIR相關學術產出

Simple Record
Full Record

題名	Missing data imputation using classification and regression trees
作者	張育瑋 Chang, Yu-Wei;Chen, Cheng-Yang
貢獻者	統計系
關鍵詞	Classification and regression trees; Missing data; Missing data imputation; Resampling
日期	2024-06
上傳時間	2024-07-17
摘要	Background Missing data are common when analyzing real data. One popular solution is to impute missing data so that one complete dataset can be obtained for subsequent data analysis. In the present study, we focus on missing data imputation using classification and regression trees (CART). Methods We consider a new perspective on missing data in a CART imputation problem and realize the perspective through some resampling algorithms. Several existing missing data imputation methods using CART are compared through simulation studies, and we aim to investigate the methods with better imputation accuracy under various conditions. Some systematic findings are demonstrated and presented. These imputation methods are further applied to two real datasets: Hepatitis data and Credit approval data for illustration. Results The method that performs the best strongly depends on the correlation between variables. For imputing missing ordinal categorical variables, the rpart package with surrogate variables is recommended under correlations larger than 0 with missing completely at random (MCAR) and missing at random (MAR) conditions. Under missing not at random (MNAR), chi-squared test methods and the rpart package with surrogate variables are suggested. For imputing missing quantitative variables, the iterative imputation method is most recommended under moderate correlation conditions.
關聯	PeerJ Computer Science, 10, e2119
資料類型	article
DOI	https://doi.org/10.7717/peerj-cs.2119

dc.contributor	統計系
dc.creator (作者)	張育瑋
dc.creator (作者)	Chang, Yu-Wei;Chen, Cheng-Yang
dc.date (日期)	2024-06
dc.date.accessioned	2024-07-17	-
dc.date.available	2024-07-17	-
dc.date.issued (上傳時間)	2024-07-17	-
dc.identifier.uri (URI)	https://nccur.lib.nccu.edu.tw/handle/140.119/152336	-
dc.description.abstract (摘要)	Background Missing data are common when analyzing real data. One popular solution is to impute missing data so that one complete dataset can be obtained for subsequent data analysis. In the present study, we focus on missing data imputation using classification and regression trees (CART). Methods We consider a new perspective on missing data in a CART imputation problem and realize the perspective through some resampling algorithms. Several existing missing data imputation methods using CART are compared through simulation studies, and we aim to investigate the methods with better imputation accuracy under various conditions. Some systematic findings are demonstrated and presented. These imputation methods are further applied to two real datasets: Hepatitis data and Credit approval data for illustration. Results The method that performs the best strongly depends on the correlation between variables. For imputing missing ordinal categorical variables, the rpart package with surrogate variables is recommended under correlations larger than 0 with missing completely at random (MCAR) and missing at random (MAR) conditions. Under missing not at random (MNAR), chi-squared test methods and the rpart package with surrogate variables are suggested. For imputing missing quantitative variables, the iterative imputation method is most recommended under moderate correlation conditions.
dc.format.extent	101 bytes	-
dc.format.mimetype	text/html	-
dc.relation (關聯)	PeerJ Computer Science, 10, e2119
dc.subject (關鍵詞)	Classification and regression trees; Missing data; Missing data imputation; Resampling
dc.title (題名)	Missing data imputation using classification and regression trees
dc.type (資料類型)	article
dc.identifier.doi (DOI)	10.7717/peerj-cs.2119
dc.doi.uri (DOI)	https://doi.org/10.7717/peerj-cs.2119