學術產出-Periodical Articles

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

題名 Supervised learning for binary classification on US adult income
作者 陳立榜
Chen, Li-Pang
貢獻者 統計系
關鍵詞 Boosting; Categorical data; Income; Discriminant analysis; Logistic regression; Prediction; Random forest; Support Vector Machine; Unbalanced binary classification
日期 2021-12
上傳時間 21-Sep-2022 11:46:06 (UTC+8)
摘要 In this project, various binary classification methods have been used to make predictions about US adult income level in relation to social factors including age, gender, education, and marital status. We first explore descriptive statistics for the dataset and deal with missing values. After that, we examine some widely used classification methods, including logistic regression, discriminant analysis, support vector machine, random forest, and boosting. Meanwhile, we also provide suitable R functions to demonstrate applications. Various metrics such as ROC curves, accuracy, recall and F-measure are calculated to compare the performance of these models. We find the boosting is the best method in our data analysis due to its highest AUC value and the highest prediction accuracy. In addition, among all predictor variables, we also find three variables that have the largest impact on the US adult income level.
關聯 Journal of Modeling and Optimization, Vol.13, No.2, pp.80-91
資料類型 article
DOI https://doi.org/10.32732/jmo.2021.13.2.80
dc.contributor 統計系
dc.creator (作者) 陳立榜
dc.creator (作者) Chen, Li-Pang
dc.date (日期) 2021-12
dc.date.accessioned 21-Sep-2022 11:46:06 (UTC+8)-
dc.date.available 21-Sep-2022 11:46:06 (UTC+8)-
dc.date.issued (上傳時間) 21-Sep-2022 11:46:06 (UTC+8)-
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/142026-
dc.description.abstract (摘要) In this project, various binary classification methods have been used to make predictions about US adult income level in relation to social factors including age, gender, education, and marital status. We first explore descriptive statistics for the dataset and deal with missing values. After that, we examine some widely used classification methods, including logistic regression, discriminant analysis, support vector machine, random forest, and boosting. Meanwhile, we also provide suitable R functions to demonstrate applications. Various metrics such as ROC curves, accuracy, recall and F-measure are calculated to compare the performance of these models. We find the boosting is the best method in our data analysis due to its highest AUC value and the highest prediction accuracy. In addition, among all predictor variables, we also find three variables that have the largest impact on the US adult income level.
dc.format.extent 105 bytes-
dc.format.mimetype text/html-
dc.relation (關聯) Journal of Modeling and Optimization, Vol.13, No.2, pp.80-91
dc.subject (關鍵詞) Boosting; Categorical data; Income; Discriminant analysis; Logistic regression; Prediction; Random forest; Support Vector Machine; Unbalanced binary classification
dc.title (題名) Supervised learning for binary classification on US adult income
dc.type (資料類型) article
dc.identifier.doi (DOI) 10.32732/jmo.2021.13.2.80
dc.doi.uri (DOI) https://doi.org/10.32732/jmo.2021.13.2.80