Supervised learning for binary classification on US adult income | NCCU Academic Hub

學術產出-Periodical Articles

Article View/Open

html(149)

Publication Export

Google Scholar^TM

政大圖書館

學術資源探索系統

Citation Infomation

Simple Record
Full Record

題名	Supervised learning for binary classification on US adult income
作者	陳立榜 Chen, Li-Pang
貢獻者	統計系
關鍵詞	Boosting; Categorical data; Income; Discriminant analysis; Logistic regression; Prediction; Random forest; Support Vector Machine; Unbalanced binary classification
日期	2021-12
上傳時間	21-Sep-2022 11:46:06 (UTC+8)
摘要	In this project, various binary classification methods have been used to make predictions about US adult income level in relation to social factors including age, gender, education, and marital status. We first explore descriptive statistics for the dataset and deal with missing values. After that, we examine some widely used classification methods, including logistic regression, discriminant analysis, support vector machine, random forest, and boosting. Meanwhile, we also provide suitable R functions to demonstrate applications. Various metrics such as ROC curves, accuracy, recall and F-measure are calculated to compare the performance of these models. We find the boosting is the best method in our data analysis due to its highest AUC value and the highest prediction accuracy. In addition, among all predictor variables, we also find three variables that have the largest impact on the US adult income level.
關聯	Journal of Modeling and Optimization, Vol.13, No.2, pp.80-91
資料類型	article
DOI	https://doi.org/10.32732/jmo.2021.13.2.80

dc.contributor	統計系
dc.creator (作者)	陳立榜
dc.creator (作者)	Chen, Li-Pang
dc.date (日期)	2021-12
dc.date.accessioned	21-Sep-2022 11:46:06 (UTC+8)	-
dc.date.available	21-Sep-2022 11:46:06 (UTC+8)	-
dc.date.issued (上傳時間)	21-Sep-2022 11:46:06 (UTC+8)	-
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/142026	-
dc.description.abstract (摘要)	In this project, various binary classification methods have been used to make predictions about US adult income level in relation to social factors including age, gender, education, and marital status. We first explore descriptive statistics for the dataset and deal with missing values. After that, we examine some widely used classification methods, including logistic regression, discriminant analysis, support vector machine, random forest, and boosting. Meanwhile, we also provide suitable R functions to demonstrate applications. Various metrics such as ROC curves, accuracy, recall and F-measure are calculated to compare the performance of these models. We find the boosting is the best method in our data analysis due to its highest AUC value and the highest prediction accuracy. In addition, among all predictor variables, we also find three variables that have the largest impact on the US adult income level.
dc.format.extent	105 bytes	-
dc.format.mimetype	text/html	-
dc.relation (關聯)	Journal of Modeling and Optimization, Vol.13, No.2, pp.80-91
dc.subject (關鍵詞)	Boosting; Categorical data; Income; Discriminant analysis; Logistic regression; Prediction; Random forest; Support Vector Machine; Unbalanced binary classification
dc.title (題名)	Supervised learning for binary classification on US adult income
dc.type (資料類型)	article
dc.identifier.doi (DOI)	10.32732/jmo.2021.13.2.80
dc.doi.uri (DOI)	https://doi.org/10.32732/jmo.2021.13.2.80