dc.contributor | 統計系 | |
dc.creator (作者) | 陳立榜 | |
dc.creator (作者) | Chen, Li-Pang | |
dc.date (日期) | 2021-12 | |
dc.date.accessioned | 21-Sep-2022 11:46:06 (UTC+8) | - |
dc.date.available | 21-Sep-2022 11:46:06 (UTC+8) | - |
dc.date.issued (上傳時間) | 21-Sep-2022 11:46:06 (UTC+8) | - |
dc.identifier.uri (URI) | http://nccur.lib.nccu.edu.tw/handle/140.119/142026 | - |
dc.description.abstract (摘要) | In this project, various binary classification methods have been used to make predictions about US adult income level in relation to social factors including age, gender, education, and marital status. We first explore descriptive statistics for the dataset and deal with missing values. After that, we examine some widely used classification methods, including logistic regression, discriminant analysis, support vector machine, random forest, and boosting. Meanwhile, we also provide suitable R functions to demonstrate applications. Various metrics such as ROC curves, accuracy, recall and F-measure are calculated to compare the performance of these models. We find the boosting is the best method in our data analysis due to its highest AUC value and the highest prediction accuracy. In addition, among all predictor variables, we also find three variables that have the largest impact on the US adult income level. | |
dc.format.extent | 105 bytes | - |
dc.format.mimetype | text/html | - |
dc.relation (關聯) | Journal of Modeling and Optimization, Vol.13, No.2, pp.80-91 | |
dc.subject (關鍵詞) | Boosting; Categorical data; Income; Discriminant analysis; Logistic regression; Prediction; Random forest; Support Vector Machine; Unbalanced binary classification | |
dc.title (題名) | Supervised learning for binary classification on US adult income | |
dc.type (資料類型) | article | |
dc.identifier.doi (DOI) | 10.32732/jmo.2021.13.2.80 | |
dc.doi.uri (DOI) | https://doi.org/10.32732/jmo.2021.13.2.80 | |