Publications-Periodical Articles
Article View/Open
Publication Export
-
題名 A virtual multi-label approach to imbalanced data classification 作者 周珮婷
Chou, Elizabeth P.
Yang, Shan-Ping貢獻者 統計系 關鍵詞 Imbalance; Classification; Virtual multi-label; Equal k-means 日期 2022-03 上傳時間 21-Sep-2022 11:44:38 (UTC+8) 摘要 One of the most challenging issues in machine learning is imbalanced data analysis. Usually, in this type of research, correctly predicting minority labels is more critical than correctly predicting majority labels. However, traditional machine learning techniques easily lead to learning bias. Traditional classifiers tend to place all subjects in the majority group, resulting in biased predictions. Machine learning studies are typically conducted from one of two perspectives: a data-based perspective or a model-based perspective. Oversampling and undersampling are examples of data-based approaches, while the addition of costs, penalties, or weights to optimize the algorithm is typical of a model-based approach. Some ensemble methods have been studied recently. These methods cause various problems, such as overfitting, the omission of some information, and long computation times. In addition, these methods do not apply to all kinds of datasets. Based on this problem, the virtual labels (ViLa) approach for the majority label is proposed to solve the imbalanced problem. A new multiclass classification approach with the equal K-means clustering method is demonstrated in the study. The proposed method is compared with commonly used imbalance problem methods, such as sampling methods (oversampling, undersampling, and SMOTE) and classifier methods (SVM and one-class SVM). The results show that the proposed method performs better when the degree of data imbalance increases and will gradually outperform other methods. 關聯 Communications in Statistics - Simulation and Computation 資料類型 article DOI https://doi.org/10.1080/03610918.2022.2049820 dc.contributor 統計系 dc.creator (作者) 周珮婷 dc.creator (作者) Chou, Elizabeth P. dc.creator (作者) Yang, Shan-Ping dc.date (日期) 2022-03 dc.date.accessioned 21-Sep-2022 11:44:38 (UTC+8) - dc.date.available 21-Sep-2022 11:44:38 (UTC+8) - dc.date.issued (上傳時間) 21-Sep-2022 11:44:38 (UTC+8) - dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/142011 - dc.description.abstract (摘要) One of the most challenging issues in machine learning is imbalanced data analysis. Usually, in this type of research, correctly predicting minority labels is more critical than correctly predicting majority labels. However, traditional machine learning techniques easily lead to learning bias. Traditional classifiers tend to place all subjects in the majority group, resulting in biased predictions. Machine learning studies are typically conducted from one of two perspectives: a data-based perspective or a model-based perspective. Oversampling and undersampling are examples of data-based approaches, while the addition of costs, penalties, or weights to optimize the algorithm is typical of a model-based approach. Some ensemble methods have been studied recently. These methods cause various problems, such as overfitting, the omission of some information, and long computation times. In addition, these methods do not apply to all kinds of datasets. Based on this problem, the virtual labels (ViLa) approach for the majority label is proposed to solve the imbalanced problem. A new multiclass classification approach with the equal K-means clustering method is demonstrated in the study. The proposed method is compared with commonly used imbalance problem methods, such as sampling methods (oversampling, undersampling, and SMOTE) and classifier methods (SVM and one-class SVM). The results show that the proposed method performs better when the degree of data imbalance increases and will gradually outperform other methods. dc.format.extent 109 bytes - dc.format.mimetype text/html - dc.relation (關聯) Communications in Statistics - Simulation and Computation dc.subject (關鍵詞) Imbalance; Classification; Virtual multi-label; Equal k-means dc.title (題名) A virtual multi-label approach to imbalanced data classification dc.type (資料類型) article dc.identifier.doi (DOI) 10.1080/03610918.2022.2049820 dc.doi.uri (DOI) https://doi.org/10.1080/03610918.2022.2049820