學術產出-Periodical Articles

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

題名 A virtual multi-label approach to imbalanced data classification
作者 周珮婷
Chou, Elizabeth P.
Yang, Shan-Ping
貢獻者 統計系
關鍵詞 Imbalance; Classification; Virtual multi-label; Equal k-means
日期 2022-03
上傳時間 21-Sep-2022 11:44:38 (UTC+8)
摘要 One of the most challenging issues in machine learning is imbalanced data analysis. Usually, in this type of research, correctly predicting minority labels is more critical than correctly predicting majority labels. However, traditional machine learning techniques easily lead to learning bias. Traditional classifiers tend to place all subjects in the majority group, resulting in biased predictions. Machine learning studies are typically conducted from one of two perspectives: a data-based perspective or a model-based perspective. Oversampling and undersampling are examples of data-based approaches, while the addition of costs, penalties, or weights to optimize the algorithm is typical of a model-based approach. Some ensemble methods have been studied recently. These methods cause various problems, such as overfitting, the omission of some information, and long computation times. In addition, these methods do not apply to all kinds of datasets. Based on this problem, the virtual labels (ViLa) approach for the majority label is proposed to solve the imbalanced problem. A new multiclass classification approach with the equal K-means clustering method is demonstrated in the study. The proposed method is compared with commonly used imbalance problem methods, such as sampling methods (oversampling, undersampling, and SMOTE) and classifier methods (SVM and one-class SVM). The results show that the proposed method performs better when the degree of data imbalance increases and will gradually outperform other methods.
關聯 Communications in Statistics - Simulation and Computation
資料類型 article
DOI https://doi.org/10.1080/03610918.2022.2049820
dc.contributor 統計系
dc.creator (作者) 周珮婷
dc.creator (作者) Chou, Elizabeth P.
dc.creator (作者) Yang, Shan-Ping
dc.date (日期) 2022-03
dc.date.accessioned 21-Sep-2022 11:44:38 (UTC+8)-
dc.date.available 21-Sep-2022 11:44:38 (UTC+8)-
dc.date.issued (上傳時間) 21-Sep-2022 11:44:38 (UTC+8)-
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/142011-
dc.description.abstract (摘要) One of the most challenging issues in machine learning is imbalanced data analysis. Usually, in this type of research, correctly predicting minority labels is more critical than correctly predicting majority labels. However, traditional machine learning techniques easily lead to learning bias. Traditional classifiers tend to place all subjects in the majority group, resulting in biased predictions. Machine learning studies are typically conducted from one of two perspectives: a data-based perspective or a model-based perspective. Oversampling and undersampling are examples of data-based approaches, while the addition of costs, penalties, or weights to optimize the algorithm is typical of a model-based approach. Some ensemble methods have been studied recently. These methods cause various problems, such as overfitting, the omission of some information, and long computation times. In addition, these methods do not apply to all kinds of datasets. Based on this problem, the virtual labels (ViLa) approach for the majority label is proposed to solve the imbalanced problem. A new multiclass classification approach with the equal K-means clustering method is demonstrated in the study. The proposed method is compared with commonly used imbalance problem methods, such as sampling methods (oversampling, undersampling, and SMOTE) and classifier methods (SVM and one-class SVM). The results show that the proposed method performs better when the degree of data imbalance increases and will gradually outperform other methods.
dc.format.extent 109 bytes-
dc.format.mimetype text/html-
dc.relation (關聯) Communications in Statistics - Simulation and Computation
dc.subject (關鍵詞) Imbalance; Classification; Virtual multi-label; Equal k-means
dc.title (題名) A virtual multi-label approach to imbalanced data classification
dc.type (資料類型) article
dc.identifier.doi (DOI) 10.1080/03610918.2022.2049820
dc.doi.uri (DOI) https://doi.org/10.1080/03610918.2022.2049820