A virtual multi-label approach to imbalanced data classification | NCCU Academic Hub

Publications-Periodical Articles

Article View/Open

html(216)

Publication Export

Google Scholar^TM

NCCU Library

Discovery System

Citation Infomation

Related Publications in TAIR

Simple Record
Full Record

題名	A virtual multi-label approach to imbalanced data classification
作者	周珮婷 Chou, Elizabeth P. Yang, Shan-Ping
貢獻者	統計系
關鍵詞	Imbalance; Classification; Virtual multi-label; Equal k-means
日期	2022-03
上傳時間	21-Sep-2022 11:44:38 (UTC+8)
摘要	One of the most challenging issues in machine learning is imbalanced data analysis. Usually, in this type of research, correctly predicting minority labels is more critical than correctly predicting majority labels. However, traditional machine learning techniques easily lead to learning bias. Traditional classifiers tend to place all subjects in the majority group, resulting in biased predictions. Machine learning studies are typically conducted from one of two perspectives: a data-based perspective or a model-based perspective. Oversampling and undersampling are examples of data-based approaches, while the addition of costs, penalties, or weights to optimize the algorithm is typical of a model-based approach. Some ensemble methods have been studied recently. These methods cause various problems, such as overfitting, the omission of some information, and long computation times. In addition, these methods do not apply to all kinds of datasets. Based on this problem, the virtual labels (ViLa) approach for the majority label is proposed to solve the imbalanced problem. A new multiclass classification approach with the equal K-means clustering method is demonstrated in the study. The proposed method is compared with commonly used imbalance problem methods, such as sampling methods (oversampling, undersampling, and SMOTE) and classifier methods (SVM and one-class SVM). The results show that the proposed method performs better when the degree of data imbalance increases and will gradually outperform other methods.
關聯	Communications in Statistics - Simulation and Computation
資料類型	article
DOI	https://doi.org/10.1080/03610918.2022.2049820

dc.contributor	統計系
dc.creator (作者)	周珮婷
dc.creator (作者)	Chou, Elizabeth P.
dc.creator (作者)	Yang, Shan-Ping
dc.date (日期)	2022-03
dc.date.accessioned	21-Sep-2022 11:44:38 (UTC+8)	-
dc.date.available	21-Sep-2022 11:44:38 (UTC+8)	-
dc.date.issued (上傳時間)	21-Sep-2022 11:44:38 (UTC+8)	-
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/142011	-
dc.description.abstract (摘要)	One of the most challenging issues in machine learning is imbalanced data analysis. Usually, in this type of research, correctly predicting minority labels is more critical than correctly predicting majority labels. However, traditional machine learning techniques easily lead to learning bias. Traditional classifiers tend to place all subjects in the majority group, resulting in biased predictions. Machine learning studies are typically conducted from one of two perspectives: a data-based perspective or a model-based perspective. Oversampling and undersampling are examples of data-based approaches, while the addition of costs, penalties, or weights to optimize the algorithm is typical of a model-based approach. Some ensemble methods have been studied recently. These methods cause various problems, such as overfitting, the omission of some information, and long computation times. In addition, these methods do not apply to all kinds of datasets. Based on this problem, the virtual labels (ViLa) approach for the majority label is proposed to solve the imbalanced problem. A new multiclass classification approach with the equal K-means clustering method is demonstrated in the study. The proposed method is compared with commonly used imbalance problem methods, such as sampling methods (oversampling, undersampling, and SMOTE) and classifier methods (SVM and one-class SVM). The results show that the proposed method performs better when the degree of data imbalance increases and will gradually outperform other methods.
dc.format.extent	109 bytes	-
dc.format.mimetype	text/html	-
dc.relation (關聯)	Communications in Statistics - Simulation and Computation
dc.subject (關鍵詞)	Imbalance; Classification; Virtual multi-label; Equal k-means
dc.title (題名)	A virtual multi-label approach to imbalanced data classification
dc.type (資料類型)	article
dc.identifier.doi (DOI)	10.1080/03610918.2022.2049820
dc.doi.uri (DOI)	https://doi.org/10.1080/03610918.2022.2049820