學術產出-Theses
Article View/Open
Publication Export
-
題名 二元分類的同類別異質性
Label Heterogeneity in Binary Classification作者 柯百翼
Ko, Pai-Yi貢獻者 周珮婷
Chou, Pei-Ting
柯百翼
Ko, Pai-Yi關鍵詞 二元分類
多元分類
標籤內嵌樹
Pseudo Likelihood分類器
類別異質性
Binary Classification
Multiclass Classification
Label Tree
Pseudo Likelihood Classifier
Label Heterogeneity日期 2020 上傳時間 3-Aug-2020 17:32:22 (UTC+8) 摘要 機器學習上,二元分類為最常見的資料型態,這種資料型態可能存在著同類別異質性的潛在問題,導致分類器模型的分類錯誤。本研究為使模型能夠更仔細的辨識資料之間的差異,提升預測分類準確率,透過華德最小變異聚合的概念將二元分類的兩類別分別進行階層式分群,將分群後的結果重新定義為新的次類別。原始的二元分類資料集轉變為多元分類資料集後,本研究使用標籤內嵌樹(Label Embedding Tree)與分類器模型 - Pseudo Likelihood 進行分類並得出多元分類預測結果,再將預測的次類別結果轉換為原始的二元分類類別。研究結果顯示此結構下得出的分類預測結果並不輸於其他著名的二元分類器模型的分類預測結果,並且不同的是分類預測結果皆穩定處於一個波動不大的區間內,反之其他二元分類器模型的分類預測結果因變數集的更動而產生了劇烈的變動,因此本研究提出的研究方法不僅一定程度上解決了同類別異質性的問題且提升分類預測率,同時能夠透過此研究結構得到穩定的分類預測率。
Binary classification is one of the most common problems in machine learning research. However, the noisy label is one of the potential difficulties in binary classification. This study aims to solve this common challenge by using sub-labels information based on the original label. Hierarchical clustering is used first to build a hierarchy of sub-label clusters. The heterogeneity which exists in the original labels is identified to improve classification accuracy. Label tree and Pseudo Likelihood classifier are used in the current study for classification. The findings show that the performance of the Label tree and Pseudo Likelihood classifier is not inferior to the other well-known binary classification models. The classification results are stable compared to those classifiers with different feature subsets. We believe the proposed method solves the heterogeneity problem that exists in the original labels in classification.參考文獻 一、 中文參考文獻[1] 王宗惇, & 陳儒賢. (2016). 結合自組織映射圖網路與支撐向量機於颱風期間水庫入流量預測之研究. [Reservoir Inflow Forecasting During Typhoon Periods by Combining Self-Organizing Map with Support Vector Regression]. 農業工程學報, 62(2), 1-16. doi:10.29974/JTAE.201606_62(2).0001[2] 李亭玫. (2017). 一個用於情緒分類的腦波分群方法. (碩士). 國立宜蘭大學,宜蘭縣. Retrieved from https://hdl.handle.net/11296/853kp5[3] 謝弘一. (2011). 資料探勘於信用卡顧客行為評分模型之建構. (博士). 輔仁大學, 新北市. Retrieved from https://hdl.handle.net/11296/c79yd9二、 英文參考文獻[4] Charrad, M., Ghazzali, N., Boiteau, V., & Niknafs, A. (2012). NbClust package for determining the number of clusters in a dataset.[5] Fushing, H., Liu, S.-Y., Hsieh, Y.-C., & McCowan, B. (2018). From patterned response dependency to structured covariate dependency: Entropy based. categorical-pattern-matching. PloS one, 13(6), e0198253-e0198253. doi:10.1371/journal.pone.0198253[6] Fushing, H., & Wang, X. (2020). Coarse- and fine-scale geometric information content of Multiclass Classification and implied Data-driven Intelligence. Proceedings of Machine Learning and Data Mining in Pattern Recognition, Petra Perner (Ed.), 16th International Conference on Machine Learning and Data Mining, MLDM 2020.[7] Gopalakrishnan, M., Sridhar, V., & Krishnamurthy, H. (1995). Some applications of clustering in the design of neural networks. Pattern Recognition Letters, 16(1), 59-65. doi:https://doi.org/10.1016/0167-8655(94)00064-A[8] Hsieh, N.-C. (2005). Hybrid mining approach in the design of credit scoring models. Expert Systems with Applications, 28(4), 655-665. doi:https://doi.org/10.1016/j.eswa.2004.12.022[9] Kim, Y. S., & Sohn, S. Y. (2004). Managing loan customers using misclassification patterns of credit scoring model. Expert Systems with. Applications, 26(4), 567-573. doi:https://doi.org/10.1016/j.eswa.2003.10.013[10] Kuo, R. J., Ho, L. M., & Hu, C. M. (2002). Integration of self-organizing feature map and K-means algorithm for market segmentation. Computers & Operations. Research, 29(11), 1475-1493. doi:https://doi.org/10.1016/S0305-0548(01)00043-0[11] Sung, A. H. (1998). Ranking importance of input parameters of neural networks. Expert Systems with Applications, 15(3), 405-411. doi:https://doi.org/10.1016/S0957-4174(98)00041-4 描述 碩士
國立政治大學
統計學系
107354020資料來源 http://thesis.lib.nccu.edu.tw/record/#G0107354020 資料類型 thesis dc.contributor.advisor 周珮婷 zh_TW dc.contributor.advisor Chou, Pei-Ting en_US dc.contributor.author (Authors) 柯百翼 zh_TW dc.contributor.author (Authors) Ko, Pai-Yi en_US dc.creator (作者) 柯百翼 zh_TW dc.creator (作者) Ko, Pai-Yi en_US dc.date (日期) 2020 en_US dc.date.accessioned 3-Aug-2020 17:32:22 (UTC+8) - dc.date.available 3-Aug-2020 17:32:22 (UTC+8) - dc.date.issued (上傳時間) 3-Aug-2020 17:32:22 (UTC+8) - dc.identifier (Other Identifiers) G0107354020 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/130961 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 統計學系 zh_TW dc.description (描述) 107354020 zh_TW dc.description.abstract (摘要) 機器學習上,二元分類為最常見的資料型態,這種資料型態可能存在著同類別異質性的潛在問題,導致分類器模型的分類錯誤。本研究為使模型能夠更仔細的辨識資料之間的差異,提升預測分類準確率,透過華德最小變異聚合的概念將二元分類的兩類別分別進行階層式分群,將分群後的結果重新定義為新的次類別。原始的二元分類資料集轉變為多元分類資料集後,本研究使用標籤內嵌樹(Label Embedding Tree)與分類器模型 - Pseudo Likelihood 進行分類並得出多元分類預測結果,再將預測的次類別結果轉換為原始的二元分類類別。研究結果顯示此結構下得出的分類預測結果並不輸於其他著名的二元分類器模型的分類預測結果,並且不同的是分類預測結果皆穩定處於一個波動不大的區間內,反之其他二元分類器模型的分類預測結果因變數集的更動而產生了劇烈的變動,因此本研究提出的研究方法不僅一定程度上解決了同類別異質性的問題且提升分類預測率,同時能夠透過此研究結構得到穩定的分類預測率。 zh_TW dc.description.abstract (摘要) Binary classification is one of the most common problems in machine learning research. However, the noisy label is one of the potential difficulties in binary classification. This study aims to solve this common challenge by using sub-labels information based on the original label. Hierarchical clustering is used first to build a hierarchy of sub-label clusters. The heterogeneity which exists in the original labels is identified to improve classification accuracy. Label tree and Pseudo Likelihood classifier are used in the current study for classification. The findings show that the performance of the Label tree and Pseudo Likelihood classifier is not inferior to the other well-known binary classification models. The classification results are stable compared to those classifiers with different feature subsets. We believe the proposed method solves the heterogeneity problem that exists in the original labels in classification. en_US dc.description.tableofcontents 第一章 緒論 1第一節 研究背景與動機 1第二節 研究目的 2第二章 文獻回顧 4第三章 研究方法 7第一節 分類預測模型 8第二節 變數挑選 11第四章 研究過程與結果 13第一節 資料介紹 13第二節 研究過程與結果 20第五章 結論與建議 48第一節 結論 48第二節 未來研究方向與建議 49第六章 參考文獻 50 zh_TW dc.format.extent 2528212 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0107354020 en_US dc.subject (關鍵詞) 二元分類 zh_TW dc.subject (關鍵詞) 多元分類 zh_TW dc.subject (關鍵詞) 標籤內嵌樹 zh_TW dc.subject (關鍵詞) Pseudo Likelihood分類器 zh_TW dc.subject (關鍵詞) 類別異質性 zh_TW dc.subject (關鍵詞) Binary Classification en_US dc.subject (關鍵詞) Multiclass Classification en_US dc.subject (關鍵詞) Label Tree en_US dc.subject (關鍵詞) Pseudo Likelihood Classifier en_US dc.subject (關鍵詞) Label Heterogeneity en_US dc.title (題名) 二元分類的同類別異質性 zh_TW dc.title (題名) Label Heterogeneity in Binary Classification en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) 一、 中文參考文獻[1] 王宗惇, & 陳儒賢. (2016). 結合自組織映射圖網路與支撐向量機於颱風期間水庫入流量預測之研究. [Reservoir Inflow Forecasting During Typhoon Periods by Combining Self-Organizing Map with Support Vector Regression]. 農業工程學報, 62(2), 1-16. doi:10.29974/JTAE.201606_62(2).0001[2] 李亭玫. (2017). 一個用於情緒分類的腦波分群方法. (碩士). 國立宜蘭大學,宜蘭縣. Retrieved from https://hdl.handle.net/11296/853kp5[3] 謝弘一. (2011). 資料探勘於信用卡顧客行為評分模型之建構. (博士). 輔仁大學, 新北市. Retrieved from https://hdl.handle.net/11296/c79yd9二、 英文參考文獻[4] Charrad, M., Ghazzali, N., Boiteau, V., & Niknafs, A. (2012). NbClust package for determining the number of clusters in a dataset.[5] Fushing, H., Liu, S.-Y., Hsieh, Y.-C., & McCowan, B. (2018). From patterned response dependency to structured covariate dependency: Entropy based. categorical-pattern-matching. PloS one, 13(6), e0198253-e0198253. doi:10.1371/journal.pone.0198253[6] Fushing, H., & Wang, X. (2020). Coarse- and fine-scale geometric information content of Multiclass Classification and implied Data-driven Intelligence. Proceedings of Machine Learning and Data Mining in Pattern Recognition, Petra Perner (Ed.), 16th International Conference on Machine Learning and Data Mining, MLDM 2020.[7] Gopalakrishnan, M., Sridhar, V., & Krishnamurthy, H. (1995). Some applications of clustering in the design of neural networks. Pattern Recognition Letters, 16(1), 59-65. doi:https://doi.org/10.1016/0167-8655(94)00064-A[8] Hsieh, N.-C. (2005). Hybrid mining approach in the design of credit scoring models. Expert Systems with Applications, 28(4), 655-665. doi:https://doi.org/10.1016/j.eswa.2004.12.022[9] Kim, Y. S., & Sohn, S. Y. (2004). Managing loan customers using misclassification patterns of credit scoring model. Expert Systems with. Applications, 26(4), 567-573. doi:https://doi.org/10.1016/j.eswa.2003.10.013[10] Kuo, R. J., Ho, L. M., & Hu, C. M. (2002). Integration of self-organizing feature map and K-means algorithm for market segmentation. Computers & Operations. Research, 29(11), 1475-1493. doi:https://doi.org/10.1016/S0305-0548(01)00043-0[11] Sung, A. H. (1998). Ranking importance of input parameters of neural networks. Expert Systems with Applications, 15(3), 405-411. doi:https://doi.org/10.1016/S0957-4174(98)00041-4 zh_TW dc.identifier.doi (DOI) 10.6814/NCCU202000962 en_US