學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

題名 二元分類的同類別異質性
Label Heterogeneity in Binary Classification
作者 柯百翼
Ko, Pai-Yi
貢獻者 周珮婷
Chou, Pei-Ting
柯百翼
Ko, Pai-Yi
關鍵詞 二元分類
多元分類
標籤內嵌樹
Pseudo Likelihood分類器
類別異質性
Binary Classification
Multiclass Classification
Label Tree
Pseudo Likelihood Classifier
Label Heterogeneity
日期 2020
上傳時間 3-Aug-2020 17:32:22 (UTC+8)
摘要 機器學習上,二元分類為最常見的資料型態,這種資料型態可能存在著同類別異質性的潛在問題,導致分類器模型的分類錯誤。本研究為使模型能夠更仔細的辨識資料之間的差異,提升預測分類準確率,透過華德最小變異聚合的概念將二元分類的兩類別分別進行階層式分群,將分群後的結果重新定義為新的次類別。原始的二元分類資料集轉變為多元分類資料集後,本研究使用標籤內嵌樹(Label Embedding Tree)與分類器模型 - Pseudo Likelihood 進行分類並得出多元分類預測結果,再將預測的次類別結果轉換為原始的二元分類類別。研究結果顯示此結構下得出的分類預測結果並不輸於其他著名的二元分類器模型的分類預測結果,並且不同的是分類預測結果皆穩定處於一個波動不大的區間內,反之其他二元分類器模型的分類預測結果因變數集的更動而產生了劇烈的變動,因此本研究提出的研究方法不僅一定程度上解決了同類別異質性的問題且提升分類預測率,同時能夠透過此研究結構得到穩定的分類預測率。
Binary classification is one of the most common problems in machine learning research. However, the noisy label is one of the potential difficulties in binary classification. This study aims to solve this common challenge by using sub-labels information based on the original label. Hierarchical clustering is used first to build a hierarchy of sub-label clusters. The heterogeneity which exists in the original labels is identified to improve classification accuracy. Label tree and Pseudo Likelihood classifier are used in the current study for classification. The findings show that the performance of the Label tree and Pseudo Likelihood classifier is not inferior to the other well-known binary classification models. The classification results are stable compared to those classifiers with different feature subsets. We believe the proposed method solves the heterogeneity problem that exists in the original labels in classification.
參考文獻 一、 中文參考文獻
[1] 王宗惇, & 陳儒賢. (2016). 結合自組織映射圖網路與支撐向量機於颱風期間水庫入流量預測之研究. [Reservoir Inflow Forecasting During Typhoon Periods by Combining Self-Organizing Map with Support Vector Regression]. 農業工程學報, 62(2), 1-16. doi:10.29974/JTAE.201606_62(2).0001
[2] 李亭玫. (2017). 一個用於情緒分類的腦波分群方法. (碩士). 國立宜蘭大學,宜蘭縣. Retrieved from https://hdl.handle.net/11296/853kp5
[3] 謝弘一. (2011). 資料探勘於信用卡顧客行為評分模型之建構. (博士). 輔仁大學, 新北市. Retrieved from https://hdl.handle.net/11296/c79yd9

二、 英文參考文獻
[4] Charrad, M., Ghazzali, N., Boiteau, V., & Niknafs, A. (2012). NbClust package for determining the number of clusters in a dataset.
[5] Fushing, H., Liu, S.-Y., Hsieh, Y.-C., & McCowan, B. (2018). From patterned response dependency to structured covariate dependency: Entropy based. categorical-pattern-matching. PloS one, 13(6), e0198253-e0198253. doi:10.1371/journal.pone.0198253
[6] Fushing, H., & Wang, X. (2020). Coarse- and fine-scale geometric information content of Multiclass Classification and implied Data-driven Intelligence. Proceedings of Machine Learning and Data Mining in Pattern Recognition, Petra Perner (Ed.), 16th International Conference on Machine Learning and Data Mining, MLDM 2020.
[7] Gopalakrishnan, M., Sridhar, V., & Krishnamurthy, H. (1995). Some applications of clustering in the design of neural networks. Pattern Recognition Letters, 16(1), 59-65. doi:https://doi.org/10.1016/0167-8655(94)00064-A
[8] Hsieh, N.-C. (2005). Hybrid mining approach in the design of credit scoring models. Expert Systems with Applications, 28(4), 655-665. doi:https://doi.org/10.1016/j.eswa.2004.12.022
[9] Kim, Y. S., & Sohn, S. Y. (2004). Managing loan customers using misclassification patterns of credit scoring model. Expert Systems with. Applications, 26(4), 567-573. doi:https://doi.org/10.1016/j.eswa.2003.10.013
[10] Kuo, R. J., Ho, L. M., & Hu, C. M. (2002). Integration of self-organizing feature map and K-means algorithm for market segmentation. Computers & Operations. Research, 29(11), 1475-1493. doi:https://doi.org/10.1016/S0305-0548(01)00043-0
[11] Sung, A. H. (1998). Ranking importance of input parameters of neural networks. Expert Systems with Applications, 15(3), 405-411. doi:https://doi.org/10.1016/S0957-4174(98)00041-4
描述 碩士
國立政治大學
統計學系
107354020
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0107354020
資料類型 thesis
dc.contributor.advisor 周珮婷zh_TW
dc.contributor.advisor Chou, Pei-Tingen_US
dc.contributor.author (Authors) 柯百翼zh_TW
dc.contributor.author (Authors) Ko, Pai-Yien_US
dc.creator (作者) 柯百翼zh_TW
dc.creator (作者) Ko, Pai-Yien_US
dc.date (日期) 2020en_US
dc.date.accessioned 3-Aug-2020 17:32:22 (UTC+8)-
dc.date.available 3-Aug-2020 17:32:22 (UTC+8)-
dc.date.issued (上傳時間) 3-Aug-2020 17:32:22 (UTC+8)-
dc.identifier (Other Identifiers) G0107354020en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/130961-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 統計學系zh_TW
dc.description (描述) 107354020zh_TW
dc.description.abstract (摘要) 機器學習上,二元分類為最常見的資料型態,這種資料型態可能存在著同類別異質性的潛在問題,導致分類器模型的分類錯誤。本研究為使模型能夠更仔細的辨識資料之間的差異,提升預測分類準確率,透過華德最小變異聚合的概念將二元分類的兩類別分別進行階層式分群,將分群後的結果重新定義為新的次類別。原始的二元分類資料集轉變為多元分類資料集後,本研究使用標籤內嵌樹(Label Embedding Tree)與分類器模型 - Pseudo Likelihood 進行分類並得出多元分類預測結果,再將預測的次類別結果轉換為原始的二元分類類別。研究結果顯示此結構下得出的分類預測結果並不輸於其他著名的二元分類器模型的分類預測結果,並且不同的是分類預測結果皆穩定處於一個波動不大的區間內,反之其他二元分類器模型的分類預測結果因變數集的更動而產生了劇烈的變動,因此本研究提出的研究方法不僅一定程度上解決了同類別異質性的問題且提升分類預測率,同時能夠透過此研究結構得到穩定的分類預測率。zh_TW
dc.description.abstract (摘要) Binary classification is one of the most common problems in machine learning research. However, the noisy label is one of the potential difficulties in binary classification. This study aims to solve this common challenge by using sub-labels information based on the original label. Hierarchical clustering is used first to build a hierarchy of sub-label clusters. The heterogeneity which exists in the original labels is identified to improve classification accuracy. Label tree and Pseudo Likelihood classifier are used in the current study for classification. The findings show that the performance of the Label tree and Pseudo Likelihood classifier is not inferior to the other well-known binary classification models. The classification results are stable compared to those classifiers with different feature subsets. We believe the proposed method solves the heterogeneity problem that exists in the original labels in classification.en_US
dc.description.tableofcontents 第一章 緒論 1
第一節 研究背景與動機 1
第二節 研究目的 2
第二章 文獻回顧 4
第三章 研究方法 7
第一節 分類預測模型 8
第二節 變數挑選 11
第四章 研究過程與結果 13
第一節 資料介紹 13
第二節 研究過程與結果 20
第五章 結論與建議 48
第一節 結論 48
第二節 未來研究方向與建議 49
第六章 參考文獻 50
zh_TW
dc.format.extent 2528212 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0107354020en_US
dc.subject (關鍵詞) 二元分類zh_TW
dc.subject (關鍵詞) 多元分類zh_TW
dc.subject (關鍵詞) 標籤內嵌樹zh_TW
dc.subject (關鍵詞) Pseudo Likelihood分類器zh_TW
dc.subject (關鍵詞) 類別異質性zh_TW
dc.subject (關鍵詞) Binary Classificationen_US
dc.subject (關鍵詞) Multiclass Classificationen_US
dc.subject (關鍵詞) Label Treeen_US
dc.subject (關鍵詞) Pseudo Likelihood Classifieren_US
dc.subject (關鍵詞) Label Heterogeneityen_US
dc.title (題名) 二元分類的同類別異質性zh_TW
dc.title (題名) Label Heterogeneity in Binary Classificationen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) 一、 中文參考文獻
[1] 王宗惇, & 陳儒賢. (2016). 結合自組織映射圖網路與支撐向量機於颱風期間水庫入流量預測之研究. [Reservoir Inflow Forecasting During Typhoon Periods by Combining Self-Organizing Map with Support Vector Regression]. 農業工程學報, 62(2), 1-16. doi:10.29974/JTAE.201606_62(2).0001
[2] 李亭玫. (2017). 一個用於情緒分類的腦波分群方法. (碩士). 國立宜蘭大學,宜蘭縣. Retrieved from https://hdl.handle.net/11296/853kp5
[3] 謝弘一. (2011). 資料探勘於信用卡顧客行為評分模型之建構. (博士). 輔仁大學, 新北市. Retrieved from https://hdl.handle.net/11296/c79yd9

二、 英文參考文獻
[4] Charrad, M., Ghazzali, N., Boiteau, V., & Niknafs, A. (2012). NbClust package for determining the number of clusters in a dataset.
[5] Fushing, H., Liu, S.-Y., Hsieh, Y.-C., & McCowan, B. (2018). From patterned response dependency to structured covariate dependency: Entropy based. categorical-pattern-matching. PloS one, 13(6), e0198253-e0198253. doi:10.1371/journal.pone.0198253
[6] Fushing, H., & Wang, X. (2020). Coarse- and fine-scale geometric information content of Multiclass Classification and implied Data-driven Intelligence. Proceedings of Machine Learning and Data Mining in Pattern Recognition, Petra Perner (Ed.), 16th International Conference on Machine Learning and Data Mining, MLDM 2020.
[7] Gopalakrishnan, M., Sridhar, V., & Krishnamurthy, H. (1995). Some applications of clustering in the design of neural networks. Pattern Recognition Letters, 16(1), 59-65. doi:https://doi.org/10.1016/0167-8655(94)00064-A
[8] Hsieh, N.-C. (2005). Hybrid mining approach in the design of credit scoring models. Expert Systems with Applications, 28(4), 655-665. doi:https://doi.org/10.1016/j.eswa.2004.12.022
[9] Kim, Y. S., & Sohn, S. Y. (2004). Managing loan customers using misclassification patterns of credit scoring model. Expert Systems with. Applications, 26(4), 567-573. doi:https://doi.org/10.1016/j.eswa.2003.10.013
[10] Kuo, R. J., Ho, L. M., & Hu, C. M. (2002). Integration of self-organizing feature map and K-means algorithm for market segmentation. Computers & Operations. Research, 29(11), 1475-1493. doi:https://doi.org/10.1016/S0305-0548(01)00043-0
[11] Sung, A. H. (1998). Ranking importance of input parameters of neural networks. Expert Systems with Applications, 15(3), 405-411. doi:https://doi.org/10.1016/S0957-4174(98)00041-4
zh_TW
dc.identifier.doi (DOI) 10.6814/NCCU202000962en_US