二元分類的同類別異質性 | Publication

Publications-Theses

Article View/Open

pdf(0)

Publication Export

Google Scholar^TM

NCCU Library

Discovery System

Citation Infomation

Related Publications in TAIR

Simple Record
Full Record

題名	二元分類的同類別異質性 Label Heterogeneity in Binary Classification
作者	柯百翼 Ko, Pai-Yi
貢獻者	周珮婷 Chou, Pei-Ting 柯百翼 Ko, Pai-Yi
關鍵詞	二元分類多元分類標籤內嵌樹 Pseudo Likelihood分類器類別異質性 Binary Classification Multiclass Classification Label Tree Pseudo Likelihood Classifier Label Heterogeneity
日期	2020
上傳時間	3-Aug-2020 17:32:22 (UTC+8)
摘要	機器學習上，二元分類為最常見的資料型態，這種資料型態可能存在著同類別異質性的潛在問題，導致分類器模型的分類錯誤。本研究為使模型能夠更仔細的辨識資料之間的差異，提升預測分類準確率，透過華德最小變異聚合的概念將二元分類的兩類別分別進行階層式分群，將分群後的結果重新定義為新的次類別。原始的二元分類資料集轉變為多元分類資料集後，本研究使用標籤內嵌樹（Label Embedding Tree）與分類器模型 - Pseudo Likelihood 進行分類並得出多元分類預測結果，再將預測的次類別結果轉換為原始的二元分類類別。研究結果顯示此結構下得出的分類預測結果並不輸於其他著名的二元分類器模型的分類預測結果，並且不同的是分類預測結果皆穩定處於一個波動不大的區間內，反之其他二元分類器模型的分類預測結果因變數集的更動而產生了劇烈的變動，因此本研究提出的研究方法不僅一定程度上解決了同類別異質性的問題且提升分類預測率，同時能夠透過此研究結構得到穩定的分類預測率。 Binary classification is one of the most common problems in machine learning research. However, the noisy label is one of the potential difficulties in binary classification. This study aims to solve this common challenge by using sub-labels information based on the original label. Hierarchical clustering is used first to build a hierarchy of sub-label clusters. The heterogeneity which exists in the original labels is identified to improve classification accuracy. Label tree and Pseudo Likelihood classifier are used in the current study for classification. The findings show that the performance of the Label tree and Pseudo Likelihood classifier is not inferior to the other well-known binary classification models. The classification results are stable compared to those classifiers with different feature subsets. We believe the proposed method solves the heterogeneity problem that exists in the original labels in classification.
參考文獻	一、中文參考文獻 [1] 王宗惇, & 陳儒賢. (2016). 結合自組織映射圖網路與支撐向量機於颱風期間水庫入流量預測之研究. [Reservoir Inflow Forecasting During Typhoon Periods by Combining Self-Organizing Map with Support Vector Regression]. 農業工程學報, 62(2), 1-16. doi:10.29974/JTAE.201606_62(2).0001 [2] 李亭玫. (2017). 一個用於情緒分類的腦波分群方法. (碩士). 國立宜蘭大學,宜蘭縣. Retrieved from https://hdl.handle.net/11296/853kp5 [3] 謝弘一. (2011). 資料探勘於信用卡顧客行為評分模型之建構. (博士). 輔仁大學, 新北市. Retrieved from https://hdl.handle.net/11296/c79yd9 二、英文參考文獻 [4] Charrad, M., Ghazzali, N., Boiteau, V., & Niknafs, A. (2012). NbClust package for determining the number of clusters in a dataset. [5] Fushing, H., Liu, S.-Y., Hsieh, Y.-C., & McCowan, B. (2018). From patterned response dependency to structured covariate dependency: Entropy based. categorical-pattern-matching. PloS one, 13(6), e0198253-e0198253. doi:10.1371/journal.pone.0198253 [6] Fushing, H., & Wang, X. (2020). Coarse- and fine-scale geometric information content of Multiclass Classification and implied Data-driven Intelligence. Proceedings of Machine Learning and Data Mining in Pattern Recognition, Petra Perner (Ed.), 16th International Conference on Machine Learning and Data Mining, MLDM 2020. [7] Gopalakrishnan, M., Sridhar, V., & Krishnamurthy, H. (1995). Some applications of clustering in the design of neural networks. Pattern Recognition Letters, 16(1), 59-65. doi:https://doi.org/10.1016/0167-8655(94)00064-A [8] Hsieh, N.-C. (2005). Hybrid mining approach in the design of credit scoring models. Expert Systems with Applications, 28(4), 655-665. doi:https://doi.org/10.1016/j.eswa.2004.12.022 [9] Kim, Y. S., & Sohn, S. Y. (2004). Managing loan customers using misclassification patterns of credit scoring model. Expert Systems with. Applications, 26(4), 567-573. doi:https://doi.org/10.1016/j.eswa.2003.10.013 [10] Kuo, R. J., Ho, L. M., & Hu, C. M. (2002). Integration of self-organizing feature map and K-means algorithm for market segmentation. Computers & Operations. Research, 29(11), 1475-1493. doi:https://doi.org/10.1016/S0305-0548(01)00043-0 [11] Sung, A. H. (1998). Ranking importance of input parameters of neural networks. Expert Systems with Applications, 15(3), 405-411. doi:https://doi.org/10.1016/S0957-4174(98)00041-4
描述	碩士國立政治大學統計學系 107354020
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0107354020
資料類型	thesis

dc.contributor.advisor	周珮婷	zh_TW
dc.contributor.advisor	Chou, Pei-Ting	en_US
dc.contributor.author (Authors)	柯百翼	zh_TW
dc.contributor.author (Authors)	Ko, Pai-Yi	en_US
dc.creator (作者)	柯百翼	zh_TW
dc.creator (作者)	Ko, Pai-Yi	en_US
dc.date (日期)	2020	en_US
dc.date.accessioned	3-Aug-2020 17:32:22 (UTC+8)	-
dc.date.available	3-Aug-2020 17:32:22 (UTC+8)	-
dc.date.issued (上傳時間)	3-Aug-2020 17:32:22 (UTC+8)	-
dc.identifier (Other Identifiers)	G0107354020	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/130961	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	統計學系	zh_TW
dc.description (描述)	107354020	zh_TW
dc.description.abstract (摘要)	機器學習上，二元分類為最常見的資料型態，這種資料型態可能存在著同類別異質性的潛在問題，導致分類器模型的分類錯誤。本研究為使模型能夠更仔細的辨識資料之間的差異，提升預測分類準確率，透過華德最小變異聚合的概念將二元分類的兩類別分別進行階層式分群，將分群後的結果重新定義為新的次類別。原始的二元分類資料集轉變為多元分類資料集後，本研究使用標籤內嵌樹（Label Embedding Tree）與分類器模型 - Pseudo Likelihood 進行分類並得出多元分類預測結果，再將預測的次類別結果轉換為原始的二元分類類別。研究結果顯示此結構下得出的分類預測結果並不輸於其他著名的二元分類器模型的分類預測結果，並且不同的是分類預測結果皆穩定處於一個波動不大的區間內，反之其他二元分類器模型的分類預測結果因變數集的更動而產生了劇烈的變動，因此本研究提出的研究方法不僅一定程度上解決了同類別異質性的問題且提升分類預測率，同時能夠透過此研究結構得到穩定的分類預測率。	zh_TW
dc.description.abstract (摘要)	Binary classification is one of the most common problems in machine learning research. However, the noisy label is one of the potential difficulties in binary classification. This study aims to solve this common challenge by using sub-labels information based on the original label. Hierarchical clustering is used first to build a hierarchy of sub-label clusters. The heterogeneity which exists in the original labels is identified to improve classification accuracy. Label tree and Pseudo Likelihood classifier are used in the current study for classification. The findings show that the performance of the Label tree and Pseudo Likelihood classifier is not inferior to the other well-known binary classification models. The classification results are stable compared to those classifiers with different feature subsets. We believe the proposed method solves the heterogeneity problem that exists in the original labels in classification.	en_US
dc.description.tableofcontents	第一章緒論 1 第一節研究背景與動機 1 第二節研究目的 2 第二章文獻回顧 4 第三章研究方法 7 第一節分類預測模型 8 第二節變數挑選 11 第四章研究過程與結果 13 第一節資料介紹 13 第二節研究過程與結果 20 第五章結論與建議 48 第一節結論 48 第二節未來研究方向與建議 49 第六章參考文獻 50	zh_TW
dc.format.extent	2528212 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0107354020	en_US
dc.subject (關鍵詞)	二元分類	zh_TW
dc.subject (關鍵詞)	多元分類	zh_TW
dc.subject (關鍵詞)	標籤內嵌樹	zh_TW
dc.subject (關鍵詞)	Pseudo Likelihood分類器	zh_TW
dc.subject (關鍵詞)	類別異質性	zh_TW
dc.subject (關鍵詞)	Binary Classification	en_US
dc.subject (關鍵詞)	Multiclass Classification	en_US
dc.subject (關鍵詞)	Label Tree	en_US
dc.subject (關鍵詞)	Pseudo Likelihood Classifier	en_US
dc.subject (關鍵詞)	Label Heterogeneity	en_US
dc.title (題名)	二元分類的同類別異質性	zh_TW
dc.title (題名)	Label Heterogeneity in Binary Classification	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	一、中文參考文獻 [1] 王宗惇, & 陳儒賢. (2016). 結合自組織映射圖網路與支撐向量機於颱風期間水庫入流量預測之研究. [Reservoir Inflow Forecasting During Typhoon Periods by Combining Self-Organizing Map with Support Vector Regression]. 農業工程學報, 62(2), 1-16. doi:10.29974/JTAE.201606_62(2).0001 [2] 李亭玫. (2017). 一個用於情緒分類的腦波分群方法. (碩士). 國立宜蘭大學,宜蘭縣. Retrieved from https://hdl.handle.net/11296/853kp5 [3] 謝弘一. (2011). 資料探勘於信用卡顧客行為評分模型之建構. (博士). 輔仁大學, 新北市. Retrieved from https://hdl.handle.net/11296/c79yd9 二、英文參考文獻 [4] Charrad, M., Ghazzali, N., Boiteau, V., & Niknafs, A. (2012). NbClust package for determining the number of clusters in a dataset. [5] Fushing, H., Liu, S.-Y., Hsieh, Y.-C., & McCowan, B. (2018). From patterned response dependency to structured covariate dependency: Entropy based. categorical-pattern-matching. PloS one, 13(6), e0198253-e0198253. doi:10.1371/journal.pone.0198253 [6] Fushing, H., & Wang, X. (2020). Coarse- and fine-scale geometric information content of Multiclass Classification and implied Data-driven Intelligence. Proceedings of Machine Learning and Data Mining in Pattern Recognition, Petra Perner (Ed.), 16th International Conference on Machine Learning and Data Mining, MLDM 2020. [7] Gopalakrishnan, M., Sridhar, V., & Krishnamurthy, H. (1995). Some applications of clustering in the design of neural networks. Pattern Recognition Letters, 16(1), 59-65. doi:https://doi.org/10.1016/0167-8655(94)00064-A [8] Hsieh, N.-C. (2005). Hybrid mining approach in the design of credit scoring models. Expert Systems with Applications, 28(4), 655-665. doi:https://doi.org/10.1016/j.eswa.2004.12.022 [9] Kim, Y. S., & Sohn, S. Y. (2004). Managing loan customers using misclassification patterns of credit scoring model. Expert Systems with. Applications, 26(4), 567-573. doi:https://doi.org/10.1016/j.eswa.2003.10.013 [10] Kuo, R. J., Ho, L. M., & Hu, C. M. (2002). Integration of self-organizing feature map and K-means algorithm for market segmentation. Computers & Operations. Research, 29(11), 1475-1493. doi:https://doi.org/10.1016/S0305-0548(01)00043-0 [11] Sung, A. H. (1998). Ranking importance of input parameters of neural networks. Expert Systems with Applications, 15(3), 405-411. doi:https://doi.org/10.1016/S0957-4174(98)00041-4	zh_TW
dc.identifier.doi (DOI)	10.6814/NCCU202000962	en_US

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM