結合資料切分與階段式分類之混合型支援向量機模型 | Publication

Publications-Theses

Article View/Open

pdf(0)

Publication Export

Google Scholar^TM

題名	結合資料切分與階段式分類之混合型支援向量機模型 A Hybrid Support Vector Machine Model Combining Data Partitioning and Staged Classification
作者	劉松憲 Liu, Sung-Hsien
貢獻者	張志浩劉松憲 Liu, Sung-Hsien
關鍵詞	支持向量機混合模型資料切分二元分類 Support Vector Machine Hybrid Model Data Partitioning Classification
日期	2025
上傳時間	1-Sep-2025 14:50:26 (UTC+8)
摘要	在機器學習領域中，如何對類別數據建立線性或非線性的分類結構使其兼具模型解釋力與預測能力，始終是一項重要且具挑戰性的課題。傳統的支持向量機（Support Vector Machine, SVM）雖具備良好的分類能力，但其效能高度仰賴資料的邊界結構與核函數選擇，當資料同時存在多樣化邊界型態時，單一核函數模型在分析的向上則略顯不足。本研究提出一種結合資料切分與階段式分類策略的混合型支持向量機（Hybrid SVM）模型。首先以線性核函數在固定成本參數C值下，透過SVM辨識落於分類邊界區域的關鍵樣本，進行樣本過濾與資料縮減。而後針對邊界內之樣本進一步建構非線性分類器，以提升模型效能與泛化能力。同時，本研究亦設計一選取準則，其權衡分類準確率與邊界外樣本比例用以調整成本參數，藉此達到最佳線性邊界截取適量之邊界內樣本。同時我們亦透過交叉驗證調整成本參數並與選取準則法進行分析效能比較。為驗證所提方法之有效性與適應性，本研究設計多組涵蓋多樣邊界結構與樣本數條件的模擬實驗，探討 Hybrid SVM 於模擬試驗下之分類效能。整體結果顯示，Hybrid SVM 在多數情境中都有不錯表現，特別於混合型邊界資料具顯著優勢，顯示其為一兼具彈性與效能的分類方法。 In the field of machine learning, constructing either linear or nonlinear classification structures for categorical data that simultaneously ensure model interpretability and predictive performance has long been recognized as a critical and challenging task. Although the traditional Support Vector Machine (SVM) demonstrates strong classification capability, its performance heavily depends on the boundary structure of the data and the selection of the kernel function. When the data exhibit heterogeneous boundary characteristics, single-kernel models may fall short in capturing such complexity. This study proposes a Hybrid Support Vector Machine (Hybrid SVM) model that integrates data partitioning and a staged classification strategy. Specifically, a linear kernel SVM with a fixed cost parameter C is first employed to identify key samples located within the margin area, enabling sample filtering and data reduction. Subsequently, a nonlinear classifier is constructed for those margin-area samples to enhance model performance and generalization ability. Additionally, this study introduces a criterion that balances classification accuracy and the proportion of non-margin samples to guide the selection of the cost parameter, thereby optimizing the extraction of margin-area samples along the linear decision boundary. Furthermore, we apply cross-validation to adjust the cost parameter and compare the effectiveness of this approach with that of the proposed criterion-based method. To validate the effectiveness and adaptability of the proposed model, a series of simulation experiments were designed under various boundary structures and sample size conditions to evaluate the classification performance of the Hybrid SVM. The overall results indicate that the Hybrid SVM performs well across most scenarios, with particularly notable advantages in datasets featuring mixed boundary types, demonstrating its flexibility and effectiveness as a classification method.
參考文獻	Arlot, S. and Celisse, A. (2010). A survey of cross-validation procedures for model selection. Statistics Surveys, 4:40–79. Bi, J. and Bennett, K. P. (2003). Dimensionality reduction via sparse support vector machines. Journal of Machine Learning Research, 3:1229–1243. Breiman, L. (2001). Random forests. Machine Learning, 45(1):5–32. Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth International Group Cortes, C. and Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3):273-297. Cox, D. R. (1958). The regression analysis of binary sequences. Journal of the Royal Statistical Society: Series B (Methodological), 20(2):215–232. Friedman, J.H.(2001). Greedyfunctionapproximation: Agradientboostingmachine. Annals of Statistics, 29(5):1189–1232. Gönen, M.andAlpaydın, E. (2011). Multiple kernel learning algorithms. Journal of Machine Learning Research, 12:2211–2268. Lanckriet, G. R., Cristianini, N., Bartlett, P., Ghaoui, L. E., and Jordan, M. I. (2004). Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research,5:27–72. Schölkopf, B., Burges, C.J.C., andVapnik, V.N.(1996). Incorporating invariances in support vector learning machines. In International Conferenceon Artificial Neural Networks, pages 47–52. Wang, W., Arora, R., Livescu, K., and Bilmes, J. A. (2014). On deep multi-view representation learning. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 2160–2166.
描述	碩士國立政治大學統計學系 112354028
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0112354028
資料類型	thesis

dc.contributor.advisor	張志浩	zh_TW
dc.contributor.author (Authors)	劉松憲	zh_TW
dc.contributor.author (Authors)	Liu, Sung-Hsien	en_US
dc.creator (作者)	劉松憲	zh_TW
dc.creator (作者)	Liu, Sung-Hsien	en_US
dc.date (日期)	2025	en_US
dc.date.accessioned	1-Sep-2025 14:50:26 (UTC+8)	-
dc.date.available	1-Sep-2025 14:50:26 (UTC+8)	-
dc.date.issued (上傳時間)	1-Sep-2025 14:50:26 (UTC+8)	-
dc.identifier (Other Identifiers)	G0112354028	en_US
dc.identifier.uri (URI)	https://nccur.lib.nccu.edu.tw/handle/140.119/159043	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	統計學系	zh_TW
dc.description (描述)	112354028	zh_TW
dc.description.abstract (摘要)	在機器學習領域中，如何對類別數據建立線性或非線性的分類結構使其兼具模型解釋力與預測能力，始終是一項重要且具挑戰性的課題。傳統的支持向量機（Support Vector Machine, SVM）雖具備良好的分類能力，但其效能高度仰賴資料的邊界結構與核函數選擇，當資料同時存在多樣化邊界型態時，單一核函數模型在分析的向上則略顯不足。本研究提出一種結合資料切分與階段式分類策略的混合型支持向量機（Hybrid SVM）模型。首先以線性核函數在固定成本參數C值下，透過SVM辨識落於分類邊界區域的關鍵樣本，進行樣本過濾與資料縮減。而後針對邊界內之樣本進一步建構非線性分類器，以提升模型效能與泛化能力。同時，本研究亦設計一選取準則，其權衡分類準確率與邊界外樣本比例用以調整成本參數，藉此達到最佳線性邊界截取適量之邊界內樣本。同時我們亦透過交叉驗證調整成本參數並與選取準則法進行分析效能比較。為驗證所提方法之有效性與適應性，本研究設計多組涵蓋多樣邊界結構與樣本數條件的模擬實驗，探討 Hybrid SVM 於模擬試驗下之分類效能。整體結果顯示，Hybrid SVM 在多數情境中都有不錯表現，特別於混合型邊界資料具顯著優勢，顯示其為一兼具彈性與效能的分類方法。	zh_TW
dc.description.abstract (摘要)	In the field of machine learning, constructing either linear or nonlinear classification structures for categorical data that simultaneously ensure model interpretability and predictive performance has long been recognized as a critical and challenging task. Although the traditional Support Vector Machine (SVM) demonstrates strong classification capability, its performance heavily depends on the boundary structure of the data and the selection of the kernel function. When the data exhibit heterogeneous boundary characteristics, single-kernel models may fall short in capturing such complexity. This study proposes a Hybrid Support Vector Machine (Hybrid SVM) model that integrates data partitioning and a staged classification strategy. Specifically, a linear kernel SVM with a fixed cost parameter C is first employed to identify key samples located within the margin area, enabling sample filtering and data reduction. Subsequently, a nonlinear classifier is constructed for those margin-area samples to enhance model performance and generalization ability. Additionally, this study introduces a criterion that balances classification accuracy and the proportion of non-margin samples to guide the selection of the cost parameter, thereby optimizing the extraction of margin-area samples along the linear decision boundary. Furthermore, we apply cross-validation to adjust the cost parameter and compare the effectiveness of this approach with that of the proposed criterion-based method. To validate the effectiveness and adaptability of the proposed model, a series of simulation experiments were designed under various boundary structures and sample size conditions to evaluate the classification performance of the Hybrid SVM. The overall results indicate that the Hybrid SVM performs well across most scenarios, with particularly notable advantages in datasets featuring mixed boundary types, demonstrating its flexibility and effectiveness as a classification method.	en_US
dc.description.tableofcontents	中文摘要 ii Abstract iii 目次 iv 表次 vi 圖次 vii 第一章緒論 1 第二章文獻回顧 3 2.1 支持向量機(Support Vector Machine, SVM) 3 2.2 多重核學習(Multiple Kernel Learning, MKL) 4 2.3 交叉驗證（Cross-Validation, CV）7 第三章方法論 8 3.1 演算法介紹 8 3.2 成本參數C之選擇策略 10 第四章模擬試驗 12 4.1 高斯混和數據試驗 13 4.2 邏輯式數據試驗 15 4.3 雙月形數據試驗 19 4.4 非線性邏輯式數據試驗 22 4.5 混合分類數據試驗 24 第五章結論 29 參考文獻 31	zh_TW
dc.format.extent	4624884 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0112354028	en_US
dc.subject (關鍵詞)	支持向量機	zh_TW
dc.subject (關鍵詞)	混合模型	zh_TW
dc.subject (關鍵詞)	資料切分	zh_TW
dc.subject (關鍵詞)	二元分類	zh_TW
dc.subject (關鍵詞)	Support Vector Machine	en_US
dc.subject (關鍵詞)	Hybrid Model	en_US
dc.subject (關鍵詞)	Data Partitioning	en_US
dc.subject (關鍵詞)	Classification	en_US
dc.title (題名)	結合資料切分與階段式分類之混合型支援向量機模型	zh_TW
dc.title (題名)	A Hybrid Support Vector Machine Model Combining Data Partitioning and Staged Classification	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	Arlot, S. and Celisse, A. (2010). A survey of cross-validation procedures for model selection. Statistics Surveys, 4:40–79. Bi, J. and Bennett, K. P. (2003). Dimensionality reduction via sparse support vector machines. Journal of Machine Learning Research, 3:1229–1243. Breiman, L. (2001). Random forests. Machine Learning, 45(1):5–32. Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth International Group Cortes, C. and Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3):273-297. Cox, D. R. (1958). The regression analysis of binary sequences. Journal of the Royal Statistical Society: Series B (Methodological), 20(2):215–232. Friedman, J.H.(2001). Greedyfunctionapproximation: Agradientboostingmachine. Annals of Statistics, 29(5):1189–1232. Gönen, M.andAlpaydın, E. (2011). Multiple kernel learning algorithms. Journal of Machine Learning Research, 12:2211–2268. Lanckriet, G. R., Cristianini, N., Bartlett, P., Ghaoui, L. E., and Jordan, M. I. (2004). Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research,5:27–72. Schölkopf, B., Burges, C.J.C., andVapnik, V.N.(1996). Incorporating invariances in support vector learning machines. In International Conferenceon Artificial Neural Networks, pages 47–52. Wang, W., Arora, R., Livescu, K., and Bilmes, J. A. (2014). On deep multi-view representation learning. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 2160–2166.	zh_TW

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM