Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 數據幾何特徵的機器學習
A study of Data Geometry-based Learning
作者 劉憲忠
Liu, Hsien Chung
貢獻者 周珮婷
Chou, Pei Ting
劉憲忠
Liu, Hsien Chung
關鍵詞 機器學習
幾何模式
machine learning
data-geometry
日期 2016
上傳時間 11-Jul-2016 16:54:50 (UTC+8)
摘要 本研究著重於數據的幾何模式以了解資料變數間的關係,運用統計模型配適所得的係數加權於距離矩陣上,是否能有效提升正確率。本研究主要使用資料雲幾何樹及餘弦相似度方法與抽樣多數決投票法判別預測資料類別,另外並與階層式分群法、支持向量機、Hybrid法於三筆不同資料的分類結果比較,其中有兩筆為生物行為評估專案資料與美國威斯康辛州診斷乳癌資料,使用監督式學習驗證資料分類結果,另一筆月亮模擬資料,使用半監督式學習預測新資料分類結果。最後,各方法的優劣性與原因將被探討與總結,可知不同資料數據的幾何,確實需要嘗試不同公式與演算法來達到好的機器學習結果。
The study focuses on the computed data-geometry based learning to discover the inter-dependence patterns among covariate vectors. In order to discover the patterns and improve classification accuracy, the distance functions are modified to better capture the geometry patterns and measure the association between variables. A comparison of the performance of my proposed learning rule to the other machine learning techniques will be summarized through three datasets. In the end, I demonstrated why the concept of geometry patterns is essential.
參考文獻 Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine learning, 36(1-2),105 -139.
Baldi, P., & Brunak, S. (2001). Bioinformatics: the machine learning approach. MIT press.
Cortes, C.; Vapnik, V. (1995). Support-vector networks. Machine Learning 20 (3):273. doi:10.1007/BF00994018.
Chou, E. P. (2015, July). Data Driven Geometry for Learning. In International Workshop on Machine Learning and Data Mining in Pattern Recognition (pp. 395 -402). Springer International Publishing.
Chou, E. P., Hsieh, F., & Capitanio, J. (2013, December). Computed Data-Geometry Based Supervised and Semi-supervised Learning in High Dimensional Data. In Machine Learning and Applications (ICMLA), 2013 12th International Conference on (Vol. 1, pp. 277-282).
Chang, Y. C. I. (2003). Boosting SVM classifiers with logistic regression. See www. stat. sinica. edu. tw/library/c_tec_rep/2003-03. pdf.
Culp, M. (2011). spa: A Semi-Supervised R Package for Semi-Parametric Graph-Based Estimation. Journal of Statistical Software, 40(10), 1-29.
Fushing, H., Wang, H., VanderWaal, K., McCowan, B., & Koehl, P. (2013). Multi -scale clustering by building a robust and self correcting ultrametric topology on data points. PloS one, 8(2), e56259.
Grozavu, N., Bennani, Y., & Lebbah, M. (2009, June). From variable weighting to cluster characterization in topographic unsupervised learning. In Neural Networks, 2009. IJCNN 2009. International Joint Conference on (pp. 1005 -1010). IEEE.
Hastie, T., Tibshirani, R., Friedman, J., & Franklin, J. (2005). The elements of statistical learning: data mining, inference and prediction. The Mathematical Intelligencer, 27(2).
Tan, A. C., & Gilbert, D. (2003, January). An empirical comparison of supervised machine learning techniques in bioinformatics. In Proceedings of the First Asia -Pacific bioinformatics conference on Bioinformatics 2003-Volume 19 (pp. 219 -222). Australian Computer Society, Inc..
描述 碩士
國立政治大學
統計學系
103354025
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0103354025
資料類型 thesis
dc.contributor.advisor 周珮婷zh_TW
dc.contributor.advisor Chou, Pei Tingen_US
dc.contributor.author (Authors) 劉憲忠zh_TW
dc.contributor.author (Authors) Liu, Hsien Chungen_US
dc.creator (作者) 劉憲忠zh_TW
dc.creator (作者) Liu, Hsien Chungen_US
dc.date (日期) 2016en_US
dc.date.accessioned 11-Jul-2016 16:54:50 (UTC+8)-
dc.date.available 11-Jul-2016 16:54:50 (UTC+8)-
dc.date.issued (上傳時間) 11-Jul-2016 16:54:50 (UTC+8)-
dc.identifier (Other Identifiers) G0103354025en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/98846-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 統計學系zh_TW
dc.description (描述) 103354025zh_TW
dc.description.abstract (摘要) 本研究著重於數據的幾何模式以了解資料變數間的關係,運用統計模型配適所得的係數加權於距離矩陣上,是否能有效提升正確率。本研究主要使用資料雲幾何樹及餘弦相似度方法與抽樣多數決投票法判別預測資料類別,另外並與階層式分群法、支持向量機、Hybrid法於三筆不同資料的分類結果比較,其中有兩筆為生物行為評估專案資料與美國威斯康辛州診斷乳癌資料,使用監督式學習驗證資料分類結果,另一筆月亮模擬資料,使用半監督式學習預測新資料分類結果。最後,各方法的優劣性與原因將被探討與總結,可知不同資料數據的幾何,確實需要嘗試不同公式與演算法來達到好的機器學習結果。zh_TW
dc.description.abstract (摘要) The study focuses on the computed data-geometry based learning to discover the inter-dependence patterns among covariate vectors. In order to discover the patterns and improve classification accuracy, the distance functions are modified to better capture the geometry patterns and measure the association between variables. A comparison of the performance of my proposed learning rule to the other machine learning techniques will be summarized through three datasets. In the end, I demonstrated why the concept of geometry patterns is essential.en_US
dc.description.tableofcontents 第一章 緒論 1
第一節 研究動機與目的 1
第二節 資料敘述 3
第二章 文獻探討 6
第三章 研究方法 8
第一節 演算法介紹 8
一、 資料雲幾何樹(Data Cloud Geometry Tree) 8
二、 支持向量機(support vector machine) 11
三、 Hybrid method 11
四、 階層式分群法(Hierarchical clustering) 12
五、 抽樣多數決投票法(Voting) 12
第二節 研究過程與方法 13
第四章 研究結果與討論 15
第一節 研究結果 15
第二節 研究討論與建議 17
參考文獻 21
zh_TW
dc.format.extent 778690 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0103354025en_US
dc.subject (關鍵詞) 機器學習zh_TW
dc.subject (關鍵詞) 幾何模式zh_TW
dc.subject (關鍵詞) machine learningen_US
dc.subject (關鍵詞) data-geometryen_US
dc.title (題名) 數據幾何特徵的機器學習zh_TW
dc.title (題名) A study of Data Geometry-based Learningen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine learning, 36(1-2),105 -139.
Baldi, P., & Brunak, S. (2001). Bioinformatics: the machine learning approach. MIT press.
Cortes, C.; Vapnik, V. (1995). Support-vector networks. Machine Learning 20 (3):273. doi:10.1007/BF00994018.
Chou, E. P. (2015, July). Data Driven Geometry for Learning. In International Workshop on Machine Learning and Data Mining in Pattern Recognition (pp. 395 -402). Springer International Publishing.
Chou, E. P., Hsieh, F., & Capitanio, J. (2013, December). Computed Data-Geometry Based Supervised and Semi-supervised Learning in High Dimensional Data. In Machine Learning and Applications (ICMLA), 2013 12th International Conference on (Vol. 1, pp. 277-282).
Chang, Y. C. I. (2003). Boosting SVM classifiers with logistic regression. See www. stat. sinica. edu. tw/library/c_tec_rep/2003-03. pdf.
Culp, M. (2011). spa: A Semi-Supervised R Package for Semi-Parametric Graph-Based Estimation. Journal of Statistical Software, 40(10), 1-29.
Fushing, H., Wang, H., VanderWaal, K., McCowan, B., & Koehl, P. (2013). Multi -scale clustering by building a robust and self correcting ultrametric topology on data points. PloS one, 8(2), e56259.
Grozavu, N., Bennani, Y., & Lebbah, M. (2009, June). From variable weighting to cluster characterization in topographic unsupervised learning. In Neural Networks, 2009. IJCNN 2009. International Joint Conference on (pp. 1005 -1010). IEEE.
Hastie, T., Tibshirani, R., Friedman, J., & Franklin, J. (2005). The elements of statistical learning: data mining, inference and prediction. The Mathematical Intelligencer, 27(2).
Tan, A. C., & Gilbert, D. (2003, January). An empirical comparison of supervised machine learning techniques in bioinformatics. In Proceedings of the First Asia -Pacific bioinformatics conference on Bioinformatics 2003-Volume 19 (pp. 219 -222). Australian Computer Society, Inc..
zh_TW