學術產出-學位論文
文章檢視/開啟
書目匯出
-
題名 數據幾何特徵的機器學習
A study of Data Geometry-based Learning作者 劉憲忠
Liu, Hsien Chung貢獻者 周珮婷
Chou, Pei Ting
劉憲忠
Liu, Hsien Chung關鍵詞 機器學習
幾何模式
machine learning
data-geometry日期 2016 上傳時間 11-七月-2016 16:54:50 (UTC+8) 摘要 本研究著重於數據的幾何模式以了解資料變數間的關係,運用統計模型配適所得的係數加權於距離矩陣上,是否能有效提升正確率。本研究主要使用資料雲幾何樹及餘弦相似度方法與抽樣多數決投票法判別預測資料類別,另外並與階層式分群法、支持向量機、Hybrid法於三筆不同資料的分類結果比較,其中有兩筆為生物行為評估專案資料與美國威斯康辛州診斷乳癌資料,使用監督式學習驗證資料分類結果,另一筆月亮模擬資料,使用半監督式學習預測新資料分類結果。最後,各方法的優劣性與原因將被探討與總結,可知不同資料數據的幾何,確實需要嘗試不同公式與演算法來達到好的機器學習結果。
The study focuses on the computed data-geometry based learning to discover the inter-dependence patterns among covariate vectors. In order to discover the patterns and improve classification accuracy, the distance functions are modified to better capture the geometry patterns and measure the association between variables. A comparison of the performance of my proposed learning rule to the other machine learning techniques will be summarized through three datasets. In the end, I demonstrated why the concept of geometry patterns is essential.參考文獻 Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine learning, 36(1-2),105 -139.Baldi, P., & Brunak, S. (2001). Bioinformatics: the machine learning approach. MIT press.Cortes, C.; Vapnik, V. (1995). Support-vector networks. Machine Learning 20 (3):273. doi:10.1007/BF00994018.Chou, E. P. (2015, July). Data Driven Geometry for Learning. In International Workshop on Machine Learning and Data Mining in Pattern Recognition (pp. 395 -402). Springer International Publishing.Chou, E. P., Hsieh, F., & Capitanio, J. (2013, December). Computed Data-Geometry Based Supervised and Semi-supervised Learning in High Dimensional Data. In Machine Learning and Applications (ICMLA), 2013 12th International Conference on (Vol. 1, pp. 277-282).Chang, Y. C. I. (2003). Boosting SVM classifiers with logistic regression. See www. stat. sinica. edu. tw/library/c_tec_rep/2003-03. pdf.Culp, M. (2011). spa: A Semi-Supervised R Package for Semi-Parametric Graph-Based Estimation. Journal of Statistical Software, 40(10), 1-29.Fushing, H., Wang, H., VanderWaal, K., McCowan, B., & Koehl, P. (2013). Multi -scale clustering by building a robust and self correcting ultrametric topology on data points. PloS one, 8(2), e56259.Grozavu, N., Bennani, Y., & Lebbah, M. (2009, June). From variable weighting to cluster characterization in topographic unsupervised learning. In Neural Networks, 2009. IJCNN 2009. International Joint Conference on (pp. 1005 -1010). IEEE.Hastie, T., Tibshirani, R., Friedman, J., & Franklin, J. (2005). The elements of statistical learning: data mining, inference and prediction. The Mathematical Intelligencer, 27(2).Tan, A. C., & Gilbert, D. (2003, January). An empirical comparison of supervised machine learning techniques in bioinformatics. In Proceedings of the First Asia -Pacific bioinformatics conference on Bioinformatics 2003-Volume 19 (pp. 219 -222). Australian Computer Society, Inc.. 描述 碩士
國立政治大學
統計學系
103354025資料來源 http://thesis.lib.nccu.edu.tw/record/#G0103354025 資料類型 thesis dc.contributor.advisor 周珮婷 zh_TW dc.contributor.advisor Chou, Pei Ting en_US dc.contributor.author (作者) 劉憲忠 zh_TW dc.contributor.author (作者) Liu, Hsien Chung en_US dc.creator (作者) 劉憲忠 zh_TW dc.creator (作者) Liu, Hsien Chung en_US dc.date (日期) 2016 en_US dc.date.accessioned 11-七月-2016 16:54:50 (UTC+8) - dc.date.available 11-七月-2016 16:54:50 (UTC+8) - dc.date.issued (上傳時間) 11-七月-2016 16:54:50 (UTC+8) - dc.identifier (其他 識別碼) G0103354025 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/98846 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 統計學系 zh_TW dc.description (描述) 103354025 zh_TW dc.description.abstract (摘要) 本研究著重於數據的幾何模式以了解資料變數間的關係,運用統計模型配適所得的係數加權於距離矩陣上,是否能有效提升正確率。本研究主要使用資料雲幾何樹及餘弦相似度方法與抽樣多數決投票法判別預測資料類別,另外並與階層式分群法、支持向量機、Hybrid法於三筆不同資料的分類結果比較,其中有兩筆為生物行為評估專案資料與美國威斯康辛州診斷乳癌資料,使用監督式學習驗證資料分類結果,另一筆月亮模擬資料,使用半監督式學習預測新資料分類結果。最後,各方法的優劣性與原因將被探討與總結,可知不同資料數據的幾何,確實需要嘗試不同公式與演算法來達到好的機器學習結果。 zh_TW dc.description.abstract (摘要) The study focuses on the computed data-geometry based learning to discover the inter-dependence patterns among covariate vectors. In order to discover the patterns and improve classification accuracy, the distance functions are modified to better capture the geometry patterns and measure the association between variables. A comparison of the performance of my proposed learning rule to the other machine learning techniques will be summarized through three datasets. In the end, I demonstrated why the concept of geometry patterns is essential. en_US dc.description.tableofcontents 第一章 緒論 1第一節 研究動機與目的 1第二節 資料敘述 3第二章 文獻探討 6第三章 研究方法 8第一節 演算法介紹 8一、 資料雲幾何樹(Data Cloud Geometry Tree) 8二、 支持向量機(support vector machine) 11三、 Hybrid method 11四、 階層式分群法(Hierarchical clustering) 12五、 抽樣多數決投票法(Voting) 12第二節 研究過程與方法 13第四章 研究結果與討論 15第一節 研究結果 15第二節 研究討論與建議 17參考文獻 21 zh_TW dc.format.extent 778690 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0103354025 en_US dc.subject (關鍵詞) 機器學習 zh_TW dc.subject (關鍵詞) 幾何模式 zh_TW dc.subject (關鍵詞) machine learning en_US dc.subject (關鍵詞) data-geometry en_US dc.title (題名) 數據幾何特徵的機器學習 zh_TW dc.title (題名) A study of Data Geometry-based Learning en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine learning, 36(1-2),105 -139.Baldi, P., & Brunak, S. (2001). Bioinformatics: the machine learning approach. MIT press.Cortes, C.; Vapnik, V. (1995). Support-vector networks. Machine Learning 20 (3):273. doi:10.1007/BF00994018.Chou, E. P. (2015, July). Data Driven Geometry for Learning. In International Workshop on Machine Learning and Data Mining in Pattern Recognition (pp. 395 -402). Springer International Publishing.Chou, E. P., Hsieh, F., & Capitanio, J. (2013, December). Computed Data-Geometry Based Supervised and Semi-supervised Learning in High Dimensional Data. In Machine Learning and Applications (ICMLA), 2013 12th International Conference on (Vol. 1, pp. 277-282).Chang, Y. C. I. (2003). Boosting SVM classifiers with logistic regression. See www. stat. sinica. edu. tw/library/c_tec_rep/2003-03. pdf.Culp, M. (2011). spa: A Semi-Supervised R Package for Semi-Parametric Graph-Based Estimation. Journal of Statistical Software, 40(10), 1-29.Fushing, H., Wang, H., VanderWaal, K., McCowan, B., & Koehl, P. (2013). Multi -scale clustering by building a robust and self correcting ultrametric topology on data points. PloS one, 8(2), e56259.Grozavu, N., Bennani, Y., & Lebbah, M. (2009, June). From variable weighting to cluster characterization in topographic unsupervised learning. In Neural Networks, 2009. IJCNN 2009. International Joint Conference on (pp. 1005 -1010). IEEE.Hastie, T., Tibshirani, R., Friedman, J., & Franklin, J. (2005). The elements of statistical learning: data mining, inference and prediction. The Mathematical Intelligencer, 27(2).Tan, A. C., & Gilbert, D. (2003, January). An empirical comparison of supervised machine learning techniques in bioinformatics. In Proceedings of the First Asia -Pacific bioinformatics conference on Bioinformatics 2003-Volume 19 (pp. 219 -222). Australian Computer Society, Inc.. zh_TW