Publications-Theses
Article View/Open
Publication Export
Google ScholarTM
NCCU Library
Citation Infomation
Related Publications in TAIR
Title | 數據幾何特徵的機器學習 A study of Data Geometry-based Learning |
Creator | 劉憲忠 Liu, Hsien Chung |
Contributor | 周珮婷 Chou, Pei Ting 劉憲忠 Liu, Hsien Chung |
Key Words | 機器學習 幾何模式 machine learning data-geometry |
Date | 2016 |
Date Issued | 11-Jul-2016 16:54:50 (UTC+8) |
Summary | 本研究著重於數據的幾何模式以了解資料變數間的關係,運用統計模型配適所得的係數加權於距離矩陣上,是否能有效提升正確率。本研究主要使用資料雲幾何樹及餘弦相似度方法與抽樣多數決投票法判別預測資料類別,另外並與階層式分群法、支持向量機、Hybrid法於三筆不同資料的分類結果比較,其中有兩筆為生物行為評估專案資料與美國威斯康辛州診斷乳癌資料,使用監督式學習驗證資料分類結果,另一筆月亮模擬資料,使用半監督式學習預測新資料分類結果。最後,各方法的優劣性與原因將被探討與總結,可知不同資料數據的幾何,確實需要嘗試不同公式與演算法來達到好的機器學習結果。 The study focuses on the computed data-geometry based learning to discover the inter-dependence patterns among covariate vectors. In order to discover the patterns and improve classification accuracy, the distance functions are modified to better capture the geometry patterns and measure the association between variables. A comparison of the performance of my proposed learning rule to the other machine learning techniques will be summarized through three datasets. In the end, I demonstrated why the concept of geometry patterns is essential. |
參考文獻 | Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine learning, 36(1-2),105 -139. Baldi, P., & Brunak, S. (2001). Bioinformatics: the machine learning approach. MIT press. Cortes, C.; Vapnik, V. (1995). Support-vector networks. Machine Learning 20 (3):273. doi:10.1007/BF00994018. Chou, E. P. (2015, July). Data Driven Geometry for Learning. In International Workshop on Machine Learning and Data Mining in Pattern Recognition (pp. 395 -402). Springer International Publishing. Chou, E. P., Hsieh, F., & Capitanio, J. (2013, December). Computed Data-Geometry Based Supervised and Semi-supervised Learning in High Dimensional Data. In Machine Learning and Applications (ICMLA), 2013 12th International Conference on (Vol. 1, pp. 277-282). Chang, Y. C. I. (2003). Boosting SVM classifiers with logistic regression. See www. stat. sinica. edu. tw/library/c_tec_rep/2003-03. pdf. Culp, M. (2011). spa: A Semi-Supervised R Package for Semi-Parametric Graph-Based Estimation. Journal of Statistical Software, 40(10), 1-29. Fushing, H., Wang, H., VanderWaal, K., McCowan, B., & Koehl, P. (2013). Multi -scale clustering by building a robust and self correcting ultrametric topology on data points. PloS one, 8(2), e56259. Grozavu, N., Bennani, Y., & Lebbah, M. (2009, June). From variable weighting to cluster characterization in topographic unsupervised learning. In Neural Networks, 2009. IJCNN 2009. International Joint Conference on (pp. 1005 -1010). IEEE. Hastie, T., Tibshirani, R., Friedman, J., & Franklin, J. (2005). The elements of statistical learning: data mining, inference and prediction. The Mathematical Intelligencer, 27(2). Tan, A. C., & Gilbert, D. (2003, January). An empirical comparison of supervised machine learning techniques in bioinformatics. In Proceedings of the First Asia -Pacific bioinformatics conference on Bioinformatics 2003-Volume 19 (pp. 219 -222). Australian Computer Society, Inc.. |
Description | 碩士 國立政治大學 統計學系 103354025 |
資料來源 | http://thesis.lib.nccu.edu.tw/record/#G0103354025 |
Type | thesis |
dc.contributor.advisor | 周珮婷 | zh_TW |
dc.contributor.advisor | Chou, Pei Ting | en_US |
dc.contributor.author (Authors) | 劉憲忠 | zh_TW |
dc.contributor.author (Authors) | Liu, Hsien Chung | en_US |
dc.creator (作者) | 劉憲忠 | zh_TW |
dc.creator (作者) | Liu, Hsien Chung | en_US |
dc.date (日期) | 2016 | en_US |
dc.date.accessioned | 11-Jul-2016 16:54:50 (UTC+8) | - |
dc.date.available | 11-Jul-2016 16:54:50 (UTC+8) | - |
dc.date.issued (上傳時間) | 11-Jul-2016 16:54:50 (UTC+8) | - |
dc.identifier (Other Identifiers) | G0103354025 | en_US |
dc.identifier.uri (URI) | http://nccur.lib.nccu.edu.tw/handle/140.119/98846 | - |
dc.description (描述) | 碩士 | zh_TW |
dc.description (描述) | 國立政治大學 | zh_TW |
dc.description (描述) | 統計學系 | zh_TW |
dc.description (描述) | 103354025 | zh_TW |
dc.description.abstract (摘要) | 本研究著重於數據的幾何模式以了解資料變數間的關係,運用統計模型配適所得的係數加權於距離矩陣上,是否能有效提升正確率。本研究主要使用資料雲幾何樹及餘弦相似度方法與抽樣多數決投票法判別預測資料類別,另外並與階層式分群法、支持向量機、Hybrid法於三筆不同資料的分類結果比較,其中有兩筆為生物行為評估專案資料與美國威斯康辛州診斷乳癌資料,使用監督式學習驗證資料分類結果,另一筆月亮模擬資料,使用半監督式學習預測新資料分類結果。最後,各方法的優劣性與原因將被探討與總結,可知不同資料數據的幾何,確實需要嘗試不同公式與演算法來達到好的機器學習結果。 | zh_TW |
dc.description.abstract (摘要) | The study focuses on the computed data-geometry based learning to discover the inter-dependence patterns among covariate vectors. In order to discover the patterns and improve classification accuracy, the distance functions are modified to better capture the geometry patterns and measure the association between variables. A comparison of the performance of my proposed learning rule to the other machine learning techniques will be summarized through three datasets. In the end, I demonstrated why the concept of geometry patterns is essential. | en_US |
dc.description.tableofcontents | 第一章 緒論 1 第一節 研究動機與目的 1 第二節 資料敘述 3 第二章 文獻探討 6 第三章 研究方法 8 第一節 演算法介紹 8 一、 資料雲幾何樹(Data Cloud Geometry Tree) 8 二、 支持向量機(support vector machine) 11 三、 Hybrid method 11 四、 階層式分群法(Hierarchical clustering) 12 五、 抽樣多數決投票法(Voting) 12 第二節 研究過程與方法 13 第四章 研究結果與討論 15 第一節 研究結果 15 第二節 研究討論與建議 17 參考文獻 21 | zh_TW |
dc.format.extent | 778690 bytes | - |
dc.format.mimetype | application/pdf | - |
dc.source.uri (資料來源) | http://thesis.lib.nccu.edu.tw/record/#G0103354025 | en_US |
dc.subject (關鍵詞) | 機器學習 | zh_TW |
dc.subject (關鍵詞) | 幾何模式 | zh_TW |
dc.subject (關鍵詞) | machine learning | en_US |
dc.subject (關鍵詞) | data-geometry | en_US |
dc.title (題名) | 數據幾何特徵的機器學習 | zh_TW |
dc.title (題名) | A study of Data Geometry-based Learning | en_US |
dc.type (資料類型) | thesis | en_US |
dc.relation.reference (參考文獻) | Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine learning, 36(1-2),105 -139. Baldi, P., & Brunak, S. (2001). Bioinformatics: the machine learning approach. MIT press. Cortes, C.; Vapnik, V. (1995). Support-vector networks. Machine Learning 20 (3):273. doi:10.1007/BF00994018. Chou, E. P. (2015, July). Data Driven Geometry for Learning. In International Workshop on Machine Learning and Data Mining in Pattern Recognition (pp. 395 -402). Springer International Publishing. Chou, E. P., Hsieh, F., & Capitanio, J. (2013, December). Computed Data-Geometry Based Supervised and Semi-supervised Learning in High Dimensional Data. In Machine Learning and Applications (ICMLA), 2013 12th International Conference on (Vol. 1, pp. 277-282). Chang, Y. C. I. (2003). Boosting SVM classifiers with logistic regression. See www. stat. sinica. edu. tw/library/c_tec_rep/2003-03. pdf. Culp, M. (2011). spa: A Semi-Supervised R Package for Semi-Parametric Graph-Based Estimation. Journal of Statistical Software, 40(10), 1-29. Fushing, H., Wang, H., VanderWaal, K., McCowan, B., & Koehl, P. (2013). Multi -scale clustering by building a robust and self correcting ultrametric topology on data points. PloS one, 8(2), e56259. Grozavu, N., Bennani, Y., & Lebbah, M. (2009, June). From variable weighting to cluster characterization in topographic unsupervised learning. In Neural Networks, 2009. IJCNN 2009. International Joint Conference on (pp. 1005 -1010). IEEE. Hastie, T., Tibshirani, R., Friedman, J., & Franklin, J. (2005). The elements of statistical learning: data mining, inference and prediction. The Mathematical Intelligencer, 27(2). Tan, A. C., & Gilbert, D. (2003, January). An empirical comparison of supervised machine learning techniques in bioinformatics. In Proceedings of the First Asia -Pacific bioinformatics conference on Bioinformatics 2003-Volume 19 (pp. 219 -222). Australian Computer Society, Inc.. | zh_TW |