數據幾何特徵的機器學習 | Publication

Publications-Theses

Article View/Open

pdf(468)

Publication Export

Google Scholar^TM

Title	數據幾何特徵的機器學習 A study of Data Geometry-based Learning
Creator	劉憲忠 Liu, Hsien Chung
Contributor	周珮婷 Chou, Pei Ting 劉憲忠 Liu, Hsien Chung
Key Words	機器學習幾何模式 machine learning data-geometry
Date	2016
Date Issued	11-Jul-2016 16:54:50 (UTC+8)
Summary	本研究著重於數據的幾何模式以了解資料變數間的關係，運用統計模型配適所得的係數加權於距離矩陣上，是否能有效提升正確率。本研究主要使用資料雲幾何樹及餘弦相似度方法與抽樣多數決投票法判別預測資料類別，另外並與階層式分群法、支持向量機、Hybrid法於三筆不同資料的分類結果比較，其中有兩筆為生物行為評估專案資料與美國威斯康辛州診斷乳癌資料，使用監督式學習驗證資料分類結果，另一筆月亮模擬資料，使用半監督式學習預測新資料分類結果。最後，各方法的優劣性與原因將被探討與總結，可知不同資料數據的幾何，確實需要嘗試不同公式與演算法來達到好的機器學習結果。 The study focuses on the computed data-geometry based learning to discover the inter-dependence patterns among covariate vectors. In order to discover the patterns and improve classification accuracy, the distance functions are modified to better capture the geometry patterns and measure the association between variables. A comparison of the performance of my proposed learning rule to the other machine learning techniques will be summarized through three datasets. In the end, I demonstrated why the concept of geometry patterns is essential.
參考文獻	Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine learning, 36(1-2),105 -139. Baldi, P., & Brunak, S. (2001). Bioinformatics: the machine learning approach. MIT press. Cortes, C.; Vapnik, V. (1995). Support-vector networks. Machine Learning 20 (3):273. doi:10.1007/BF00994018. Chou, E. P. (2015, July). Data Driven Geometry for Learning. In International Workshop on Machine Learning and Data Mining in Pattern Recognition (pp. 395 -402). Springer International Publishing. Chou, E. P., Hsieh, F., & Capitanio, J. (2013, December). Computed Data-Geometry Based Supervised and Semi-supervised Learning in High Dimensional Data. In Machine Learning and Applications (ICMLA), 2013 12th International Conference on (Vol. 1, pp. 277-282). Chang, Y. C. I. (2003). Boosting SVM classifiers with logistic regression. See www. stat. sinica. edu. tw/library/c_tec_rep/2003-03. pdf. Culp, M. (2011). spa: A Semi-Supervised R Package for Semi-Parametric Graph-Based Estimation. Journal of Statistical Software, 40(10), 1-29. Fushing, H., Wang, H., VanderWaal, K., McCowan, B., & Koehl, P. (2013). Multi -scale clustering by building a robust and self correcting ultrametric topology on data points. PloS one, 8(2), e56259. Grozavu, N., Bennani, Y., & Lebbah, M. (2009, June). From variable weighting to cluster characterization in topographic unsupervised learning. In Neural Networks, 2009. IJCNN 2009. International Joint Conference on (pp. 1005 -1010). IEEE. Hastie, T., Tibshirani, R., Friedman, J., & Franklin, J. (2005). The elements of statistical learning: data mining, inference and prediction. The Mathematical Intelligencer, 27(2). Tan, A. C., & Gilbert, D. (2003, January). An empirical comparison of supervised machine learning techniques in bioinformatics. In Proceedings of the First Asia -Pacific bioinformatics conference on Bioinformatics 2003-Volume 19 (pp. 219 -222). Australian Computer Society, Inc..
Description	碩士國立政治大學統計學系 103354025
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0103354025
Type	thesis

dc.contributor.advisor	周珮婷	zh_TW
dc.contributor.advisor	Chou, Pei Ting	en_US
dc.contributor.author (Authors)	劉憲忠	zh_TW
dc.contributor.author (Authors)	Liu, Hsien Chung	en_US
dc.creator (作者)	劉憲忠	zh_TW
dc.creator (作者)	Liu, Hsien Chung	en_US
dc.date (日期)	2016	en_US
dc.date.accessioned	11-Jul-2016 16:54:50 (UTC+8)	-
dc.date.available	11-Jul-2016 16:54:50 (UTC+8)	-
dc.date.issued (上傳時間)	11-Jul-2016 16:54:50 (UTC+8)	-
dc.identifier (Other Identifiers)	G0103354025	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/98846	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	統計學系	zh_TW
dc.description (描述)	103354025	zh_TW
dc.description.abstract (摘要)	本研究著重於數據的幾何模式以了解資料變數間的關係，運用統計模型配適所得的係數加權於距離矩陣上，是否能有效提升正確率。本研究主要使用資料雲幾何樹及餘弦相似度方法與抽樣多數決投票法判別預測資料類別，另外並與階層式分群法、支持向量機、Hybrid法於三筆不同資料的分類結果比較，其中有兩筆為生物行為評估專案資料與美國威斯康辛州診斷乳癌資料，使用監督式學習驗證資料分類結果，另一筆月亮模擬資料，使用半監督式學習預測新資料分類結果。最後，各方法的優劣性與原因將被探討與總結，可知不同資料數據的幾何，確實需要嘗試不同公式與演算法來達到好的機器學習結果。	zh_TW
dc.description.abstract (摘要)	The study focuses on the computed data-geometry based learning to discover the inter-dependence patterns among covariate vectors. In order to discover the patterns and improve classification accuracy, the distance functions are modified to better capture the geometry patterns and measure the association between variables. A comparison of the performance of my proposed learning rule to the other machine learning techniques will be summarized through three datasets. In the end, I demonstrated why the concept of geometry patterns is essential.	en_US
dc.description.tableofcontents	第一章緒論 1 第一節研究動機與目的 1 第二節資料敘述 3 第二章文獻探討 6 第三章研究方法 8 第一節演算法介紹 8 一、資料雲幾何樹(Data Cloud Geometry Tree) 8 二、支持向量機(support vector machine) 11 三、 Hybrid method 11 四、階層式分群法(Hierarchical clustering) 12 五、抽樣多數決投票法(Voting) 12 第二節研究過程與方法 13 第四章研究結果與討論 15 第一節研究結果 15 第二節研究討論與建議 17 參考文獻 21	zh_TW
dc.format.extent	778690 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0103354025	en_US
dc.subject (關鍵詞)	機器學習	zh_TW
dc.subject (關鍵詞)	幾何模式	zh_TW
dc.subject (關鍵詞)	machine learning	en_US
dc.subject (關鍵詞)	data-geometry	en_US
dc.title (題名)	數據幾何特徵的機器學習	zh_TW
dc.title (題名)	A study of Data Geometry-based Learning	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine learning, 36(1-2),105 -139. Baldi, P., & Brunak, S. (2001). Bioinformatics: the machine learning approach. MIT press. Cortes, C.; Vapnik, V. (1995). Support-vector networks. Machine Learning 20 (3):273. doi:10.1007/BF00994018. Chou, E. P. (2015, July). Data Driven Geometry for Learning. In International Workshop on Machine Learning and Data Mining in Pattern Recognition (pp. 395 -402). Springer International Publishing. Chou, E. P., Hsieh, F., & Capitanio, J. (2013, December). Computed Data-Geometry Based Supervised and Semi-supervised Learning in High Dimensional Data. In Machine Learning and Applications (ICMLA), 2013 12th International Conference on (Vol. 1, pp. 277-282). Chang, Y. C. I. (2003). Boosting SVM classifiers with logistic regression. See www. stat. sinica. edu. tw/library/c_tec_rep/2003-03. pdf. Culp, M. (2011). spa: A Semi-Supervised R Package for Semi-Parametric Graph-Based Estimation. Journal of Statistical Software, 40(10), 1-29. Fushing, H., Wang, H., VanderWaal, K., McCowan, B., & Koehl, P. (2013). Multi -scale clustering by building a robust and self correcting ultrametric topology on data points. PloS one, 8(2), e56259. Grozavu, N., Bennani, Y., & Lebbah, M. (2009, June). From variable weighting to cluster characterization in topographic unsupervised learning. In Neural Networks, 2009. IJCNN 2009. International Joint Conference on (pp. 1005 -1010). IEEE. Hastie, T., Tibshirani, R., Friedman, J., & Franklin, J. (2005). The elements of statistical learning: data mining, inference and prediction. The Mathematical Intelligencer, 27(2). Tan, A. C., & Gilbert, D. (2003, January). An empirical comparison of supervised machine learning techniques in bioinformatics. In Proceedings of the First Asia -Pacific bioinformatics conference on Bioinformatics 2003-Volume 19 (pp. 219 -222). Australian Computer Society, Inc..	zh_TW

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM