學術產出-國科會研究計畫

文章檢視/開啟

書目匯出

Google ScholarTM

政大圖書館

引文資訊

TAIR相關學術產出

題名 數據驅動的幾何學習
作者 周珮婷
貢獻者 統計學系
關鍵詞 距離;數據雲幾何;機器學習
Distance; DCG tree; Machine Learning
日期 2014
上傳時間 25-十二月-2017 15:18:14 (UTC+8)
摘要 高維度變量提供機器學習和分類問題詳細的資料訊息。這些共變數之間的關係對研究人員是未知的。在古典與現代的機器學習文獻中,這問題較少被討論;大多數流行的算法為使用一些降維的方法,甚至強加一個內置的複雜性懲罰。這是一種對高維資料浪費的態度。相反的,我們應該可以利用這種高維變數間潛在的相互關係,而不是任意降維。在本研究中,我們利用上述所提到的概念,首先計算數據點之間的相似性,利用等距演化樹(Ultrametric tree),從所有相關的共變數,得到數據幾何形式模式的信息。然後,我們利用這些模式去建立監督和半監督式的學習。這種計算方法主要是基於一個新的聚類方法,數據雲幾何(DCG),它是一種非監督式學習。我們的數據驅動的學習方法是集中在如何找出適當的距離來表示數據的幾何關係,以促進有效率的找到整體特徵矩陣作為學習的中心問題。
High dimensional covariate information provides a detailed description of any individuals involved in a machine learning and classification problem. The inter-dependence patterns among these covariate vectors may be unknown to researchers. This fact is not well recognized in classic and modern machine learning literature; most model-based popular algorithms are implemented using some version of the dimension-reduction approach or even impose a built-in complexity penalty. This is a defensive attitude toward the high dimensionality. In contrast, an accommodating attitude can exploit such potential inter-dependence patterns embedded within the high dimensionality. In this research project, we implement this latter attitude throughout by first computing the similarity between data nodes and then discovering pattern information in the form of Ultrametric tree geometry among almost all the covariate dimensions involved. We then make use of these patterns to build supervised and semi-supervised learning algorithms. The computations for such discovery are primarily based on the new clustering technique, Data Cloud Geometry (DCG), a non-supervised learning algorithm. Our data-driven learning approach is focused on the central issue of how to adaptively evolve a simple empirical distance into an effective one in order to facilitate an efficient global feature-matrix for learning purposes.
關聯 執行起迄:2014/10/01~2015/07/31
103-2118-M-004-006
資料類型 report
dc.contributor 統計學系zh_Tw
dc.creator (作者) 周珮婷zh_TW
dc.date (日期) 2014en_US
dc.date.accessioned 25-十二月-2017 15:18:14 (UTC+8)-
dc.date.available 25-十二月-2017 15:18:14 (UTC+8)-
dc.date.issued (上傳時間) 25-十二月-2017 15:18:14 (UTC+8)-
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/115382-
dc.description.abstract (摘要) 高維度變量提供機器學習和分類問題詳細的資料訊息。這些共變數之間的關係對研究人員是未知的。在古典與現代的機器學習文獻中,這問題較少被討論;大多數流行的算法為使用一些降維的方法,甚至強加一個內置的複雜性懲罰。這是一種對高維資料浪費的態度。相反的,我們應該可以利用這種高維變數間潛在的相互關係,而不是任意降維。在本研究中,我們利用上述所提到的概念,首先計算數據點之間的相似性,利用等距演化樹(Ultrametric tree),從所有相關的共變數,得到數據幾何形式模式的信息。然後,我們利用這些模式去建立監督和半監督式的學習。這種計算方法主要是基於一個新的聚類方法,數據雲幾何(DCG),它是一種非監督式學習。我們的數據驅動的學習方法是集中在如何找出適當的距離來表示數據的幾何關係,以促進有效率的找到整體特徵矩陣作為學習的中心問題。zh_TW
dc.description.abstract (摘要) High dimensional covariate information provides a detailed description of any individuals involved in a machine learning and classification problem. The inter-dependence patterns among these covariate vectors may be unknown to researchers. This fact is not well recognized in classic and modern machine learning literature; most model-based popular algorithms are implemented using some version of the dimension-reduction approach or even impose a built-in complexity penalty. This is a defensive attitude toward the high dimensionality. In contrast, an accommodating attitude can exploit such potential inter-dependence patterns embedded within the high dimensionality. In this research project, we implement this latter attitude throughout by first computing the similarity between data nodes and then discovering pattern information in the form of Ultrametric tree geometry among almost all the covariate dimensions involved. We then make use of these patterns to build supervised and semi-supervised learning algorithms. The computations for such discovery are primarily based on the new clustering technique, Data Cloud Geometry (DCG), a non-supervised learning algorithm. Our data-driven learning approach is focused on the central issue of how to adaptively evolve a simple empirical distance into an effective one in order to facilitate an efficient global feature-matrix for learning purposes.en_US
dc.format.extent 1290251 bytes-
dc.format.mimetype application/pdf-
dc.relation (關聯) 執行起迄:2014/10/01~2015/07/31zh_TW
dc.relation (關聯) 103-2118-M-004-006zh_TW
dc.subject (關鍵詞) 距離;數據雲幾何;機器學習zh_TW
dc.subject (關鍵詞) Distance; DCG tree; Machine Learningen_US
dc.title (題名) 數據驅動的幾何學習_TW
dc.type (資料類型) report