數據驅動的幾何學習 | NCCU Academic Hub

Publications-NSC Projects

Article View/Open

pdf(237)

Publication Export

Google Scholar^TM

NCCU Library

Discovery System

Citation Infomation

No doi shows Citation Infomation

Related Publications in TAIR

Simple Record
Full Record

題名	數據驅動的幾何學習
作者	周珮婷
貢獻者	統計學系
關鍵詞	距離;數據雲幾何;機器學習 Distance; DCG tree; Machine Learning
日期	2014
上傳時間	25-Dec-2017 15:18:14 (UTC+8)
摘要	高維度變量提供機器學習和分類問題詳細的資料訊息。這些共變數之間的關係對研究人員是未知的。在古典與現代的機器學習文獻中，這問題較少被討論;大多數流行的算法為使用一些降維的方法，甚至強加一個內置的複雜性懲罰。這是一種對高維資料浪費的態度。相反的，我們應該可以利用這種高維變數間潛在的相互關係，而不是任意降維。在本研究中，我們利用上述所提到的概念，首先計算數據點之間的相似性，利用等距演化樹(Ultrametric tree)，從所有相關的共變數，得到數據幾何形式模式的信息。然後，我們利用這些模式去建立監督和半監督式的學習。這種計算方法主要是基於一個新的聚類方法，數據雲幾何（DCG），它是一種非監督式學習。我們的數據驅動的學習方法是集中在如何找出適當的距離來表示數據的幾何關係，以促進有效率的找到整體特徵矩陣作為學習的中心問題。 High dimensional covariate information provides a detailed description of any individuals involved in a machine learning and classification problem. The inter-dependence patterns among these covariate vectors may be unknown to researchers. This fact is not well recognized in classic and modern machine learning literature; most model-based popular algorithms are implemented using some version of the dimension-reduction approach or even impose a built-in complexity penalty. This is a defensive attitude toward the high dimensionality. In contrast, an accommodating attitude can exploit such potential inter-dependence patterns embedded within the high dimensionality. In this research project, we implement this latter attitude throughout by first computing the similarity between data nodes and then discovering pattern information in the form of Ultrametric tree geometry among almost all the covariate dimensions involved. We then make use of these patterns to build supervised and semi-supervised learning algorithms. The computations for such discovery are primarily based on the new clustering technique, Data Cloud Geometry (DCG), a non-supervised learning algorithm. Our data-driven learning approach is focused on the central issue of how to adaptively evolve a simple empirical distance into an effective one in order to facilitate an efficient global feature-matrix for learning purposes.
關聯	執行起迄：2014/10/01~2015/07/31 103-2118-M-004-006
資料類型	report

dc.contributor	統計學系	zh_Tw
dc.creator (作者)	周珮婷	zh_TW
dc.date (日期)	2014	en_US
dc.date.accessioned	25-Dec-2017 15:18:14 (UTC+8)	-
dc.date.available	25-Dec-2017 15:18:14 (UTC+8)	-
dc.date.issued (上傳時間)	25-Dec-2017 15:18:14 (UTC+8)	-
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/115382	-
dc.description.abstract (摘要)	高維度變量提供機器學習和分類問題詳細的資料訊息。這些共變數之間的關係對研究人員是未知的。在古典與現代的機器學習文獻中，這問題較少被討論;大多數流行的算法為使用一些降維的方法，甚至強加一個內置的複雜性懲罰。這是一種對高維資料浪費的態度。相反的，我們應該可以利用這種高維變數間潛在的相互關係，而不是任意降維。在本研究中，我們利用上述所提到的概念，首先計算數據點之間的相似性，利用等距演化樹(Ultrametric tree)，從所有相關的共變數，得到數據幾何形式模式的信息。然後，我們利用這些模式去建立監督和半監督式的學習。這種計算方法主要是基於一個新的聚類方法，數據雲幾何（DCG），它是一種非監督式學習。我們的數據驅動的學習方法是集中在如何找出適當的距離來表示數據的幾何關係，以促進有效率的找到整體特徵矩陣作為學習的中心問題。	zh_TW
dc.description.abstract (摘要)	High dimensional covariate information provides a detailed description of any individuals involved in a machine learning and classification problem. The inter-dependence patterns among these covariate vectors may be unknown to researchers. This fact is not well recognized in classic and modern machine learning literature; most model-based popular algorithms are implemented using some version of the dimension-reduction approach or even impose a built-in complexity penalty. This is a defensive attitude toward the high dimensionality. In contrast, an accommodating attitude can exploit such potential inter-dependence patterns embedded within the high dimensionality. In this research project, we implement this latter attitude throughout by first computing the similarity between data nodes and then discovering pattern information in the form of Ultrametric tree geometry among almost all the covariate dimensions involved. We then make use of these patterns to build supervised and semi-supervised learning algorithms. The computations for such discovery are primarily based on the new clustering technique, Data Cloud Geometry (DCG), a non-supervised learning algorithm. Our data-driven learning approach is focused on the central issue of how to adaptively evolve a simple empirical distance into an effective one in order to facilitate an efficient global feature-matrix for learning purposes.	en_US
dc.format.extent	1290251 bytes	-
dc.format.mimetype	application/pdf	-
dc.relation (關聯)	執行起迄：2014/10/01~2015/07/31	zh_TW
dc.relation (關聯)	103-2118-M-004-006	zh_TW
dc.subject (關鍵詞)	距離;數據雲幾何;機器學習	zh_TW
dc.subject (關鍵詞)	Distance; DCG tree; Machine Learning	en_US
dc.title (題名)	數據驅動的幾何學習	_TW
dc.type (資料類型)	report