Multidimensional scaling for large genomic data sets | Publication | NCCU Academic Hub

Publications-Periodical Articles

Article View/Open

html(1547)

Publication Export

Google Scholar^TM

NCCU Library

Discovery System

Citation Infomation

No doi shows Citation Infomation

Related Publications in TAIR

Simple Record
Full Record

題名	Multidimensional scaling for large genomic data sets
作者	曾正男 Tzeng,Jengnan 盧鴻興 Lu,Henry Horng-Shing 李文雄 Li,Wen-Hsiung
貢獻者	應數系
日期	2008.04
上傳時間	7-Aug-2014 11:35:39 (UTC+8)
摘要	Background: Multi-dimensional scaling (MDS) is aimed to represent high dimensional data in a low dimensional space with preservation of the similarities between data points. This reduction in dimensionality is crucial for analyzing and revealing the genuine structure hidden in the data. For noisy data, dimension reduction can effectively reduce the effect of noise on the embedded structure. For large data set, dimension reduction can effectively reduce information retrieval complexity. Thus, MDS techniques are used in many applications of data mining and gene network research. However, although there have been a number of studies that applied MDS techniques to genomics research, the number of analyzed data points was restricted by the high computational complexity of MDS. In general, a non-metric MDS method is faster than a metric MDS, but it does not preserve the true relationships. The computational complexity of most metric MDS methods is over O(N2), so that it is difficult to process a data set of a large number of genes N, such as in the case of whole genome microarray data. Results:We developed a new rapid metric MDS method with a low computational complexity, making metric MDS applicable for large data sets. Computer simulation showed that the new method of split-and-combine MDS (SC-MDS) is fast, accurate and efficient. Our empirical studies using microarray data on the yeast cell cycle showed that the performance of K-means in the reduced dimensional space is similar to or slightly better than that of K-means in the original space, but about three times faster to obtain the clustering results. Our clustering results using SC-MDS are more stable than those in the original space. Hence, the proposed SC-MDS is useful for analyzing whole genome data. Conclusion:Our new method reduces the computational complexity from O(N3) to O(N) when the dimension of the feature space is far less than the number of genes N, and it successfully reconstructs the low dimensional representation as does the classical MDS. Its performance depends on the grouping method and the minimal number of the intersection points between groups. Feasible methods for grouping methods are suggested; each group must contain both neighboring and far apart data points. Our method can represent high dimensional large data set in a low dimensional space not only efficiently but also effectively.
關聯	BMC Bioinformatics,9(179)
資料類型	article

dc.contributor	應數系	en_US
dc.creator (作者)	曾正男	zh_TW
dc.creator (作者)	Tzeng,Jengnan	en_US
dc.creator (作者)	盧鴻興	zh_TW
dc.creator (作者)	Lu,Henry Horng-Shing	en_US
dc.creator (作者)	李文雄	zh_TW
dc.creator (作者)	Li,Wen-Hsiung	en_US
dc.date (日期)	2008.04	en_US
dc.date.accessioned	7-Aug-2014 11:35:39 (UTC+8)	-
dc.date.available	7-Aug-2014 11:35:39 (UTC+8)	-
dc.date.issued (上傳時間)	7-Aug-2014 11:35:39 (UTC+8)	-
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/68419	-
dc.description.abstract (摘要)	Background: Multi-dimensional scaling (MDS) is aimed to represent high dimensional data in a low dimensional space with preservation of the similarities between data points. This reduction in dimensionality is crucial for analyzing and revealing the genuine structure hidden in the data. For noisy data, dimension reduction can effectively reduce the effect of noise on the embedded structure. For large data set, dimension reduction can effectively reduce information retrieval complexity. Thus, MDS techniques are used in many applications of data mining and gene network research. However, although there have been a number of studies that applied MDS techniques to genomics research, the number of analyzed data points was restricted by the high computational complexity of MDS. In general, a non-metric MDS method is faster than a metric MDS, but it does not preserve the true relationships. The computational complexity of most metric MDS methods is over O(N2), so that it is difficult to process a data set of a large number of genes N, such as in the case of whole genome microarray data. Results:We developed a new rapid metric MDS method with a low computational complexity, making metric MDS applicable for large data sets. Computer simulation showed that the new method of split-and-combine MDS (SC-MDS) is fast, accurate and efficient. Our empirical studies using microarray data on the yeast cell cycle showed that the performance of K-means in the reduced dimensional space is similar to or slightly better than that of K-means in the original space, but about three times faster to obtain the clustering results. Our clustering results using SC-MDS are more stable than those in the original space. Hence, the proposed SC-MDS is useful for analyzing whole genome data. Conclusion:Our new method reduces the computational complexity from O(N3) to O(N) when the dimension of the feature space is far less than the number of genes N, and it successfully reconstructs the low dimensional representation as does the classical MDS. Its performance depends on the grouping method and the minimal number of the intersection points between groups. Feasible methods for grouping methods are suggested; each group must contain both neighboring and far apart data points. Our method can represent high dimensional large data set in a low dimensional space not only efficiently but also effectively.	en_US
dc.format.extent	108 bytes	-
dc.format.mimetype	text/html	-
dc.language.iso	en_US	-
dc.relation (關聯)	BMC Bioinformatics,9(179)	en_US
dc.title (題名)	Multidimensional scaling for large genomic data sets	en_US
dc.type (資料類型)	article	en