Publications-Periodical Articles

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 A parsimonious threshold-independent protein feature selection method through the area under receiver operating characteristic curve
作者 Wang,Zhanfeng;Chang,Yuan-chin;Ying,Zhiliang;Liang,Zhu;Yang,Yaning
貢獻者 統計系
日期 2007-09
上傳時間 23-Dec-2014 15:20:05 (UTC+8)
摘要 Motivation: Protein expression profiling for differences indicative of early cancer holds promise for improving diagnostics. Due to their high dimensionality, statistical analysis of proteomic data from mass spectrometers is challenging in many aspects such as dimension reduction, feature subset selection as well as construction of classification rules. Search of an optimal feature subset, commonly known as the feature subset selection (FSS) problem, is an important step towards disease classification/diagnostics with biomarkers.Methods: We develop a parsimonious threshold-independent feature selection (PTIFS) method based on the concept of area under the curve (AUC) of the receiver operating characteristic (ROC). To reduce computational complexity to a manageable level, we use a sigmoid approximation to the empirical AUC as the criterion function. Starting from an anchor feature, the PTIFS method selects a feature subset through an iterative updating algorithm. Highly correlated features that have similar discriminating power are precluded from being selected simultaneously. The classification rule is then determined from the resulting feature subset.Results: The performance of the proposed approach is investigated by extensive simulation studies, and by applying the method to two mass spectrometry data sets of prostate cancer and of liver cancer. We compare the new approach with the threshold gradient descent regularization (TGDR) method. The results show that our method can achieve comparable performance to that of the TGDR method in terms of disease classification, but with fewer features selected.
關聯 Bioinformatics,23(20),2788-2794
資料類型 article
dc.contributor 統計系en_US
dc.creator (作者) Wang,Zhanfeng;Chang,Yuan-chin;Ying,Zhiliang;Liang,Zhu;Yang,Yaningen_US
dc.date (日期) 2007-09en_US
dc.date.accessioned 23-Dec-2014 15:20:05 (UTC+8)-
dc.date.available 23-Dec-2014 15:20:05 (UTC+8)-
dc.date.issued (上傳時間) 23-Dec-2014 15:20:05 (UTC+8)-
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/72230-
dc.description.abstract (摘要) Motivation: Protein expression profiling for differences indicative of early cancer holds promise for improving diagnostics. Due to their high dimensionality, statistical analysis of proteomic data from mass spectrometers is challenging in many aspects such as dimension reduction, feature subset selection as well as construction of classification rules. Search of an optimal feature subset, commonly known as the feature subset selection (FSS) problem, is an important step towards disease classification/diagnostics with biomarkers.Methods: We develop a parsimonious threshold-independent feature selection (PTIFS) method based on the concept of area under the curve (AUC) of the receiver operating characteristic (ROC). To reduce computational complexity to a manageable level, we use a sigmoid approximation to the empirical AUC as the criterion function. Starting from an anchor feature, the PTIFS method selects a feature subset through an iterative updating algorithm. Highly correlated features that have similar discriminating power are precluded from being selected simultaneously. The classification rule is then determined from the resulting feature subset.Results: The performance of the proposed approach is investigated by extensive simulation studies, and by applying the method to two mass spectrometry data sets of prostate cancer and of liver cancer. We compare the new approach with the threshold gradient descent regularization (TGDR) method. The results show that our method can achieve comparable performance to that of the TGDR method in terms of disease classification, but with fewer features selected.en_US
dc.format.extent 128 bytes-
dc.format.mimetype text/html-
dc.language.iso en_US-
dc.relation (關聯) Bioinformatics,23(20),2788-2794en_US
dc.title (題名) A parsimonious threshold-independent protein feature selection method through the area under receiver operating characteristic curveen_US
dc.type (資料類型) articleen