Please use this identifier to cite or link to this item: https://ah.lib.nccu.edu.tw/handle/140.119/66311
題名: 高維度資料特徵選取之探討–應用於分類蛋白質質譜儀資料
其他題名: On Feature Selection of High Dimensional Data - Application on Classifying Proteomic Spectra Data
作者: 郭訓志; 黃仁澤 ; 薛慧敏
Kuo, Hsun-Chih ; Hunag, Jen-Tse ; Hsueh,Huey-Miin
貢獻者: 統計系
關鍵詞: 特徵選取,蛋白質質譜儀資料,支援向量機,交叉驗證
日期: 2011
上傳時間: 27-May-2014
摘要: 一般健檢的腫瘤指標的靈敏度和特異性皆不高,也無法偵測較小的腫瘤,因此通常無法及早診斷出腫瘤。本研究的資料為應用蛋白質晶片與表面強化雷射解吸電離飛行質譜技術(SELDI)的血清蛋白質質譜資料,血清樣本來自健康的正常人以及三組不同時期的攝護腺癌症病人。研究目的在選取有助於區分不同時期攝護腺癌症的蛋白質特徵,利用重複隨機抽樣的交叉驗證和支援向量機(Support Vector Machine),先以t 檢定的平均p值、Kruskal-Wallis 檢定的平均p值、或平均分錯率對於所有蛋白質特徵進行排序,再利用向前選取方式找出最小分錯率模型之特徵變數。為了精簡模型,本研究同時考慮佐以相關係數與判定係數萃取後的特徵變數之分類結果。在各個方法比較上,使用Kruskal-Wallis檢定之最小p值特徵選取法的分類效果較好,而輔助的萃取方法以最大相關係數萃取法最能有效縮減特徵個數,同時又保持分類效果。
Often the time the tumor marker of regular health evaluation is low in sensitivity and specificity so that it could not detect tumor of small size in time. This research aims to develop a classification tool for early diagnosis of tumor by studying proteomic mass spectra of prostate cancer data at different stages. The prostate cancer data studied are the Surface-Enhanced Laser Desorption/Ionization Time-of-Flight Mass Spectrometry (SELDI-TOF-MS) generated from 327 serum samples. Of the 327 serum samples, 81 are from unaffected healthy men (HM), 78 are from patients diagnosed with benign prostatic hyperplasia (BPH), 84 are from patients with organ-confined PCA (T1/T2), and 84 are from patients with non-organ-confined PCA (T3/T4). The goal of this research is to select features (peaks) of the mass spectra that are useful for classifying different stages of prostate cancer via repeated random subsampling cross-validation. The forward minimum-p_value method (derived from t test or Kruskal-Wallis test) and the forward minimum-classification-error method incorporated with SVM are proposed in this study. In addition, maximum-correlation method and maximum-R2 method are considered for further feature selection. In comparison, the forward minimum-p_value method derived from Kruskal-Wallis test often outperforms other methods in terms of classification rate. Moreover, the maximum-correlation method not only can reduce the number of features effectively but also can preserve the classification rate at the same time.
關聯: Journal of Data Analysis, 6(3), 67-80
資料類型: article
Appears in Collections:期刊論文

Files in This Item:
File Description SizeFormat
72-83.pdf1.08 MBAdobe PDF2View/Open
Show full item record

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.