dc.contributor.advisor | 郭訓志 | zh_TW |
dc.contributor.author (作者) | 陳詩佳 | zh_TW |
dc.creator (作者) | 陳詩佳 | zh_TW |
dc.date (日期) | 2006 | en_US |
dc.date.accessioned | 2009-09-14 | - |
dc.date.available | 2009-09-14 | - |
dc.date.issued (上傳時間) | 2009-09-14 | - |
dc.identifier (其他 識別碼) | G0094354014 | en_US |
dc.identifier.uri (URI) | https://nccur.lib.nccu.edu.tw/handle/140.119/30917 | - |
dc.description (描述) | 碩士 | zh_TW |
dc.description (描述) | 國立政治大學 | zh_TW |
dc.description (描述) | 統計研究所 | zh_TW |
dc.description (描述) | 94354014 | zh_TW |
dc.description (描述) | 95 | zh_TW |
dc.description.abstract (摘要) | 癌症高居國人十大死因之首,由於癌症初期病患接受適時治療的存活率較高,因此若能「早期發現,早期診斷,早期治療」則可降低死亡率。本研究主要針對「表面強化雷射解析電離飛行質譜技術」(Surface-Enhanced Laser Desorption / Ionization Time-of-Flight Mass Spectrometry,SELDI-TOF-MS)所蒐集而來的攝護腺癌症蛋白質質譜之事前處理資料進行分析。目的是希望藉由Meta-Learning的方式結合分類器,並以逐步特徵選取之,期望以較少且具代表的特徵變數將資料分類,以達到較高的正確率。本文利用正確率決定逐步特徵選取時變數加入的順序,並進一步以Elastic Net與判定係數作為特徵變數排序依據,以改善變數間共線性高的問題。並且考慮投票法(多數表決法與權重投票法)以及串聯法(cascading):多個分類器串聯與單一分類器串聯。研究發現,以判定係數刪選特徵變數加入的先後順序並以支持向量機(Support Vector Machine,SVM)串聯的特徵選取結果在各分類下皆有良好表現,為較佳的特徵選取方式。 關鍵字:特徵選取、串聯法、蛋白質質譜、meta-learning、支持向量機 | zh_TW |
dc.description.tableofcontents | 第壹章 緒論 4 第一節 研究背景 4 第二節 研究動機與目的 6 第三節 研究架構 6 第貳章 蛋白質質譜資料 8 第一節 表面強化雷射解析電離飛行質譜技術 8 第二節 攝護腺癌症蛋白質質譜資料 9 第三節 蛋白質質譜資料之探討 11 第參章 文獻探討 12 第肆章 研究方法 15 第一節 分類器的介紹 16 4.1.1 LDA 16 4.1.2 KNN 18 4.1.3 SVM 21 第二節 結合多個分類器之特徵選取 25 4.2.1 Stacking 26 4.2.2 Cascading 28 第三節 特徵選取 30 第伍章 實證分析 31 第一節 投票法 33 5.1.1 多數表決法 33 5.1.2 權重投票法 36 第二節 CASCADING 37 5.2.1 多個分類器之串聯 38 5.2.2 單一分類器之串聯 42 第三節 特徵選取之改良 45 5.3.1 Elastic Net + 單一分類器之串聯 46 5.3.3 判定係數粹取法 49 第陸章 結論與建議 52 參考文獻 54 附 錄 59 | zh_TW |
dc.language.iso | en_US | - |
dc.source.uri (資料來源) | http://thesis.lib.nccu.edu.tw/record/#G0094354014 | en_US |
dc.subject (關鍵詞) | 特徵選取 | zh_TW |
dc.subject (關鍵詞) | 串聯法 | zh_TW |
dc.subject (關鍵詞) | 蛋白質質譜 | zh_TW |
dc.subject (關鍵詞) | 支持向量機 | zh_TW |
dc.title (題名) | 使用Meta-Learning在蛋白質質譜資料特徵選取之探討 | zh_TW |
dc.title (題名) | Feature Selection via Meta-Learning on Proteomic Mass Spectrum Data | en_US |
dc.type (資料類型) | thesis | en |
dc.relation.reference (參考文獻) | 牛頓雜誌編輯部,「孜孜不倦地實驗,也會找到新發現;訪問日本島津製 | zh_TW |
dc.relation.reference (參考文獻) | 作所田中耕一研究員」,牛頓雜誌國際中文版第235期,2003年3月號。 | zh_TW |
dc.relation.reference (參考文獻) | 牛頓雜誌編輯部,「我的新挑戰!訪問日本島津製作所田中耕一紀念質量 | zh_TW |
dc.relation.reference (參考文獻) | 分析研究所」,牛頓雜誌國際中文版第242期,2003年10月號。 | zh_TW |
dc.relation.reference (參考文獻) | 行政院衛生署,「中華民國九十四年台灣地區死因統計結果摘要」。 | zh_TW |
dc.relation.reference (參考文獻) | 網址:http://www.doh.gov.tw/statistic/data/死因摘要/94年/94.htm | zh_TW |
dc.relation.reference (參考文獻) | 行政院衛生署,國民健康局,「94年度衛生教育宣導主軸-癌症預防」。 | zh_TW |
dc.relation.reference (參考文獻) | 網址:http://www.bhp.doh.gov.tw/BHP/index.jsp | zh_TW |
dc.relation.reference (參考文獻) | 行政院衛生署,「中華民國九十四年臺灣地區主要癌症死亡原因」。 | zh_TW |
dc.relation.reference (參考文獻) | 網址:http://www.doh.gov.tw/statistic/data/死因摘要/94年/表8.xls | zh_TW |
dc.relation.reference (參考文獻) | 全國癌症病友服務中心,「攝護腺癌(90.02.01)衛教手冊之十八」。 | zh_TW |
dc.relation.reference (參考文獻) | 網址:http://www2.cch.org.tw/OURHOME/booklet/booklet18.htm | zh_TW |
dc.relation.reference (參考文獻) | 徐竣建,「重疊法應用於蛋白質質譜資料」,國立政治大學統計系研究所碩士論文,2006年,指導教授:余清祥博士。 | zh_TW |
dc.relation.reference (參考文獻) | 國泰綜合醫院,癌症資訊網,「攝護腺癌症簡介」。 | zh_TW |
dc.relation.reference (參考文獻) | 網址:http://www1.cgh.org.tw/content/healthy/cancerx/newpage19.htm | zh_TW |
dc.relation.reference (參考文獻) | 黃仁澤,「對於高維度資料進行特徵選取─應用於分類蛋白質質譜儀資料」,國立政治大學統計系研究所碩士論文,2005年,指導教授:郭訓志博士、薛慧敏博士。 | zh_TW |
dc.relation.reference (參考文獻) | 葉勝宗,「使用AUC特徵選取法在蛋白質質譜資料分析之應用」,國立政治大學統計系研究所碩士論文,2006年,指導教授:張源俊博士,郭訓志博士。 | zh_TW |
dc.relation.reference (參考文獻) | 陳敏鋑,「認識癌症」,癌症關懷季刊,德桃基金會。 | zh_TW |
dc.relation.reference (參考文獻) | 網址:http://med.mc.ntu.edu.tw/~onc/Lecture/cancer1.html | zh_TW |
dc.relation.reference (參考文獻) | 賴基銘,「癌症篩檢未來的展望:SELDI血清蛋白指紋圖譜的應用」,國家 | zh_TW |
dc.relation.reference (參考文獻) | 衛生研究院電子報第52期,2004年6月25日。 | zh_TW |
dc.relation.reference (參考文獻) | Adam, B.L., Qu, Y., Davis, J.W., Ward, M.D., Clements, M.A., Cazares, L.H., | zh_TW |
dc.relation.reference (參考文獻) | Semmes, O.J., Schellhammer, P.F., Yasui, Y., Feng, Z. and Wright, G.L. Jr. | zh_TW |
dc.relation.reference (參考文獻) | (2002) “Serum Protein Fingerprinting Coupled with a Pattern- matching | zh_TW |
dc.relation.reference (參考文獻) | Algorithm Distinguishes Prostate Cancer from Benign Prostate Hyperplasia | zh_TW |
dc.relation.reference (參考文獻) | and Healthy Men.” Cancer Research, Vol. 62, No. 13, pp. 3609-14. | zh_TW |
dc.relation.reference (參考文獻) | Alpaydin, E. and Kaynak, C. (1998), “Cascading Classifiers.” Kybernetika, Vol. 34, No. 4, pp. 369-374. | zh_TW |
dc.relation.reference (參考文獻) | Alpaydin, E. and Kaynak, C. (2000) “MultiStage Cascading of Multiple Classifiers: One Man’s Noise is Another Man’s Data.” In Seventeenth International Conference on Machine Learning, ed. P. Langley, pp. 455-462. San Francisco: Morgan Kaufmann. | zh_TW |
dc.relation.reference (參考文獻) | Alpaydin, E. (2004), Introduction to Machine Learning, MIT Press. | zh_TW |
dc.relation.reference (參考文獻) | Bryan,J. G. (1951), “The Generalized Discriminant Function: Mathematical | zh_TW |
dc.relation.reference (參考文獻) | Foundations and Computational Routine.” Harvard Educational Review, | zh_TW |
dc.relation.reference (參考文獻) | Vol. 21, pp. 90-95. | zh_TW |
dc.relation.reference (參考文獻) | Breiman, L. (1996) “Bagging Predictor.” Machine Learning, Vol. 24, No. 2, pp.123-140. | zh_TW |
dc.relation.reference (參考文獻) | Burbidge, R., Trotter, M., Buxton, B. F. and Holden, S. B. (2001), “Drug Design by Machine Learning: Support Vector Machine for Pharmaceutical Data Analysis.” Computers and Chemistry, Vol. 26, pp. 5-14. | zh_TW |
dc.relation.reference (參考文獻) | Chang, Y. C. and Lin, S. C. (2004), “Synergy of Logistic Regression and Support Vector Machine in Multiple-Class Classification.” LNCS, Vol. 3177, pp.132-141. | zh_TW |
dc.relation.reference (參考文獻) | Chen, G., Gharib, T. G., Huang, C. C., Thomas, D. G., Shedden, K. A., Taylor, Jeremy M. G., Kardia, Sharon L.R., Misek, D. E., Giordano, T. J., Tannettoni, M. D., Orringer, M.B., Hanash, S. M. and Beer, D. G.. (2002) “Proteomic Analysis of Lung Adenocarcinoma: Identification of a Highly Expressed Set of Proteins in Tumors.” Clinical Cancer Research, Vol. 8, pp. 2298-2305. | zh_TW |
dc.relation.reference (參考文獻) | Draper, N. R. and Smith, H. (1981), Applied Regression Analysis, 2nd Edn. Wiley, New York. | zh_TW |
dc.relation.reference (參考文獻) | Dudani, S. A. (1976) “The distance-weighted k-nearest-neighbor rule.” | zh_TW |
dc.relation.reference (參考文獻) | IEEE Transactions on Systems, Man, and Cybernetics, 6(4):325-327. | zh_TW |
dc.relation.reference (參考文獻) | Fisher, R. A. (1936), “The Use of Multiple Measurements in Taxonomic | zh_TW |
dc.relation.reference (參考文獻) | Problems.” Annals of Eugenics, Vol. 7, pp. 179-188. | zh_TW |
dc.relation.reference (參考文獻) | Fix, E. and Hodges, J. L. (1951), “Discriminatory Analysis-Nonparametric | zh_TW |
dc.relation.reference (參考文獻) | Discrimination: Consistency Properties.” Report No. 4, US Air Force School of Aviation Medicine, Random Field, Texas. [Published in Agrawala (1997), | zh_TW |
dc.relation.reference (參考文獻) | Silverman and Jones (1989) and Dasarathy (1991).] | zh_TW |
dc.relation.reference (參考文獻) | Furey, T., Schummer, M., Duffy, N., Bednarski, D., Haussler, D. and Cristiannini, N. | zh_TW |
dc.relation.reference (參考文獻) | (2000), “Support Vector Machine Classification and Validation of Caner Tissue Samples Using Microarray Expression Data.” Bioinformatics, Vol. 16, pp. 906-914. | zh_TW |
dc.relation.reference (參考文獻) | Guyon, I., Weston, J. and Barnhill, S. “Gene selection for cancer classification using support vector machines.” Machine Learning, 46(1): 389-422 | zh_TW |
dc.relation.reference (參考文獻) | Hastie, T., Tibshirani, R. and Friedman, J. (2001) The Elements of Statistical Learning. Springer. | zh_TW |
dc.relation.reference (參考文獻) | Holland, J.H. (1994) Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial | zh_TW |
dc.relation.reference (參考文獻) | Intelligence, 3rd edn. Cambridge, MA: MIT Press. | zh_TW |
dc.relation.reference (參考文獻) | Johnson, R. A. and Wichern, D. W. (2002), Applied Multivariate Statistical Analysis, Prentice-Hall, Inc. Upper Saddle River, NJ, USA. | zh_TW |
dc.relation.reference (參考文獻) | Kohonen, Y. (1982) “Self-Organizing Formation of Topologically Correct Feature Maps.” Biological Cybernetics, Vol. 43, pp. 59-69. | zh_TW |
dc.relation.reference (參考文獻) | Kohonen, T. (1990) “The Self-Organizing Map”, Proc Inst Electrical Electronics Eng, Vol. 78, pp. 1464-1480. | zh_TW |
dc.relation.reference (參考文獻) | Lilien, R.H., Farid, H. and Donald, B.R. (2003), “Probabilistic Disease Classification of Expression-Dependent Proteomic Data from Mass Spectrometry of Human Serum.” Journal of Computational Biology, Vol. 10, No. 6, pp.925-946. | zh_TW |
dc.relation.reference (參考文獻) | Osuna, E., Freund, R. and Girosi, F. (1997), “Training Support Vector Machines: An Application to Face Detection.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 130-136. | zh_TW |
dc.relation.reference (參考文獻) | Petricoin, E.F., Ardekani, A.M., Hitt, B.A., Levine, P.J., Fusaro, V.A., Steinberg, S.M., Mills, G.B., Simone, C., Fishman, D.A., Kohn, E.C. and Liotta, L.A. (2002) “Use of Proteomic Patterns in Serum to Identify Ovarian Cancer.” Lancet, Vol. 359, Iss. 9306, pp. 572-577. | zh_TW |
dc.relation.reference (參考文獻) | Qu, Y., Adam, B.L., Thornquist, M., Potter, J.D., Thompson, M.L., Yasui, Y., Davis, J., Schellhammer,P. F., Cazares,L., Clements,M.A., Wright, Jr.G.L. and Feng, Z. (2003), “Data Reduction Using a Discrete Wavelet Transform in Discriminant Analysis of Very High Dimensionality Data.” Biometrics, Vol. 59, pp, 143–151. | zh_TW |
dc.relation.reference (參考文獻) | Rao, C. R. (1948), “The Utilization of Multiple Measurements in Problems of | zh_TW |
dc.relation.reference (參考文獻) | Model Uncertainty in Generalized Linear Models.” Journal of The Royal Statistical Society series B, Vol. 10, pp. 159-203. | zh_TW |
dc.relation.reference (參考文獻) | Ripley, B. D. (1996), Pattern Recognition and Neural Networks, Cambridge: Cambridge University Press. | zh_TW |
dc.relation.reference (參考文獻) | Sauve, A. C. and Speed, T. P. (2004) “ Normalization, Baseline Correction and | zh_TW |
dc.relation.reference (參考文獻) | Alignment of High-Throughput Mass Spectrometry Data.” Proceedings | zh_TW |
dc.relation.reference (參考文獻) | Gensips 2004 | zh_TW |
dc.relation.reference (參考文獻) | Schölkopf, B. Herbrich, R. and Smola, A. J. (2001) “The General Representer Theorem.” LNAI, Vol. 2111, pp. 416-426. | zh_TW |
dc.relation.reference (參考文獻) | Tong, S. and Koller, D. (2002), “Support vector machine active learning with | zh_TW |
dc.relation.reference (參考文獻) | applications to text classification.” The Journal of Machine Learning Research, Vol.2, pp.45-66. | zh_TW |
dc.relation.reference (參考文獻) | Trafalis, T. B. and Ince, H. (2000), “Support Vector Machine for Regression | zh_TW |
dc.relation.reference (參考文獻) | and Application to Financial Forecasting.” Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks, Vol. 6, pp.6348-6353. | zh_TW |
dc.relation.reference (參考文獻) | Vapnik, V. N. (1995), The Nature of Statistical Learning Theory, Springer, New York. | zh_TW |
dc.relation.reference (參考文獻) | Wolpert, D. H. (1992), “Stacked Generalization.” Neural Networks, Vol.5, pp241-259. | zh_TW |
dc.relation.reference (參考文獻) | Wu, B., Abbott, T., Fishman, D., McMurray W., Mor, G., Stone, K., Ward, D., Williams, K. and Zhao, H. (2003), “Comparison of Statistical Methods for Classification of Ovarian Cancer Using Mass Spectrometry Data.” Bioinformatics, Vol. 19, No. 13, pp. 1636-1643. | zh_TW |
dc.relation.reference (參考文獻) | Zhang, X., Mesirov, J. P. and Waltz, D. L. (1992) “Hybird System for Protein Secondary Structure Prediction.” NCBI, Vol. 255, No.4, pp.1049-1063. | zh_TW |
dc.relation.reference (參考文獻) | Zou, H. and Hastie, T. (2004) “Regularization and Variable Selection via the Elastic Net.” Journal of the Royal Statistical Society, Series B, Vol. 67, pp. 301-320. | zh_TW |