dc.contributor.advisor | 余清祥 | zh_TW |
dc.contributor.advisor | Yue, Ching-Syang | en_US |
dc.contributor.author (作者) | 黃靜文 | zh_TW |
dc.contributor.author (作者) | Huang, Ching-Wen | en_US |
dc.creator (作者) | 黃靜文 | zh_TW |
dc.creator (作者) | Huang, Ching-Wen | en_US |
dc.date (日期) | 2004 | en_US |
dc.date.accessioned | 2009-09-14 | - |
dc.date.available | 2009-09-14 | - |
dc.date.issued (上傳時間) | 2009-09-14 | - |
dc.identifier (其他 識別碼) | G0923540121 | en_US |
dc.identifier.uri (URI) | https://nccur.lib.nccu.edu.tw/handle/140.119/30944 | - |
dc.description (描述) | 碩士 | zh_TW |
dc.description (描述) | 國立政治大學 | zh_TW |
dc.description (描述) | 統計研究所 | zh_TW |
dc.description (描述) | 92354012 | zh_TW |
dc.description (描述) | 93 | zh_TW |
dc.description.abstract (摘要) | 本文應用攝護腺癌症蛋白質資料庫,是經由表面強化雷射解吸電離飛行質譜技術的血清蛋白質強度資料,藉此資料判斷受測者是否罹患癌症。此資料庫之受測者包含正常、良腫、癌初和癌末四種類別,其中包括兩筆資料,一筆為包含約48000個區間資料(變數)之原始資料,另一筆為經由人工變數篩選後,僅剩餘779區間資料(變數)之人工處理資料,此兩筆皆為高維度資料,皆約有650個觀察值。高維度資料因變數過多,除了分析不易外,亦造成運算時間較長。故本研究目的即探討在有效的維度縮減方式下,找出最小化分錯率的方法。 本研究先比較分類方法-支持向量機、類神經網路和分類迴歸樹之優劣,再將較優的分類方法:支持向量機和類神經網路,應用於維度縮減資料之分類。本研究採用之維度縮減方法,包含離散小波分析、主成份分析和主成份分析網路。根據分析結果,離散小波分析和主成份分析表現較佳,而主成份分析網路差強人意。 本研究除探討以上維度縮減方法對此病例資料庫分類之成效外,亦結合線性維度縮減-主成份分析,非線性維度縮減-主成份分析網路,希望能藉重疊法再改善僅做單一維度縮減方法之病例篩檢分錯率,根據分析結果,重疊法對原始資料改善效果不明顯,但對人工處理資料卻有明顯的改善效果。 | zh_TW |
dc.description.abstract (摘要) | In this paper, we study the serum protein data set of prostate cancer, which acquired by Surface-Enhanced Laser Desorption/Ionization Time-of-Flight Mass Spectrometry (SELDI-TOF-MS) technique. The data set, with four populations of prostate cancer patients, includes both raw data and preprocessed data. There are around 48000 variables in raw data and 779 variables in preprocessed data. The sample size of each data is around 650. Because of the high dimensionality, this data set provokes higher level of difficulty and computation time. Therefore, the goal of this study is to search efficient dimension reduction methods. We first compare three classification methods: support vector machine, artificial neural network, and classification and regression tree. And, we use discrete wavelet transform, principal component analysis and principal component analysis networks to reduce the data dimension. Then, we discuss the dimension reduction methods and propose overlap method that combines the linear dimension reduction method-principal component analysis, and the nonlinear dimension reduction method-principal component analysis networks to improve the classification result. We find that the improvement of overlap method is significant in the preprocessed data, but not significant in the raw data. | en_US |
dc.description.tableofcontents | 第一章 緒論 . 1 第一節 研究動機和目的……………………………………………….1 第二節 資料來源與簡介……………………………………………….2 1.2.1. 表面強化雷射解吸電離飛行質譜技術……….…...…….….2 1.2.2. 資料簡介….……...…..………………………………………4 第三節 研究工具與設定……...………………………………………..5 第二章 分類方法 . 7 第一節 支持向量機………………………...…………………………..7 2.1.1. 方法簡介.……….……...…………………...………………..7 2.1.2. 參數設定……….…………..……………...…………………8 第二節 類神經網路…………………………………………………….9 2.2.1. 方法簡介……….……………………………...….………….9 2.2.2. 參數設定……...………………….……………...………….10 第三節 分類迴歸樹…………………………………………………...12 第四節 實證結果……………...………………………………………13 第三章 維度縮減方法 . 15 第一節 離散小波轉換……………………………..………………….15 3.1.1. 方法簡介……………………………………………………15 3.1.2. 參數設定…………………………………………...……….17 3.1.3. 小波係數個數選取…………………………………………18 第二節 主成份分析……………………………………...……………20 3.2.1. 方法簡介……………………………………………………20 3.2.2. 主成份個數選取……………...……………………….……21 3.2.3. 主成份分析效果……………………………………………25 第三節 主成份分析網路……………………………...………………26 3.3.1. 方法簡介……………...…………………….………………26 3.3.2. 參數設定……………………………………………………27 3.3.3. 隱藏層節點數選取……….…………...……………………27 第四節 方法比較……………………………….…..…………………31 第四章 重疊法 . 36 第一節 方法簡介………………………………………...……………36 第二節 實證結果……………………………………………………...40 第五章 結論與建議 . 45 第一節 結論………………………………………………………...…45 第二節 建議…………………………………………………………...46 參考文獻……………………………………………..……………………….48 附錄一-各分類方法之平均分錯率和標準差…………………...…………51 附錄二-主成份分析之平均分錯率和標準差……………………...………52 附錄三-主成份分析網路之平均分錯率和標準差…...……………………56 附錄四-重疊法之平均分錯率和標準差……………………...……………58 附錄五-維度縮減後之類神經網路分類輸出值直方圖…………………...61 附錄六-維度縮減後之各區間輸出值之分錯比例………………………...64 | zh_TW |
dc.language.iso | en_US | - |
dc.source.uri (資料來源) | http://thesis.lib.nccu.edu.tw/record/#G0923540121 | en_US |
dc.subject (關鍵詞) | 分類 | zh_TW |
dc.subject (關鍵詞) | 維度縮減 | zh_TW |
dc.subject (關鍵詞) | 疾病診斷 | zh_TW |
dc.subject (關鍵詞) | 電腦模擬 | zh_TW |
dc.subject (關鍵詞) | Classification | en_US |
dc.subject (關鍵詞) | Dimension reduction | en_US |
dc.subject (關鍵詞) | Disease diagnosis | en_US |
dc.subject (關鍵詞) | Computer simulation | en_US |
dc.title (題名) | 維度縮減應用於蛋白質質譜儀資料 | zh_TW |
dc.title (題名) | Dimension Reduction on Protein Mass Spectrometry Data | en_US |
dc.type (資料類型) | thesis | en |
dc.relation.reference (參考文獻) | 【中文部分】 | zh_TW |
dc.relation.reference (參考文獻) | [01] 行政院衛生署,「中華民國九十三年臺灣地區死因統計結果摘要」。 | zh_TW |
dc.relation.reference (參考文獻) | 網址:http://www.doh.gov.tw/statistic/data/死因摘要/93年/93.htm | zh_TW |
dc.relation.reference (參考文獻) | [02] 彭文正譯,Michael J.A. Berry與Gordon S. Linoff著,資料採礦-顧客關係管理暨電子行銷之應用,數博網資訊股份有限公司,2001年。 | zh_TW |
dc.relation.reference (參考文獻) | [03] 葉怡成,應用類神經網路,儒林圖書公司,1997年。 | zh_TW |
dc.relation.reference (參考文獻) | [04] 潘荔錞、蔡志彥和簡志青,「蛋白質體學在臨床醫學之應用」,化工資訊與商情月刊第3期,2003年9月號。 | zh_TW |
dc.relation.reference (參考文獻) | [05] 賴基銘,「癌症篩檢未來的展望:SELDI血清蛋白指紋圖譜的應用」,國家衛生研究院電子報,第52期,2004年6月25日。 | zh_TW |
dc.relation.reference (參考文獻) | 網址:http://sars.nhri.org.tw/enews/enews_list_new3.php?volume_indx= | zh_TW |
dc.relation.reference (參考文獻) | 52&enews_dt=2004-06-25 | zh_TW |
dc.relation.reference (參考文獻) | 【英文部分】 | zh_TW |
dc.relation.reference (參考文獻) | [06] Alpaydin, E. (2004), Introduction to Machine Learning. MIT Press. | zh_TW |
dc.relation.reference (參考文獻) | [07] Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984), Classification and Regression Trees, Wadsworth. | zh_TW |
dc.relation.reference (參考文獻) | [08] Cottrell, G. W., Munro, P. and Zipser, D. (1987), “Learning Internal Representations from Gray-Scale Images: An Example of Extensional Programming”, In Ninth Annual Conference of the Cognitive Science Society, 462-473. Hillsdale, NJ:Erlbsum. | zh_TW |
dc.relation.reference (參考文獻) | [09] Cybenko, G. (1989), “Approximation by Superpositions of a Sigmoidal Function,” Mathematical Control Signal Systems, vol.2, 303-314. | zh_TW |
dc.relation.reference (參考文獻) | [10] Donoho, D. L. and Johnstone, I. M. (1994), “Ideal Spatial Adaptation by Wavelet Shrinkage”, Biometrika, vol.81, 245-455. | zh_TW |
dc.relation.reference (參考文獻) | [11] Donoho, D. L. and Johnstone, I. M. (1995), “Adapting to Unknown Smoothness via Wavelet Shrinkage”, Journal of the American Statistical Association, vol.90, 1200-1224. | zh_TW |
dc.relation.reference (參考文獻) | [12] Donoho, D. L. and Johnstone, I. M. (1998), “Minimax Estimation via Wavelet Shrinkage,” Annals of Statistics, vol.26, 879-921. | zh_TW |
dc.relation.reference (參考文獻) | [13] Daubechies, I. (1992), Ten Lectures on Wavelets, CBMS-NSF Regional Conference Series in Applied Mathematics, SIAM:Philadelphia. | zh_TW |
dc.relation.reference (參考文獻) | [14] Hornik, K., Stinchcombe, M. and White, H. (1989), Multilayer Feedforward Networks Are Universal Approximations, Neural Networks, vol.2, 336-359. | zh_TW |
dc.relation.reference (參考文獻) | [15] Hsu, C-W., Chang, C-C. and Lin, C-J. (2003), “A Practical Guide to Support Vector Classification”. | zh_TW |
dc.relation.reference (參考文獻) | Paper available at http://www.csie.ntu.edu.tw/~cjlin/papers.html. | zh_TW |
dc.relation.reference (參考文獻) | [16] Huang, T-K., Weng, R. C. and Lin, C-J. (July 2004), “A Generalized Bradley-Terry Model: From Group Competition to Individual Skill”. A short version appears in NIPS. | zh_TW |
dc.relation.reference (參考文獻) | [17] Johnson, D. E. (1998), Applied Multivariate Methods for Data Analysts, Pacific Grove, Calif. Dluxbury Press. | zh_TW |
dc.relation.reference (參考文獻) | [18] Mallat, S. G. (1989), “A Theory for Multiresolution Signal Decomposition: the Wavelet Representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vo1.11, No.7, 674-693. | zh_TW |
dc.relation.reference (參考文獻) | [19] Qu, Y., Adam, B-L., Thornquist, M., Potter, J. D., Thompson, M. L., Yasui, Y., Davis, J., Schellhammer, P. F., Cazares, L., Clements, M. A., Wright, G. L., Jr. and Feng, Z. (March 2003), “Data Reduction Using a Discrete Wavelet Transform in Discriminant Analysis of Very High Dimensionality Data”, BIOMETRICS, vol.59, 143-151. | zh_TW |
dc.relation.reference (參考文獻) | [20] Rumelhart E., Hinton G. E., and Williams R. J. (1986), Learning Internal Representations by Error Propagation in Parallel Distributed Processing, MIT Press, Cambridge, MA, vol.1, 318-362. | zh_TW |
dc.relation.reference (參考文獻) | [21] Vapnik V. N. (1995), The Nature of Statistical Learning Theory, Springer, New York. | zh_TW |