重疊法應用於蛋白質質譜儀資料

學術產出-學位論文

文章檢視/開啟

html(170)

書目匯出

Google Scholar^TM

題名	重疊法應用於蛋白質質譜儀資料 Overlap Technique on Protein Mass Spectrometry Data
作者	徐竣建 Hsu, Chun-Chien
貢獻者	余清祥 Yue, Ching-Syang Jack 徐竣建 Hsu, Chun-Chien
關鍵詞	疾病診斷維度縮減分類主成份分析支持向量機重疊法 Disease Diagnosis Dimension Reduction Classification Principal Component Analysis Support Vector Machine Overlap
日期	2005
上傳時間	2009-09-14
摘要	癌症至今已連續蟬聯並高居國人十大死因之首，由於癌症初期病患接受適時治療的存活率較高，因此若能「早期發現，早期診斷，早期治療」則可降低死亡率。本文所引用的資料庫，是經由「表面強化雷射解吸電離飛行質譜技術」（SELDI-TOF-MS）所擷取建置的蛋白質質譜儀資料，包括兩筆高維度資料：一筆為攝護腺癌症，另一筆則為頭頸癌症。然而蛋白質質譜儀資料常因維度變數繁雜眾多，對於資料的存取容量及運算時間而言，往往造成相當沉重的負擔與不便；有鑑於此，本文之目的即在探討將高維度資料經由維度縮減後，找出分錯率最小化之分析方法，希冀提高癌症病例資料分類的準確性。本研究分為實驗組及對照組兩部分，實驗組是以主成份分析（Principal Component Analysis，PCA）進行維度縮減，再利用支持向量機（Support Vector Machine，SVM）予以分類，最後藉由重疊法（Overlap）以期改善分類效果；對照組則是以支持向量機直接進行分類。分析結果顯示，重疊法對於攝護腺癌症具有顯著的改善效果，但對於頭頸癌症的改善效果卻不明顯。此外，本研究也探討關於蛋白質質譜儀資料之質量範圍，藉以確認專家學者所建議的質量範圍是否與分析結果相互一致。在攝護腺癌症中的原始資料，專家學者所建議的質量範圍以外，似乎仍隱藏著重要的相關資訊；在頭頸癌症中的原始資料，專家學者所建議的質量範圍以外，對於研究分析而言則並沒有實質上的幫助。 Cancer has been the number one leading cause of death in Taiwan for the past 24 years. Early detection of this disease would significantly reduce the mortality rate. The database adopted in this study is from the Protein Mass Spectrometry Data Sets acquired and established by “Surface-Enhanced Laser Desorption/Ionization Time-of-Flight Mass Spectrometry” (SELDI-TOF-MS) technique, including the Prostate Cancer and Head/Neck Cancer Data Sets. However, because of its high dimensionality, dealing the analysis of the raw data is not easy. Therefore, the purpose of this thesis is to search a feasible method, putting the dimension reduction and minimizing classification errors in the same time. The data sets are separated into the experimental and controlled groups. The first step of the experimental group is to use dimension reduction by Principal Component Analysis (PCA), following by Support Vector Machine (SVM) for classification, and finally Overlap Method is used to reduce classification errors. For comparison, the controlled group uses SVM for classification. The empirical results indicate that the improvement of Overlap Method is significant in the Prostate Cancer case, but not in that of the Head/Neck case. We also study data range suggested according to the expert opinions. We find that there is information hidden outside the data range suggested by the experts in the Prostate Cancer case, but not in the Head/Neck case.
參考文獻	[01] 牛頓雜誌編輯部，「孜孜不倦地實驗，也會找到新發現；訪問日本島津製作所田中耕一研究員」，牛頓雜誌國際中文版第235期，2003年3月號。 [02] 牛頓雜誌編輯部，「我的新挑戰！訪問日本島津製作所田中耕一紀念質量分析研究所」，牛頓雜誌國際中文版第242期，2003年10月號。 [03] 行政院衛生署，「中華民國九十四年台灣地區死因統計結果摘要」。網址：http://www.doh.gov.tw/statistic/data/死因摘要/94年/94.htm [04] 行政院衛生署，「臺灣地區主要癌症死亡原因」。網址：http://www.doh.gov.tw/statistic/data/死因摘要/94年/表8.xls [05] 行政院衛生署，國民健康局，「94年度衛生教育宣導主軸－癌症預防」。網址：http://www.bhp.doh.gov.tw/BHP/index.jsp [06] 財團法人，尹書田紀念醫院，「攝護腺癌症」。網址：http://www.shutien.org.tw/stuc62.htm [07] 財團法人，仁愛綜合醫院，「保健衛教，泌尿科，攝護腺癌症」。網址：http://www.jah.org.tw/chinese/5_knowledge/2_info/b/b03/05.asp [08] 陳敏鋑，「認識癌症」，癌症關懷季刊，德桃基金會。網址：http://med.mc.ntu.edu.tw/~onc/Lecture/cancer1.html [09] 陳順宇，「多變量分析」，第3版，華泰書局，2004年。 [10] 黃筌敬，黃貞瑛，「應用小波轉換處理蛋白質體資料」，輔仁大學資訊工程學系研究所碩士論文，2004年。 [11] 黃靜文，余清祥，「維度縮減應用於蛋白質質譜儀資料」，國立政治大學統計學系研究所碩士論文，2004年。 [12] 潘荔錞、蔡志彥和簡志青，「蛋白質體學在臨床醫學之應用」，化工資訊與商情月刊第3期，2003年9月號。 [13] 賴基銘，「癌症篩檢未來的展望：SELDI 血清蛋白指紋圖譜的應用」，國家衛生研究院電子報第52期，2004年6月25日。 [14] 關少雄，「正子射出斷層掃描術，頭頸部癌症」，正子造影醫療網。網址：http://www.uhealthy.com/chinese/pet2/composite/clinic-014.htm [15] Alpaydin, E. (2004), “Introduction to Machine Learning”, MIT Press. [16] Adam, B.-L., Private Communication. [17] Chang, C.-C., Hsu, C.-W. and Lin, C.-J. (2006), LIBSVM: A Library for Support Vector Machines. Software Available from http://www.csie.ntu.edu.tw/~cjlin/libsvm/. [18] Furey, T., Schummer, M., Duffy, N., Bednarski, D., Haussler, D., and Cristiannini, N. (2000), “Support Vector Machine Classification and Validation of Cancer Tissue Samples Using Microarray Expression Data”, Bioinformatics, vol. 16, pp. 906-914. [19] Hastie, T., Tibshirani, R., Friedman, J. (2001), “The Elements of Statistical Learning - Data Mining, Inference, and Prediction”, Springer, New York. [20] Hotelling, H. (1933), “Analysis of a Complex of Statistical Variables into Principal Components”, Journal of Education Psychology, vol. 24, pp. 417-441. [21] Joachims, T. (1998), “Text Categorization with Support Vector Machines: Learning with Many Relevant Features”, Proceedings of European Conference on Machine Learning (ECML), Berlin, pp. 137-142. [22] Osuna, E., Freund, R. and Girosi, F. (1997), “Training Support Vector Machines: An Application to Face Detection”, Proceedings of Computer Vision and Pattern Recognition, San Juan, Puerto Rico, pp. 130-136. [23] Pradhan, S., Ward, W., Hacioglu, K., Martin, J. and Jurafsky, D. (2004), “Shallow Semantic Parsing Using Support Vector Machines”, Proceedings of NAACL-HLT, Boston, Massachusetts, pp. 233-240. [24] Tay, F. E. H. and Cao, L. (2001), “Application of Support Vector Machines in Financial Time Series Forecasting”, Omega, vol. 29, pp. 309-317. [25] Vapnik, V. (1995), “The Nature of Statistic Learning Theory”, Springer, New York. [26] Vapnik, V. (1982), “Estimation of Dependences Based on Empirical Data”, Springer, New York.
描述	碩士國立政治大學統計研究所 93354010 94
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0933540101
資料類型	thesis

dc.contributor.advisor	余清祥	zh_TW
dc.contributor.advisor	Yue, Ching-Syang Jack	en_US
dc.contributor.author (作者)	徐竣建	zh_TW
dc.contributor.author (作者)	Hsu, Chun-Chien	en_US
dc.creator (作者)	徐竣建	zh_TW
dc.creator (作者)	Hsu, Chun-Chien	en_US
dc.date (日期)	2005	en_US
dc.date.accessioned	2009-09-14	-
dc.date.available	2009-09-14	-
dc.date.issued (上傳時間)	2009-09-14	-
dc.identifier (其他識別碼)	G0933540101	en_US
dc.identifier.uri (URI)	https://nccur.lib.nccu.edu.tw/handle/140.119/30953	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	統計研究所	zh_TW
dc.description (描述)	93354010	zh_TW
dc.description (描述)	94	zh_TW
dc.description.abstract (摘要)	癌症至今已連續蟬聯並高居國人十大死因之首，由於癌症初期病患接受適時治療的存活率較高，因此若能「早期發現，早期診斷，早期治療」則可降低死亡率。本文所引用的資料庫，是經由「表面強化雷射解吸電離飛行質譜技術」（SELDI-TOF-MS）所擷取建置的蛋白質質譜儀資料，包括兩筆高維度資料：一筆為攝護腺癌症，另一筆則為頭頸癌症。然而蛋白質質譜儀資料常因維度變數繁雜眾多，對於資料的存取容量及運算時間而言，往往造成相當沉重的負擔與不便；有鑑於此，本文之目的即在探討將高維度資料經由維度縮減後，找出分錯率最小化之分析方法，希冀提高癌症病例資料分類的準確性。本研究分為實驗組及對照組兩部分，實驗組是以主成份分析（Principal Component Analysis，PCA）進行維度縮減，再利用支持向量機（Support Vector Machine，SVM）予以分類，最後藉由重疊法（Overlap）以期改善分類效果；對照組則是以支持向量機直接進行分類。分析結果顯示，重疊法對於攝護腺癌症具有顯著的改善效果，但對於頭頸癌症的改善效果卻不明顯。此外，本研究也探討關於蛋白質質譜儀資料之質量範圍，藉以確認專家學者所建議的質量範圍是否與分析結果相互一致。在攝護腺癌症中的原始資料，專家學者所建議的質量範圍以外，似乎仍隱藏著重要的相關資訊；在頭頸癌症中的原始資料，專家學者所建議的質量範圍以外，對於研究分析而言則並沒有實質上的幫助。	zh_TW
dc.description.abstract (摘要)	Cancer has been the number one leading cause of death in Taiwan for the past 24 years. Early detection of this disease would significantly reduce the mortality rate. The database adopted in this study is from the Protein Mass Spectrometry Data Sets acquired and established by “Surface-Enhanced Laser Desorption/Ionization Time-of-Flight Mass Spectrometry” (SELDI-TOF-MS) technique, including the Prostate Cancer and Head/Neck Cancer Data Sets. However, because of its high dimensionality, dealing the analysis of the raw data is not easy. Therefore, the purpose of this thesis is to search a feasible method, putting the dimension reduction and minimizing classification errors in the same time. The data sets are separated into the experimental and controlled groups. The first step of the experimental group is to use dimension reduction by Principal Component Analysis (PCA), following by Support Vector Machine (SVM) for classification, and finally Overlap Method is used to reduce classification errors. For comparison, the controlled group uses SVM for classification. The empirical results indicate that the improvement of Overlap Method is significant in the Prostate Cancer case, but not in that of the Head/Neck case. We also study data range suggested according to the expert opinions. We find that there is information hidden outside the data range suggested by the experts in the Prostate Cancer case, but not in the Head/Neck case.	en_US
dc.description.tableofcontents	第壹章緒論 ……………………………………………………………… 01 第一節研究背景 …………………………………………………… 01 第二節研究動機與目的 …………………………………………… 02 第三節研究架構 …………………………………………………… 03 第貳章蛋白質質譜儀資料 ……………………………………………… 04 第一節表面強化雷射解吸電離飛行質譜技術 …………………… 04 第二節攝護腺癌症蛋白質質譜儀資料庫 ………………………… 06 第三節頭頸癌症蛋白質質譜儀資料庫 …………………………… 08 第四節蛋白質質譜儀資料之探討 ………………………………… 09 第參章研究方法 ………………………………………………………… 10 第一節主成份分析 ………………………………………………… 10 第二節支持向量機 ………………………………………………… 13 第三節重疊法 ……………………………………………………… 19 3.3.1. 重疊法之主要概念 …………………………………… 19 3.3.2. 重疊法之分類準則 …………………………………… 19 3.3.3. 重疊法之定義 ………………………………………… 20 第肆章實證分析 ………………………………………………………… 23 第一節攝護腺癌症蛋白質質譜儀資料之實證研究 ……………… 23 4.1.1. 研究設定 ……………………………………………… 23 4.1.2. 質量範圍之選取 ……………………………………… 25 4.1.3. 研究步驟與流程 ……………………………………… 26 4.1.4. 實證結果 ……………………………………………… 27 第二節頭頸癌症蛋白質質譜儀資料之實證研究 ………………… 35 4.2.1. 研究設定 ……………………………………………… 35 4.2.2. 質量範圍之選取 ……………………………………… 35 4.2.3. 研究步驟與流程 ……………………………………… 36 4.2.4. 實證結果 ……………………………………………… 37 第三節實證分析結果之彙整 ……………………………………… 38 第伍章質量範圍之探討 ………………………………………………… 39 第一節攝護腺癌症蛋白質質譜儀資料之質量範圍 ……………… 39 第二節頭頸癌症蛋白質質譜儀資料之質量範圍 ………………… 46 第三節質量範圍結果之彙整 ……………………………………… 49 第陸章結論與未來展望 ………………………………………………… 50 第一節結論 ………………………………………………………… 50 第二節未來展望 …………………………………………………… 51 參考文獻 …………………………………………………………………… 53 附錄 ………………………………………………………………………… 56	zh_TW
dc.language.iso	en_US	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0933540101	en_US
dc.subject (關鍵詞)	疾病診斷	zh_TW
dc.subject (關鍵詞)	維度縮減	zh_TW
dc.subject (關鍵詞)	分類	zh_TW
dc.subject (關鍵詞)	主成份分析	zh_TW
dc.subject (關鍵詞)	支持向量機	zh_TW
dc.subject (關鍵詞)	重疊法	zh_TW
dc.subject (關鍵詞)	Disease Diagnosis	en_US
dc.subject (關鍵詞)	Dimension Reduction	en_US
dc.subject (關鍵詞)	Classification	en_US
dc.subject (關鍵詞)	Principal Component Analysis	en_US
dc.subject (關鍵詞)	Support Vector Machine	en_US
dc.subject (關鍵詞)	Overlap	en_US
dc.title (題名)	重疊法應用於蛋白質質譜儀資料	zh_TW
dc.title (題名)	Overlap Technique on Protein Mass Spectrometry Data	en_US
dc.type (資料類型)	thesis	en
dc.relation.reference (參考文獻)	[01] 牛頓雜誌編輯部，「孜孜不倦地實驗，也會找到新發現；訪問日本島津製作所田中耕一研究員」，牛頓雜誌國際中文版第235期，2003年3月號。	zh_TW
dc.relation.reference (參考文獻)	[02] 牛頓雜誌編輯部，「我的新挑戰！訪問日本島津製作所田中耕一紀念質量分析研究所」，牛頓雜誌國際中文版第242期，2003年10月號。	zh_TW
dc.relation.reference (參考文獻)	[03] 行政院衛生署，「中華民國九十四年台灣地區死因統計結果摘要」。	zh_TW
dc.relation.reference (參考文獻)	網址：http://www.doh.gov.tw/statistic/data/死因摘要/94年/94.htm	zh_TW
dc.relation.reference (參考文獻)	[04] 行政院衛生署，「臺灣地區主要癌症死亡原因」。	zh_TW
dc.relation.reference (參考文獻)	網址：http://www.doh.gov.tw/statistic/data/死因摘要/94年/表8.xls	zh_TW
dc.relation.reference (參考文獻)	[05] 行政院衛生署，國民健康局，「94年度衛生教育宣導主軸－癌症預防」。	zh_TW
dc.relation.reference (參考文獻)	網址：http://www.bhp.doh.gov.tw/BHP/index.jsp	zh_TW
dc.relation.reference (參考文獻)	[06] 財團法人，尹書田紀念醫院，「攝護腺癌症」。	zh_TW
dc.relation.reference (參考文獻)	網址：http://www.shutien.org.tw/stuc62.htm	zh_TW
dc.relation.reference (參考文獻)	[07] 財團法人，仁愛綜合醫院，「保健衛教，泌尿科，攝護腺癌症」。	zh_TW
dc.relation.reference (參考文獻)	網址：http://www.jah.org.tw/chinese/5_knowledge/2_info/b/b03/05.asp	zh_TW
dc.relation.reference (參考文獻)	[08] 陳敏鋑，「認識癌症」，癌症關懷季刊，德桃基金會。	zh_TW
dc.relation.reference (參考文獻)	網址：http://med.mc.ntu.edu.tw/~onc/Lecture/cancer1.html	zh_TW
dc.relation.reference (參考文獻)	[09] 陳順宇，「多變量分析」，第3版，華泰書局，2004年。	zh_TW
dc.relation.reference (參考文獻)	[10] 黃筌敬，黃貞瑛，「應用小波轉換處理蛋白質體資料」，輔仁大學資訊工程學系研究所碩士論文，2004年。	zh_TW
dc.relation.reference (參考文獻)	[11] 黃靜文，余清祥，「維度縮減應用於蛋白質質譜儀資料」，國立政治大學統計學系研究所碩士論文，2004年。	zh_TW
dc.relation.reference (參考文獻)	[12] 潘荔錞、蔡志彥和簡志青，「蛋白質體學在臨床醫學之應用」，化工資訊與商情月刊第3期，2003年9月號。	zh_TW
dc.relation.reference (參考文獻)	[13] 賴基銘，「癌症篩檢未來的展望：SELDI 血清蛋白指紋圖譜的應用」，國家衛生研究院電子報第52期，2004年6月25日。	zh_TW
dc.relation.reference (參考文獻)	[14] 關少雄，「正子射出斷層掃描術，頭頸部癌症」，正子造影醫療網。	zh_TW
dc.relation.reference (參考文獻)	網址：http://www.uhealthy.com/chinese/pet2/composite/clinic-014.htm	zh_TW
dc.relation.reference (參考文獻)	[15] Alpaydin, E. (2004), “Introduction to Machine Learning”, MIT Press.	zh_TW
dc.relation.reference (參考文獻)	[16] Adam, B.-L., Private Communication.	zh_TW
dc.relation.reference (參考文獻)	[17] Chang, C.-C., Hsu, C.-W. and Lin, C.-J. (2006), LIBSVM: A Library for Support Vector Machines.	zh_TW
dc.relation.reference (參考文獻)	Software Available from http://www.csie.ntu.edu.tw/~cjlin/libsvm/.	zh_TW
dc.relation.reference (參考文獻)	[18] Furey, T., Schummer, M., Duffy, N., Bednarski, D., Haussler, D., and Cristiannini, N. (2000), “Support Vector Machine Classification and Validation of Cancer Tissue Samples Using Microarray Expression Data”, Bioinformatics, vol. 16, pp. 906-914.	zh_TW
dc.relation.reference (參考文獻)	[19] Hastie, T., Tibshirani, R., Friedman, J. (2001), “The Elements of Statistical Learning - Data Mining, Inference, and Prediction”, Springer, New York.	zh_TW
dc.relation.reference (參考文獻)	[20] Hotelling, H. (1933), “Analysis of a Complex of Statistical Variables into Principal Components”, Journal of Education Psychology, vol. 24, pp. 417-441.	zh_TW
dc.relation.reference (參考文獻)	[21] Joachims, T. (1998), “Text Categorization with Support Vector Machines: Learning with Many Relevant Features”, Proceedings of European Conference on Machine Learning (ECML), Berlin, pp. 137-142.	zh_TW
dc.relation.reference (參考文獻)	[22] Osuna, E., Freund, R. and Girosi, F. (1997), “Training Support Vector Machines: An Application to Face Detection”, Proceedings of Computer Vision and Pattern Recognition, San Juan, Puerto Rico, pp. 130-136.	zh_TW
dc.relation.reference (參考文獻)	[23] Pradhan, S., Ward, W., Hacioglu, K., Martin, J. and Jurafsky, D. (2004), “Shallow Semantic Parsing Using Support Vector Machines”, Proceedings of NAACL-HLT, Boston, Massachusetts, pp. 233-240.	zh_TW
dc.relation.reference (參考文獻)	[24] Tay, F. E. H. and Cao, L. (2001), “Application of Support Vector Machines in Financial Time Series Forecasting”, Omega, vol. 29, pp. 309-317.	zh_TW
dc.relation.reference (參考文獻)	[25] Vapnik, V. (1995), “The Nature of Statistic Learning Theory”, Springer, New York.	zh_TW
dc.relation.reference (參考文獻)	[26] Vapnik, V. (1982), “Estimation of Dependences Based on Empirical Data”, Springer, New York.	zh_TW

學術產出-學位論文

文章檢視/開啟

書目匯出

Google ScholarTM

政大圖書館

引文資訊

TAIR相關學術產出

Google Scholar^TM