Publications-NSC Projects

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 統計機器學習及其應用-病例分類與資料縮減研究-應用蛋白質資料庫檢測癌症(2/2)
其他題名 Disease Classification and Data Reduction--- Application to Cancer Detection Based on Proteomic
作者 余清祥
關鍵詞 資料縮減;分類;病例診斷;模擬
Data reduction;Classification;Diagnosis;Simulation
日期 2005
上傳時間 18-Apr-2007 16:36:53 (UTC+8)
Publisher 臺北市:國立政治大學統計學系
摘要 在資料庫內容龐大紛雜的現代社會中,時效性往往是最重要的考量因素,以期在最短的時間內獲取近似、可接受的解答,為後續發展提供即時的建議。例如:醫師根據癌症病患的檢體報告,儘快判斷病患是否需要立即實施手術、化學治療,或甚至不需要任何治療、但須持續追蹤觀察。因為資料量的縮減通常代表較低的分析時間與成本,縮減資料自然成為講求時效及近似解答的最佳選擇之一,其中常見的方法包括直方圖(Histogram)、歧異值分解(Singular Value Decomposition)、索引樹(Index Tree)、抽樣、小波(Wavelet)等等。本計畫將使用攝護腺病人的蛋白質體資料庫(Proteomic data),其中病例個數約300人、變數個數卻接近5 萬個,以正確的病例分類為目標,比較幾種常見資料縮減方法的優劣。本計畫將預計分為三年進行:第一年使用人工篩選(錯誤較少、變數較少)過的蛋白質質譜儀數據,考慮以Support Vector Machine (SVM)、類神經網路、Classification and Regression Tree (CART)、羅吉士迴歸四種常見的分類方法,尋求在二元、分類標準下的最佳分類方法;第二年使用變數個數約5 萬個的原始資料,以二元分類為目標,配合之前較佳的分類方法,尋求可篩選出最多訊息的資料縮減方法;第三年則嘗試合併每位病人兩份檢體結果,以多元分 類為目標,獲得正確的病例診斷。
It is often needed to get quick approximate answers from large databases (i.e., data reduction), since obtaining answers quickly is important and it is acceptable to sacrifice the accuracy of the answer for speed. The reduction process is important in the exploratory data analysis, particularly when interactive response times are critical. For example, doctors need to decide from the medical exam if cancer patients need surgeries, chemical therapies, or thorough physical exam. Popular data reduction methods include histogram, singular value decomposition (SVD), index tree, sampling, and wavelet. We will use data from prostate cancer patients (Proteomic data), which include records of about 300 patients and almost 50,000 variables. Our goal is to include the data reduction methods to minimize the classification error. The project will be divided into three years. The focus of the first year is to explore the performance of frequently used classification methods, such as support vector machine (SVM), neural network, classification and regression tree, and logistic regression. We shall use the pre-processed data with only 779 variables and possible errors corrected manually, and the goal of the first year is binary classification. Data reduction methods will be considered in the second year and the raw data (about 48,000 variables and errors not corrected) will be used as well. The focus will be on the diagnosis of patients and we shall consider methods of combining samples from the same patient.
描述 核定金額:323000元
資料類型 report
dc.coverage.temporal 計畫年度:94 起迄日期:20050801~20060731en_US
dc.creator (作者) 余清祥zh_TW
dc.date (日期) 2005en_US
dc.date.accessioned 18-Apr-2007 16:36:53 (UTC+8)en_US
dc.date.accessioned 8-Sep-2008 16:07:14 (UTC+8)-
dc.date.available 18-Apr-2007 16:36:53 (UTC+8)en_US
dc.date.available 8-Sep-2008 16:07:14 (UTC+8)-
dc.date.issued (上傳時間) 18-Apr-2007 16:36:53 (UTC+8)en_US
dc.identifier (Other Identifiers) 942118M004001.pdfen_US
dc.identifier.uri (URI) http://tair.lib.ntu.edu.tw:8000/123456789/3862en_US
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/3862-
dc.description (描述) 核定金額:323000元en_US
dc.description.abstract (摘要) 在資料庫內容龐大紛雜的現代社會中,時效性往往是最重要的考量因素,以期在最短的時間內獲取近似、可接受的解答,為後續發展提供即時的建議。例如:醫師根據癌症病患的檢體報告,儘快判斷病患是否需要立即實施手術、化學治療,或甚至不需要任何治療、但須持續追蹤觀察。因為資料量的縮減通常代表較低的分析時間與成本,縮減資料自然成為講求時效及近似解答的最佳選擇之一,其中常見的方法包括直方圖(Histogram)、歧異值分解(Singular Value Decomposition)、索引樹(Index Tree)、抽樣、小波(Wavelet)等等。本計畫將使用攝護腺病人的蛋白質體資料庫(Proteomic data),其中病例個數約300人、變數個數卻接近5 萬個,以正確的病例分類為目標,比較幾種常見資料縮減方法的優劣。本計畫將預計分為三年進行:第一年使用人工篩選(錯誤較少、變數較少)過的蛋白質質譜儀數據,考慮以Support Vector Machine (SVM)、類神經網路、Classification and Regression Tree (CART)、羅吉士迴歸四種常見的分類方法,尋求在二元、分類標準下的最佳分類方法;第二年使用變數個數約5 萬個的原始資料,以二元分類為目標,配合之前較佳的分類方法,尋求可篩選出最多訊息的資料縮減方法;第三年則嘗試合併每位病人兩份檢體結果,以多元分 類為目標,獲得正確的病例診斷。-
dc.description.abstract (摘要) It is often needed to get quick approximate answers from large databases (i.e., data reduction), since obtaining answers quickly is important and it is acceptable to sacrifice the accuracy of the answer for speed. The reduction process is important in the exploratory data analysis, particularly when interactive response times are critical. For example, doctors need to decide from the medical exam if cancer patients need surgeries, chemical therapies, or thorough physical exam. Popular data reduction methods include histogram, singular value decomposition (SVD), index tree, sampling, and wavelet. We will use data from prostate cancer patients (Proteomic data), which include records of about 300 patients and almost 50,000 variables. Our goal is to include the data reduction methods to minimize the classification error. The project will be divided into three years. The focus of the first year is to explore the performance of frequently used classification methods, such as support vector machine (SVM), neural network, classification and regression tree, and logistic regression. We shall use the pre-processed data with only 779 variables and possible errors corrected manually, and the goal of the first year is binary classification. Data reduction methods will be considered in the second year and the raw data (about 48,000 variables and errors not corrected) will be used as well. The focus will be on the diagnosis of patients and we shall consider methods of combining samples from the same patient.-
dc.format applicaiton/pdfen_US
dc.format.extent bytesen_US
dc.format.extent 473639 bytesen_US
dc.format.extent 473639 bytes-
dc.format.extent 18630 bytes-
dc.format.mimetype application/pdfen_US
dc.format.mimetype application/pdfen_US
dc.format.mimetype application/pdf-
dc.format.mimetype text/plain-
dc.language zh-TWen_US
dc.language.iso zh-TWen_US
dc.publisher (Publisher) 臺北市:國立政治大學統計學系en_US
dc.rights (Rights) 行政院國家科學委員會en_US
dc.subject (關鍵詞) 資料縮減;分類;病例診斷;模擬-
dc.subject (關鍵詞) Data reduction;Classification;Diagnosis;Simulation-
dc.title (題名) 統計機器學習及其應用-病例分類與資料縮減研究-應用蛋白質資料庫檢測癌症(2/2)zh_TW
dc.title.alternative (其他題名) Disease Classification and Data Reduction--- Application to Cancer Detection Based on Proteomic-
dc.type (資料類型) reporten