學術產出-NSC Projects

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 接受者操作特徵函數線下面積之無母數迴歸分析
其他題名 Nonparametric Regression Analysis for the Area under the Roc Curve
作者 薛慧敏;張源俊
貢獻者 國立政治大學統計學系
行政院國家科學委員會
關鍵詞 統計;接受者;操作特徵函數線下面積;無母數迴歸分析
日期 2009
上傳時間 30-Aug-2012 09:59:37 (UTC+8)
摘要 98學年度 接受者操作特徵函數線下面積(AUC)無母數迴歸分析 接受者操作特徵函數線下面積(AUC)近年來為評估二元分類器(classifier)的判別力之常用準則。文獻上已有資訊工程學家發展可獲得最佳AUC之分類器,但是這些方法需要複雜與大量計算。另一種作法為先根據其他效率較高之方法建立分類器,再評量各特徵變數對此分類器之重要性或影響力,最終可進一步進行變數選取工作,以決定較佳、更為精簡的分類器。本計畫目標便是衡量特徵變數對此分類器之判別力之顯著性。我們考慮建立此分類器之AUC對特徵變數(自變數)之迴歸模型,並以各自變數相對應之迴歸係數之顯著性來評估此其影響力。本計畫考慮Cai與Dodd(2008)之AUC迴歸模型,並將提出一利用核函數配適(kernel fitting)之無母數估計方法,我們將更進一步推導此估計量之漸近特性,並發展統計檢定方法以檢定各迴歸係數之顯著性。我們最後將進行電腦模擬以驗證理論結果。 99學年度 部分接受者操作特徵函數線下面積(pAUC)無母數迴歸分析 在本計畫的第二年度,我們將上述AUC迴歸模型作推廣。若研究者的研究興趣僅針對ROC圖型中部份範圍,例如僅考慮低偽陽率(false positive rate)的狀況,或是考慮分類器之臨界點在某界限以上,則此時,部份接受者操作特徵函數線下面積(pAUC)為目標準則,而相對應之模型為pAUC迴歸模型。另外,若研究人員僅給定分類器之臨界點下限,我們將同時需要估計偽陽率之上限,使得估計之複雜程度加深。我們考慮利用上述之無母數估計方法,將進一步推導此估計量之漸近特性,並發展統計檢定方法以檢定各迴歸係數之顯著性。我們最後將進行電腦模擬以驗證理論結果。我們期望本研究成果在未來得以利用在高維度資料之分類器分析上,發展一具備統計意義及擁有理論背景之特徵選取方法。
YEAR 2009 Nonparametric regression analysis for the Area under the ROC curve The AUC (area under the receiver operating characteristic curve) criterion, which was proposed for evaluation of medical diagnostic tools, nowadays becomes more and more popular in assessing the discriminating power of a binary classification rule with continuous-scale. Recently, several classifiers optimizing directly AUC have been developed. However, a great amount of complex computations are required for large datasets. An alternative way is to build up a classifier based on all features first. The contribution of individual covariate or feature is then assessed and important or significant features are selected. To evaluate the significance of covariates, we consider a model that regress the AUC of the classifier on the covariates. In literature, a semi-parametric location-scale distribution estimation is employed in the AUC regression analysis. However, the estimation is complicated and no theoretical result has been provided. In this project, a nonparametric estimating procedure for the AUC regression model will be proposed. Theoretical justification on the limiting behaviors of the estimators will be derived. Consistent estimates of the standard errors of the estimators of the regression coefficients will be found. Furthermore, we will then construct statistical testing procedures for testing the significance of each predictor consequently. Intensive numerical studies will be also conducted for empirical verification. YEAR 2010 Nonparametric regression analysis for the partial Area under the ROC curve In the second year, the regression model is extended to a broader class, where the research interest is allowed to be only a restricted interval of ROC curve. Mostly, one is more interested in the lower range of the false positive rate. The correspondent summary measure is the partial AUC (pAUC). We then consider a model regress pAUC on the covariates. Sometimes, the investigators have sufficient knowledge in determining the range of the false positive rate of interest. Sometimes, only a lower bound of the critical value is provided. As a consequence, one needs to estimate the upper limit of the false positive rate as well and more complicated estimations will be involved. We will propose a nonparametric estimation for the pAUC regression model. Statistical testing procedures will be proposed and justified theoretically and empirically. The procedure is expected to provide a basis with sound theoretical results in developing a computationally efficient applicable feature selection approach for high-dimensional data analysis in the future.
關聯 基礎研究
學術補助
研究期間:9808~ 9907
研究經費:629仟元
資料類型 report
dc.contributor 國立政治大學統計學系en_US
dc.contributor 行政院國家科學委員會en_US
dc.creator (作者) 薛慧敏;張源俊zh_TW
dc.date (日期) 2009en_US
dc.date.accessioned 30-Aug-2012 09:59:37 (UTC+8)-
dc.date.available 30-Aug-2012 09:59:37 (UTC+8)-
dc.date.issued (上傳時間) 30-Aug-2012 09:59:37 (UTC+8)-
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/53414-
dc.description.abstract (摘要) 98學年度 接受者操作特徵函數線下面積(AUC)無母數迴歸分析 接受者操作特徵函數線下面積(AUC)近年來為評估二元分類器(classifier)的判別力之常用準則。文獻上已有資訊工程學家發展可獲得最佳AUC之分類器,但是這些方法需要複雜與大量計算。另一種作法為先根據其他效率較高之方法建立分類器,再評量各特徵變數對此分類器之重要性或影響力,最終可進一步進行變數選取工作,以決定較佳、更為精簡的分類器。本計畫目標便是衡量特徵變數對此分類器之判別力之顯著性。我們考慮建立此分類器之AUC對特徵變數(自變數)之迴歸模型,並以各自變數相對應之迴歸係數之顯著性來評估此其影響力。本計畫考慮Cai與Dodd(2008)之AUC迴歸模型,並將提出一利用核函數配適(kernel fitting)之無母數估計方法,我們將更進一步推導此估計量之漸近特性,並發展統計檢定方法以檢定各迴歸係數之顯著性。我們最後將進行電腦模擬以驗證理論結果。 99學年度 部分接受者操作特徵函數線下面積(pAUC)無母數迴歸分析 在本計畫的第二年度,我們將上述AUC迴歸模型作推廣。若研究者的研究興趣僅針對ROC圖型中部份範圍,例如僅考慮低偽陽率(false positive rate)的狀況,或是考慮分類器之臨界點在某界限以上,則此時,部份接受者操作特徵函數線下面積(pAUC)為目標準則,而相對應之模型為pAUC迴歸模型。另外,若研究人員僅給定分類器之臨界點下限,我們將同時需要估計偽陽率之上限,使得估計之複雜程度加深。我們考慮利用上述之無母數估計方法,將進一步推導此估計量之漸近特性,並發展統計檢定方法以檢定各迴歸係數之顯著性。我們最後將進行電腦模擬以驗證理論結果。我們期望本研究成果在未來得以利用在高維度資料之分類器分析上,發展一具備統計意義及擁有理論背景之特徵選取方法。en_US
dc.description.abstract (摘要) YEAR 2009 Nonparametric regression analysis for the Area under the ROC curve The AUC (area under the receiver operating characteristic curve) criterion, which was proposed for evaluation of medical diagnostic tools, nowadays becomes more and more popular in assessing the discriminating power of a binary classification rule with continuous-scale. Recently, several classifiers optimizing directly AUC have been developed. However, a great amount of complex computations are required for large datasets. An alternative way is to build up a classifier based on all features first. The contribution of individual covariate or feature is then assessed and important or significant features are selected. To evaluate the significance of covariates, we consider a model that regress the AUC of the classifier on the covariates. In literature, a semi-parametric location-scale distribution estimation is employed in the AUC regression analysis. However, the estimation is complicated and no theoretical result has been provided. In this project, a nonparametric estimating procedure for the AUC regression model will be proposed. Theoretical justification on the limiting behaviors of the estimators will be derived. Consistent estimates of the standard errors of the estimators of the regression coefficients will be found. Furthermore, we will then construct statistical testing procedures for testing the significance of each predictor consequently. Intensive numerical studies will be also conducted for empirical verification. YEAR 2010 Nonparametric regression analysis for the partial Area under the ROC curve In the second year, the regression model is extended to a broader class, where the research interest is allowed to be only a restricted interval of ROC curve. Mostly, one is more interested in the lower range of the false positive rate. The correspondent summary measure is the partial AUC (pAUC). We then consider a model regress pAUC on the covariates. Sometimes, the investigators have sufficient knowledge in determining the range of the false positive rate of interest. Sometimes, only a lower bound of the critical value is provided. As a consequence, one needs to estimate the upper limit of the false positive rate as well and more complicated estimations will be involved. We will propose a nonparametric estimation for the pAUC regression model. Statistical testing procedures will be proposed and justified theoretically and empirically. The procedure is expected to provide a basis with sound theoretical results in developing a computationally efficient applicable feature selection approach for high-dimensional data analysis in the future.en_US
dc.language.iso en_US-
dc.relation (關聯) 基礎研究en_US
dc.relation (關聯) 學術補助en_US
dc.relation (關聯) 研究期間:9808~ 9907en_US
dc.relation (關聯) 研究經費:629仟元en_US
dc.subject (關鍵詞) 統計;接受者;操作特徵函數線下面積;無母數迴歸分析en_US
dc.title (題名) 接受者操作特徵函數線下面積之無母數迴歸分析zh_TW
dc.title.alternative (其他題名) Nonparametric Regression Analysis for the Area under the Roc Curveen_US
dc.type (資料類型) reporten