多標記接受者操作特徵曲線下部分面積最佳線性組合之研究 | Publication

Publications-Theses

Article View/Open

pdf(178)

Publication Export

Google Scholar^TM

題名	多標記接受者操作特徵曲線下部分面積最佳線性組合之研究 The study on the optimal linear combination of markers based on the partial area under the ROC curve
作者	許嫚荏 Hsu, Man Jen
貢獻者	薛慧敏<br>張源俊 Hsueh, Huey Miin<br>Chang, Yuan Chin Ivan 許嫚荏 Hsu, Man Jen
關鍵詞	判別能力疾病偵測操作者特徵曲線下的部份面積標記選取最佳線性組合操作者特徵曲線特異度敏感度 Discriminatory power Hypothesis testing Optimal linear combination Partial area under ROC curve Stepwise biomarker selection Receiver operating curve Specificity Sensitivity
日期	2012
上傳時間	1-May-2013 11:52:23 (UTC+8)
摘要	本論文的研究目標是建構一個由多標記複合成的最佳疾病診斷工具，所考慮的評估準則為操作者特徵曲線在特定特異度範圍之線下面積(pAUC)。在常態分布假設下，我們推導多標記線性組合之pAUC以及最佳線性組合之必要條件。由於函數本身過於複雜使得計算困難。除此之外，我們也發現其最佳解可能不唯一，以及局部極值存在，這些情況使得現有演算法的運用受限，我們因此提出多重初始值演算法。當母體參數未知時，我們利用最大概似估計量以獲得樣本pAUC以及令其極大化之最佳線性組合，並證明樣本最佳線性組合將一致性地收斂到母體最佳線性組合。在進一步的研究中，我們針對單標記的邊際判別能力、多標記的複合判別能力以及個別標記的條件判別能力，分別提出相關統計檢定方法。這些統計檢定被運用至兩個標記選取的方法，分別是前進選擇法與後退淘汰法。我們運用這些方法以選取與疾病檢測有顯著相關的標記。本論文透過模擬研究來驗證所提出的演算法、統計檢定方法以及標記選取的方法。另外，也將這些方法運用在數組實際資料上。 The aim of this work is to construct a composite diagnostic tool based on multiple biomarkers under the criterion of the partial area under a ROC curve (pAUC) for a predetermined specificity range. Recently several studies are interested in the optimal linear combination maximizing the whole area under a ROC curve (AUC). In this study, we focus on finding the optimal linear combination by a direct maximization of the pAUC under normal assumption. In order to find an analytic solution, the first derivative of the pAUC is derived. The form is so complicated, that a further validation on the Hessian matrix is difficult. In addition, we find that the pAUC maximizer may not be unique and sometimes, local maximizers exist. As a result, the existing algorithms, which depend on the initial-point, are inadequate to serve our needs. We propose a new algorithm by adopting several initial points at one time. In addition, when the population parameters are unknown and only a random sample data set is available, the maximizer of the sample version of the pAUC is shown to be a strong consistent estimator of its theoretical counterpart. We further focus on determining whether a biomarker set, or one specific biomarker has a significant contribution to the disease diagnosis. We propose three statistical tests for the identification of the discriminatory power. The proposed tests are applied to biomarker selection for reducing the variable number in advanced analysis. Numerical studies are performed to validate the proposed algorithm and the proposed statistical procedures.
參考文獻	[1] Baker, S. G., Pinsky, P. F., 2001. A proposed design and analysis for comparing digital and analog mammography: special receiver operating characteristic methods for cancer screening. Journal of the American Statistical Association 96, 421–428. [2] Bamber, D., 1975. The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology 12, 387–415. [3] Bast Jr, R., 1993. Perspectives on the future of cancer markers. Clinical Chemistry 39, 2444–2451. [4] Beam, C. A., Conant, E. F., A.Sickles, E., Weinstein,S. P., 2003. Evaluation of proscriptive health care policy implementation in screening mammography. Radiology 229, 534–540. [5] Blume, J. D., 2009. Bounding sample size projections for the area under a roc curve. Journal of Statistical Planning and Inference 139, 711–721. [6] DeLong, E. R., DeLong, D. M., Clarke-Pearson, D. L., 1988. Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics 44, 837–845. [7] Friedman, J. H., Popescu, B. E., 2004. Gradient directed regularization for linear regression and classification [online]. [8] Janes, H., Pepe, M., 2006. The optimal ratio of cases to controls for estimating the classification accuracy of a biomarker. Biostatistics 7, 456–468. [9] Komori, O., Eguchi, S., 2010. A boosting method for maximizing the partial area under the roc curve [online]. BMC Bioinformatics 11, 314. [10] Li, C., Liao, C., Liu, J., 2008. A non-inferiority test for diagnostic accuracy based on the paired partial areas under roc curves. Statistics in Medicine 27, 1762–1776. [11] Liu, A., Schisterman, E., Zhu, Y., 2005. On linear combinations of biomarkers to improve diagnostic accuracy. Statistics in Medicine 24, 37–47. [12] Ma, S., Huang, J., 2005. Regularized roc method for disease classification and biomarker selection with microarray data. Bioinformatics 21, 4356–4362. [13] Marsaglia, G., 1972. Choosing a point from the surface of a sphere. The Annals of Mathematical Statistics 43, 645–646. [14] Marshall, R., 1989. The predictive value of simple rules for combining two diagnostic tests. Biometrics 45, 1213–1222. [15] McClish, D., 1989. Analyzing a portion of the ROC curve. Medical Decision Making 9, 190–195. [16] Muller, M., 1959. A note on a method for generating points uniformly on n-dimensional spheres. Communications of the ACM 2, 19–20. [17] Obuchowski, N., McClish, D. K., 1997. Sample size determination for diagnostic accuracy studies involving binormal roc curve indices. Statistics in Medicine 16, 1529–1542. [18] Obuchowski, N. A., 2000. Sample size tables for receiver operating characteristic studies. American Journal of Roentgenology 175, 603–608. [19] Pepe, M., 2004. The Statistical Evaluation Of Medical Tests For Classification And Prediction. Oxford Statistical Science Series. Oxford University Press. [20] Pepe, M., Thompson, M., 2000. Combining diagnostic test results to increase accuracy. Biostatistics 1, 123–140. [21] Schott, J., 2005. Matrix Analysis For Statistics. Wiley Series in Probability and Statistics. Wiley. [22] Shao, J., 1999. Mathematical Statistics. Springer- Verlag Inc. [23] Silva, J. E., Mqrques, J. P., Jossinet, J., 2000. Classification of breast tissue by electrical impedance spectroscopy. Medical and Biological Engineering and Computing 38, 26–30. [24] Su, H. M., Voon, W. C., Lin, T. H., Lee, K. T., Chu, C. S., Lee, M. Y., Sheu, S. H., Lai, W. T., 2004. Ankle-brachial pressure index measured using an automated oscillometric method as a predictor of the severity of coronary atherosclerosis in patients with coronary artery disease. The Kaohsiung Journal of Medical Sciences 20, 268–272. [25] Su, J., Liu, J., 1993. Linear combinations of multiple diagnostic markers. Journal of the American Statistical Association 88, 1350–1355. [26] Thompson, M., Zucchini, W., 1989. On the statistical analysis of ROC curves. Statistics in Medicine 8, 1277–1290. [27] Tian, L., 2010. Confidence interval estimation of partial area under curve based on combined biomarkers. Computational Statistics & Data Analysis 54, 466–472. [28] Wang, Z., Chang, Y.-C. I., 2010. Marker selection via maximizing the partial area under the roc curve of linear risk scores. Biostatistics 12, 369–385. [29] Woolas, R., Conaway, M., Xu, F., Jacobs, I., Yu, Y., Daly, L., Davies, A., O’Briant, K., Berchuck, A., Soper, J., et al., 1995. Combinations of multiple serum markers are superior to individual assays for discriminating malignant from benign pelvic masses. Gynecologic Oncology 59, 111–116.
描述	博士國立政治大學統計研究所 95354503 101
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0095354503
資料類型	thesis

dc.contributor.advisor	薛慧敏<br>張源俊	zh_TW
dc.contributor.advisor	Hsueh, Huey Miin<br>Chang, Yuan Chin Ivan	en_US
dc.contributor.author (Authors)	許嫚荏	zh_TW
dc.contributor.author (Authors)	Hsu, Man Jen	en_US
dc.creator (作者)	許嫚荏	zh_TW
dc.creator (作者)	Hsu, Man Jen	en_US
dc.date (日期)	2012	en_US
dc.date.accessioned	1-May-2013 11:52:23 (UTC+8)	-
dc.date.available	1-May-2013 11:52:23 (UTC+8)	-
dc.date.issued (上傳時間)	1-May-2013 11:52:23 (UTC+8)	-
dc.identifier (Other Identifiers)	G0095354503	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/57973	-
dc.description (描述)	博士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	統計研究所	zh_TW
dc.description (描述)	95354503	zh_TW
dc.description (描述)	101	zh_TW
dc.description.abstract (摘要)	本論文的研究目標是建構一個由多標記複合成的最佳疾病診斷工具，所考慮的評估準則為操作者特徵曲線在特定特異度範圍之線下面積(pAUC)。在常態分布假設下，我們推導多標記線性組合之pAUC以及最佳線性組合之必要條件。由於函數本身過於複雜使得計算困難。除此之外，我們也發現其最佳解可能不唯一，以及局部極值存在，這些情況使得現有演算法的運用受限，我們因此提出多重初始值演算法。當母體參數未知時，我們利用最大概似估計量以獲得樣本pAUC以及令其極大化之最佳線性組合，並證明樣本最佳線性組合將一致性地收斂到母體最佳線性組合。在進一步的研究中，我們針對單標記的邊際判別能力、多標記的複合判別能力以及個別標記的條件判別能力，分別提出相關統計檢定方法。這些統計檢定被運用至兩個標記選取的方法，分別是前進選擇法與後退淘汰法。我們運用這些方法以選取與疾病檢測有顯著相關的標記。本論文透過模擬研究來驗證所提出的演算法、統計檢定方法以及標記選取的方法。另外，也將這些方法運用在數組實際資料上。	zh_TW
dc.description.abstract (摘要)	The aim of this work is to construct a composite diagnostic tool based on multiple biomarkers under the criterion of the partial area under a ROC curve (pAUC) for a predetermined specificity range. Recently several studies are interested in the optimal linear combination maximizing the whole area under a ROC curve (AUC). In this study, we focus on finding the optimal linear combination by a direct maximization of the pAUC under normal assumption. In order to find an analytic solution, the first derivative of the pAUC is derived. The form is so complicated, that a further validation on the Hessian matrix is difficult. In addition, we find that the pAUC maximizer may not be unique and sometimes, local maximizers exist. As a result, the existing algorithms, which depend on the initial-point, are inadequate to serve our needs. We propose a new algorithm by adopting several initial points at one time. In addition, when the population parameters are unknown and only a random sample data set is available, the maximizer of the sample version of the pAUC is shown to be a strong consistent estimator of its theoretical counterpart. We further focus on determining whether a biomarker set, or one specific biomarker has a significant contribution to the disease diagnosis. We propose three statistical tests for the identification of the discriminatory power. The proposed tests are applied to biomarker selection for reducing the variable number in advanced analysis. Numerical studies are performed to validate the proposed algorithm and the proposed statistical procedures.	en_US
dc.description.tableofcontents	Contents 1 Introduction 1 1.1 Motivation 1 1.2 Outline 5 2 The Linear Combination Achieving the Optimal Partial Area under the ROC Curve 7 2.1 Partial Area under the ROC curve (pAUC) 7 2.2 Computational Issues 10 2.3 Multiple-Initial Algorithm 11 3 Statistical Inference Related with the pAUC Maximizer 14 3.1 Estimating the Linear Combination Maximizing the pAUC 14 3.2 Testing the Discriminatory Power 15 3.3 Biomarker Selection 19 4 Simulation Study 23 4.1 Multiple-Initial Algorithm 24 4.2 Statistical Inference 25 4.3 Two-Biomarker Study 44 5 Real Examples 57 5.1 Atherosclerotic Coronary Heart Disease Data 58 5.2 Duchenne Muscular Dystrophy (DMD) Data 62 5.3 Breast Tissue Data 65 5.4 Magic Gamma Telescope Data 70 6 Conclusions and Future Works 76 6.1 Conclusions 76 6.2 Future Works 79 A Proofs 81 A.1 Proof of Theorem 1 81 A.2 Proof of Corollary 1 82 A.3 Lemma 1 83 A.4 Lemma 2 83 A.5 Proof of Theorem 2 83 A.6 Proof of Lemma 1 85 A.7 Proof of Lemma 2 86	zh_TW
dc.format.extent	2388593 bytes	-
dc.format.mimetype	application/pdf	-
dc.language.iso	en_US	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0095354503	en_US
dc.subject (關鍵詞)	判別能力	zh_TW
dc.subject (關鍵詞)	疾病偵測	zh_TW
dc.subject (關鍵詞)	操作者特徵曲線下的部份面積	zh_TW
dc.subject (關鍵詞)	標記選取	zh_TW
dc.subject (關鍵詞)	最佳線性組合	zh_TW
dc.subject (關鍵詞)	操作者特徵曲線	zh_TW
dc.subject (關鍵詞)	特異度	zh_TW
dc.subject (關鍵詞)	敏感度	zh_TW
dc.subject (關鍵詞)	Discriminatory power	en_US
dc.subject (關鍵詞)	Hypothesis testing	en_US
dc.subject (關鍵詞)	Optimal linear combination	en_US
dc.subject (關鍵詞)	Partial area under ROC curve	en_US
dc.subject (關鍵詞)	Stepwise biomarker selection	en_US
dc.subject (關鍵詞)	Receiver operating curve	en_US
dc.subject (關鍵詞)	Specificity	en_US
dc.subject (關鍵詞)	Sensitivity	en_US
dc.title (題名)	多標記接受者操作特徵曲線下部分面積最佳線性組合之研究	zh_TW
dc.title (題名)	The study on the optimal linear combination of markers based on the partial area under the ROC curve	en_US
dc.type (資料類型)	thesis	en
dc.relation.reference (參考文獻)	[1] Baker, S. G., Pinsky, P. F., 2001. A proposed design and analysis for comparing digital and analog mammography: special receiver operating characteristic methods for cancer screening. Journal of the American Statistical Association 96, 421–428. [2] Bamber, D., 1975. The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology 12, 387–415. [3] Bast Jr, R., 1993. Perspectives on the future of cancer markers. Clinical Chemistry 39, 2444–2451. [4] Beam, C. A., Conant, E. F., A.Sickles, E., Weinstein,S. P., 2003. Evaluation of proscriptive health care policy implementation in screening mammography. Radiology 229, 534–540. [5] Blume, J. D., 2009. Bounding sample size projections for the area under a roc curve. Journal of Statistical Planning and Inference 139, 711–721. [6] DeLong, E. R., DeLong, D. M., Clarke-Pearson, D. L., 1988. Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics 44, 837–845. [7] Friedman, J. H., Popescu, B. E., 2004. Gradient directed regularization for linear regression and classification [online]. [8] Janes, H., Pepe, M., 2006. The optimal ratio of cases to controls for estimating the classification accuracy of a biomarker. Biostatistics 7, 456–468. [9] Komori, O., Eguchi, S., 2010. A boosting method for maximizing the partial area under the roc curve [online]. BMC Bioinformatics 11, 314. [10] Li, C., Liao, C., Liu, J., 2008. A non-inferiority test for diagnostic accuracy based on the paired partial areas under roc curves. Statistics in Medicine 27, 1762–1776. [11] Liu, A., Schisterman, E., Zhu, Y., 2005. On linear combinations of biomarkers to improve diagnostic accuracy. Statistics in Medicine 24, 37–47. [12] Ma, S., Huang, J., 2005. Regularized roc method for disease classification and biomarker selection with microarray data. Bioinformatics 21, 4356–4362. [13] Marsaglia, G., 1972. Choosing a point from the surface of a sphere. The Annals of Mathematical Statistics 43, 645–646. [14] Marshall, R., 1989. The predictive value of simple rules for combining two diagnostic tests. Biometrics 45, 1213–1222. [15] McClish, D., 1989. Analyzing a portion of the ROC curve. Medical Decision Making 9, 190–195. [16] Muller, M., 1959. A note on a method for generating points uniformly on n-dimensional spheres. Communications of the ACM 2, 19–20. [17] Obuchowski, N., McClish, D. K., 1997. Sample size determination for diagnostic accuracy studies involving binormal roc curve indices. Statistics in Medicine 16, 1529–1542. [18] Obuchowski, N. A., 2000. Sample size tables for receiver operating characteristic studies. American Journal of Roentgenology 175, 603–608. [19] Pepe, M., 2004. The Statistical Evaluation Of Medical Tests For Classification And Prediction. Oxford Statistical Science Series. Oxford University Press. [20] Pepe, M., Thompson, M., 2000. Combining diagnostic test results to increase accuracy. Biostatistics 1, 123–140. [21] Schott, J., 2005. Matrix Analysis For Statistics. Wiley Series in Probability and Statistics. Wiley. [22] Shao, J., 1999. Mathematical Statistics. Springer- Verlag Inc. [23] Silva, J. E., Mqrques, J. P., Jossinet, J., 2000. Classification of breast tissue by electrical impedance spectroscopy. Medical and Biological Engineering and Computing 38, 26–30. [24] Su, H. M., Voon, W. C., Lin, T. H., Lee, K. T., Chu, C. S., Lee, M. Y., Sheu, S. H., Lai, W. T., 2004. Ankle-brachial pressure index measured using an automated oscillometric method as a predictor of the severity of coronary atherosclerosis in patients with coronary artery disease. The Kaohsiung Journal of Medical Sciences 20, 268–272. [25] Su, J., Liu, J., 1993. Linear combinations of multiple diagnostic markers. Journal of the American Statistical Association 88, 1350–1355. [26] Thompson, M., Zucchini, W., 1989. On the statistical analysis of ROC curves. Statistics in Medicine 8, 1277–1290. [27] Tian, L., 2010. Confidence interval estimation of partial area under curve based on combined biomarkers. Computational Statistics & Data Analysis 54, 466–472. [28] Wang, Z., Chang, Y.-C. I., 2010. Marker selection via maximizing the partial area under the roc curve of linear risk scores. Biostatistics 12, 369–385. [29] Woolas, R., Conaway, M., Xu, F., Jacobs, I., Yu, Y., Daly, L., Davies, A., O’Briant, K., Berchuck, A., Soper, J., et al., 1995. Combinations of multiple serum markers are superior to individual assays for discriminating malignant from benign pelvic masses. Gynecologic Oncology 59, 111–116.	zh_TW

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM