Publications-Theses
Article View/Open
Publication Export
-
Google ScholarTM
NCCU Library
Citation Infomation
Related Publications in TAIR
題名 自變數有誤差的邏輯式迴歸模型:估計、實驗設計及序貫分析
Logistic regression models when covariates are measured with errors: Estimation, design and sequential method作者 簡至毅
Chien, Chih Yi貢獻者 薛慧敏<br>張源俊
Hsueh, Huey Mriin<br>Chang, Yuan Chin
簡至毅
Chien, Chih Yi關鍵詞 邏輯式迴歸
測量誤差
樣本數計算
序貫分析
二階段抽樣
logistic regression model
measurement error
sample size calculation
sequential sampling
two-stage case-control sampling
Case-control study日期 2009 上傳時間 9-May-2016 15:14:17 (UTC+8) 摘要 本文主要在探討自變數存在有測量誤差時,邏輯式迴歸模型的估計問題,並設計實驗使得測量誤差能滿足遞減假設,進一步應用序貫分析方法,在給定水準下,建立一個信賴範圍。 當自變數存在有測量誤差時,通常會得到有偏誤的估計量,進而在做決策時會得到與無測量誤差所做出的決策不同。在本文中提出了一個遞減的測量誤差,使得滿足這樣的假設,可以證明估計量的強收斂,並證明與無測量誤差所得到的估計量相同的近似分配。相較於先前的假設,特別是證明大樣本的性質,新增加的樣本會有更小的測量誤差是更加合理的假設。我們同時設計了一個實驗來滿足所提出遞減誤差的條件,並利用序貫設計得到一個更省時也節省成本的處理方法。 一般的case-control實驗,自變數也會出現測量誤差,我們也證明了斜率估計量的強收斂與近似分配的性質,並提出一個二階段抽樣方法,計算出所需的樣本數及建立信賴區間。
In this thesis, we focus on the estimate of unknown parameters, experimental designs and sequential methods in both prospective and retrospective logistic regression models when there are covariates measured with errors. The imprecise measurement of exposure happens very often in practice, for example, in retrospective epidemiology studies, that may due to either the difficulty or the cost of measuring. It is known that the imprecisely measured variables can result in biased coefficients estimation in a regression model and therefore, it may lead to an incorrect inference. Thus, it is an important issue if the effects of the variables are of primary interest. When considering a prospective logistic regression model, we derive asymptotic results for the estimators of the regression parameters when there are mismeasured covariates. If the measurement error satisfies certain assumptions, we show that the estimators follow the normal distribution with zero mean, asymptotically unbiased and asymptotically normally distributed. Contrary to the traditional assumption on measurement error, which is mainly used for proving large sample properties, we assume that the measurement error decays gradually at a certain rate as there is a new observation added to the model. This kind of assumption can be fulfilled when the usual replicate observation method is used to dilute the magnitude of measurement errors, and therefore, is also more useful in practical viewpoint. Moreover, the independence of measurement error and covariate is not required in our theorems. An experimental design with measurement error satisfying the required degenerating rate is introduced. In addition, this assumption allows us to employ sequential sampling, which is popular in clinical trials, to such a measurement error logistic regression model. It is clear that the sequential method cannot be applied based on the assumption that the measurement errors decay uniformly as sample size increasing as in the most of the literature. Therefore, a sequential estimation procedure based on MLEs and such moment conditions is proposed and can be shown to be asymptotical consistent and efficient. Case-control studies are broadly used in clinical trials and epidemiological studies. It can be showed that the odds ratio can be consistently estimated with some exposure variables based on logistic models (see Prentice and Pyke (1979)). The two-stage case-control sampling scheme is employed for a confidence region of slope coefficient beta. A necessary sample size is calculated by a given pre-determined level. Furthermore, we consider the measurement error in the covariates of a case-control retrospective logistic regression model. We also derive some asymptotic results of the maximum likelihood estimators (MLEs) of the regression coefficients under some moment conditions on measurement errors. Under such kinds of moment conditions of measurement errors, the MLEs can be shown to be strongly consistent, asymptotically unbiased and asymptotically normally distributed. Some simulation results of the proposed two-stage procedures are obtained. We also give some numerical studies and real data to verify the theoretical results in different measurement error scenarios.參考文獻 [1] Anderson, J. A. (1972). Separate sample logistic discrimination. Biometrika, 59, 19-35. [2] Begg, M. D. and Lagakos S. W. (1992). Effects of mismodeling on tests of association based on logistic regression models. The Annals of Statistics, 20, 1929-1952. [3] Carroll, R. J., Ruppert, D., Stefanski, L. A. and Crainiceanu C. M. (2006). Measurement Error in Nonlinear Models (2nd ed.). London: Chapman & Hall/CRC. [4] Chang, Y-c. I. and Martinsek, A. T. (1992). Fixed size condence regions for parameters of a logistic regression model. The Annals of Statistics, 20, 1953-1969. [5] Chang, Y-c. I. (2001). Sequential condence regions of generalized linear models with adaptive designs. Journal of Statistical Planning and Inference, 93, 277-293. [6] Chen, K. (2000). Optimal Sequential Designs of Case-Control Studies. The Annals of Statistics, 28, 1452-1471. [7] Cheng, C-L. and Van Ness, J. W. (1999). Statistical Regression with Measurement Error. London: Oxford University Press. [8] Chow, Y. S. and Robbins, H. (1965). On the Asymptotic Theory of Fixed-Width Sequential Condence Intervals for the Mean. The Annals of Mathematical Statistics, 36, 457-462. [9] Chow, Y. S. and Teicher, H. (1978). Probability Theory: Independence Interchangeability Martingales. New York: Springer-Verlag. [10] Demark-Wahnefried, W., Clipp, E. C., Lipkus, I. M., Lobach, D., Snyder, D. C., Sloane, R., Peterson, B., Macri, J. M., Rock, C. L., McBride, C. M. and Kraus, W. E. (2007). Main Outcomes of the FRESH START Trial: A Sequentially Tailored, Diet and Exercise Mailed Print Intervention Among Breast and Prostate Cancer Survivors. Journal of Clinical Oncology, 25, 2709-2718. [11] Etzioni, R., Pepe, M., Longton, G., Hu, C. and Goodman, G. (1999). Incorporating The Time Dimension in Receiver Operating Characterstic Curves: a Case Study of Prostat Cancer. Medical Decision Making, 19, 242-251. [12] Farewell, V. T. (1979). Some Results on the Estimation of Logistic Models Based on Retrospective Data. Biometrika, 66, 27-32. [13] Fuller, W. A. (1980). Properties of Some Estimators for the Errors-in-Variables Model. The Annals of statistics, 8, 407-422. [14] Fuller, W. A. (1987). Measurement Error Models. New York: John Wiley & Sons, Inc. [15] Gleser, C. J. (1981). Estimation in a Multivariate \\Errors in Variables" Regression Model: Large Sample Results. The Annals of Statistics, 9, 24-44. [16] Janes, H., Pepe, M., Kooperberg, C. and Newcomb, P. (2005). Identifying Target Populations for Screening or Not Screening Using Logic Regression. Statistics in Medicine, 24, 1321-1338. [17] Kalohn, J. C. and Spray, J. A. (1999). The Eect of Model Misspecication on Classication Decisions Made Using a Computerized Test. Journal of Educational Measurement, 36, 47-59. [18] Merle, Y., Aouimer, A. and Tod, M. (2004). Impact of Model Misspecication at Design (and/or) Estimation Step in Population Pharmacokinetic Studies. Journal of Biopharmaceutical Statistics, 14, 213-227. [19] O`neill, R. T. and Anello, C. (1978). Case-control Studies: A Sequential Approach. American Journal of Epidemiology, 120, 145-153. [20] Owen, J. D. and James, M. S. (1998). Estimating Sample Size for Epidemiologic Studies: The Impact of Ignoring Exposure Measurement Uncertainty. Statistics in Medicine, 17, 1375-1389. [21] Pagano, M. and Gauvreau, K. (2000). Principles of Biostatistics (2nd ed.). Pacic Grove, California: Duxbury. [22] Paul, G. and Nhu, D. L. (2002). Comparing the Effects of Continuous and Discrete Covariate Mismeasurement, with Emphasis on the Dichotomization of Mismeasured Predictors. Biometrics, 58, 878-887. [23] Pierce, J. P., Stefanick, M. L., Flatt, S. W., Natarajan, L., Sternfeld, B., Madlensky, L., Al-Delaimy, W. K., Thomson, C. A., Kealey, S., Hajek, R., Parker, B. A., Newman, V. A., Caan, B. and Rock, C. L. (2007). Greater Survival After Breast Cancer in Physically Active Women With High Vegetable-Fruit Intake Regardless of Obesity. Journal of Clinical Oncology, 25, 2345-2351. [24] Prentice, R. L. and Pyke, R. (1979). Logistic Disease Incidence Models and Case-Control Studies. Biometrika, 66, 403-411. [25] Smith, P. (1997). Model Misspecication in Data Envelopment Analysis. Annals of Operations Research, 73, 233-252. [26] Tosteson, T. D., Buzas, J. S., Demidenko, E. and Karagas, M. (2003). Power and Sample Size Calculations for Generalized Regression Models with Covariate Measurement Error. Statistics in Medicine, 22, 1069-1082. [27] Urmanov, A. M., Gribok, A. V., Hines, J. W. and Uhrig, R. E. (2002). An Information Approach to Regularization Parameter Selection under Model Misspecication. Inverse Problems, 18, 1207-1228. [28] Wang, C. Y. and Wang, S. (1995). On Information Matrices in Casecontrol Studies. Statistics and Probability Letters, 22, 269-274. [29] Woodroofe, M. (1982). Nonlinear renewal theory in sequential analysis. Philadelphia, Pa: Society for Industrial and Applied Mathematics. 描述 博士
國立政治大學
統計學系
92354503資料來源 http://thesis.lib.nccu.edu.tw/record/#G0923545033 資料類型 thesis dc.contributor.advisor 薛慧敏<br>張源俊 zh_TW dc.contributor.advisor Hsueh, Huey Mriin<br>Chang, Yuan Chin en_US dc.contributor.author (Authors) 簡至毅 zh_TW dc.contributor.author (Authors) Chien, Chih Yi en_US dc.creator (作者) 簡至毅 zh_TW dc.creator (作者) Chien, Chih Yi en_US dc.date (日期) 2009 en_US dc.date.accessioned 9-May-2016 15:14:17 (UTC+8) - dc.date.available 9-May-2016 15:14:17 (UTC+8) - dc.date.issued (上傳時間) 9-May-2016 15:14:17 (UTC+8) - dc.identifier (Other Identifiers) G0923545033 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/95126 - dc.description (描述) 博士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 統計學系 zh_TW dc.description (描述) 92354503 zh_TW dc.description.abstract (摘要) 本文主要在探討自變數存在有測量誤差時,邏輯式迴歸模型的估計問題,並設計實驗使得測量誤差能滿足遞減假設,進一步應用序貫分析方法,在給定水準下,建立一個信賴範圍。 當自變數存在有測量誤差時,通常會得到有偏誤的估計量,進而在做決策時會得到與無測量誤差所做出的決策不同。在本文中提出了一個遞減的測量誤差,使得滿足這樣的假設,可以證明估計量的強收斂,並證明與無測量誤差所得到的估計量相同的近似分配。相較於先前的假設,特別是證明大樣本的性質,新增加的樣本會有更小的測量誤差是更加合理的假設。我們同時設計了一個實驗來滿足所提出遞減誤差的條件,並利用序貫設計得到一個更省時也節省成本的處理方法。 一般的case-control實驗,自變數也會出現測量誤差,我們也證明了斜率估計量的強收斂與近似分配的性質,並提出一個二階段抽樣方法,計算出所需的樣本數及建立信賴區間。 zh_TW dc.description.abstract (摘要) In this thesis, we focus on the estimate of unknown parameters, experimental designs and sequential methods in both prospective and retrospective logistic regression models when there are covariates measured with errors. The imprecise measurement of exposure happens very often in practice, for example, in retrospective epidemiology studies, that may due to either the difficulty or the cost of measuring. It is known that the imprecisely measured variables can result in biased coefficients estimation in a regression model and therefore, it may lead to an incorrect inference. Thus, it is an important issue if the effects of the variables are of primary interest. When considering a prospective logistic regression model, we derive asymptotic results for the estimators of the regression parameters when there are mismeasured covariates. If the measurement error satisfies certain assumptions, we show that the estimators follow the normal distribution with zero mean, asymptotically unbiased and asymptotically normally distributed. Contrary to the traditional assumption on measurement error, which is mainly used for proving large sample properties, we assume that the measurement error decays gradually at a certain rate as there is a new observation added to the model. This kind of assumption can be fulfilled when the usual replicate observation method is used to dilute the magnitude of measurement errors, and therefore, is also more useful in practical viewpoint. Moreover, the independence of measurement error and covariate is not required in our theorems. An experimental design with measurement error satisfying the required degenerating rate is introduced. In addition, this assumption allows us to employ sequential sampling, which is popular in clinical trials, to such a measurement error logistic regression model. It is clear that the sequential method cannot be applied based on the assumption that the measurement errors decay uniformly as sample size increasing as in the most of the literature. Therefore, a sequential estimation procedure based on MLEs and such moment conditions is proposed and can be shown to be asymptotical consistent and efficient. Case-control studies are broadly used in clinical trials and epidemiological studies. It can be showed that the odds ratio can be consistently estimated with some exposure variables based on logistic models (see Prentice and Pyke (1979)). The two-stage case-control sampling scheme is employed for a confidence region of slope coefficient beta. A necessary sample size is calculated by a given pre-determined level. Furthermore, we consider the measurement error in the covariates of a case-control retrospective logistic regression model. We also derive some asymptotic results of the maximum likelihood estimators (MLEs) of the regression coefficients under some moment conditions on measurement errors. Under such kinds of moment conditions of measurement errors, the MLEs can be shown to be strongly consistent, asymptotically unbiased and asymptotically normally distributed. Some simulation results of the proposed two-stage procedures are obtained. We also give some numerical studies and real data to verify the theoretical results in different measurement error scenarios. en_US dc.description.tableofcontents 1 Introduction 1 1.1 Motivation 1 1.2 Outline of the Study 6 2 Logistic Regression Models with Mis-measured Covariate 7 2.1 Logistic Regression Model 7 2.2 Mismeasured Covariate9 2.3 Sample Size Determinations 14 2.4 Sequential Sampling Scheme 17 2.5 Simulation Study 20 2.6 An example: Bronchopulmonary Dysplasia Study 24 3 Retrospective Logistic Regression Models 27 3.1 Basic Concept 27 3.2 Optimal Case-control Ratio and Two-stage Sampling Scheme 31 3.3 Mismeasured Covariate in Case-control Studies 44 3.4 Simulation Study 47 3.5 Real example: Framingham Heart Study 51 4 Conclusions and Future Works 54 4.1 Conclusions 54 4.2 Future Works 56 A Proves of Theorems 58 A.1 Proof of Theorem 1 58 A.2 Proof of Theorem 2 61 A.3 Proof of Corollary 3 65 A.4 Proof of Corollary 4 65 A.5 Proof of Corollary 5 66 A.6 Proof of Theorem 6 66 A.7 Proof of Theorem 7 67 A.8 Proof of Corollary 8 and Proof of Corollary 9 67 Bibliography 68 zh_TW dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0923545033 en_US dc.subject (關鍵詞) 邏輯式迴歸 zh_TW dc.subject (關鍵詞) 測量誤差 zh_TW dc.subject (關鍵詞) 樣本數計算 zh_TW dc.subject (關鍵詞) 序貫分析 zh_TW dc.subject (關鍵詞) 二階段抽樣 zh_TW dc.subject (關鍵詞) logistic regression model en_US dc.subject (關鍵詞) measurement error en_US dc.subject (關鍵詞) sample size calculation en_US dc.subject (關鍵詞) sequential sampling en_US dc.subject (關鍵詞) two-stage case-control sampling en_US dc.subject (關鍵詞) Case-control study en_US dc.title (題名) 自變數有誤差的邏輯式迴歸模型:估計、實驗設計及序貫分析 zh_TW dc.title (題名) Logistic regression models when covariates are measured with errors: Estimation, design and sequential method en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) [1] Anderson, J. A. (1972). Separate sample logistic discrimination. Biometrika, 59, 19-35. [2] Begg, M. D. and Lagakos S. W. (1992). Effects of mismodeling on tests of association based on logistic regression models. The Annals of Statistics, 20, 1929-1952. [3] Carroll, R. J., Ruppert, D., Stefanski, L. A. and Crainiceanu C. M. (2006). Measurement Error in Nonlinear Models (2nd ed.). London: Chapman & Hall/CRC. [4] Chang, Y-c. I. and Martinsek, A. T. (1992). Fixed size condence regions for parameters of a logistic regression model. The Annals of Statistics, 20, 1953-1969. [5] Chang, Y-c. I. (2001). Sequential condence regions of generalized linear models with adaptive designs. Journal of Statistical Planning and Inference, 93, 277-293. [6] Chen, K. (2000). Optimal Sequential Designs of Case-Control Studies. The Annals of Statistics, 28, 1452-1471. [7] Cheng, C-L. and Van Ness, J. W. (1999). Statistical Regression with Measurement Error. London: Oxford University Press. [8] Chow, Y. S. and Robbins, H. (1965). On the Asymptotic Theory of Fixed-Width Sequential Condence Intervals for the Mean. The Annals of Mathematical Statistics, 36, 457-462. [9] Chow, Y. S. and Teicher, H. (1978). Probability Theory: Independence Interchangeability Martingales. New York: Springer-Verlag. [10] Demark-Wahnefried, W., Clipp, E. C., Lipkus, I. M., Lobach, D., Snyder, D. C., Sloane, R., Peterson, B., Macri, J. M., Rock, C. L., McBride, C. M. and Kraus, W. E. (2007). Main Outcomes of the FRESH START Trial: A Sequentially Tailored, Diet and Exercise Mailed Print Intervention Among Breast and Prostate Cancer Survivors. Journal of Clinical Oncology, 25, 2709-2718. [11] Etzioni, R., Pepe, M., Longton, G., Hu, C. and Goodman, G. (1999). Incorporating The Time Dimension in Receiver Operating Characterstic Curves: a Case Study of Prostat Cancer. Medical Decision Making, 19, 242-251. [12] Farewell, V. T. (1979). Some Results on the Estimation of Logistic Models Based on Retrospective Data. Biometrika, 66, 27-32. [13] Fuller, W. A. (1980). Properties of Some Estimators for the Errors-in-Variables Model. The Annals of statistics, 8, 407-422. [14] Fuller, W. A. (1987). Measurement Error Models. New York: John Wiley & Sons, Inc. [15] Gleser, C. J. (1981). Estimation in a Multivariate \\Errors in Variables" Regression Model: Large Sample Results. The Annals of Statistics, 9, 24-44. [16] Janes, H., Pepe, M., Kooperberg, C. and Newcomb, P. (2005). Identifying Target Populations for Screening or Not Screening Using Logic Regression. Statistics in Medicine, 24, 1321-1338. [17] Kalohn, J. C. and Spray, J. A. (1999). The Eect of Model Misspecication on Classication Decisions Made Using a Computerized Test. Journal of Educational Measurement, 36, 47-59. [18] Merle, Y., Aouimer, A. and Tod, M. (2004). Impact of Model Misspecication at Design (and/or) Estimation Step in Population Pharmacokinetic Studies. Journal of Biopharmaceutical Statistics, 14, 213-227. [19] O`neill, R. T. and Anello, C. (1978). Case-control Studies: A Sequential Approach. American Journal of Epidemiology, 120, 145-153. [20] Owen, J. D. and James, M. S. (1998). Estimating Sample Size for Epidemiologic Studies: The Impact of Ignoring Exposure Measurement Uncertainty. Statistics in Medicine, 17, 1375-1389. [21] Pagano, M. and Gauvreau, K. (2000). Principles of Biostatistics (2nd ed.). Pacic Grove, California: Duxbury. [22] Paul, G. and Nhu, D. L. (2002). Comparing the Effects of Continuous and Discrete Covariate Mismeasurement, with Emphasis on the Dichotomization of Mismeasured Predictors. Biometrics, 58, 878-887. [23] Pierce, J. P., Stefanick, M. L., Flatt, S. W., Natarajan, L., Sternfeld, B., Madlensky, L., Al-Delaimy, W. K., Thomson, C. A., Kealey, S., Hajek, R., Parker, B. A., Newman, V. A., Caan, B. and Rock, C. L. (2007). Greater Survival After Breast Cancer in Physically Active Women With High Vegetable-Fruit Intake Regardless of Obesity. Journal of Clinical Oncology, 25, 2345-2351. [24] Prentice, R. L. and Pyke, R. (1979). Logistic Disease Incidence Models and Case-Control Studies. Biometrika, 66, 403-411. [25] Smith, P. (1997). Model Misspecication in Data Envelopment Analysis. Annals of Operations Research, 73, 233-252. [26] Tosteson, T. D., Buzas, J. S., Demidenko, E. and Karagas, M. (2003). Power and Sample Size Calculations for Generalized Regression Models with Covariate Measurement Error. Statistics in Medicine, 22, 1069-1082. [27] Urmanov, A. M., Gribok, A. V., Hines, J. W. and Uhrig, R. E. (2002). An Information Approach to Regularization Parameter Selection under Model Misspecication. Inverse Problems, 18, 1207-1228. [28] Wang, C. Y. and Wang, S. (1995). On Information Matrices in Casecontrol Studies. Statistics and Probability Letters, 22, 269-274. [29] Woodroofe, M. (1982). Nonlinear renewal theory in sequential analysis. Philadelphia, Pa: Society for Industrial and Applied Mathematics. zh_TW