Publications-Theses
Article View/Open
Publication Export
-
題名 迴歸分析中共線性於Suppression與Collapsibility之效果探討
Effects of Collinearity on Suppression and Collapsibility in Multiple Linear Regression作者 許斯淵
Hsu, Szu-Yuan貢獻者 江振東
許斯淵
Hsu, Szu-Yuan關鍵詞 共線性
相關係數
迴歸係數
判定係數
t 統計量
Collinearity
Correlation coefficient
Regression coefficient
R-square
t-statistics日期 2019 上傳時間 7-Aug-2019 16:00:35 (UTC+8) 摘要 在探討一個連續型反應變數與一個以上的解釋變數之間的關係時,線性迴歸是一種經常被使用的統計方法。當額外的解釋變數加入模型時,研究者通常著重於迴歸係數估計值與其t統計量的行為表現以及判定係數(R-square)的增加程度等迴歸結果,然而這些結果與新加入的解釋變數及原先已存在於模型裡的解釋變數之間的共線性(collinearity)不無關係。本文主要在探討共線性的效果對於迴歸係數估計值與其t統計量以及判定係數的行為表現之影響。本文研究中發現,當額外的解釋變數加入模型時,新模型的迴歸分析結果可以完全透過三個相關係數以及原模型的判定係數來詮釋,因此可以進一步透過這些訊息來預期新的模型之下的迴歸結果。另一方面,藉由將額外加入的解釋變數視為研究所感興趣的解釋變數,而將原先存在於模型裡的解釋變數視為共變量(covariate),本文亦透過類似的方式來探討共線性的效果對於模型裡collapsibility之影響。所謂的collapsibility是指無論共變量是否存在於模型裡,皆不會影響到研究中所感興趣的解釋變數與反應變數之間的關係。整體而言,本文研究發現當共線性存在於線性迴歸模型中,並不一定會對於迴歸結果造成不好的影響。因此,當模型裡解釋變數間存在共線性時,變數是否從模型中移除必須謹慎思量。
Linear regression is a statistical method that allows researchers to summarize and study the relationship between a response and one or more predictor variables. When adding a predictor into a model, we are most interested in knowing its estimated regression coefficient, the corresponding t-statistic, and the value of R-square that increases. One apparent issue that might impact the results is the collinearity between the added-predictor and those already in the model. In this study, we investigate behavior patterns of the estimated regression coefficient, the corresponding t-statistic and R-square as the collinearity varies. We argue that all the above mentioned statistics are functions of three correlation coefficients and an R-square, and provide summary tables that can be used to anticipate the behavior of the statistics. On the other hand, by treating the added-predictor as the predictor of interest, and those predictors already in the model as covariates, we are able the apply similar techniques to deal with the impact of collinearity on collapsibility, that is, whether the relationship between the response and the predictor of interest remains the same if the covariates are dropped from the model. Overall, we found that collinearity in a linear regression model may not necessarily yield ill effects as we normally think. We urge researchers to think twice before dropping a collinear predictor from further model consideration.參考文獻 Chiang, J. T. and Hsu, S. Y. (2018), “Revisiting the Effects of Collinearity in Multiple Linear Regression: High Collinearity May Not Cause the Serious Problems You Might Think,” (Unpublished manuscript).Clogg, C. C., Petkova, E, and Shihadeh, E. S. (1992), “Statistical Methods for Analyzing Collapsibility in Regression Models,” Journal of Educational Statistics, 17(1), 51-74.Clogg, C. C., Petkova, E, and Haritou, A. (1995), “Statistical Methods for Comparing Regression Coefficients between Models,” American Journal of Sociology, 100(5), 1261-1293.Cohen, J. and Cohen, P. (1975), Applied Multiple Regression/Correlation Analysis for The Behavioral Sciences, New Jersey: Lawrence Erlbaum Associates.Conger, A. J. (1974), “A Revised Definition for Suppressor Variables: A Guide to Their Identification and Interpretation,” Educational and Psychological Measurement, 34, 35-46.Currie, I. and Korabinski, A. (1984), “Some Comments on Bivariate Regression,” The Statistician, 33, 283-292.Darlington, R. B. (1968), “Multiple Regression in Psychological Research and Practice,” Psychological Bulletin, 69, 161-182.Dua, S., Bhuker, M., Sharma, P., Dhall, M., and Kapoor, S. (2014), “Body Mass Index Relates to Blood Pressure Among Adults,” North American Journal of Medical Sciences, 6(2), 89-95.Friedman, L., and Wall, M. (2005), “Graphical Views of Suppression and Multicollinearity in Multiple Linear Regression,” The American Statistician, 59, 127-137.Greenland, S., Robins, J. M., and Pearl, J. (1999), “Confounding and Collapsibility in Causal Inference,” Statistical Science, 14(1), 29-46.Hamilton, D. (1987), “Sometimes R^2>r_(yx_1)^2+r_(yx_2)^2: Correlated Variables Are Not Always Redundant,” The American Statistician, 41, 129-132.—— (1988), “Reply to [Comments by Freund and Mitra],” The American Statistician, 42, 90-91.Horst, P. (1941), “The Prediction of Personal Adjustment,” Social Science Research Council Bulletin, 48, 431-436.Kleinbaum, D. G., Kupper, L. L., Nizam, A., and Muller, K. E. (2008), Applied Regression Analysis and Other Multivariable Methods (4th ed.), Tomson-Brooks/Cole.Kutner, M., Nachtsheim, C., and Neter, J. (2004), Applied Linear Regression Models (4th ed.), McGraw-Hill/Irwin.Ludlow, L., and Klein, K. (2014), “Suppressor Variables: The Difference between ‘Is’ versus ‘Acting As’,” Journal of Statistics Education, 22(2), 1-28.O’Brien R. M. (2017), “Dropping Highly Collinear Variables from a Model: Why it Typically is Not a Good Idea,” Social Science Quarterly, 98(1), 360-375.Rencher, A. C. and Schaalje, G. B. (2008), Linear Models in Statistics (2nd ed.), John Wiley & Sons, Inc.Shieh, G. (2001), “The Inequality between The Coefficient of Determination and The Sum of Squared Simple Correlation Coefficients,” The American Statistician, 55, 121-124.Shieh, G. (2006), “Suppression Situations in Multiple Linear Regression,” Educational and Psychological Measurement, 66, 435-447.Velicer, W. (1978), “Suppressor Variables and The Semipartial Correlation Coefficient,” Educational and Psychological Measurement, 38, 953-958.Waller, N. G. (2011), “The Geometry of Enhancement in Multiple Regression,” Psychometrika, 76, 634-649. 描述 博士
國立政治大學
統計學系
101354501資料來源 http://thesis.lib.nccu.edu.tw/record/#G0101354501 資料類型 thesis dc.contributor.advisor 江振東 zh_TW dc.contributor.author (Authors) 許斯淵 zh_TW dc.contributor.author (Authors) Hsu, Szu-Yuan en_US dc.creator (作者) 許斯淵 zh_TW dc.creator (作者) Hsu, Szu-Yuan en_US dc.date (日期) 2019 en_US dc.date.accessioned 7-Aug-2019 16:00:35 (UTC+8) - dc.date.available 7-Aug-2019 16:00:35 (UTC+8) - dc.date.issued (上傳時間) 7-Aug-2019 16:00:35 (UTC+8) - dc.identifier (Other Identifiers) G0101354501 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/124679 - dc.description (描述) 博士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 統計學系 zh_TW dc.description (描述) 101354501 zh_TW dc.description.abstract (摘要) 在探討一個連續型反應變數與一個以上的解釋變數之間的關係時,線性迴歸是一種經常被使用的統計方法。當額外的解釋變數加入模型時,研究者通常著重於迴歸係數估計值與其t統計量的行為表現以及判定係數(R-square)的增加程度等迴歸結果,然而這些結果與新加入的解釋變數及原先已存在於模型裡的解釋變數之間的共線性(collinearity)不無關係。本文主要在探討共線性的效果對於迴歸係數估計值與其t統計量以及判定係數的行為表現之影響。本文研究中發現,當額外的解釋變數加入模型時,新模型的迴歸分析結果可以完全透過三個相關係數以及原模型的判定係數來詮釋,因此可以進一步透過這些訊息來預期新的模型之下的迴歸結果。另一方面,藉由將額外加入的解釋變數視為研究所感興趣的解釋變數,而將原先存在於模型裡的解釋變數視為共變量(covariate),本文亦透過類似的方式來探討共線性的效果對於模型裡collapsibility之影響。所謂的collapsibility是指無論共變量是否存在於模型裡,皆不會影響到研究中所感興趣的解釋變數與反應變數之間的關係。整體而言,本文研究發現當共線性存在於線性迴歸模型中,並不一定會對於迴歸結果造成不好的影響。因此,當模型裡解釋變數間存在共線性時,變數是否從模型中移除必須謹慎思量。 zh_TW dc.description.abstract (摘要) Linear regression is a statistical method that allows researchers to summarize and study the relationship between a response and one or more predictor variables. When adding a predictor into a model, we are most interested in knowing its estimated regression coefficient, the corresponding t-statistic, and the value of R-square that increases. One apparent issue that might impact the results is the collinearity between the added-predictor and those already in the model. In this study, we investigate behavior patterns of the estimated regression coefficient, the corresponding t-statistic and R-square as the collinearity varies. We argue that all the above mentioned statistics are functions of three correlation coefficients and an R-square, and provide summary tables that can be used to anticipate the behavior of the statistics. On the other hand, by treating the added-predictor as the predictor of interest, and those predictors already in the model as covariates, we are able the apply similar techniques to deal with the impact of collinearity on collapsibility, that is, whether the relationship between the response and the predictor of interest remains the same if the covariates are dropped from the model. Overall, we found that collinearity in a linear regression model may not necessarily yield ill effects as we normally think. We urge researchers to think twice before dropping a collinear predictor from further model consideration. en_US dc.description.tableofcontents Contents1. Introduction 12. Effects of collinearity on suppression and enhancement in two-predictor case 53. Effects of collinearity on suppression and enhancement in general cases 83.1 Working formulas of b, se(b), t, and R-square 93.2 Behavior pattern of b as a function of r_(x,x^) 133.3 Behavior pattern of t as a function of r_(x,x^) 173.4 Behavior pattern of R_yU^2 as a function of r_(x,x^) 214. Effects of collinearity on collapsibility in multiple linear regression 244.1 Working formulas of d^=β^*-β^, se(d^) and t(d^) 254.2 Behavior pattern of d^ as a function of r_(x,x^) 274.3 Behavior pattern of t(d^) as a function of r_(x,x^) 304.4 Relationship between suppression and collapsibility 335. Illustrating examples 346. Conclusions and discussions 38References 40Appendix 43A.1 Derivations of estimated regression coefficients and R-squares 43A.2 Working formulas of R_yU^2 and b when X=x 47A.3 Working formulas of R_yU^2 and b when Z_1=z_1 and X=x 49A.4 Situations where t^2>t_0^2 51 zh_TW dc.format.extent 1480342 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0101354501 en_US dc.subject (關鍵詞) 共線性 zh_TW dc.subject (關鍵詞) 相關係數 zh_TW dc.subject (關鍵詞) 迴歸係數 zh_TW dc.subject (關鍵詞) 判定係數 zh_TW dc.subject (關鍵詞) t 統計量 zh_TW dc.subject (關鍵詞) Collinearity en_US dc.subject (關鍵詞) Correlation coefficient en_US dc.subject (關鍵詞) Regression coefficient en_US dc.subject (關鍵詞) R-square en_US dc.subject (關鍵詞) t-statistics en_US dc.title (題名) 迴歸分析中共線性於Suppression與Collapsibility之效果探討 zh_TW dc.title (題名) Effects of Collinearity on Suppression and Collapsibility in Multiple Linear Regression en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) Chiang, J. T. and Hsu, S. Y. (2018), “Revisiting the Effects of Collinearity in Multiple Linear Regression: High Collinearity May Not Cause the Serious Problems You Might Think,” (Unpublished manuscript).Clogg, C. C., Petkova, E, and Shihadeh, E. S. (1992), “Statistical Methods for Analyzing Collapsibility in Regression Models,” Journal of Educational Statistics, 17(1), 51-74.Clogg, C. C., Petkova, E, and Haritou, A. (1995), “Statistical Methods for Comparing Regression Coefficients between Models,” American Journal of Sociology, 100(5), 1261-1293.Cohen, J. and Cohen, P. (1975), Applied Multiple Regression/Correlation Analysis for The Behavioral Sciences, New Jersey: Lawrence Erlbaum Associates.Conger, A. J. (1974), “A Revised Definition for Suppressor Variables: A Guide to Their Identification and Interpretation,” Educational and Psychological Measurement, 34, 35-46.Currie, I. and Korabinski, A. (1984), “Some Comments on Bivariate Regression,” The Statistician, 33, 283-292.Darlington, R. B. (1968), “Multiple Regression in Psychological Research and Practice,” Psychological Bulletin, 69, 161-182.Dua, S., Bhuker, M., Sharma, P., Dhall, M., and Kapoor, S. (2014), “Body Mass Index Relates to Blood Pressure Among Adults,” North American Journal of Medical Sciences, 6(2), 89-95.Friedman, L., and Wall, M. (2005), “Graphical Views of Suppression and Multicollinearity in Multiple Linear Regression,” The American Statistician, 59, 127-137.Greenland, S., Robins, J. M., and Pearl, J. (1999), “Confounding and Collapsibility in Causal Inference,” Statistical Science, 14(1), 29-46.Hamilton, D. (1987), “Sometimes R^2>r_(yx_1)^2+r_(yx_2)^2: Correlated Variables Are Not Always Redundant,” The American Statistician, 41, 129-132.—— (1988), “Reply to [Comments by Freund and Mitra],” The American Statistician, 42, 90-91.Horst, P. (1941), “The Prediction of Personal Adjustment,” Social Science Research Council Bulletin, 48, 431-436.Kleinbaum, D. G., Kupper, L. L., Nizam, A., and Muller, K. E. (2008), Applied Regression Analysis and Other Multivariable Methods (4th ed.), Tomson-Brooks/Cole.Kutner, M., Nachtsheim, C., and Neter, J. (2004), Applied Linear Regression Models (4th ed.), McGraw-Hill/Irwin.Ludlow, L., and Klein, K. (2014), “Suppressor Variables: The Difference between ‘Is’ versus ‘Acting As’,” Journal of Statistics Education, 22(2), 1-28.O’Brien R. M. (2017), “Dropping Highly Collinear Variables from a Model: Why it Typically is Not a Good Idea,” Social Science Quarterly, 98(1), 360-375.Rencher, A. C. and Schaalje, G. B. (2008), Linear Models in Statistics (2nd ed.), John Wiley & Sons, Inc.Shieh, G. (2001), “The Inequality between The Coefficient of Determination and The Sum of Squared Simple Correlation Coefficients,” The American Statistician, 55, 121-124.Shieh, G. (2006), “Suppression Situations in Multiple Linear Regression,” Educational and Psychological Measurement, 66, 435-447.Velicer, W. (1978), “Suppressor Variables and The Semipartial Correlation Coefficient,” Educational and Psychological Measurement, 38, 953-958.Waller, N. G. (2011), “The Geometry of Enhancement in Multiple Regression,” Psychometrika, 76, 634-649. zh_TW dc.identifier.doi (DOI) 10.6814/NCCU201900647 en_US