學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

題名 迴歸分析中共線性於Suppression與Collapsibility之效果探討
Effects of Collinearity on Suppression and Collapsibility in Multiple Linear Regression
作者 許斯淵
Hsu, Szu-Yuan
貢獻者 江振東
許斯淵
Hsu, Szu-Yuan
關鍵詞 共線性
相關係數
迴歸係數
判定係數
t 統計量
Collinearity
Correlation coefficient
Regression coefficient
R-square
t-statistics
日期 2019
上傳時間 7-Aug-2019 16:00:35 (UTC+8)
摘要 在探討一個連續型反應變數與一個以上的解釋變數之間的關係時,線性迴歸是一種經常被使用的統計方法。當額外的解釋變數加入模型時,研究者通常著重於迴歸係數估計值與其t統計量的行為表現以及判定係數(R-square)的增加程度等迴歸結果,然而這些結果與新加入的解釋變數及原先已存在於模型裡的解釋變數之間的共線性(collinearity)不無關係。本文主要在探討共線性的效果對於迴歸係數估計值與其t統計量以及判定係數的行為表現之影響。本文研究中發現,當額外的解釋變數加入模型時,新模型的迴歸分析結果可以完全透過三個相關係數以及原模型的判定係數來詮釋,因此可以進一步透過這些訊息來預期新的模型之下的迴歸結果。另一方面,藉由將額外加入的解釋變數視為研究所感興趣的解釋變數,而將原先存在於模型裡的解釋變數視為共變量(covariate),本文亦透過類似的方式來探討共線性的效果對於模型裡collapsibility之影響。所謂的collapsibility是指無論共變量是否存在於模型裡,皆不會影響到研究中所感興趣的解釋變數與反應變數之間的關係。整體而言,本文研究發現當共線性存在於線性迴歸模型中,並不一定會對於迴歸結果造成不好的影響。因此,當模型裡解釋變數間存在共線性時,變數是否從模型中移除必須謹慎思量。
Linear regression is a statistical method that allows researchers to summarize and study the relationship between a response and one or more predictor variables. When adding a predictor into a model, we are most interested in knowing its estimated regression coefficient, the corresponding t-statistic, and the value of R-square that increases. One apparent issue that might impact the results is the collinearity between the added-predictor and those already in the model. In this study, we investigate behavior patterns of the estimated regression coefficient, the corresponding t-statistic and R-square as the collinearity varies. We argue that all the above mentioned statistics are functions of three correlation coefficients and an R-square, and provide summary tables that can be used to anticipate the behavior of the statistics. On the other hand, by treating the added-predictor as the predictor of interest, and those predictors already in the model as covariates, we are able the apply similar techniques to deal with the impact of collinearity on collapsibility, that is, whether the relationship between the response and the predictor of interest remains the same if the covariates are dropped from the model. Overall, we found that collinearity in a linear regression model may not necessarily yield ill effects as we normally think. We urge researchers to think twice before dropping a collinear predictor from further model consideration.
參考文獻 Chiang, J. T. and Hsu, S. Y. (2018), “Revisiting the Effects of Collinearity in Multiple Linear Regression: High Collinearity May Not Cause the Serious Problems You Might Think,” (Unpublished manuscript).

Clogg, C. C., Petkova, E, and Shihadeh, E. S. (1992), “Statistical Methods for Analyzing Collapsibility in Regression Models,” Journal of Educational Statistics, 17(1), 51-74.

Clogg, C. C., Petkova, E, and Haritou, A. (1995), “Statistical Methods for Comparing Regression Coefficients between Models,” American Journal of Sociology, 100(5), 1261-1293.

Cohen, J. and Cohen, P. (1975), Applied Multiple Regression/Correlation Analysis for The Behavioral Sciences, New Jersey: Lawrence Erlbaum Associates.

Conger, A. J. (1974), “A Revised Definition for Suppressor Variables: A Guide to Their Identification and Interpretation,” Educational and Psychological Measurement, 34, 35-46.

Currie, I. and Korabinski, A. (1984), “Some Comments on Bivariate Regression,” The Statistician, 33, 283-292.

Darlington, R. B. (1968), “Multiple Regression in Psychological Research and Practice,” Psychological Bulletin, 69, 161-182.

Dua, S., Bhuker, M., Sharma, P., Dhall, M., and Kapoor, S. (2014), “Body Mass Index Relates to Blood Pressure Among Adults,” North American Journal of Medical Sciences, 6(2), 89-95.

Friedman, L., and Wall, M. (2005), “Graphical Views of Suppression and Multicollinearity in Multiple Linear Regression,” The American Statistician, 59, 127-137.

Greenland, S., Robins, J. M., and Pearl, J. (1999), “Confounding and Collapsibility in Causal Inference,” Statistical Science, 14(1), 29-46.

Hamilton, D. (1987), “Sometimes R^2>r_(yx_1)^2+r_(yx_2)^2: Correlated Variables Are Not Always Redundant,” The American Statistician, 41, 129-132.
—— (1988), “Reply to [Comments by Freund and Mitra],” The American Statistician, 42, 90-91.

Horst, P. (1941), “The Prediction of Personal Adjustment,” Social Science Research Council Bulletin, 48, 431-436.

Kleinbaum, D. G., Kupper, L. L., Nizam, A., and Muller, K. E. (2008), Applied Regression Analysis and Other Multivariable Methods (4th ed.), Tomson-Brooks/Cole.

Kutner, M., Nachtsheim, C., and Neter, J. (2004), Applied Linear Regression Models (4th ed.), McGraw-Hill/Irwin.

Ludlow, L., and Klein, K. (2014), “Suppressor Variables: The Difference between ‘Is’ versus ‘Acting As’,” Journal of Statistics Education, 22(2), 1-28.

O’Brien R. M. (2017), “Dropping Highly Collinear Variables from a Model: Why it Typically is Not a Good Idea,” Social Science Quarterly, 98(1), 360-375.

Rencher, A. C. and Schaalje, G. B. (2008), Linear Models in Statistics (2nd ed.), John Wiley & Sons, Inc.

Shieh, G. (2001), “The Inequality between The Coefficient of Determination and The Sum of Squared Simple Correlation Coefficients,” The American Statistician, 55, 121-124.

Shieh, G. (2006), “Suppression Situations in Multiple Linear Regression,” Educational and Psychological Measurement, 66, 435-447.

Velicer, W. (1978), “Suppressor Variables and The Semipartial Correlation Coefficient,” Educational and Psychological Measurement, 38, 953-958.

Waller, N. G. (2011), “The Geometry of Enhancement in Multiple Regression,” Psychometrika, 76, 634-649.
描述 博士
國立政治大學
統計學系
101354501
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0101354501
資料類型 thesis
dc.contributor.advisor 江振東zh_TW
dc.contributor.author (Authors) 許斯淵zh_TW
dc.contributor.author (Authors) Hsu, Szu-Yuanen_US
dc.creator (作者) 許斯淵zh_TW
dc.creator (作者) Hsu, Szu-Yuanen_US
dc.date (日期) 2019en_US
dc.date.accessioned 7-Aug-2019 16:00:35 (UTC+8)-
dc.date.available 7-Aug-2019 16:00:35 (UTC+8)-
dc.date.issued (上傳時間) 7-Aug-2019 16:00:35 (UTC+8)-
dc.identifier (Other Identifiers) G0101354501en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/124679-
dc.description (描述) 博士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 統計學系zh_TW
dc.description (描述) 101354501zh_TW
dc.description.abstract (摘要) 在探討一個連續型反應變數與一個以上的解釋變數之間的關係時,線性迴歸是一種經常被使用的統計方法。當額外的解釋變數加入模型時,研究者通常著重於迴歸係數估計值與其t統計量的行為表現以及判定係數(R-square)的增加程度等迴歸結果,然而這些結果與新加入的解釋變數及原先已存在於模型裡的解釋變數之間的共線性(collinearity)不無關係。本文主要在探討共線性的效果對於迴歸係數估計值與其t統計量以及判定係數的行為表現之影響。本文研究中發現,當額外的解釋變數加入模型時,新模型的迴歸分析結果可以完全透過三個相關係數以及原模型的判定係數來詮釋,因此可以進一步透過這些訊息來預期新的模型之下的迴歸結果。另一方面,藉由將額外加入的解釋變數視為研究所感興趣的解釋變數,而將原先存在於模型裡的解釋變數視為共變量(covariate),本文亦透過類似的方式來探討共線性的效果對於模型裡collapsibility之影響。所謂的collapsibility是指無論共變量是否存在於模型裡,皆不會影響到研究中所感興趣的解釋變數與反應變數之間的關係。整體而言,本文研究發現當共線性存在於線性迴歸模型中,並不一定會對於迴歸結果造成不好的影響。因此,當模型裡解釋變數間存在共線性時,變數是否從模型中移除必須謹慎思量。zh_TW
dc.description.abstract (摘要) Linear regression is a statistical method that allows researchers to summarize and study the relationship between a response and one or more predictor variables. When adding a predictor into a model, we are most interested in knowing its estimated regression coefficient, the corresponding t-statistic, and the value of R-square that increases. One apparent issue that might impact the results is the collinearity between the added-predictor and those already in the model. In this study, we investigate behavior patterns of the estimated regression coefficient, the corresponding t-statistic and R-square as the collinearity varies. We argue that all the above mentioned statistics are functions of three correlation coefficients and an R-square, and provide summary tables that can be used to anticipate the behavior of the statistics. On the other hand, by treating the added-predictor as the predictor of interest, and those predictors already in the model as covariates, we are able the apply similar techniques to deal with the impact of collinearity on collapsibility, that is, whether the relationship between the response and the predictor of interest remains the same if the covariates are dropped from the model. Overall, we found that collinearity in a linear regression model may not necessarily yield ill effects as we normally think. We urge researchers to think twice before dropping a collinear predictor from further model consideration.en_US
dc.description.tableofcontents Contents

1. Introduction 1
2. Effects of collinearity on suppression and enhancement in two-predictor case 5
3. Effects of collinearity on suppression and enhancement in general cases 8
3.1 Working formulas of b, se(b), t, and R-square 9
3.2 Behavior pattern of b as a function of r_(x,x^) 13
3.3 Behavior pattern of t as a function of r_(x,x^) 17
3.4 Behavior pattern of R_yU^2 as a function of r_(x,x^) 21
4. Effects of collinearity on collapsibility in multiple linear regression 24
4.1 Working formulas of d^=β^*-β^, se(d^) and t(d^) 25
4.2 Behavior pattern of d^ as a function of r_(x,x^) 27
4.3 Behavior pattern of t(d^) as a function of r_(x,x^) 30
4.4 Relationship between suppression and collapsibility 33
5. Illustrating examples 34
6. Conclusions and discussions 38
References 40
Appendix 43
A.1 Derivations of estimated regression coefficients and R-squares 43
A.2 Working formulas of R_yU^2 and b when X=x 47
A.3 Working formulas of R_yU^2 and b when Z_1=z_1 and X=x 49
A.4 Situations where t^2>t_0^2 51
zh_TW
dc.format.extent 1480342 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0101354501en_US
dc.subject (關鍵詞) 共線性zh_TW
dc.subject (關鍵詞) 相關係數zh_TW
dc.subject (關鍵詞) 迴歸係數zh_TW
dc.subject (關鍵詞) 判定係數zh_TW
dc.subject (關鍵詞) t 統計量zh_TW
dc.subject (關鍵詞) Collinearityen_US
dc.subject (關鍵詞) Correlation coefficienten_US
dc.subject (關鍵詞) Regression coefficienten_US
dc.subject (關鍵詞) R-squareen_US
dc.subject (關鍵詞) t-statisticsen_US
dc.title (題名) 迴歸分析中共線性於Suppression與Collapsibility之效果探討zh_TW
dc.title (題名) Effects of Collinearity on Suppression and Collapsibility in Multiple Linear Regressionen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) Chiang, J. T. and Hsu, S. Y. (2018), “Revisiting the Effects of Collinearity in Multiple Linear Regression: High Collinearity May Not Cause the Serious Problems You Might Think,” (Unpublished manuscript).

Clogg, C. C., Petkova, E, and Shihadeh, E. S. (1992), “Statistical Methods for Analyzing Collapsibility in Regression Models,” Journal of Educational Statistics, 17(1), 51-74.

Clogg, C. C., Petkova, E, and Haritou, A. (1995), “Statistical Methods for Comparing Regression Coefficients between Models,” American Journal of Sociology, 100(5), 1261-1293.

Cohen, J. and Cohen, P. (1975), Applied Multiple Regression/Correlation Analysis for The Behavioral Sciences, New Jersey: Lawrence Erlbaum Associates.

Conger, A. J. (1974), “A Revised Definition for Suppressor Variables: A Guide to Their Identification and Interpretation,” Educational and Psychological Measurement, 34, 35-46.

Currie, I. and Korabinski, A. (1984), “Some Comments on Bivariate Regression,” The Statistician, 33, 283-292.

Darlington, R. B. (1968), “Multiple Regression in Psychological Research and Practice,” Psychological Bulletin, 69, 161-182.

Dua, S., Bhuker, M., Sharma, P., Dhall, M., and Kapoor, S. (2014), “Body Mass Index Relates to Blood Pressure Among Adults,” North American Journal of Medical Sciences, 6(2), 89-95.

Friedman, L., and Wall, M. (2005), “Graphical Views of Suppression and Multicollinearity in Multiple Linear Regression,” The American Statistician, 59, 127-137.

Greenland, S., Robins, J. M., and Pearl, J. (1999), “Confounding and Collapsibility in Causal Inference,” Statistical Science, 14(1), 29-46.

Hamilton, D. (1987), “Sometimes R^2>r_(yx_1)^2+r_(yx_2)^2: Correlated Variables Are Not Always Redundant,” The American Statistician, 41, 129-132.
—— (1988), “Reply to [Comments by Freund and Mitra],” The American Statistician, 42, 90-91.

Horst, P. (1941), “The Prediction of Personal Adjustment,” Social Science Research Council Bulletin, 48, 431-436.

Kleinbaum, D. G., Kupper, L. L., Nizam, A., and Muller, K. E. (2008), Applied Regression Analysis and Other Multivariable Methods (4th ed.), Tomson-Brooks/Cole.

Kutner, M., Nachtsheim, C., and Neter, J. (2004), Applied Linear Regression Models (4th ed.), McGraw-Hill/Irwin.

Ludlow, L., and Klein, K. (2014), “Suppressor Variables: The Difference between ‘Is’ versus ‘Acting As’,” Journal of Statistics Education, 22(2), 1-28.

O’Brien R. M. (2017), “Dropping Highly Collinear Variables from a Model: Why it Typically is Not a Good Idea,” Social Science Quarterly, 98(1), 360-375.

Rencher, A. C. and Schaalje, G. B. (2008), Linear Models in Statistics (2nd ed.), John Wiley & Sons, Inc.

Shieh, G. (2001), “The Inequality between The Coefficient of Determination and The Sum of Squared Simple Correlation Coefficients,” The American Statistician, 55, 121-124.

Shieh, G. (2006), “Suppression Situations in Multiple Linear Regression,” Educational and Psychological Measurement, 66, 435-447.

Velicer, W. (1978), “Suppressor Variables and The Semipartial Correlation Coefficient,” Educational and Psychological Measurement, 38, 953-958.

Waller, N. G. (2011), “The Geometry of Enhancement in Multiple Regression,” Psychometrika, 76, 634-649.
zh_TW
dc.identifier.doi (DOI) 10.6814/NCCU201900647en_US