學術產出-Theses
Article View/Open
Publication Export
-
題名 多維度變異係數模型-基於B-Spline 近似之選模
Variable Selection of High Dimension Varying Coefficient Model Under B-Spline Approximation作者 楊博安
Yang, Po-An貢獻者 黃子銘
Huang, Tzee-Ming
楊博安
Yang, Po-An關鍵詞 變異係數模型
B-平滑曲獻
向前選取法
Varying coefficient model
B-spline
Forward selection
Group lasso日期 2020 上傳時間 2-Sep-2020 11:42:02 (UTC+8) 摘要 變異係數模型是一種非線性模型,在許多領域都有廣泛的應用。與線型模型相比,變異係數模型最大的特點是允許係數隨著影響變數變動而變動,同時也保留易於詮釋的優點。而在大數據的時代,資料蒐集變得相對容易,當資料的變數個數非常大,而有顯著貢獻的真實變數不多時,如何挑選有用的變數十分重要。現行研究中多半以向前選取法與正規化方法兩種類型為主。本文以模擬實驗比較分組向前選取法與group lasso方法在不同條件設定下的優劣,並提出下列兩點建議:為了防止向前選取法過早停止,建議在BIC不再改善後再進行數步選取變數群組流程;某些時候group lasso傾向選取過多無關變數或選取過少真實變數,建議在進行完數種不同懲罰項的group lasso之後進行向後選取法,以決定最優模型。
Varying coefficient model is a form of nonlinear regression models which has numerous applications in many fields. While enjoying the good interpretability, the major difference from linear model is that the coefficients are allowed to vary systematically and smoothly in more than one dimension. However, in big data, when the number of candidate variables are very large, it is challenging to select the relevant variables. In recent years, there are several works dealing with this situation. Two main approaches are selection methods and regularization methods. In this thesis, we compare groupwise forward selection and group lasso in different conditions of simulations. For forward selection, we suggest running several steps after the stopping criterion is met in order to avoid stopping too early. We also find that group lasso method select too much unrelated variables or select too few true variables under some conditions. Thus, we apply groupwise backward selection after choosing several penalty terms in group lasso to improve the performance.參考文獻 Bertsekas, D. P. (2016). Nonlinear Programming. 3rd edition. Athena Scientific.Breiman, L. (1995). Better subset regression using the nonnegative garrote. Technometrics, 37(4):373–384.Cai, J., Fan, J., Zhou, H., Zhou, Y., et al. (2007). Hazard models with varying coefficients for multivariate failure time data. The Annals of Statistics, 35(1):324–354.Cheng, M.-Y., Honda, T., and Zhang, J.T. (2016). Forward variable selection for sparse ultrahigh dimensional varying coefficient models. Journal of the American Statistical Association, 111(515):1209–1221.De Boor, C., De Boor, C., Mathématicien, E.U., De Boor, C., and De Boor, C. (1978). A practical guide to splines, volume 27. springer-verlag New York.Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., et al. (2004). Least angle regression.The Annals of statistics, 32(2):407–499.Fan, J., Feng, Y., and Song, R. (2011). Nonparametric independence screening in sparse ultra-high-dimensional additive models. Journal of the American Statistical Association, 106(494):544–557.Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 96(456):1348–1360.Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(5):849–911.Fan, J., Ma, Y., and Dai, W. (2014). Nonparametric independence screening in sparse ultra-high-dimensional varying coefficient models. Journal of the American Statistical Association, 109(507):1270–1284.Hastie, T. and Tibshirani, R. (1993). Varying-coefficient models. Journal of the Royal Statistical Society: Series B (Methodological), 55(4):757–796.Hastie, T. J. and Tibshirani, R. J. (1990). Generalized additive models, volume 43. CRC press.Luo, S. and Chen, Z. (2014). Sequential lasso cum EBIC for feature selection with ultrahigh dimensional feature space. Journal of the American Statistical Association,109(507):1229–1240.Meier, L., Van De Geer, S., and Bühlmann, P. (2008). The group lasso for logistic regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(1):53–71.Nicholls, D. and Quinn, B. (1982). Random coefficient autoregressive models: an introduction. Lecture notes in statistics. Springer, Springer Nature, United States.Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288.Wang, H. (2009). Forward regression for ultra-high dimensional variable screening. Journal of the American Statistical Association, 104(488):1512–1524.Wei, F., Huang, J., and Li, H. (2011). Variable selection and estimation in high-dimensional varying coefficient models. Statistica Sinica, 21(4):1515–1540.Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68:49–67.Zhang, W., Lee, S.Y., and Song, X. (2002). Local polynomial fitting in semivarying coefficient model. Journal of Multivariate Analysis, 82(1):166–188.Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American statistical association, 101(476):1418–1429.Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the royal statistical society: series B (statistical methodology), 67(2):301–320. 描述 碩士
國立政治大學
統計學系
107354003資料來源 http://thesis.lib.nccu.edu.tw/record/#G0107354003 資料類型 thesis dc.contributor.advisor 黃子銘 zh_TW dc.contributor.advisor Huang, Tzee-Ming en_US dc.contributor.author (Authors) 楊博安 zh_TW dc.contributor.author (Authors) Yang, Po-An en_US dc.creator (作者) 楊博安 zh_TW dc.creator (作者) Yang, Po-An en_US dc.date (日期) 2020 en_US dc.date.accessioned 2-Sep-2020 11:42:02 (UTC+8) - dc.date.available 2-Sep-2020 11:42:02 (UTC+8) - dc.date.issued (上傳時間) 2-Sep-2020 11:42:02 (UTC+8) - dc.identifier (Other Identifiers) G0107354003 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/131472 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 統計學系 zh_TW dc.description (描述) 107354003 zh_TW dc.description.abstract (摘要) 變異係數模型是一種非線性模型,在許多領域都有廣泛的應用。與線型模型相比,變異係數模型最大的特點是允許係數隨著影響變數變動而變動,同時也保留易於詮釋的優點。而在大數據的時代,資料蒐集變得相對容易,當資料的變數個數非常大,而有顯著貢獻的真實變數不多時,如何挑選有用的變數十分重要。現行研究中多半以向前選取法與正規化方法兩種類型為主。本文以模擬實驗比較分組向前選取法與group lasso方法在不同條件設定下的優劣,並提出下列兩點建議:為了防止向前選取法過早停止,建議在BIC不再改善後再進行數步選取變數群組流程;某些時候group lasso傾向選取過多無關變數或選取過少真實變數,建議在進行完數種不同懲罰項的group lasso之後進行向後選取法,以決定最優模型。 zh_TW dc.description.abstract (摘要) Varying coefficient model is a form of nonlinear regression models which has numerous applications in many fields. While enjoying the good interpretability, the major difference from linear model is that the coefficients are allowed to vary systematically and smoothly in more than one dimension. However, in big data, when the number of candidate variables are very large, it is challenging to select the relevant variables. In recent years, there are several works dealing with this situation. Two main approaches are selection methods and regularization methods. In this thesis, we compare groupwise forward selection and group lasso in different conditions of simulations. For forward selection, we suggest running several steps after the stopping criterion is met in order to avoid stopping too early. We also find that group lasso method select too much unrelated variables or select too few true variables under some conditions. Thus, we apply groupwise backward selection after choosing several penalty terms in group lasso to improve the performance. en_US dc.description.tableofcontents 第一章 緒論 1第二章 文獻探討及回顧 3第三章研究方法 53.1 B-spline 53.2 基底擴展 63.3 分組向前選取法 73.4 Group Lasso 9第四章 模擬 114.1 實驗一 114.2 實驗二 15第五章 結論 22參考文獻 23 zh_TW dc.format.extent 444388 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0107354003 en_US dc.subject (關鍵詞) 變異係數模型 zh_TW dc.subject (關鍵詞) B-平滑曲獻 zh_TW dc.subject (關鍵詞) 向前選取法 zh_TW dc.subject (關鍵詞) Varying coefficient model en_US dc.subject (關鍵詞) B-spline en_US dc.subject (關鍵詞) Forward selection en_US dc.subject (關鍵詞) Group lasso en_US dc.title (題名) 多維度變異係數模型-基於B-Spline 近似之選模 zh_TW dc.title (題名) Variable Selection of High Dimension Varying Coefficient Model Under B-Spline Approximation en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) Bertsekas, D. P. (2016). Nonlinear Programming. 3rd edition. Athena Scientific.Breiman, L. (1995). Better subset regression using the nonnegative garrote. Technometrics, 37(4):373–384.Cai, J., Fan, J., Zhou, H., Zhou, Y., et al. (2007). Hazard models with varying coefficients for multivariate failure time data. The Annals of Statistics, 35(1):324–354.Cheng, M.-Y., Honda, T., and Zhang, J.T. (2016). Forward variable selection for sparse ultrahigh dimensional varying coefficient models. Journal of the American Statistical Association, 111(515):1209–1221.De Boor, C., De Boor, C., Mathématicien, E.U., De Boor, C., and De Boor, C. (1978). A practical guide to splines, volume 27. springer-verlag New York.Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., et al. (2004). Least angle regression.The Annals of statistics, 32(2):407–499.Fan, J., Feng, Y., and Song, R. (2011). Nonparametric independence screening in sparse ultra-high-dimensional additive models. Journal of the American Statistical Association, 106(494):544–557.Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 96(456):1348–1360.Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(5):849–911.Fan, J., Ma, Y., and Dai, W. (2014). Nonparametric independence screening in sparse ultra-high-dimensional varying coefficient models. Journal of the American Statistical Association, 109(507):1270–1284.Hastie, T. and Tibshirani, R. (1993). Varying-coefficient models. Journal of the Royal Statistical Society: Series B (Methodological), 55(4):757–796.Hastie, T. J. and Tibshirani, R. J. (1990). Generalized additive models, volume 43. CRC press.Luo, S. and Chen, Z. (2014). Sequential lasso cum EBIC for feature selection with ultrahigh dimensional feature space. Journal of the American Statistical Association,109(507):1229–1240.Meier, L., Van De Geer, S., and Bühlmann, P. (2008). The group lasso for logistic regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(1):53–71.Nicholls, D. and Quinn, B. (1982). Random coefficient autoregressive models: an introduction. Lecture notes in statistics. Springer, Springer Nature, United States.Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288.Wang, H. (2009). Forward regression for ultra-high dimensional variable screening. Journal of the American Statistical Association, 104(488):1512–1524.Wei, F., Huang, J., and Li, H. (2011). Variable selection and estimation in high-dimensional varying coefficient models. Statistica Sinica, 21(4):1515–1540.Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68:49–67.Zhang, W., Lee, S.Y., and Song, X. (2002). Local polynomial fitting in semivarying coefficient model. Journal of Multivariate Analysis, 82(1):166–188.Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American statistical association, 101(476):1418–1429.Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the royal statistical society: series B (statistical methodology), 67(2):301–320. zh_TW dc.identifier.doi (DOI) 10.6814/NCCU202001217 en_US