學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

題名 多維度變異係數模型-基於B-Spline 近似之選模
Variable Selection of High Dimension Varying Coefficient Model Under B-Spline Approximation
作者 楊博安
Yang, Po-An
貢獻者 黃子銘
Huang, Tzee-Ming
楊博安
Yang, Po-An
關鍵詞 變異係數模型
B-平滑曲獻
向前選取法
Varying coefficient model
B-spline
Forward selection
Group lasso
日期 2020
上傳時間 2-Sep-2020 11:42:02 (UTC+8)
摘要 變異係數模型是一種非線性模型,在許多領域都有廣泛的應用。與線型模型相比,變異係數模型最大的特點是允許係數隨著影響變數變動而變動,同時也保留易於詮釋的優點。而在大數據的時代,資料蒐集變得相對容易,當資料的變數個數非常大,而有顯著貢獻的真實變數不多時,如何挑選有用的變數十分重要。現行研究中多半以向前選取法與正規化方法兩種類型為主。本文以模擬實驗比較分組向前選取法與group lasso方法在不同條件設定下的優劣,並提出下列兩點建議:為了防止向前選取法過早停止,建議在BIC不再改善後再進行數步選取變數群組流程;某些時候group lasso傾向選取過多無關變數或選取過少真實變數,建議在進行完數種不同懲罰項的group lasso之後進行向後選取法,以決定最優模型。
Varying coefficient model is a form of nonlinear regression models which has numerous applications in many fields. While enjoying the good interpretability, the major difference from linear model is that the coefficients are allowed to vary systematically and smoothly in more than one dimension. However, in big data, when the number of candidate variables are very large, it is challenging to select the relevant variables. In recent years, there are several works dealing with this situation. Two main approaches are selection methods and regularization methods. In this thesis, we compare groupwise forward selection and group lasso in different conditions of simulations. For forward selection, we suggest running several steps after the stopping criterion is met in order to avoid stopping too early. We also find that group lasso method select too much unrelated variables or select too few true variables under some conditions. Thus, we apply groupwise backward selection after choosing several penalty terms in group lasso to improve the performance.
參考文獻 Bertsekas, D. P. (2016). Nonlinear Programming. 3rd edition. Athena Scientific.

Breiman, L. (1995). Better subset regression using the nonnegative garrote. Technometrics, 37(4):373–384.

Cai, J., Fan, J., Zhou, H., Zhou, Y., et al. (2007). Hazard models with varying coefficients for multivariate failure time data. The Annals of Statistics, 35(1):324–354.

Cheng, M.-Y., Honda, T., and Zhang, J.T. (2016). Forward variable selection for sparse ultrahigh dimensional varying coefficient models. Journal of the American Statistical Association, 111(515):1209–1221.

De Boor, C., De Boor, C., Mathématicien, E.U., De Boor, C., and De Boor, C. (1978). A practical guide to splines, volume 27. springer-verlag New York.

Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., et al. (2004). Least angle regression.The Annals of statistics, 32(2):407–499.

Fan, J., Feng, Y., and Song, R. (2011). Nonparametric independence screening in sparse ultra-high-dimensional additive models. Journal of the American Statistical Association, 106(494):544–557.

Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 96(456):1348–1360.

Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(5):849–911.

Fan, J., Ma, Y., and Dai, W. (2014). Nonparametric independence screening in sparse ultra-high-dimensional varying coefficient models. Journal of the American Statistical Association, 109(507):1270–1284.

Hastie, T. and Tibshirani, R. (1993). Varying-coefficient models. Journal of the Royal Statistical Society: Series B (Methodological), 55(4):757–796.

Hastie, T. J. and Tibshirani, R. J. (1990). Generalized additive models, volume 43. CRC press.

Luo, S. and Chen, Z. (2014). Sequential lasso cum EBIC for feature selection with ultrahigh dimensional feature space. Journal of the American Statistical Association,
109(507):1229–1240.

Meier, L., Van De Geer, S., and Bühlmann, P. (2008). The group lasso for logistic regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(1):53–71.

Nicholls, D. and Quinn, B. (1982). Random coefficient autoregressive models: an introduction. Lecture notes in statistics. Springer, Springer Nature, United States.

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288.

Wang, H. (2009). Forward regression for ultra-high dimensional variable screening. Journal of the American Statistical Association, 104(488):1512–1524.

Wei, F., Huang, J., and Li, H. (2011). Variable selection and estimation in high-dimensional varying coefficient models. Statistica Sinica, 21(4):1515–1540.

Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68:49–67.

Zhang, W., Lee, S.Y., and Song, X. (2002). Local polynomial fitting in semivarying coefficient model. Journal of Multivariate Analysis, 82(1):166–188.

Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American statistical association, 101(476):1418–1429.

Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the royal statistical society: series B (statistical methodology), 67(2):301–320.
描述 碩士
國立政治大學
統計學系
107354003
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0107354003
資料類型 thesis
dc.contributor.advisor 黃子銘zh_TW
dc.contributor.advisor Huang, Tzee-Mingen_US
dc.contributor.author (Authors) 楊博安zh_TW
dc.contributor.author (Authors) Yang, Po-Anen_US
dc.creator (作者) 楊博安zh_TW
dc.creator (作者) Yang, Po-Anen_US
dc.date (日期) 2020en_US
dc.date.accessioned 2-Sep-2020 11:42:02 (UTC+8)-
dc.date.available 2-Sep-2020 11:42:02 (UTC+8)-
dc.date.issued (上傳時間) 2-Sep-2020 11:42:02 (UTC+8)-
dc.identifier (Other Identifiers) G0107354003en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/131472-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 統計學系zh_TW
dc.description (描述) 107354003zh_TW
dc.description.abstract (摘要) 變異係數模型是一種非線性模型,在許多領域都有廣泛的應用。與線型模型相比,變異係數模型最大的特點是允許係數隨著影響變數變動而變動,同時也保留易於詮釋的優點。而在大數據的時代,資料蒐集變得相對容易,當資料的變數個數非常大,而有顯著貢獻的真實變數不多時,如何挑選有用的變數十分重要。現行研究中多半以向前選取法與正規化方法兩種類型為主。本文以模擬實驗比較分組向前選取法與group lasso方法在不同條件設定下的優劣,並提出下列兩點建議:為了防止向前選取法過早停止,建議在BIC不再改善後再進行數步選取變數群組流程;某些時候group lasso傾向選取過多無關變數或選取過少真實變數,建議在進行完數種不同懲罰項的group lasso之後進行向後選取法,以決定最優模型。zh_TW
dc.description.abstract (摘要) Varying coefficient model is a form of nonlinear regression models which has numerous applications in many fields. While enjoying the good interpretability, the major difference from linear model is that the coefficients are allowed to vary systematically and smoothly in more than one dimension. However, in big data, when the number of candidate variables are very large, it is challenging to select the relevant variables. In recent years, there are several works dealing with this situation. Two main approaches are selection methods and regularization methods. In this thesis, we compare groupwise forward selection and group lasso in different conditions of simulations. For forward selection, we suggest running several steps after the stopping criterion is met in order to avoid stopping too early. We also find that group lasso method select too much unrelated variables or select too few true variables under some conditions. Thus, we apply groupwise backward selection after choosing several penalty terms in group lasso to improve the performance.en_US
dc.description.tableofcontents 第一章 緒論 1

第二章 文獻探討及回顧 3

第三章研究方法 5
3.1 B-spline 5
3.2 基底擴展 6
3.3 分組向前選取法 7
3.4 Group Lasso 9

第四章 模擬 11
4.1 實驗一 11
4.2 實驗二 15

第五章 結論 22

參考文獻 23
zh_TW
dc.format.extent 444388 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0107354003en_US
dc.subject (關鍵詞) 變異係數模型zh_TW
dc.subject (關鍵詞) B-平滑曲獻zh_TW
dc.subject (關鍵詞) 向前選取法zh_TW
dc.subject (關鍵詞) Varying coefficient modelen_US
dc.subject (關鍵詞) B-splineen_US
dc.subject (關鍵詞) Forward selectionen_US
dc.subject (關鍵詞) Group lassoen_US
dc.title (題名) 多維度變異係數模型-基於B-Spline 近似之選模zh_TW
dc.title (題名) Variable Selection of High Dimension Varying Coefficient Model Under B-Spline Approximationen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) Bertsekas, D. P. (2016). Nonlinear Programming. 3rd edition. Athena Scientific.

Breiman, L. (1995). Better subset regression using the nonnegative garrote. Technometrics, 37(4):373–384.

Cai, J., Fan, J., Zhou, H., Zhou, Y., et al. (2007). Hazard models with varying coefficients for multivariate failure time data. The Annals of Statistics, 35(1):324–354.

Cheng, M.-Y., Honda, T., and Zhang, J.T. (2016). Forward variable selection for sparse ultrahigh dimensional varying coefficient models. Journal of the American Statistical Association, 111(515):1209–1221.

De Boor, C., De Boor, C., Mathématicien, E.U., De Boor, C., and De Boor, C. (1978). A practical guide to splines, volume 27. springer-verlag New York.

Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., et al. (2004). Least angle regression.The Annals of statistics, 32(2):407–499.

Fan, J., Feng, Y., and Song, R. (2011). Nonparametric independence screening in sparse ultra-high-dimensional additive models. Journal of the American Statistical Association, 106(494):544–557.

Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 96(456):1348–1360.

Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(5):849–911.

Fan, J., Ma, Y., and Dai, W. (2014). Nonparametric independence screening in sparse ultra-high-dimensional varying coefficient models. Journal of the American Statistical Association, 109(507):1270–1284.

Hastie, T. and Tibshirani, R. (1993). Varying-coefficient models. Journal of the Royal Statistical Society: Series B (Methodological), 55(4):757–796.

Hastie, T. J. and Tibshirani, R. J. (1990). Generalized additive models, volume 43. CRC press.

Luo, S. and Chen, Z. (2014). Sequential lasso cum EBIC for feature selection with ultrahigh dimensional feature space. Journal of the American Statistical Association,
109(507):1229–1240.

Meier, L., Van De Geer, S., and Bühlmann, P. (2008). The group lasso for logistic regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(1):53–71.

Nicholls, D. and Quinn, B. (1982). Random coefficient autoregressive models: an introduction. Lecture notes in statistics. Springer, Springer Nature, United States.

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288.

Wang, H. (2009). Forward regression for ultra-high dimensional variable screening. Journal of the American Statistical Association, 104(488):1512–1524.

Wei, F., Huang, J., and Li, H. (2011). Variable selection and estimation in high-dimensional varying coefficient models. Statistica Sinica, 21(4):1515–1540.

Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68:49–67.

Zhang, W., Lee, S.Y., and Song, X. (2002). Local polynomial fitting in semivarying coefficient model. Journal of Multivariate Analysis, 82(1):166–188.

Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American statistical association, 101(476):1418–1429.

Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the royal statistical society: series B (statistical methodology), 67(2):301–320.
zh_TW
dc.identifier.doi (DOI) 10.6814/NCCU202001217en_US