多維度變異係數模型-基於B-Spline 近似之選模

學術產出-Theses

Article View/Open

pdf(0)

Publication Export

Google Scholar^TM

政大圖書館

學術資源探索系統

Citation Infomation

Simple Record
Full Record

題名	多維度變異係數模型-基於B-Spline 近似之選模 Variable Selection of High Dimension Varying Coefficient Model Under B-Spline Approximation
作者	楊博安 Yang, Po-An
貢獻者	黃子銘 Huang, Tzee-Ming 楊博安 Yang, Po-An
關鍵詞	變異係數模型 B-平滑曲獻向前選取法 Varying coefficient model B-spline Forward selection Group lasso
日期	2020
上傳時間	2-Sep-2020 11:42:02 (UTC+8)
摘要	變異係數模型是一種非線性模型，在許多領域都有廣泛的應用。與線型模型相比，變異係數模型最大的特點是允許係數隨著影響變數變動而變動，同時也保留易於詮釋的優點。而在大數據的時代，資料蒐集變得相對容易，當資料的變數個數非常大，而有顯著貢獻的真實變數不多時，如何挑選有用的變數十分重要。現行研究中多半以向前選取法與正規化方法兩種類型為主。本文以模擬實驗比較分組向前選取法與group lasso方法在不同條件設定下的優劣，並提出下列兩點建議：為了防止向前選取法過早停止，建議在BIC不再改善後再進行數步選取變數群組流程；某些時候group lasso傾向選取過多無關變數或選取過少真實變數，建議在進行完數種不同懲罰項的group lasso之後進行向後選取法，以決定最優模型。 Varying coefficient model is a form of nonlinear regression models which has numerous applications in many fields. While enjoying the good interpretability, the major difference from linear model is that the coefficients are allowed to vary systematically and smoothly in more than one dimension. However, in big data, when the number of candidate variables are very large, it is challenging to select the relevant variables. In recent years, there are several works dealing with this situation. Two main approaches are selection methods and regularization methods. In this thesis, we compare groupwise forward selection and group lasso in different conditions of simulations. For forward selection, we suggest running several steps after the stopping criterion is met in order to avoid stopping too early. We also find that group lasso method select too much unrelated variables or select too few true variables under some conditions. Thus, we apply groupwise backward selection after choosing several penalty terms in group lasso to improve the performance.
參考文獻	Bertsekas, D. P. (2016). Nonlinear Programming. 3rd edition. Athena Scientific. Breiman, L. (1995). Better subset regression using the nonnegative garrote. Technometrics, 37(4):373–384. Cai, J., Fan, J., Zhou, H., Zhou, Y., et al. (2007). Hazard models with varying coefficients for multivariate failure time data. The Annals of Statistics, 35(1):324–354. Cheng, M.-Y., Honda, T., and Zhang, J.T. (2016). Forward variable selection for sparse ultrahigh dimensional varying coefficient models. Journal of the American Statistical Association, 111(515):1209–1221. De Boor, C., De Boor, C., Mathématicien, E.U., De Boor, C., and De Boor, C. (1978). A practical guide to splines, volume 27. springer-verlag New York. Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., et al. (2004). Least angle regression.The Annals of statistics, 32(2):407–499. Fan, J., Feng, Y., and Song, R. (2011). Nonparametric independence screening in sparse ultra-high-dimensional additive models. Journal of the American Statistical Association, 106(494):544–557. Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 96(456):1348–1360. Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(5):849–911. Fan, J., Ma, Y., and Dai, W. (2014). Nonparametric independence screening in sparse ultra-high-dimensional varying coefficient models. Journal of the American Statistical Association, 109(507):1270–1284. Hastie, T. and Tibshirani, R. (1993). Varying-coefficient models. Journal of the Royal Statistical Society: Series B (Methodological), 55(4):757–796. Hastie, T. J. and Tibshirani, R. J. (1990). Generalized additive models, volume 43. CRC press. Luo, S. and Chen, Z. (2014). Sequential lasso cum EBIC for feature selection with ultrahigh dimensional feature space. Journal of the American Statistical Association, 109(507):1229–1240. Meier, L., Van De Geer, S., and Bühlmann, P. (2008). The group lasso for logistic regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(1):53–71. Nicholls, D. and Quinn, B. (1982). Random coefficient autoregressive models: an introduction. Lecture notes in statistics. Springer, Springer Nature, United States. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288. Wang, H. (2009). Forward regression for ultra-high dimensional variable screening. Journal of the American Statistical Association, 104(488):1512–1524. Wei, F., Huang, J., and Li, H. (2011). Variable selection and estimation in high-dimensional varying coefficient models. Statistica Sinica, 21(4):1515–1540. Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68:49–67. Zhang, W., Lee, S.Y., and Song, X. (2002). Local polynomial fitting in semivarying coefficient model. Journal of Multivariate Analysis, 82(1):166–188. Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American statistical association, 101(476):1418–1429. Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the royal statistical society: series B (statistical methodology), 67(2):301–320.
描述	碩士國立政治大學統計學系 107354003
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0107354003
資料類型	thesis

dc.contributor.advisor	黃子銘	zh_TW
dc.contributor.advisor	Huang, Tzee-Ming	en_US
dc.contributor.author (Authors)	楊博安	zh_TW
dc.contributor.author (Authors)	Yang, Po-An	en_US
dc.creator (作者)	楊博安	zh_TW
dc.creator (作者)	Yang, Po-An	en_US
dc.date (日期)	2020	en_US
dc.date.accessioned	2-Sep-2020 11:42:02 (UTC+8)	-
dc.date.available	2-Sep-2020 11:42:02 (UTC+8)	-
dc.date.issued (上傳時間)	2-Sep-2020 11:42:02 (UTC+8)	-
dc.identifier (Other Identifiers)	G0107354003	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/131472	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	統計學系	zh_TW
dc.description (描述)	107354003	zh_TW
dc.description.abstract (摘要)	變異係數模型是一種非線性模型，在許多領域都有廣泛的應用。與線型模型相比，變異係數模型最大的特點是允許係數隨著影響變數變動而變動，同時也保留易於詮釋的優點。而在大數據的時代，資料蒐集變得相對容易，當資料的變數個數非常大，而有顯著貢獻的真實變數不多時，如何挑選有用的變數十分重要。現行研究中多半以向前選取法與正規化方法兩種類型為主。本文以模擬實驗比較分組向前選取法與group lasso方法在不同條件設定下的優劣，並提出下列兩點建議：為了防止向前選取法過早停止，建議在BIC不再改善後再進行數步選取變數群組流程；某些時候group lasso傾向選取過多無關變數或選取過少真實變數，建議在進行完數種不同懲罰項的group lasso之後進行向後選取法，以決定最優模型。	zh_TW
dc.description.abstract (摘要)	Varying coefficient model is a form of nonlinear regression models which has numerous applications in many fields. While enjoying the good interpretability, the major difference from linear model is that the coefficients are allowed to vary systematically and smoothly in more than one dimension. However, in big data, when the number of candidate variables are very large, it is challenging to select the relevant variables. In recent years, there are several works dealing with this situation. Two main approaches are selection methods and regularization methods. In this thesis, we compare groupwise forward selection and group lasso in different conditions of simulations. For forward selection, we suggest running several steps after the stopping criterion is met in order to avoid stopping too early. We also find that group lasso method select too much unrelated variables or select too few true variables under some conditions. Thus, we apply groupwise backward selection after choosing several penalty terms in group lasso to improve the performance.	en_US
dc.description.tableofcontents	第一章緒論 1 第二章文獻探討及回顧 3 第三章研究方法 5 3.1 B-spline 5 3.2 基底擴展 6 3.3 分組向前選取法 7 3.4 Group Lasso 9 第四章模擬 11 4.1 實驗一 11 4.2 實驗二 15 第五章結論 22 參考文獻 23	zh_TW
dc.format.extent	444388 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0107354003	en_US
dc.subject (關鍵詞)	變異係數模型	zh_TW
dc.subject (關鍵詞)	B-平滑曲獻	zh_TW
dc.subject (關鍵詞)	向前選取法	zh_TW
dc.subject (關鍵詞)	Varying coefficient model	en_US
dc.subject (關鍵詞)	B-spline	en_US
dc.subject (關鍵詞)	Forward selection	en_US
dc.subject (關鍵詞)	Group lasso	en_US
dc.title (題名)	多維度變異係數模型-基於B-Spline 近似之選模	zh_TW
dc.title (題名)	Variable Selection of High Dimension Varying Coefficient Model Under B-Spline Approximation	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	Bertsekas, D. P. (2016). Nonlinear Programming. 3rd edition. Athena Scientific. Breiman, L. (1995). Better subset regression using the nonnegative garrote. Technometrics, 37(4):373–384. Cai, J., Fan, J., Zhou, H., Zhou, Y., et al. (2007). Hazard models with varying coefficients for multivariate failure time data. The Annals of Statistics, 35(1):324–354. Cheng, M.-Y., Honda, T., and Zhang, J.T. (2016). Forward variable selection for sparse ultrahigh dimensional varying coefficient models. Journal of the American Statistical Association, 111(515):1209–1221. De Boor, C., De Boor, C., Mathématicien, E.U., De Boor, C., and De Boor, C. (1978). A practical guide to splines, volume 27. springer-verlag New York. Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., et al. (2004). Least angle regression.The Annals of statistics, 32(2):407–499. Fan, J., Feng, Y., and Song, R. (2011). Nonparametric independence screening in sparse ultra-high-dimensional additive models. Journal of the American Statistical Association, 106(494):544–557. Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 96(456):1348–1360. Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(5):849–911. Fan, J., Ma, Y., and Dai, W. (2014). Nonparametric independence screening in sparse ultra-high-dimensional varying coefficient models. Journal of the American Statistical Association, 109(507):1270–1284. Hastie, T. and Tibshirani, R. (1993). Varying-coefficient models. Journal of the Royal Statistical Society: Series B (Methodological), 55(4):757–796. Hastie, T. J. and Tibshirani, R. J. (1990). Generalized additive models, volume 43. CRC press. Luo, S. and Chen, Z. (2014). Sequential lasso cum EBIC for feature selection with ultrahigh dimensional feature space. Journal of the American Statistical Association, 109(507):1229–1240. Meier, L., Van De Geer, S., and Bühlmann, P. (2008). The group lasso for logistic regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(1):53–71. Nicholls, D. and Quinn, B. (1982). Random coefficient autoregressive models: an introduction. Lecture notes in statistics. Springer, Springer Nature, United States. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288. Wang, H. (2009). Forward regression for ultra-high dimensional variable screening. Journal of the American Statistical Association, 104(488):1512–1524. Wei, F., Huang, J., and Li, H. (2011). Variable selection and estimation in high-dimensional varying coefficient models. Statistica Sinica, 21(4):1515–1540. Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68:49–67. Zhang, W., Lee, S.Y., and Song, X. (2002). Local polynomial fitting in semivarying coefficient model. Journal of Multivariate Analysis, 82(1):166–188. Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American statistical association, 101(476):1418–1429. Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the royal statistical society: series B (statistical methodology), 67(2):301–320.	zh_TW
dc.identifier.doi (DOI)	10.6814/NCCU202001217	en_US

學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

Google Scholar^TM