學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

題名 可加性迴歸模型的變數選擇
Variable selection for addititve regression model
作者 謝嘉倫
Hsieh, Jia-Lun
貢獻者 黃子銘
Huang, Tzee-Ming
謝嘉倫
Hsieh, Jia-Lun
關鍵詞 B樣條
變數選擇
函數估計
無母數迴歸
B-spline
Variable selection
Function estimation
Nonparametric regression
日期 2021
上傳時間 5-Aug-2021 10:21:41 (UTC+8)
摘要 可加性迴歸模型適合處理非線性資料,在實踐的過程中需估計模型中的未知函數,本文以spline smoothing的技術估計函數並採用B-spline基底。在這樣的估計過程中容易使模型的參數量增加,造成更大的運算負擔,同時在實務研究中不僅重視數學模型配適更著重於模型的可解釋性,因此適當的變數選擇就顯得非常重要。本文旨在討論使用可加性模型時所適用的變數選擇方法,並分別討論資料中僅含數值變數和同時包含數值變數及類變數的情形。在最後兩組模擬實驗互相比較後得出結論:用於僅含有數值變數的選擇方法運算速度快且效果優良。而用於同時包含兩種變數的選擇方法運算量較大速度也較慢,但若將其簡化只用於辨識相關的類別變數可能降低運算量並有良好的變數選擇效果。
Additive regression model is suitable for nonlinear data. To practice it, we need to estimate the unknown functions in the model. In this thesis, we will estimate the functions by spline smoothing and B-spline basis. In the process, it is easy to increase the number of parameters in the model resulting in greater burden on computation. Besides, not only the effection but also the interpretability of the model is emphasized in the practical research of quantitative analysis. As the result, variable selection is important. We aim to discuss the variable selection for additive model and diverse methods for data containing only continuous variables or containing both continuous variables and categorical variables. After comparing two simulation, it is concluded that the method for data with only continuous variables is efficient and effective, the other method for data with both continuous variables and categorical variables is more complex and less efficient but it is still worth when we only identify relevant categorical variables by the method.
參考文獻 Variyath. AM and Brobbey.A. Variable selection in multivariate multiple regression. PLOS ONE, 15(7),2020.

K.Ulm and A.Hapfelmeier. A new variable selection approach using random forests.Computational StatisticsandDataAnalysis, 60:50-69,2013.

Xuyuan Li, HolgerR.Maier, and AaronC.Zecchin. Improved pmi-based input variable selection approach for artifcial neural network and other data driven environmental and water resource models. Environmental Modelling and Software, 65:15-29, 2015.

Jerome H. Friedman and Werner Stuetzle. Projection pursuit regression. Journal of theAmerican Statistical Association, 76,1981.

Geoffrey S. Watson. Smooth regression analysis. The Indian Journal of Statistics, Series A, 26(4):359-372,1964.

Charles J. Stone. Additive regression and other nonparametric models. Annals of Statistics, 13(2),1985.

I.J.Schoenberg. Contributions to the problem of approximation of equidistant data by analytic functions. Quarterly of Applied Mathematics, 4(1):45-99,1946.

Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58:267-288,1996.

Ming Yuan and Yi Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, 68:49-67,2006.

Jian Huang, Joel L.Horowitz,and Fengrong Wei. variable selection in nonparametric additive models. The Annals of Statistics, 38(4):2282-2313,2010.

Xia Cui, Heng Peng,Songqiao Wen,and Lixing Zhu. Component selection in the additive regression model. Scandinavian Journal of Statistics, 40(3):491-510,2013.

Miao Yang, Lan Xue, and Lijian Yang. Variable selection for additive model via cumulative ratios of empirical strengths total. Journal of Nonparametric Statistics,28(3):595-616, 2016.

Shujie Ma and Jeffrey S. Racine. Additive regression splines with irrelevant categorical and continuous regressors. Statistica Sinica, 23:515-541,2013.

Carl de Boor.On calculating with b-splines. Journal of Approximation Theory,6(1):50-62, 2015.
描述 碩士
國立政治大學
統計學系
108354024
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0108354024
資料類型 thesis
dc.contributor.advisor 黃子銘zh_TW
dc.contributor.advisor Huang, Tzee-Mingen_US
dc.contributor.author (Authors) 謝嘉倫zh_TW
dc.contributor.author (Authors) Hsieh, Jia-Lunen_US
dc.creator (作者) 謝嘉倫zh_TW
dc.creator (作者) Hsieh, Jia-Lunen_US
dc.date (日期) 2021en_US
dc.date.accessioned 5-Aug-2021 10:21:41 (UTC+8)-
dc.date.available 5-Aug-2021 10:21:41 (UTC+8)-
dc.date.issued (上傳時間) 5-Aug-2021 10:21:41 (UTC+8)-
dc.identifier (Other Identifiers) G0108354024en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/136768-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 統計學系zh_TW
dc.description (描述) 108354024zh_TW
dc.description.abstract (摘要) 可加性迴歸模型適合處理非線性資料,在實踐的過程中需估計模型中的未知函數,本文以spline smoothing的技術估計函數並採用B-spline基底。在這樣的估計過程中容易使模型的參數量增加,造成更大的運算負擔,同時在實務研究中不僅重視數學模型配適更著重於模型的可解釋性,因此適當的變數選擇就顯得非常重要。本文旨在討論使用可加性模型時所適用的變數選擇方法,並分別討論資料中僅含數值變數和同時包含數值變數及類變數的情形。在最後兩組模擬實驗互相比較後得出結論:用於僅含有數值變數的選擇方法運算速度快且效果優良。而用於同時包含兩種變數的選擇方法運算量較大速度也較慢,但若將其簡化只用於辨識相關的類別變數可能降低運算量並有良好的變數選擇效果。zh_TW
dc.description.abstract (摘要) Additive regression model is suitable for nonlinear data. To practice it, we need to estimate the unknown functions in the model. In this thesis, we will estimate the functions by spline smoothing and B-spline basis. In the process, it is easy to increase the number of parameters in the model resulting in greater burden on computation. Besides, not only the effection but also the interpretability of the model is emphasized in the practical research of quantitative analysis. As the result, variable selection is important. We aim to discuss the variable selection for additive model and diverse methods for data containing only continuous variables or containing both continuous variables and categorical variables. After comparing two simulation, it is concluded that the method for data with only continuous variables is efficient and effective, the other method for data with both continuous variables and categorical variables is more complex and less efficient but it is still worth when we only identify relevant categorical variables by the method.en_US
dc.description.tableofcontents 1. 緒論 6
2. 文獻探討 8
3. 研究方法 10
3.1. B-spline與模型架構 10
3.2. 數值變數的選擇 12
3.3. 類別變數的選擇 13
4. 模擬實驗 17
4.1. 實驗一 17
4.1.1. 模擬資料生成 17
4.1.2. 截點配置與B-spline估計 18
4.1.3. 變數重要性計算 22
4.1.4. 決策方法 23
4.1.5. 500次模擬 24
4.2. 實驗二 24
4.2.1. 模擬資料生成 24
4.2.2. 超參數 25
4.2.3. 參數估計與交叉驗證 25
5. 結論與建議 27
A. 附錄 28
參考文獻 30
zh_TW
dc.format.extent 1157897 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0108354024en_US
dc.subject (關鍵詞) B樣條zh_TW
dc.subject (關鍵詞) 變數選擇zh_TW
dc.subject (關鍵詞) 函數估計zh_TW
dc.subject (關鍵詞) 無母數迴歸zh_TW
dc.subject (關鍵詞) B-splineen_US
dc.subject (關鍵詞) Variable selectionen_US
dc.subject (關鍵詞) Function estimationen_US
dc.subject (關鍵詞) Nonparametric regressionen_US
dc.title (題名) 可加性迴歸模型的變數選擇zh_TW
dc.title (題名) Variable selection for addititve regression modelen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) Variyath. AM and Brobbey.A. Variable selection in multivariate multiple regression. PLOS ONE, 15(7),2020.

K.Ulm and A.Hapfelmeier. A new variable selection approach using random forests.Computational StatisticsandDataAnalysis, 60:50-69,2013.

Xuyuan Li, HolgerR.Maier, and AaronC.Zecchin. Improved pmi-based input variable selection approach for artifcial neural network and other data driven environmental and water resource models. Environmental Modelling and Software, 65:15-29, 2015.

Jerome H. Friedman and Werner Stuetzle. Projection pursuit regression. Journal of theAmerican Statistical Association, 76,1981.

Geoffrey S. Watson. Smooth regression analysis. The Indian Journal of Statistics, Series A, 26(4):359-372,1964.

Charles J. Stone. Additive regression and other nonparametric models. Annals of Statistics, 13(2),1985.

I.J.Schoenberg. Contributions to the problem of approximation of equidistant data by analytic functions. Quarterly of Applied Mathematics, 4(1):45-99,1946.

Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58:267-288,1996.

Ming Yuan and Yi Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, 68:49-67,2006.

Jian Huang, Joel L.Horowitz,and Fengrong Wei. variable selection in nonparametric additive models. The Annals of Statistics, 38(4):2282-2313,2010.

Xia Cui, Heng Peng,Songqiao Wen,and Lixing Zhu. Component selection in the additive regression model. Scandinavian Journal of Statistics, 40(3):491-510,2013.

Miao Yang, Lan Xue, and Lijian Yang. Variable selection for additive model via cumulative ratios of empirical strengths total. Journal of Nonparametric Statistics,28(3):595-616, 2016.

Shujie Ma and Jeffrey S. Racine. Additive regression splines with irrelevant categorical and continuous regressors. Statistica Sinica, 23:515-541,2013.

Carl de Boor.On calculating with b-splines. Journal of Approximation Theory,6(1):50-62, 2015.
zh_TW
dc.identifier.doi (DOI) 10.6814/NCCU202100689en_US