Lasso顯著性檢定與向前逐步迴歸變數選取方法之比較 | Publication

Publications-Theses

Article View/Open

pdf(502)

Publication Export

Google Scholar^TM

Title	Lasso顯著性檢定與向前逐步迴歸變數選取方法之比較 A Comparison between Lasso Significance Test and Forward Stepwise Selection Method
Creator	鄒昀庭 Tsou, Yun Ting
Contributor	黃子銘 Huang, Tzee Ming 鄒昀庭 Tsou, Yun Ting
Key Words	變數選取最小絕對壓縮挑選機制向前逐步迴歸拔靴法 Variable Selection Least Absolute Shrinkage and Selection Operator Forward Stepwise Regression Bootstrap
Date	2013
Date Issued	6-Aug-2014 11:39:39 (UTC+8)
Summary	迴歸模式的變數選取是很重要的課題，Tibshirani於1996年提出最小絕對壓縮挑選機制（Least Absolute Shrinkage and Selection Operator；簡稱Lasso），主要特色是能在估計的過程中自動完成變數選取。但因為Lasso本身並沒有牽扯到統計推論的層面，因此2014年時Lockhart et al.所提出的Lasso顯著性檢定是重要的突破。由於Lasso顯著性檢定的建構過程與傳統向前逐步迴歸相近，本研究接續Lockhart et al.(2014)對兩種變數選取方法的比較，提出以Bootstrap來改良傳統向前逐步迴歸；最後並比較Lasso、Lasso顯著性檢定、傳統向前逐步迴歸、以AIC決定變數組合的向前逐步迴歸，以及以Bootstrap改良的向前逐步迴歸等五種方法變數選取之效果。最後發現Lasso顯著性檢定雖然不容易犯型一錯誤，選取變數時卻過於保守；而以Bootstrap改良的向前逐步迴歸跟Lasso顯著性檢定一樣不容易犯型一錯誤，而選取變數上又比起Lasso顯著性檢定更大膽，因此可算是理想的方法改良結果。 Variable selection of a regression model is an essential topic. In 1996, Tibshirani proposed a method called Lasso (Least Absolute Shrinkage and Selection Operator), which completes the matter of selecting variable set while estimating the parameters. However, the original version of Lasso does not provide a way for making inference. Therefore, the significance test for lasso proposed by Lockhart et al. in 2014 is an important breakthrough. Based on the similarity of construction of statistics between Lasso significance test and forward selection method, continuing the comparisons between the two methods from Lockhart et al. (2014), we propose an improved version of forward selection method by bootstrap. And at the second half of our research, we compare the variable selection results of Lasso, Lasso significance test, forward selection, forward selection by AIC, and forward selection by bootstrap. We find that although the Type I error probability for Lasso Significance Test is small, the testing method is too conservative for including new variables. On the other hand, the Type I error probability for forward selection by bootstrap is also small, yet it is more aggressive in including new variables. Therefore, based on our simulation results, the bootstrap improving forward selection is rather an ideal variable selecting method.
參考文獻	[1] Frank I. and Friedman J. (1993) A Statistical View of Some Chemometrics Regression Tools, Technometrics, 35, p.109-148. [2] Tibshirani R. J. (1996). Regression Shrinkage and Selection via the LASSO, Journal of the Royal Statistical Society, Series B, Volume 58, p.267-288. [3] Osborne M. R., Presnell B., and Turlach B. A. (2000) On the Lasso and Its Dual, Journal of Computational and Graphical Statistics 9, p.319-337. [4] Fan J. and Li R. (2001) Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties, Journal of the American Statistical Association 96, p.1348-1360. [5] Miller A. (2002) Subset Selection in Regression, Second Edition, Chapman & Hall/CRC. [6] Zou H. (2006) The Adaptive Lasso and Its Oracle Properties, Journal of the American Statistical Association, 101, p.1418-1429. [7] 葉世弘（2009），運用aGLasso在多變量線性迴歸模型的模型選取，國立成功大學碩士論文。 [8] Cortez P., Teixeira J., Cerdeira A., Almeida F., Matos T., and Reis J. (2009) Using Data Mining for Wine Quality Assessment, Proceedings of the 12th International Conference on Discovery Science, p.66-79, October 03-05, 2009, Porto, Portugal. [9] Kyung M., Gill J., Ghosh M., and Casella G. (2010) Penalized regression, standard errors, and Bayesian Lassos, Bayesian Analysis, 5, p.369-412. [10] Lockhart R., Taylor J., Tibshirani R., and Tibshirani R. J. (2014) A Significance Test for the Lasso, Annals of Statistics, Vol. 42, No. 2, p.413-468. [11] Kass R. E., Eden U. T., and Brown E. N. (2014) Analysis of Neural Data, Springer.
Description	碩士國立政治大學統計研究所 101354002 102
資料來源	http://thesis.lib.nccu.edu.tw/record/#G1013540022
Type	thesis

dc.contributor.advisor	黃子銘	zh_TW
dc.contributor.advisor	Huang, Tzee Ming	en_US
dc.contributor.author (Authors)	鄒昀庭	zh_TW
dc.contributor.author (Authors)	Tsou, Yun Ting	en_US
dc.creator (作者)	鄒昀庭	zh_TW
dc.creator (作者)	Tsou, Yun Ting	en_US
dc.date (日期)	2013	en_US
dc.date.accessioned	6-Aug-2014 11:39:39 (UTC+8)	-
dc.date.available	6-Aug-2014 11:39:39 (UTC+8)	-
dc.date.issued (上傳時間)	6-Aug-2014 11:39:39 (UTC+8)	-
dc.identifier (Other Identifiers)	G1013540022	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/68228	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	統計研究所	zh_TW
dc.description (描述)	101354002	zh_TW
dc.description (描述)	102	zh_TW
dc.description.abstract (摘要)	迴歸模式的變數選取是很重要的課題，Tibshirani於1996年提出最小絕對壓縮挑選機制（Least Absolute Shrinkage and Selection Operator；簡稱Lasso），主要特色是能在估計的過程中自動完成變數選取。但因為Lasso本身並沒有牽扯到統計推論的層面，因此2014年時Lockhart et al.所提出的Lasso顯著性檢定是重要的突破。由於Lasso顯著性檢定的建構過程與傳統向前逐步迴歸相近，本研究接續Lockhart et al.(2014)對兩種變數選取方法的比較，提出以Bootstrap來改良傳統向前逐步迴歸；最後並比較Lasso、Lasso顯著性檢定、傳統向前逐步迴歸、以AIC決定變數組合的向前逐步迴歸，以及以Bootstrap改良的向前逐步迴歸等五種方法變數選取之效果。最後發現Lasso顯著性檢定雖然不容易犯型一錯誤，選取變數時卻過於保守；而以Bootstrap改良的向前逐步迴歸跟Lasso顯著性檢定一樣不容易犯型一錯誤，而選取變數上又比起Lasso顯著性檢定更大膽，因此可算是理想的方法改良結果。	zh_TW
dc.description.abstract (摘要)	Variable selection of a regression model is an essential topic. In 1996, Tibshirani proposed a method called Lasso (Least Absolute Shrinkage and Selection Operator), which completes the matter of selecting variable set while estimating the parameters. However, the original version of Lasso does not provide a way for making inference. Therefore, the significance test for lasso proposed by Lockhart et al. in 2014 is an important breakthrough. Based on the similarity of construction of statistics between Lasso significance test and forward selection method, continuing the comparisons between the two methods from Lockhart et al. (2014), we propose an improved version of forward selection method by bootstrap. And at the second half of our research, we compare the variable selection results of Lasso, Lasso significance test, forward selection, forward selection by AIC, and forward selection by bootstrap. We find that although the Type I error probability for Lasso Significance Test is small, the testing method is too conservative for including new variables. On the other hand, the Type I error probability for forward selection by bootstrap is also small, yet it is more aggressive in including new variables. Therefore, based on our simulation results, the bootstrap improving forward selection is rather an ideal variable selecting method.	en_US
dc.description.tableofcontents	第一章　緒論 1 1.1　研究背景與動機 1 1.2　研究目的 1 1.3　研究流程 2 第二章　文獻回顧 3 2.1　Lasso 3 2.2　由統計推論的角度探討Lasso 5 2.3　Lasso顯著性檢定 5 2.4　由分配的角度比較Lasso顯著性檢定與向前逐步迴歸 6 第三章　改良向前逐步迴歸 9 3.1　驗證向前逐步迴歸之缺陷 9 3.2　透過Bootstrap改良向前逐步迴歸 11 第四章　模擬資料分析 13 4.1　模擬設計與流程 13 4.2　模擬結果 16 第五章　實證資料分析 24 5.1　資料背景 24 5.2　定義問題與方法 25 5.3　變數選取 26 第六章　結論與建議 30 6.1　結論 30 6.2　限制與建議 31 參考文獻 33	zh_TW
dc.format.extent	878632 bytes	-
dc.format.mimetype	application/pdf	-
dc.language.iso	en_US	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G1013540022	en_US
dc.subject (關鍵詞)	變數選取	zh_TW
dc.subject (關鍵詞)	最小絕對壓縮挑選機制	zh_TW
dc.subject (關鍵詞)	向前逐步迴歸	zh_TW
dc.subject (關鍵詞)	拔靴法	zh_TW
dc.subject (關鍵詞)	Variable Selection	en_US
dc.subject (關鍵詞)	Least Absolute Shrinkage and Selection Operator	en_US
dc.subject (關鍵詞)	Forward Stepwise Regression	en_US
dc.subject (關鍵詞)	Bootstrap	en_US
dc.title (題名)	Lasso顯著性檢定與向前逐步迴歸變數選取方法之比較	zh_TW
dc.title (題名)	A Comparison between Lasso Significance Test and Forward Stepwise Selection Method	en_US
dc.type (資料類型)	thesis	en
dc.relation.reference (參考文獻)	[1] Frank I. and Friedman J. (1993) A Statistical View of Some Chemometrics Regression Tools, Technometrics, 35, p.109-148. [2] Tibshirani R. J. (1996). Regression Shrinkage and Selection via the LASSO, Journal of the Royal Statistical Society, Series B, Volume 58, p.267-288. [3] Osborne M. R., Presnell B., and Turlach B. A. (2000) On the Lasso and Its Dual, Journal of Computational and Graphical Statistics 9, p.319-337. [4] Fan J. and Li R. (2001) Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties, Journal of the American Statistical Association 96, p.1348-1360. [5] Miller A. (2002) Subset Selection in Regression, Second Edition, Chapman & Hall/CRC. [6] Zou H. (2006) The Adaptive Lasso and Its Oracle Properties, Journal of the American Statistical Association, 101, p.1418-1429. [7] 葉世弘（2009），運用aGLasso在多變量線性迴歸模型的模型選取，國立成功大學碩士論文。 [8] Cortez P., Teixeira J., Cerdeira A., Almeida F., Matos T., and Reis J. (2009) Using Data Mining for Wine Quality Assessment, Proceedings of the 12th International Conference on Discovery Science, p.66-79, October 03-05, 2009, Porto, Portugal. [9] Kyung M., Gill J., Ghosh M., and Casella G. (2010) Penalized regression, standard errors, and Bayesian Lassos, Bayesian Analysis, 5, p.369-412. [10] Lockhart R., Taylor J., Tibshirani R., and Tibshirani R. J. (2014) A Significance Test for the Lasso, Annals of Statistics, Vol. 42, No. 2, p.413-468. [11] Kass R. E., Eden U. T., and Brown E. N. (2014) Analysis of Neural Data, Springer.	zh_TW

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM