LASSO與其衍生方法之特性比較

Publications-Theses

Article View/Open

pdf(869)

Publication Export

Google Scholar^TM

題名	LASSO與其衍生方法之特性比較 Property comparison of LASSO and its derivative methods
作者	黃昭勳 Huang, Jau-Shiun
貢獻者	蔡政安<br>薛慧敏 Tsai, Chen-An<br>Hsueh, Huey-Miin 黃昭勳 Huang, Jau-Shiun
關鍵詞	Elastic Net LASSO 懲罰函數迴歸變數篩選 Elastic Net LASSO Penalty function Regression Variable selection
日期	2017
上傳時間	11-Jul-2017 11:25:28 (UTC+8)
摘要	本論文比較了幾種估計線性模型係數的方法，包括LASSO、Elastic Net、LAD-LASSO、EBLASSO和EBENet。有別於普通最小平方法，這些方法在估計模型係數的同時，能夠達到變數篩選，也就是刪除不重要的解釋變數，只將重要的變數保留在模型中。在現今大數據的時代，資料量有著愈來愈龐大的趨勢，其中不乏上百個甚至上千個解釋變數的資料，對於這樣的資料，變數篩選就顯得更加重要。本文主要目的為評估各種估計模型係數方法的特性與優劣，當中包含了兩種模擬研究與兩筆實際資料應用。由模擬的分析結果來看，每種估計方法都有不同的特性，沒有一種方法使用在所有資料都是最好的。 In this study, we compare several methods for estimating coefficients of linear models, including LASSO, Elastic Net, LAD-LASSO, EBLASSO and EBENet. These methods are different from Ordinary Least Square (OLS) because they allow estimation of coefficients and variable selection simultaneously. In other words, these methods eliminate non-important predictors and only important predictors remain in the model. In the age of big data, quantity of data has become larger and larger. A datum with hundreds of or thousands of predictors is also common. For this type of data, variable selection is apparently more essential. The primary goal of this article is to compare properties of different variable selection methods as well as to find which method best fits a large number of data. Two simulation scenarios and two real data applications are included in this study. By analyzing results from the simulation study, we can find that every method enjoys different characteristics, and no standard method can handle all kinds of data.
參考文獻	黃書彬，攝護腺特異抗原(PSA)過高的意義？？，上網日期106年5月17日，檢自http://www.kmuh.org.tw/www/kmcj/data/10306/11.htm 蔡政安，2009。《微陣列資料分析(Microarray Data Analysis)》。中國醫藥大學生物統計中心。 Cai, X., Huang, A. and Xu, S. (2011). Fast empirical Bayesian LASSO for multiple quantitative trait locus mapping. BMC Bioinformatics, 12, 211. Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist., 32, 407-499. Gao, X.L. and Huang, J. (2010). Asymptotic analysis of high-dimensional LAD regression with Lasso. Statistica Sinica, 20, 1485-1506. Gill, P., Murray, W. and Wright, M., (1981). Practical optimization. New York: Academic Press. Hoerl, A. and Kennard, R. (1988). Ridge regression. Encyclopedia of Statistical Sciences, 8, 129-136. Huang, A., Xu, S. and Cai, X. (2015). Empirical Bayesian elastic net for multiple quantitative trait locus mapping. Heredity, 114, 107-115. Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso. J. R. Statist. Soc. B, 58, 267-288. Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Statist. Soc. B, 67, 301-320.
描述	碩士國立政治大學統計學系 104354012
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0104354012
資料類型	thesis

dc.contributor.advisor	蔡政安<br>薛慧敏	zh_TW
dc.contributor.advisor	Tsai, Chen-An<br>Hsueh, Huey-Miin	en_US
dc.contributor.author (Authors)	黃昭勳	zh_TW
dc.contributor.author (Authors)	Huang, Jau-Shiun	en_US
dc.creator (作者)	黃昭勳	zh_TW
dc.creator (作者)	Huang, Jau-Shiun	en_US
dc.date (日期)	2017	en_US
dc.date.accessioned	11-Jul-2017 11:25:28 (UTC+8)	-
dc.date.available	11-Jul-2017 11:25:28 (UTC+8)	-
dc.date.issued (上傳時間)	11-Jul-2017 11:25:28 (UTC+8)	-
dc.identifier (Other Identifiers)	G0104354012	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/110781	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	統計學系	zh_TW
dc.description (描述)	104354012	zh_TW
dc.description.abstract (摘要)	本論文比較了幾種估計線性模型係數的方法，包括LASSO、Elastic Net、LAD-LASSO、EBLASSO和EBENet。有別於普通最小平方法，這些方法在估計模型係數的同時，能夠達到變數篩選，也就是刪除不重要的解釋變數，只將重要的變數保留在模型中。在現今大數據的時代，資料量有著愈來愈龐大的趨勢，其中不乏上百個甚至上千個解釋變數的資料，對於這樣的資料，變數篩選就顯得更加重要。本文主要目的為評估各種估計模型係數方法的特性與優劣，當中包含了兩種模擬研究與兩筆實際資料應用。由模擬的分析結果來看，每種估計方法都有不同的特性，沒有一種方法使用在所有資料都是最好的。	zh_TW
dc.description.abstract (摘要)	In this study, we compare several methods for estimating coefficients of linear models, including LASSO, Elastic Net, LAD-LASSO, EBLASSO and EBENet. These methods are different from Ordinary Least Square (OLS) because they allow estimation of coefficients and variable selection simultaneously. In other words, these methods eliminate non-important predictors and only important predictors remain in the model. In the age of big data, quantity of data has become larger and larger. A datum with hundreds of or thousands of predictors is also common. For this type of data, variable selection is apparently more essential. The primary goal of this article is to compare properties of different variable selection methods as well as to find which method best fits a large number of data. Two simulation scenarios and two real data applications are included in this study. By analyzing results from the simulation study, we can find that every method enjoys different characteristics, and no standard method can handle all kinds of data.	en_US
dc.description.tableofcontents	第一章研究背景 1 第二章研究方法 4 第三章模擬研究 9 第一節前言 9 第二節模擬過程 9 第三節模擬結果與討論 13 第四章變數分群模擬研究 19 第一節前言 19 第二節模擬過程 19 第三節模擬結果與討論 20 第五章實際資料應用 25 第一節攝護腺癌 (Prostate Cancer) 研究應用 25 第二節白血病 (Leukemia) 研究應用 28 第六章結論 39 參考文獻 40	zh_TW
dc.format.extent	1480098 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0104354012	en_US
dc.subject (關鍵詞)	Elastic Net	zh_TW
dc.subject (關鍵詞)	LASSO	zh_TW
dc.subject (關鍵詞)	懲罰函數	zh_TW
dc.subject (關鍵詞)	迴歸	zh_TW
dc.subject (關鍵詞)	變數篩選	zh_TW
dc.subject (關鍵詞)	Elastic Net	en_US
dc.subject (關鍵詞)	LASSO	en_US
dc.subject (關鍵詞)	Penalty function	en_US
dc.subject (關鍵詞)	Regression	en_US
dc.subject (關鍵詞)	Variable selection	en_US
dc.title (題名)	LASSO與其衍生方法之特性比較	zh_TW
dc.title (題名)	Property comparison of LASSO and its derivative methods	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	黃書彬，攝護腺特異抗原(PSA)過高的意義？？，上網日期106年5月17日，檢自http://www.kmuh.org.tw/www/kmcj/data/10306/11.htm 蔡政安，2009。《微陣列資料分析(Microarray Data Analysis)》。中國醫藥大學生物統計中心。 Cai, X., Huang, A. and Xu, S. (2011). Fast empirical Bayesian LASSO for multiple quantitative trait locus mapping. BMC Bioinformatics, 12, 211. Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist., 32, 407-499. Gao, X.L. and Huang, J. (2010). Asymptotic analysis of high-dimensional LAD regression with Lasso. Statistica Sinica, 20, 1485-1506. Gill, P., Murray, W. and Wright, M., (1981). Practical optimization. New York: Academic Press. Hoerl, A. and Kennard, R. (1988). Ridge regression. Encyclopedia of Statistical Sciences, 8, 129-136. Huang, A., Xu, S. and Cai, X. (2015). Empirical Bayesian elastic net for multiple quantitative trait locus mapping. Heredity, 114, 107-115. Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso. J. R. Statist. Soc. B, 58, 267-288. Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Statist. Soc. B, 67, 301-320.	zh_TW

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM