線性迴歸模型的隨機梯度下降估計之漸近分析 | Publication

Publications-Theses

Article View/Open

pdf(0)

Publication Export

Google Scholar^TM

NCCU Library

Discovery System

Citation Infomation

Related Publications in TAIR

Simple Record
Full Record

題名	線性迴歸模型的隨機梯度下降估計之漸近分析 Asymptotic Analysis of Stochastic Gradient-Based Estimators for Linear Regression Models
作者	李承志 Li, Cheng-Jhih
貢獻者	翁久幸 Weng, Chiu-Hsing 李承志 Li, Cheng-Jhih
關鍵詞	隨機梯度下降法平均隨機梯度下降法線性迴歸模型漸進分析學習率維度 Stochastic Gradient Descent Averaged Stochastic Gradient Descent Linear Regression Asymptotic Analysis Learning Rate Dimensionality
日期	2022
上傳時間	1-Aug-2022 17:14:56 (UTC+8)
摘要	隨機梯度下降法 (Stochastic Gradient Descent; SGD) 是一個常見的最佳化問題解決方法。因為在計算上只需要使用到損失函數的一階微分，梯度下降法有很好的計算效率。然而，當學習率中的比例常數設置不當時，SGD 將變得不穩定。因此，選擇合適的學習率是梯度下降更新中一個重要的課題。另一種與 SGD 相關的演算法是平均梯度下降法 (Averaged Stochastic Gradient Descent; ASGD) 。ASGD 是在每次的 SGD 迭代之後增加一個平均步驟。當 ASGD 所使用的學習率遞減速度不低於 t^{−1}，此方法被證明是漸進有效的。但事實上 ASGD 遭遇到一些問題，包含: (i) 由於平均的速度緩慢或學習率不當，ASGD 需要大量的樣本 (ii) 在高維度的問題中較不穩定。在這篇論文中，我們從理論和模擬兩個方面研究在線性迴歸模型中 SGD 和 ASGD 的漸近性質。雖然我們發現 ASGD 在某些情況下比 SGD 更敏感，但是藉由謹慎的學習率選擇可以改善結果。此外，增加訓練樣本也提升 ASGD 的表現。 Stochastic Gradient Descent (SGD) is a common solution for optimization problem. It is computationally efficient because the algorithm only relies on the first derivative of the loss function. However, it is known that SGD is lack of robustness due to the improper setting of proportionality constant. Therefore, choosing an appropriate learning rate is an important issue for SGD update. Another SGD-related algorithm is the Averaged Stochastic Gradient Descent (ASGD). It adds an averaging step after standard SGD procedure in every iteration. By using the learning rate not decreasing slower than t^{−1}, it is proved to be asymptotically efficient. But in practice, ASGD encounters some problems: (i) it requires large samples because of slow averaging or improper learning rate, (ii) it is unstable to high dimensionality. In this thesis, we study asymptotic properties of SGD and ASGD for linear regression models in both theory and simulation. We find that though ASGD is more sensitive to SGD in some situations, and a careful choice of learning rate could improve the results. Moreover, increasing training samples also improves the performances for ASGD.
參考文獻	Léon Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010, pages 177–186. Springer, 2010. Léon Bottou. Stochastic gradient descent tricks. In Neural networks: Tricks of the trade, pages 421–436. Springer, 2012. Léon Bottou, Frank E Curtis, and Jorge Nocedal. Optimization methods for largescale machine learning. Siam Review, 60(2):223–311, 2018. Alexandre Défossez and Francis Bach. Averaged least-mean-squares: Bias-variance trade-offs and optimal sampling distributions. In Artificial Intelligence and Statistics, pages 205–213. PMLR, 2015. Jack Kiefer and Jacob Wolfowitz. Stochastic estimation of the maximum of a regression function. The Annals of Mathematical Statistics, pages 462–466, 1952. Harold J. (Harold Joseph) Kushner. Stochastic approximation and recursive algorithms and applications Harold J. Kushner, G. George Yin. Applications of mathematics ; 35. Springer, New York, 2003. Eric Moulines and Francis Bach. Non-asymptotic analysis of stochastic approximation algorithms for machine learning. Advances in neural information processing systems, 24, 2011. Noboru Murata. A statistical study of on-line learning. Online Learning and Neural Networks. Cambridge University Press, Cambridge, UK, pages 63–92, 1998. Arkadi Nemirovski, Anatoli Juditsky, Guanghui Lan, and Alexander Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on optimization, 19(4):1574–1609, 2009. Yurii Nesterov. Introductory lectures on convex optimization: A basic course, volume 87. Springer Science & Business Media, 2003. Boris T Polyak. New stochastic approximation type procedures. Automat. i Telemekh, 7(98-107):2, 1990. Boris T Polyak and Anatoli B Juditsky. Acceleration of stochastic approximation by averaging. SIAM journal on control and optimization, 30(4):838–855, 1992. Herbert Robbins and Sutton Monro. A stochastic approximation method. The annals of mathematical statistics, pages 400–407, 1951. Sebastian Ruder. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747, 2016. David Ruppert. Efficient estimators from a slowly convergent Robbins-Monro procedure. School of Oper. Res. and Ind. Eng., Cornell Univ., Ithaca, NY, Tech. Rep, 781, 1988. Panos Toulis and Edoardo M Airoldi. Asymptotic and finite-sample properties of estimators based on stochastic gradients. The Annals of Statistics, 45(4):1694–1727, 2017. Wei Xu. Towards optimal one pass large scale learning with averaged stochastic gradient descent. arXiv preprint arXiv:1107.2490, 2011.
描述	碩士國立政治大學統計學系 109354005
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0109354005
資料類型	thesis

dc.contributor.advisor	翁久幸	zh_TW
dc.contributor.advisor	Weng, Chiu-Hsing	en_US
dc.contributor.author (Authors)	李承志	zh_TW
dc.contributor.author (Authors)	Li, Cheng-Jhih	en_US
dc.creator (作者)	李承志	zh_TW
dc.creator (作者)	Li, Cheng-Jhih	en_US
dc.date (日期)	2022	en_US
dc.date.accessioned	1-Aug-2022 17:14:56 (UTC+8)	-
dc.date.available	1-Aug-2022 17:14:56 (UTC+8)	-
dc.date.issued (上傳時間)	1-Aug-2022 17:14:56 (UTC+8)	-
dc.identifier (Other Identifiers)	G0109354005	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/141004	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	統計學系	zh_TW
dc.description (描述)	109354005	zh_TW
dc.description.abstract (摘要)	隨機梯度下降法 (Stochastic Gradient Descent; SGD) 是一個常見的最佳化問題解決方法。因為在計算上只需要使用到損失函數的一階微分，梯度下降法有很好的計算效率。然而，當學習率中的比例常數設置不當時，SGD 將變得不穩定。因此，選擇合適的學習率是梯度下降更新中一個重要的課題。另一種與 SGD 相關的演算法是平均梯度下降法 (Averaged Stochastic Gradient Descent; ASGD) 。ASGD 是在每次的 SGD 迭代之後增加一個平均步驟。當 ASGD 所使用的學習率遞減速度不低於 t^{−1}，此方法被證明是漸進有效的。但事實上 ASGD 遭遇到一些問題，包含: (i) 由於平均的速度緩慢或學習率不當，ASGD 需要大量的樣本 (ii) 在高維度的問題中較不穩定。在這篇論文中，我們從理論和模擬兩個方面研究在線性迴歸模型中 SGD 和 ASGD 的漸近性質。雖然我們發現 ASGD 在某些情況下比 SGD 更敏感，但是藉由謹慎的學習率選擇可以改善結果。此外，增加訓練樣本也提升 ASGD 的表現。	zh_TW
dc.description.abstract (摘要)	Stochastic Gradient Descent (SGD) is a common solution for optimization problem. It is computationally efficient because the algorithm only relies on the first derivative of the loss function. However, it is known that SGD is lack of robustness due to the improper setting of proportionality constant. Therefore, choosing an appropriate learning rate is an important issue for SGD update. Another SGD-related algorithm is the Averaged Stochastic Gradient Descent (ASGD). It adds an averaging step after standard SGD procedure in every iteration. By using the learning rate not decreasing slower than t^{−1}, it is proved to be asymptotically efficient. But in practice, ASGD encounters some problems: (i) it requires large samples because of slow averaging or improper learning rate, (ii) it is unstable to high dimensionality. In this thesis, we study asymptotic properties of SGD and ASGD for linear regression models in both theory and simulation. We find that though ASGD is more sensitive to SGD in some situations, and a careful choice of learning rate could improve the results. Moreover, increasing training samples also improves the performances for ASGD.	en_US
dc.description.tableofcontents	1 Introduction . . . . . . 1 2 Literature Review . . . . . . 3 2.1 Learning algorithms . . . . . . 3 2.2 Number of pass and data setting . . . . . . 4 2.3 Strong convexity and Lipschitz condition . . . . . . 5 2.4 Some bounds . . . . . . 5 3 Asymptotic Analysis for Linear Regression . . . . . . 9 4 Simulations . . . . . . 12 4.1 1-Dimensional Case . . . . . . 12 4.2 5-Dimensional Case . . . . . . 16 4.3 20-Dimensional Case . . . . . . 20 5 Conclusion & Future Work . . . . . . 25 A Proof of Theorem 2.2 Case α = 1 . . . . . . 28 B Additional Experiments . . . . . . 31	zh_TW
dc.format.extent	8825016 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0109354005	en_US
dc.subject (關鍵詞)	隨機梯度下降法	zh_TW
dc.subject (關鍵詞)	平均隨機梯度下降法	zh_TW
dc.subject (關鍵詞)	線性迴歸模型	zh_TW
dc.subject (關鍵詞)	漸進分析	zh_TW
dc.subject (關鍵詞)	學習率	zh_TW
dc.subject (關鍵詞)	維度	zh_TW
dc.subject (關鍵詞)	Stochastic Gradient Descent	en_US
dc.subject (關鍵詞)	Averaged Stochastic Gradient Descent	en_US
dc.subject (關鍵詞)	Linear Regression	en_US
dc.subject (關鍵詞)	Asymptotic Analysis	en_US
dc.subject (關鍵詞)	Learning Rate	en_US
dc.subject (關鍵詞)	Dimensionality	en_US
dc.title (題名)	線性迴歸模型的隨機梯度下降估計之漸近分析	zh_TW
dc.title (題名)	Asymptotic Analysis of Stochastic Gradient-Based Estimators for Linear Regression Models	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	Léon Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010, pages 177–186. Springer, 2010. Léon Bottou. Stochastic gradient descent tricks. In Neural networks: Tricks of the trade, pages 421–436. Springer, 2012. Léon Bottou, Frank E Curtis, and Jorge Nocedal. Optimization methods for largescale machine learning. Siam Review, 60(2):223–311, 2018. Alexandre Défossez and Francis Bach. Averaged least-mean-squares: Bias-variance trade-offs and optimal sampling distributions. In Artificial Intelligence and Statistics, pages 205–213. PMLR, 2015. Jack Kiefer and Jacob Wolfowitz. Stochastic estimation of the maximum of a regression function. The Annals of Mathematical Statistics, pages 462–466, 1952. Harold J. (Harold Joseph) Kushner. Stochastic approximation and recursive algorithms and applications Harold J. Kushner, G. George Yin. Applications of mathematics ; 35. Springer, New York, 2003. Eric Moulines and Francis Bach. Non-asymptotic analysis of stochastic approximation algorithms for machine learning. Advances in neural information processing systems, 24, 2011. Noboru Murata. A statistical study of on-line learning. Online Learning and Neural Networks. Cambridge University Press, Cambridge, UK, pages 63–92, 1998. Arkadi Nemirovski, Anatoli Juditsky, Guanghui Lan, and Alexander Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on optimization, 19(4):1574–1609, 2009. Yurii Nesterov. Introductory lectures on convex optimization: A basic course, volume 87. Springer Science & Business Media, 2003. Boris T Polyak. New stochastic approximation type procedures. Automat. i Telemekh, 7(98-107):2, 1990. Boris T Polyak and Anatoli B Juditsky. Acceleration of stochastic approximation by averaging. SIAM journal on control and optimization, 30(4):838–855, 1992. Herbert Robbins and Sutton Monro. A stochastic approximation method. The annals of mathematical statistics, pages 400–407, 1951. Sebastian Ruder. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747, 2016. David Ruppert. Efficient estimators from a slowly convergent Robbins-Monro procedure. School of Oper. Res. and Ind. Eng., Cornell Univ., Ithaca, NY, Tech. Rep, 781, 1988. Panos Toulis and Edoardo M Airoldi. Asymptotic and finite-sample properties of estimators based on stochastic gradients. The Annals of Statistics, 45(4):1694–1727, 2017. Wei Xu. Towards optimal one pass large scale learning with averaged stochastic gradient descent. arXiv preprint arXiv:1107.2490, 2011.	zh_TW
dc.identifier.doi (DOI)	10.6814/NCCU202201003	en_US

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM