隨機梯度下降法的學習率與收斂探討 | 學術產出

學術產出-學位論文

文章檢視/開啟

pdf(58)

書目匯出

Google Scholar^TM

政大圖書館

學術資源探索系統

引文資訊

資料載入中...

資料載入中...

TAIR相關學術產出

Simple Record
Full Record

題名	隨機梯度下降法的學習率與收斂探討 On learning rate and convergence of stochastic gradient descent methods
作者	陳建佑
貢獻者	翁久幸<br>林士貴陳建佑
關鍵詞	隨機梯度下降法平均隨機梯度下降法批次隨機梯度下降法線性模型順序回歸矩陣分解 Stochatic Gradient Descent Average Stochatic Gradient Descent Mini-Batch Stochastic Gradient Descent Linear model Ordinal Regression Matrix Factorization
日期	2021
上傳時間	4-八月-2021 14:41:46 (UTC+8)
摘要	隨機梯度下降法(Stochastic gradient descent；SGD)，因其計算上只需使用到一次微分，在計算上較為簡易且快速，被廣泛應用於巨量資料及深度學習模型等的參數估計中。SGD的表現與學習率的設定息息相關，許多專家學者對學習率進行討論。本文透過模擬實驗，探討線性模型及順序變量的回歸模型中，多種學習率的設定與收斂情況之關係，最後將前述模擬的結果應用於結合順序回歸與矩陣分解法的推薦系統模型。由模擬實驗中觀察到學習率的設置不佳將影響理想收斂結果，於是提出新的學習率以獲得穩定結果。在後續的模擬實驗中亦驗證擁有穩定學習率衰退的隨機梯度下降法通常會得到較好的表現。最後利用此學習率設定進行實際資料試驗，亦獲得不錯之結果。 Stochastic gradient descent (SGD) is widely used for parameter estimation in big-data and deep-learning models. It is appealing because its requires only the first derivatives of the function. As the performance of SGD can be affected the learning rate, there were numerous studies about this issue. In this thesis, we discussed the parameter estimation and convergence of SGD for linear models and ordinal regression models through extensive simulation studies. Our simulation showed that improper learning rates can lead to poor convergence. So, we proposed a learning rate and found it performed well in linear models. Then, based on simulation results, we selected appropriate learning rates and employed it to a recommendation system model. Finally, we considered a real dataset and the results were reasonably well.
參考文獻	[1] 陳冠廷（2020）。隨機梯度下降法對於順序迴歸模型估計之收斂研究及推薦系統應用。國立政治大學統計學系碩士論文，台北市。取自https://hdl.handle.net/11296/4c3be8 [2] Agresti, A. (2010). Analysis of ordinal categorical data (Vol. 656). John Wiley & Sons. [3] Amari, S. I., Park, H., & Fukumizu, K. (2000). Adaptive method of realizing natural gradient learning for multilayer perceptrons. Neural computation, 12(6), 1399-1409. [4] Dean, J., Corrado, G. S., Monga, R., Chen, K., Devin, M., Le, Q. V., ... & Ng, A. Y. (2012). Large scale distributed deep networks. [5] Funk, S. (2006). Netflix update: Try this at home. Retrived from https://sifter.org/simon/journal/20061211.html [6] Koren, Y. (2008, August). Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 426-434). [7] Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer, 42(8), 30-37. [8] Koren, Y., & Sill, J. (2011, October). Ordrec: an ordinal model for predicting personalized item rating distributions. In Proceedings of the fifth ACM conference on Recommender systems (pp. 117-124). [9] Kiefer, J., & Wolfowitz, J. (1952). Stochastic estimation of the maximum of a regression function. The Annals of Mathematical Statistics, 462-466. [10] McCullagh, P. (1980). Regression models for ordinal data. Journal of the Royal Statistical Society: Series B (Methodological), 42(2), 109-127. [11] L´eon Bottou and Olivier Bousquet. The tradeoffs of large scale learning. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 161–168. MIT Press, Cambridge, MA, 2008. [12] Polyak, B. T., & Juditsky, A. B. (1992). Acceleration of stochastic approximation by averaging. SIAM journal on control and optimization, 30(4), 838-855. [13] Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 400-407. [14] Toulis, P., & Airoldi, E. M. (2017). Asymptotic and finite-sample properties of estimators based on stochastic gradients. Annals of Statistics, 45(4), 1694-1727. [15] Xu, W. (2011). Towards optimal one pass large scale learning with averaged stochastic gradient descent. arXiv preprint arXiv:1107.2490. [16] Zhang, T. (2004, July). Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the twenty-first international conference on Machine learning (p. 116).
描述	碩士國立政治大學統計學系 108354011
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0108354011
資料類型	thesis

dc.contributor.advisor	翁久幸<br>林士貴	zh_TW
dc.contributor.author (作者)	陳建佑	zh_TW
dc.creator (作者)	陳建佑	zh_TW
dc.date (日期)	2021	en_US
dc.date.accessioned	4-八月-2021 14:41:46 (UTC+8)	-
dc.date.available	4-八月-2021 14:41:46 (UTC+8)	-
dc.date.issued (上傳時間)	4-八月-2021 14:41:46 (UTC+8)	-
dc.identifier (其他識別碼)	G0108354011	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/136317	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	統計學系	zh_TW
dc.description (描述)	108354011	zh_TW
dc.description.abstract (摘要)	隨機梯度下降法(Stochastic gradient descent；SGD)，因其計算上只需使用到一次微分，在計算上較為簡易且快速，被廣泛應用於巨量資料及深度學習模型等的參數估計中。SGD的表現與學習率的設定息息相關，許多專家學者對學習率進行討論。本文透過模擬實驗，探討線性模型及順序變量的回歸模型中，多種學習率的設定與收斂情況之關係，最後將前述模擬的結果應用於結合順序回歸與矩陣分解法的推薦系統模型。由模擬實驗中觀察到學習率的設置不佳將影響理想收斂結果，於是提出新的學習率以獲得穩定結果。在後續的模擬實驗中亦驗證擁有穩定學習率衰退的隨機梯度下降法通常會得到較好的表現。最後利用此學習率設定進行實際資料試驗，亦獲得不錯之結果。	zh_TW
dc.description.abstract (摘要)	Stochastic gradient descent (SGD) is widely used for parameter estimation in big-data and deep-learning models. It is appealing because its requires only the first derivatives of the function. As the performance of SGD can be affected the learning rate, there were numerous studies about this issue. In this thesis, we discussed the parameter estimation and convergence of SGD for linear models and ordinal regression models through extensive simulation studies. Our simulation showed that improper learning rates can lead to poor convergence. So, we proposed a learning rate and found it performed well in linear models. Then, based on simulation results, we selected appropriate learning rates and employed it to a recommendation system model. Finally, we considered a real dataset and the results were reasonably well.	en_US
dc.description.tableofcontents	第一章緒論 1 第二章文獻探討 3 第三章研究方法 4 3.1 梯度下降及相關之演算法 4 3.2 SGD及ASGD估計之變異 6 3.2.1 ASGD於線性模型之變異 6 3.2.2 SGD估計之變異 9 3.3 順序回歸模型 10 3.4 SVD OrdRec & SVD++ OrdRec Model 11 3.5 推薦系統模型評分指標 14 第四章實驗結果 16 4.1 模擬研究 16 4.1.1 Finite Data之SGD、ASGD估計準確度及估計變異 16 4.1.2 Stream Data之SGD、ASGD估計準確度及估計變異 32 4.1.3 順序回歸參數估計 40 4.1.4 SVD OrdRec model及SVD++ OrdRec model參數估計 45 4.2 實際資料驗證 51 4.2.1 資料介紹 51 4.2.2 訓練資料切分 52 4.2.3 參數設定及結果比較 53 第五章結論 55 附錄1 – 實際資料參數設定 57 參考文獻 58	zh_TW
dc.format.extent	1503629 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0108354011	en_US
dc.subject (關鍵詞)	隨機梯度下降法	zh_TW
dc.subject (關鍵詞)	平均隨機梯度下降法	zh_TW
dc.subject (關鍵詞)	批次隨機梯度下降法	zh_TW
dc.subject (關鍵詞)	線性模型	zh_TW
dc.subject (關鍵詞)	順序回歸	zh_TW
dc.subject (關鍵詞)	矩陣分解	zh_TW
dc.subject (關鍵詞)	Stochatic Gradient Descent	en_US
dc.subject (關鍵詞)	Average Stochatic Gradient Descent	en_US
dc.subject (關鍵詞)	Mini-Batch Stochastic Gradient Descent	en_US
dc.subject (關鍵詞)	Linear model	en_US
dc.subject (關鍵詞)	Ordinal Regression	en_US
dc.subject (關鍵詞)	Matrix Factorization	en_US
dc.title (題名)	隨機梯度下降法的學習率與收斂探討	zh_TW
dc.title (題名)	On learning rate and convergence of stochastic gradient descent methods	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	[1] 陳冠廷（2020）。隨機梯度下降法對於順序迴歸模型估計之收斂研究及推薦系統應用。國立政治大學統計學系碩士論文，台北市。取自https://hdl.handle.net/11296/4c3be8 [2] Agresti, A. (2010). Analysis of ordinal categorical data (Vol. 656). John Wiley & Sons. [3] Amari, S. I., Park, H., & Fukumizu, K. (2000). Adaptive method of realizing natural gradient learning for multilayer perceptrons. Neural computation, 12(6), 1399-1409. [4] Dean, J., Corrado, G. S., Monga, R., Chen, K., Devin, M., Le, Q. V., ... & Ng, A. Y. (2012). Large scale distributed deep networks. [5] Funk, S. (2006). Netflix update: Try this at home. Retrived from https://sifter.org/simon/journal/20061211.html [6] Koren, Y. (2008, August). Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 426-434). [7] Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer, 42(8), 30-37. [8] Koren, Y., & Sill, J. (2011, October). Ordrec: an ordinal model for predicting personalized item rating distributions. In Proceedings of the fifth ACM conference on Recommender systems (pp. 117-124). [9] Kiefer, J., & Wolfowitz, J. (1952). Stochastic estimation of the maximum of a regression function. The Annals of Mathematical Statistics, 462-466. [10] McCullagh, P. (1980). Regression models for ordinal data. Journal of the Royal Statistical Society: Series B (Methodological), 42(2), 109-127. [11] L´eon Bottou and Olivier Bousquet. The tradeoffs of large scale learning. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 161–168. MIT Press, Cambridge, MA, 2008. [12] Polyak, B. T., & Juditsky, A. B. (1992). Acceleration of stochastic approximation by averaging. SIAM journal on control and optimization, 30(4), 838-855. [13] Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 400-407. [14] Toulis, P., & Airoldi, E. M. (2017). Asymptotic and finite-sample properties of estimators based on stochastic gradients. Annals of Statistics, 45(4), 1694-1727. [15] Xu, W. (2011). Towards optimal one pass large scale learning with averaged stochastic gradient descent. arXiv preprint arXiv:1107.2490. [16] Zhang, T. (2004, July). Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the twenty-first international conference on Machine learning (p. 116).	zh_TW
dc.identifier.doi (DOI)	10.6814/NCCU202100823	en_US

學術產出-學位論文

文章檢視/開啟

書目匯出

Google ScholarTM

政大圖書館

引文資訊

TAIR相關學術產出

Google Scholar^TM