消除深度學習目標函數中局部極小值之研究

Publications-Theses

Article View/Open

pdf(340)

Publication Export

Google Scholar^TM

NCCU Library

Discovery System

Citation Infomation

Related Publications in TAIR

Simple Record
Full Record

題名	消除深度學習目標函數中局部極小值之研究 A Survey on Eliminating Local Minima of the Objective Function in Deep Learning
作者	季佳琪 Chi, Chia-Chi
貢獻者	蔡炎龍 Tsai, Yen-Lung 季佳琪 Chi, Chia-Chi
關鍵詞	深度學習神經網路目標函數損失函數局部極小值 Deep Learning Neural Network Objective Function Loss Function Local Minima
日期	2019
上傳時間	5-Sep-2019 16:13:48 (UTC+8)
摘要	在本文中，我們主要研究消除目標函數的非最佳局部極小值的方法和其中的定理。更具體地說，我們發現，在給定原始神經網絡的情況下，我們可以透過對其添加外加的神經網路層來建構一個修正的神經網絡。在這前提下，如果修正的神經網絡的目標函數達到局部最小值，則原始神經網絡的目標函數將達到絕對最小值。在接下來的內容中，我們首先回顧一些以前的相關文獻、概述何謂深度學習，並證明常見損失函數的凸性以滿足定理的假設。接下來，我們在主要定理中證明了一些細節、討論了此方法的效果，並研究了它的局限性。最後，我們進行了一系列實驗來顯示此方法可以用於實際工作。 In this paper, we mainly survey the method and theorems of eliminating suboptimal local minima of the objective function. More specifically, we find that: given an original neural network, we can construct a modified network by adding external layers to it. Then if the objective function of the modified network achieve a local minimum, the objective function of the original neural network will reach a global minimum. We first review some previous related literature, give an overview of deep learning and then prove the convexity of common loss functions to satisfy the assumptions of theorems. Next, we prove some details in such theorems, discuss the effects of the method, and investigate its limitations. Finally, we perform a series of experiments to show that the method can be used for practical works.
參考文獻	[1] Alexandr Andoni, Rina Panigrahy, Gregory Valiant, and Li Zhang. Learning polynomials with neural networks. In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32, ICML’14, pages II–1908–II–1916. JMLR.org, 2014. [2] Avrim L. Blum and Ronald L. Rivest. Training a 3-node neural network is np-complete. Neural Networks, 5(1):117 – 127, 1992. [3] Alon Brutzkus and Amir Globerson. Globally optimal gradient descent for a convnet with gaussian inputs. 02 2017. [4] Kyunghyun Cho, Bart van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. On the properties of neural machine translation: Encoder–decoder approaches. In Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, pages 103–111, Doha, Qatar, October 2014. Association for Computational Linguistics. [5] Anna Choromanska, Mikael Henaff, Michael Mathieu, Gerard Ben Arous, and Yann LeCun. The loss surfaces of multilayer networks. Journal of Machine Learning Research, 38:192–204, 2015. [6] Simon S. Du and Jason D. Lee. On the power of over-parametrization in neural networks with quadratic activation. CoRR, abs/1803.01206, 2018. [7] Jeffrey L. Elman. Finding structure in time. Cognitive Science, 14(2):179 – 211, 1990. [8] Rong Ge, Jason D. Lee, and Tengyu Ma. Learning one-hidden-layer neural networks with landscape design. CoRR, abs/1711.00501, 2017. [9] Surbhi Goel and Adam R. Klivans. Learning depth-three neural networks in polynomial time. CoRR, abs/1709.06010, 2017. [10] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016. [11] Alex Graves. Generating sequences with recurrent neural networks. arXiv preprint arXiv: 1308.0850, 2013. [12] Moritz Hardt and Tengyu Ma. Identity matters in deep learning. CoRR, abs/1611.04231, 2017. [13] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997. [14] Kenji Kawaguchi. Deep learning without poor local minima. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 586–594. Curran Associates, Inc., 2016. [15] KenjiKawaguchiandYoshuaBengio.Depthwithnonlinearitycreatesnobadlocalminima in resnets. arXiv preprint arXiv:1810.09038, 2018. [16] Kenji Kawaguchi, Jiaoyang Huang, and Leslie Pack Kaelbling. Effect of depth and width on local minima in deep learning. CoRR, abs/1811.08150, 2018. [17] Kenji Kawaguchi and Leslie Pack Kaelbling. Elimination of all bad local minima in deep learning. CoRR, abs/1901.00279, 2019. [18] AlexKrizhevsky,IlyaSutskever,andGeoffreyEHinton.Imagenetclassificationwithdeep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012. [19] Solomon Kullback and Richard A Leibler. On information and sufficiency. The annals of mathematical statistics, 22(1):79–86, 1951. [20] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553): 436, 2015. [21] YannLeCun,LéonBottou,YoshuaBengio,PatrickHaffner,etal.Gradient-basedlearning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. [22] Yuanzhi Li and Yang Yuan. Convergence analysis of two-layer neural networks with relu activation. CoRR, abs/1705.09886, 2017. [23] Shiyu Liang, Ruoyu Sun, Jason D. Lee, and Rayadurgam Srikant. Adding one neuron can eliminate all bad local minima. Advances in Neural Information Processing Systems, 2018-December:4350–4360, 1 2018. [24] Katta G. Murty and Santosh N. Kabadi. Some np-complete problems in quadratic and nonlinear programming. Mathematical Programming, 39(2):117–129, Jun 1987. [25] Quynh Nguyen and Matthias Hein. Optimization landscape and expressivity of deep CNNs. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 3730–3739, Stockholmsmässan, Stockholm Sweden, 10–15 Jul 2018. PMLR. [26] Quynh N. Nguyen and Matthias Hein. The loss surface of deep and wide neural networks. CoRR, abs/1704.08045, 2017. [27] SebastianRuder.Anoverviewofgradientdescentoptimizationalgorithms.arXivpreprint arXiv:1609.04747, 2016. [28] Jürgen Schmidhuber. Deep learning in neural networks: An overview. Neural networks, 61:85–117, 2015. [29] Hanie Sedghi and Anima Anandkumar. Provable methods for training neural networks with sparse connectivity. arXiv preprint arXiv:1412.2693, 2014. [30] Ohad Shamir. Are resnets provably better than linear predictors? CoRR, abs/1804.06739, 2018. [31] ClaudeElwoodShannon.Amathematicaltheoryofcommunication.Bellsystemtechnical journal, 27(3):379–423, 1948. [32] Mahdi Soltanolkotabi. Learning relus via gradient descent. CoRR, abs/1705.04591, 2017. [33] Daniel Soudry and Elad Hoffer. Exponentially vanishing sub-optimal local minima in multilayer neural networks, 2018. [34] Paul J Werbos et al. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10):1550–1560, 1990. [35] Kai Zhong, Zhao Song, Prateek Jain, Peter L. Bartlett, and Inderjit S. Dhillon. Recovery guarantees for one-hidden-layer neural networks. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 4140–4149, International Convention Centre, Sydney, Australia, 06–11 Aug 2017. PMLR.
描述	碩士國立政治大學應用數學系 1057510163
資料來源	http://thesis.lib.nccu.edu.tw/record/#G1057510163
資料類型	thesis

dc.contributor.advisor	蔡炎龍	zh_TW
dc.contributor.advisor	Tsai, Yen-Lung	en_US
dc.contributor.author (Authors)	季佳琪	zh_TW
dc.contributor.author (Authors)	Chi, Chia-Chi	en_US
dc.creator (作者)	季佳琪	zh_TW
dc.creator (作者)	Chi, Chia-Chi	en_US
dc.date (日期)	2019	en_US
dc.date.accessioned	5-Sep-2019 16:13:48 (UTC+8)	-
dc.date.available	5-Sep-2019 16:13:48 (UTC+8)	-
dc.date.issued (上傳時間)	5-Sep-2019 16:13:48 (UTC+8)	-
dc.identifier (Other Identifiers)	G1057510163	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/125637	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	應用數學系	zh_TW
dc.description (描述)	1057510163	zh_TW
dc.description.abstract (摘要)	在本文中，我們主要研究消除目標函數的非最佳局部極小值的方法和其中的定理。更具體地說，我們發現，在給定原始神經網絡的情況下，我們可以透過對其添加外加的神經網路層來建構一個修正的神經網絡。在這前提下，如果修正的神經網絡的目標函數達到局部最小值，則原始神經網絡的目標函數將達到絕對最小值。在接下來的內容中，我們首先回顧一些以前的相關文獻、概述何謂深度學習，並證明常見損失函數的凸性以滿足定理的假設。接下來，我們在主要定理中證明了一些細節、討論了此方法的效果，並研究了它的局限性。最後，我們進行了一系列實驗來顯示此方法可以用於實際工作。	zh_TW
dc.description.abstract (摘要)	In this paper, we mainly survey the method and theorems of eliminating suboptimal local minima of the objective function. More specifically, we find that: given an original neural network, we can construct a modified network by adding external layers to it. Then if the objective function of the modified network achieve a local minimum, the objective function of the original neural network will reach a global minimum. We first review some previous related literature, give an overview of deep learning and then prove the convexity of common loss functions to satisfy the assumptions of theorems. Next, we prove some details in such theorems, discuss the effects of the method, and investigate its limitations. Finally, we perform a series of experiments to show that the method can be used for practical works.	en_US
dc.description.tableofcontents	1 Introduction 1 2 Deep Learning 3 2.1 Definition of Deep Learning 3 2.2 Standard Neural Network 4 2.2.1 The Structure of The Neural Network 4 2.2.2 The Operation of The Neural Network 5 2.2.3 Activation Function 6 2.3 Optimization for Training Deep Network 7 2.4 Convolutional Neural Network 10 2.4.1 Definition of Convolution 10 2.4.2 The Structure of The Convolutional Neural Network 12 2.5 Recurrent Neural Network 15 2.5.1 The Structure of The Recurrent Neural Network 16 2.5.2 The Operation of The Recurrent Neural Network 16 3 Model Description 19 3.1 The Architecture 19 3.2 Loss and Objective Functions 21 3.2.1 Construction of Objective Functions 21 3.2.2 Convexity of Loss Functions 21 4 Main Theorems 25 4.1 Lemmas 25 4.2 Theorems 34 5 Effects of Eliminating Local Minima 43 5.1 Effects of All Situations 43 5.2 Examples 44 6 Challenges of Eliminating Local Minima 49 6.1 Theorem 49 6.2 Example 51 7 Experiments 52 7.1 Standard Neural Network Results 52 7.2 Convolutional Neural Network Results 54 8 Conclusion 57 Appendix A Code of The Models 58 A.1 The NN Model 58 A.2 The mNN Model 59 A.3 The CNN Model 62 A.4 The nCNN Model 63 A.5 The mnCNN Model 65 Bibliography 69	zh_TW
dc.format.extent	5183578 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G1057510163	en_US
dc.subject (關鍵詞)	深度學習	zh_TW
dc.subject (關鍵詞)	神經網路	zh_TW
dc.subject (關鍵詞)	目標函數	zh_TW
dc.subject (關鍵詞)	損失函數	zh_TW
dc.subject (關鍵詞)	局部極小值	zh_TW
dc.subject (關鍵詞)	Deep Learning	en_US
dc.subject (關鍵詞)	Neural Network	en_US
dc.subject (關鍵詞)	Objective Function	en_US
dc.subject (關鍵詞)	Loss Function	en_US
dc.subject (關鍵詞)	Local Minima	en_US
dc.title (題名)	消除深度學習目標函數中局部極小值之研究	zh_TW
dc.title (題名)	A Survey on Eliminating Local Minima of the Objective Function in Deep Learning	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	[1] Alexandr Andoni, Rina Panigrahy, Gregory Valiant, and Li Zhang. Learning polynomials with neural networks. In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32, ICML’14, pages II–1908–II–1916. JMLR.org, 2014. [2] Avrim L. Blum and Ronald L. Rivest. Training a 3-node neural network is np-complete. Neural Networks, 5(1):117 – 127, 1992. [3] Alon Brutzkus and Amir Globerson. Globally optimal gradient descent for a convnet with gaussian inputs. 02 2017. [4] Kyunghyun Cho, Bart van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. On the properties of neural machine translation: Encoder–decoder approaches. In Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, pages 103–111, Doha, Qatar, October 2014. Association for Computational Linguistics. [5] Anna Choromanska, Mikael Henaff, Michael Mathieu, Gerard Ben Arous, and Yann LeCun. The loss surfaces of multilayer networks. Journal of Machine Learning Research, 38:192–204, 2015. [6] Simon S. Du and Jason D. Lee. On the power of over-parametrization in neural networks with quadratic activation. CoRR, abs/1803.01206, 2018. [7] Jeffrey L. Elman. Finding structure in time. Cognitive Science, 14(2):179 – 211, 1990. [8] Rong Ge, Jason D. Lee, and Tengyu Ma. Learning one-hidden-layer neural networks with landscape design. CoRR, abs/1711.00501, 2017. [9] Surbhi Goel and Adam R. Klivans. Learning depth-three neural networks in polynomial time. CoRR, abs/1709.06010, 2017. [10] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016. [11] Alex Graves. Generating sequences with recurrent neural networks. arXiv preprint arXiv: 1308.0850, 2013. [12] Moritz Hardt and Tengyu Ma. Identity matters in deep learning. CoRR, abs/1611.04231, 2017. [13] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997. [14] Kenji Kawaguchi. Deep learning without poor local minima. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 586–594. Curran Associates, Inc., 2016. [15] KenjiKawaguchiandYoshuaBengio.Depthwithnonlinearitycreatesnobadlocalminima in resnets. arXiv preprint arXiv:1810.09038, 2018. [16] Kenji Kawaguchi, Jiaoyang Huang, and Leslie Pack Kaelbling. Effect of depth and width on local minima in deep learning. CoRR, abs/1811.08150, 2018. [17] Kenji Kawaguchi and Leslie Pack Kaelbling. Elimination of all bad local minima in deep learning. CoRR, abs/1901.00279, 2019. [18] AlexKrizhevsky,IlyaSutskever,andGeoffreyEHinton.Imagenetclassificationwithdeep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012. [19] Solomon Kullback and Richard A Leibler. On information and sufficiency. The annals of mathematical statistics, 22(1):79–86, 1951. [20] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553): 436, 2015. [21] YannLeCun,LéonBottou,YoshuaBengio,PatrickHaffner,etal.Gradient-basedlearning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. [22] Yuanzhi Li and Yang Yuan. Convergence analysis of two-layer neural networks with relu activation. CoRR, abs/1705.09886, 2017. [23] Shiyu Liang, Ruoyu Sun, Jason D. Lee, and Rayadurgam Srikant. Adding one neuron can eliminate all bad local minima. Advances in Neural Information Processing Systems, 2018-December:4350–4360, 1 2018. [24] Katta G. Murty and Santosh N. Kabadi. Some np-complete problems in quadratic and nonlinear programming. Mathematical Programming, 39(2):117–129, Jun 1987. [25] Quynh Nguyen and Matthias Hein. Optimization landscape and expressivity of deep CNNs. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 3730–3739, Stockholmsmässan, Stockholm Sweden, 10–15 Jul 2018. PMLR. [26] Quynh N. Nguyen and Matthias Hein. The loss surface of deep and wide neural networks. CoRR, abs/1704.08045, 2017. [27] SebastianRuder.Anoverviewofgradientdescentoptimizationalgorithms.arXivpreprint arXiv:1609.04747, 2016. [28] Jürgen Schmidhuber. Deep learning in neural networks: An overview. Neural networks, 61:85–117, 2015. [29] Hanie Sedghi and Anima Anandkumar. Provable methods for training neural networks with sparse connectivity. arXiv preprint arXiv:1412.2693, 2014. [30] Ohad Shamir. Are resnets provably better than linear predictors? CoRR, abs/1804.06739, 2018. [31] ClaudeElwoodShannon.Amathematicaltheoryofcommunication.Bellsystemtechnical journal, 27(3):379–423, 1948. [32] Mahdi Soltanolkotabi. Learning relus via gradient descent. CoRR, abs/1705.04591, 2017. [33] Daniel Soudry and Elad Hoffer. Exponentially vanishing sub-optimal local minima in multilayer neural networks, 2018. [34] Paul J Werbos et al. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10):1550–1560, 1990. [35] Kai Zhong, Zhao Song, Prateek Jain, Peter L. Bartlett, and Inderjit S. Dhillon. Recovery guarantees for one-hidden-layer neural networks. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 4140–4149, International Convention Centre, Sydney, Australia, 06–11 Aug 2017. PMLR.	zh_TW
dc.identifier.doi (DOI)	10.6814/NCCU201900936	en_US

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM