學術產出-學位論文
文章檢視/開啟
書目匯出
-
題名 應用PPO深度強化學習演算法於投資 : 組合之資產配置優化
Applying Deep Reinforcement Learning Algorithm PPO for Portfolio Optimization作者 林上人
Lin, Shang-Jen貢獻者 胡毓忠
Hu, Yuh-Jong
林上人
Lin, Shang-Jen關鍵詞 深度強化學習
投資組合
資產配置
機器人理財
Deep Reinforcement Learning
Proximal Policy Optimization
Portfolio Management
Asset allocation
Robo-Advisor日期 2020 上傳時間 2-三月-2020 11:38:14 (UTC+8) 摘要 本研究結合深度強化學習和金融科技,探討深度強化學習技術於資產配置議題上的效益,希望建構的模型能同時擁有判斷及學習資產配置優化的能力,因此透過強化學習體現學習的過程,並以深度學習的特徵學習技術加強判斷的能力。利用PPO深度強化學習演算法與GRU循環神經網路的結合來針對路孚特資料庫進行資產配置,最終目標是結合資料、判斷及學習此三項要素產生一個智慧理財軟體代理者,依照經驗與歷史資料來判斷是否要進行投資,並決定資產分配的結果。藉此驗證PPO是否可有效配置資產並提高資產總價值。本研究在比較每日交易與每30日交易兩種情況時,每日交易會因導致手續費過高進而使報酬遠低於每30日交易,因此固定為每30日進行交易。接著透過調整GRU使用層數與修改數據組成天數進行研究,利用2006年到2016年的股票資料訓練模型,並使用2017到2018年的股票資料做測試。過程中發現在本實驗的實驗設定之下,產生的手續費對報酬的變化影響幅度不足以讓智慧理財軟體代理者因此學到需要考量降低手續費的投資策略,且初期投資資金大小設定讓智慧理財軟體代理者分配的資金大多時候皆不足以買入一張高股價之股票,導致持股變化多集中在股價低的股票。實驗最終得到每30天交易一次、單純使用PPO並且每個資料由7天組成的參數組合能夠得到相對較穩定,表現較好的智慧理財軟體代理者模型,並獲得7.39%的年化報酬率。
This research integrates DRL and FinTech to discuss the benefit of the portfolio with DRL. Hoping to build the model with judgment and learning ability, therefore to practice the process of learning with reinforcement learning and strengthen judgment with deep learning. Using the algorithm of PPO and GRU recurrent neural networks to combine the Refinitiv Database for portfolio optimization. Combine database, judgment and learning, three features to make the smart financial software agent, according to the experience and historical data to decide to invest or not and come up with the asset allocation ratio.To verify the effect of PPO on portfolio optimization and increase the value of total assets.In this study, when comparing the daily transaction with the transaction every 30 days, the daily transactions would cause excessive fees and make the return far lower than every 30-day transaction. Therefore, the transaction was fixed at every 30 days. Next, the research was conducted by adjusting the number of GRU layers and modifying the data composition days.Training the model with 2006 to 2016 stock information then use 2017 to 2018 for testing.In the process, it was found that the amplitude of change on reward caused by the commission under the experimental conditions is not enough to allow the smart financial software agent to learn the investment strategy that would reduce the commission. Besides, the funds that the smart financial commissioner to allocate are often insufficient to buy a lot of high-priced stock. As a result, the shareholdings changes focused on stocks with a low price.The experiment finally obtained a parameter combination consisting of trading one time every 30 days, using only PPO, and each data consisting of 7 days, which can obtain a relatively stable and well-performing smart financial commissioner model. This smart financial commissioner obtained 7.39% annualized rate of return at the end.參考文獻 [1] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, "Playing atari with deep reinforcement learning," arXiv preprint arXiv:1312.5602, 2013.[2] R. Evans and J. Gao, "DeepMind AI reduces Google data centre cooling bill by 40%," Website, 2016, https://deepmind.com/blog/ deepmind-ai-reduces-google-data-centre-cooling-bill-40/.[3]BBC News, "Artificial intelligence: Google`s AlphaGo beats go master Lee Sedol," Website, 2016, https://www.bbc.com/news/technology-35785875.[4] R. A. Ferri, All About Asset Allocation. McGraw-Hill New York, 2010.[5]G. P. Brinson, B. D. Singer, and G. L. Beebower, "Determinants of portfolio performance ii: An update," Financial Analysts Journal, vol. 47, no. 3, pp. 40-48, 1991.[6] H. Markowitz, "Portfolio Selection," The Journal of Finance, vol. 7, no. 1, pp. 77-91, 1952.[7] A. F. Perold and W. F. Sharpe, "Dynamic strategies for asset allocation," Financial Analysts Journal, vol. 44, no. 1, pp. 16-27, 1988.[8] J. C. Singleton, Core-satellite portfolio management. McGraw Hill Professional, 2004.[9] Y. Bengio, "Using a financial training criterion rather than a prediction criterion," International Journal of Neural Systems, vol. 8, no. 04, pp. 433-443, 1997.[10] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997.[11] K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bandanau, F. Bougares, H. Schwenk, and Y. Bengio, "Learning phrase representations using rnn encoder-decoder for statistical machine translation," arXiv preprint arXiv:1406.1078, 2014.[12] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, "Empirical evaluation of gated recurrent neural networks on sequence modeling," arXiv preprint arXiv: 1412.3555, 2014.[13] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.[14] C. J. Watkins and P. Dayan, "Q-learning," Machine Learning, vol. 8, no. 3-4, pp. 279-292, 1992.[15] R. J. Williams, "Simple statistical gradient-following algorithms for connectionist reinforcement learning," Machine Learning, vol. 8, no. 3-4, pp. 229-256, 1992.[16] R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation," in Advances in Neural Information Processing Systems, 2000, pp. 1057-1063.[17] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, "Asynchronous methods for deep reinforcement learning," in International Conference on Machine Learning, 2016, pp. 1928-1937.[18] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, "Continuous control with deep reinforcement learning," arXiv preprint arXiv:1509.02971, 2015.[19] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and 0. Klimov, "Proximal policy optimization algorithms," arXiv preprint arXiv:1707.068.47,2017.[20] N. Heess, S. Sriram, J. Lemmon, J. Merel, G. Wayne, Y. Tassa, T. Erez, Z. Wang, S. Eslami, M. Riedmiller et al., "Emergence of locomotion behaviors in rich environments," arXiv preprint arXiv:1707.02286, 2017.[21] J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, "Trust region policy optimization," in International conference on machine learning, 2015, pp. 1889-1897.[22] T. Rollinger and S. Hoffman, "Sortino ratio: A better measure of risk," 2013.[23] J. Moody and M. Saffell, "Learning to trade via direct reinforcement," IEEE Transactions on Neural Networks, vol. 12, no. 4, pp. 875-889, 2001.[24] Y. Deng, F. Bao, Y. Kong, Z. Ren, and Q. Dai, "Deep direct reinforcement learning for financial signal representation and trading," IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 3, pp. 653-664, 2016.[25] Z. Xiong, X.-Y. Liu, S. Zhong, A. Walid, et al., "Practical deep reinforcement learning approach for stock trading," arXiv preprint arXiv:1811.07522, 2018.[26] Z. Liang, K. Jiang, H. Chen, J. Zhu, and Y. Li, "Deep reinforcement learning in portfolio management," arXiv preprint arXiv:1808.09940, 2018.[27] M. Hausknecht and P. Stone, "Deep recurrent q-learning for partially observable mdps," in 2015 AAAI Fall Symposium Series, 2015.[28] C. Y. Huang, "Financial trading as a game: A deep reinforcement learning approach," arXiv preprint arXiv:1807.02787, 2018. 描述 碩士
國立政治大學
資訊科學系碩士在職專班
106971001資料來源 http://thesis.lib.nccu.edu.tw/record/#G0106971001 資料類型 thesis dc.contributor.advisor 胡毓忠 zh_TW dc.contributor.advisor Hu, Yuh-Jong en_US dc.contributor.author (作者) 林上人 zh_TW dc.contributor.author (作者) Lin, Shang-Jen en_US dc.creator (作者) 林上人 zh_TW dc.creator (作者) Lin, Shang-Jen en_US dc.date (日期) 2020 en_US dc.date.accessioned 2-三月-2020 11:38:14 (UTC+8) - dc.date.available 2-三月-2020 11:38:14 (UTC+8) - dc.date.issued (上傳時間) 2-三月-2020 11:38:14 (UTC+8) - dc.identifier (其他 識別碼) G0106971001 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/128992 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊科學系碩士在職專班 zh_TW dc.description (描述) 106971001 zh_TW dc.description.abstract (摘要) 本研究結合深度強化學習和金融科技,探討深度強化學習技術於資產配置議題上的效益,希望建構的模型能同時擁有判斷及學習資產配置優化的能力,因此透過強化學習體現學習的過程,並以深度學習的特徵學習技術加強判斷的能力。利用PPO深度強化學習演算法與GRU循環神經網路的結合來針對路孚特資料庫進行資產配置,最終目標是結合資料、判斷及學習此三項要素產生一個智慧理財軟體代理者,依照經驗與歷史資料來判斷是否要進行投資,並決定資產分配的結果。藉此驗證PPO是否可有效配置資產並提高資產總價值。本研究在比較每日交易與每30日交易兩種情況時,每日交易會因導致手續費過高進而使報酬遠低於每30日交易,因此固定為每30日進行交易。接著透過調整GRU使用層數與修改數據組成天數進行研究,利用2006年到2016年的股票資料訓練模型,並使用2017到2018年的股票資料做測試。過程中發現在本實驗的實驗設定之下,產生的手續費對報酬的變化影響幅度不足以讓智慧理財軟體代理者因此學到需要考量降低手續費的投資策略,且初期投資資金大小設定讓智慧理財軟體代理者分配的資金大多時候皆不足以買入一張高股價之股票,導致持股變化多集中在股價低的股票。實驗最終得到每30天交易一次、單純使用PPO並且每個資料由7天組成的參數組合能夠得到相對較穩定,表現較好的智慧理財軟體代理者模型,並獲得7.39%的年化報酬率。 zh_TW dc.description.abstract (摘要) This research integrates DRL and FinTech to discuss the benefit of the portfolio with DRL. Hoping to build the model with judgment and learning ability, therefore to practice the process of learning with reinforcement learning and strengthen judgment with deep learning. Using the algorithm of PPO and GRU recurrent neural networks to combine the Refinitiv Database for portfolio optimization. Combine database, judgment and learning, three features to make the smart financial software agent, according to the experience and historical data to decide to invest or not and come up with the asset allocation ratio.To verify the effect of PPO on portfolio optimization and increase the value of total assets.In this study, when comparing the daily transaction with the transaction every 30 days, the daily transactions would cause excessive fees and make the return far lower than every 30-day transaction. Therefore, the transaction was fixed at every 30 days. Next, the research was conducted by adjusting the number of GRU layers and modifying the data composition days.Training the model with 2006 to 2016 stock information then use 2017 to 2018 for testing.In the process, it was found that the amplitude of change on reward caused by the commission under the experimental conditions is not enough to allow the smart financial software agent to learn the investment strategy that would reduce the commission. Besides, the funds that the smart financial commissioner to allocate are often insufficient to buy a lot of high-priced stock. As a result, the shareholdings changes focused on stocks with a low price.The experiment finally obtained a parameter combination consisting of trading one time every 30 days, using only PPO, and each data consisting of 7 days, which can obtain a relatively stable and well-performing smart financial commissioner model. This smart financial commissioner obtained 7.39% annualized rate of return at the end. en_US dc.description.tableofcontents 目錄 iv表目錄 vi圖目錄 vii第一章 導論 11.1 研究動機 11.2 研究目的 11.3 研究價值 2第二章 研究背景 32.1 資產配置 32.1.1 資產配置的意義 32.1.2 資產配置的考量因子 42.1.3 資產配置的投資策略 42.2 深度學習應用於資產配置 52.3 強化學習應用於資產配置 72.4 深度強化學習應用於資產配置 112.5 財經資料庫 142.5.1 Datastream 142.5.2 元大台灣卓越50 14第三章 相關研究 15第四章 研究架構與方法 184.1 研究實驗流程 184.2 資料蒐集 204.3 定義金融市場模型 204.4 定義智慧理財軟體代理者模型 224.5 定義模型測試方法 22第五章 研究實作與比較 245.1 資料前處理 245.1.1 股票選擇 245.1.2 資料欄位組合 265.2 模型訓練 265.2.1 交易頻率選定 265.2.2 調整GRU使用層數 275.2.3 修改數據組成天數 285.3 模型測試 295.3.1 GRU層數不同之模型測試 295.3.2 數據組成天數不同之模型測試 31第六章 結論與未來展望 346.1 結論 346.2 未來展望 35參考文獻 36 zh_TW dc.format.extent 2690946 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0106971001 en_US dc.subject (關鍵詞) 深度強化學習 zh_TW dc.subject (關鍵詞) 投資組合 zh_TW dc.subject (關鍵詞) 資產配置 zh_TW dc.subject (關鍵詞) 機器人理財 zh_TW dc.subject (關鍵詞) Deep Reinforcement Learning en_US dc.subject (關鍵詞) Proximal Policy Optimization en_US dc.subject (關鍵詞) Portfolio Management en_US dc.subject (關鍵詞) Asset allocation en_US dc.subject (關鍵詞) Robo-Advisor en_US dc.title (題名) 應用PPO深度強化學習演算法於投資 : 組合之資產配置優化 zh_TW dc.title (題名) Applying Deep Reinforcement Learning Algorithm PPO for Portfolio Optimization en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) [1] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, "Playing atari with deep reinforcement learning," arXiv preprint arXiv:1312.5602, 2013.[2] R. Evans and J. Gao, "DeepMind AI reduces Google data centre cooling bill by 40%," Website, 2016, https://deepmind.com/blog/ deepmind-ai-reduces-google-data-centre-cooling-bill-40/.[3]BBC News, "Artificial intelligence: Google`s AlphaGo beats go master Lee Sedol," Website, 2016, https://www.bbc.com/news/technology-35785875.[4] R. A. Ferri, All About Asset Allocation. McGraw-Hill New York, 2010.[5]G. P. Brinson, B. D. Singer, and G. L. Beebower, "Determinants of portfolio performance ii: An update," Financial Analysts Journal, vol. 47, no. 3, pp. 40-48, 1991.[6] H. Markowitz, "Portfolio Selection," The Journal of Finance, vol. 7, no. 1, pp. 77-91, 1952.[7] A. F. Perold and W. F. Sharpe, "Dynamic strategies for asset allocation," Financial Analysts Journal, vol. 44, no. 1, pp. 16-27, 1988.[8] J. C. Singleton, Core-satellite portfolio management. McGraw Hill Professional, 2004.[9] Y. Bengio, "Using a financial training criterion rather than a prediction criterion," International Journal of Neural Systems, vol. 8, no. 04, pp. 433-443, 1997.[10] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997.[11] K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bandanau, F. Bougares, H. Schwenk, and Y. Bengio, "Learning phrase representations using rnn encoder-decoder for statistical machine translation," arXiv preprint arXiv:1406.1078, 2014.[12] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, "Empirical evaluation of gated recurrent neural networks on sequence modeling," arXiv preprint arXiv: 1412.3555, 2014.[13] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.[14] C. J. Watkins and P. Dayan, "Q-learning," Machine Learning, vol. 8, no. 3-4, pp. 279-292, 1992.[15] R. J. Williams, "Simple statistical gradient-following algorithms for connectionist reinforcement learning," Machine Learning, vol. 8, no. 3-4, pp. 229-256, 1992.[16] R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation," in Advances in Neural Information Processing Systems, 2000, pp. 1057-1063.[17] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, "Asynchronous methods for deep reinforcement learning," in International Conference on Machine Learning, 2016, pp. 1928-1937.[18] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, "Continuous control with deep reinforcement learning," arXiv preprint arXiv:1509.02971, 2015.[19] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and 0. Klimov, "Proximal policy optimization algorithms," arXiv preprint arXiv:1707.068.47,2017.[20] N. Heess, S. Sriram, J. Lemmon, J. Merel, G. Wayne, Y. Tassa, T. Erez, Z. Wang, S. Eslami, M. Riedmiller et al., "Emergence of locomotion behaviors in rich environments," arXiv preprint arXiv:1707.02286, 2017.[21] J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, "Trust region policy optimization," in International conference on machine learning, 2015, pp. 1889-1897.[22] T. Rollinger and S. Hoffman, "Sortino ratio: A better measure of risk," 2013.[23] J. Moody and M. Saffell, "Learning to trade via direct reinforcement," IEEE Transactions on Neural Networks, vol. 12, no. 4, pp. 875-889, 2001.[24] Y. Deng, F. Bao, Y. Kong, Z. Ren, and Q. Dai, "Deep direct reinforcement learning for financial signal representation and trading," IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 3, pp. 653-664, 2016.[25] Z. Xiong, X.-Y. Liu, S. Zhong, A. Walid, et al., "Practical deep reinforcement learning approach for stock trading," arXiv preprint arXiv:1811.07522, 2018.[26] Z. Liang, K. Jiang, H. Chen, J. Zhu, and Y. Li, "Deep reinforcement learning in portfolio management," arXiv preprint arXiv:1808.09940, 2018.[27] M. Hausknecht and P. Stone, "Deep recurrent q-learning for partially observable mdps," in 2015 AAAI Fall Symposium Series, 2015.[28] C. Y. Huang, "Financial trading as a game: A deep reinforcement learning approach," arXiv preprint arXiv:1807.02787, 2018. zh_TW dc.identifier.doi (DOI) 10.6814/NCCU202000267 en_US