應用PPO深度強化學習演算法於投資 : 組合之資產配置優化

學術產出-學位論文

文章檢視/開啟

pdf(0)

書目匯出

Google Scholar^TM

政大圖書館

學術資源探索系統

引文資訊

TAIR相關學術產出

Simple Record
Full Record

題名	應用PPO深度強化學習演算法於投資 : 組合之資產配置優化 Applying Deep Reinforcement Learning Algorithm PPO for Portfolio Optimization
作者	林上人 Lin, Shang-Jen
貢獻者	胡毓忠 Hu, Yuh-Jong 林上人 Lin, Shang-Jen
關鍵詞	深度強化學習投資組合資產配置機器人理財 Deep Reinforcement Learning Proximal Policy Optimization Portfolio Management Asset allocation Robo-Advisor
日期	2020
上傳時間	2-三月-2020 11:38:14 (UTC+8)
摘要	本研究結合深度強化學習和金融科技，探討深度強化學習技術於資產配置議題上的效益，希望建構的模型能同時擁有判斷及學習資產配置優化的能力，因此透過強化學習體現學習的過程，並以深度學習的特徵學習技術加強判斷的能力。利用PPO深度強化學習演算法與GRU循環神經網路的結合來針對路孚特資料庫進行資產配置，最終目標是結合資料、判斷及學習此三項要素產生一個智慧理財軟體代理者，依照經驗與歷史資料來判斷是否要進行投資，並決定資產分配的結果。藉此驗證PPO是否可有效配置資產並提高資產總價值。本研究在比較每日交易與每30日交易兩種情況時，每日交易會因導致手續費過高進而使報酬遠低於每30日交易，因此固定為每30日進行交易。接著透過調整GRU使用層數與修改數據組成天數進行研究，利用2006年到2016年的股票資料訓練模型，並使用2017到2018年的股票資料做測試。過程中發現在本實驗的實驗設定之下，產生的手續費對報酬的變化影響幅度不足以讓智慧理財軟體代理者因此學到需要考量降低手續費的投資策略，且初期投資資金大小設定讓智慧理財軟體代理者分配的資金大多時候皆不足以買入一張高股價之股票，導致持股變化多集中在股價低的股票。實驗最終得到每30天交易一次、單純使用PPO並且每個資料由7天組成的參數組合能夠得到相對較穩定，表現較好的智慧理財軟體代理者模型，並獲得7.39%的年化報酬率。 This research integrates DRL and FinTech to discuss the benefit of the portfolio with DRL. Hoping to build the model with judgment and learning ability, therefore to practice the process of learning with reinforcement learning and strengthen judgment with deep learning. Using the algorithm of PPO and GRU recurrent neural networks to combine the Refinitiv Database for portfolio optimization. Combine database, judgment and learning, three features to make the smart financial software agent, according to the experience and historical data to decide to invest or not and come up with the asset allocation ratio. To verify the effect of PPO on portfolio optimization and increase the value of total assets. In this study, when comparing the daily transaction with the transaction every 30 days, the daily transactions would cause excessive fees and make the return far lower than every 30-day transaction. Therefore, the transaction was fixed at every 30 days. Next, the research was conducted by adjusting the number of GRU layers and modifying the data composition days. Training the model with 2006 to 2016 stock information then use 2017 to 2018 for testing. In the process, it was found that the amplitude of change on reward caused by the commission under the experimental conditions is not enough to allow the smart financial software agent to learn the investment strategy that would reduce the commission. Besides, the funds that the smart financial commissioner to allocate are often insufficient to buy a lot of high-priced stock. As a result, the shareholdings changes focused on stocks with a low price. The experiment finally obtained a parameter combination consisting of trading one time every 30 days, using only PPO, and each data consisting of 7 days, which can obtain a relatively stable and well-performing smart financial commissioner model. This smart financial commissioner obtained 7.39% annualized rate of return at the end.
參考文獻	[1] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, "Playing atari with deep reinforcement learning," arXiv preprint arXiv:1312.5602, 2013. [2] R. Evans and J. Gao, "DeepMind AI reduces Google data centre cooling bill by 40%," Website, 2016, https://deepmind.com/blog/ deepmind-ai-reduces-google-data-centre-cooling-bill-40/. [3] BBC News, "Artificial intelligence: Google`s AlphaGo beats go master Lee Sedol," Website, 2016, https://www.bbc.com/news/technology-35785875. [4] R. A. Ferri, All About Asset Allocation. McGraw-Hill New York, 2010. [5] G. P. Brinson, B. D. Singer, and G. L. Beebower, "Determinants of portfolio performance ii: An update," Financial Analysts Journal, vol. 47, no. 3, pp. 40-48, 1991. [6] H. Markowitz, "Portfolio Selection," The Journal of Finance, vol. 7, no. 1, pp. 77-91, 1952. [7] A. F. Perold and W. F. Sharpe, "Dynamic strategies for asset allocation," Financial Analysts Journal, vol. 44, no. 1, pp. 16-27, 1988. [8] J. C. Singleton, Core-satellite portfolio management. McGraw Hill Professional, 2004. [9] Y. Bengio, "Using a financial training criterion rather than a prediction criterion," International Journal of Neural Systems, vol. 8, no. 04, pp. 433-443, 1997. [10] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997. [11] K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bandanau, F. Bougares, H. Schwenk, and Y. Bengio, "Learning phrase representations using rnn encoder-decoder for statistical machine translation," arXiv preprint arXiv:1406.1078, 2014. [12] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, "Empirical evaluation of gated recurrent neural networks on sequence modeling," arXiv preprint arXiv: 1412.3555, 2014. [13] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018. [14] C. J. Watkins and P. Dayan, "Q-learning," Machine Learning, vol. 8, no. 3-4, pp. 279-292, 1992. [15] R. J. Williams, "Simple statistical gradient-following algorithms for connectionist reinforcement learning," Machine Learning, vol. 8, no. 3-4, pp. 229-256, 1992. [16] R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation," in Advances in Neural Information Processing Systems, 2000, pp. 1057-1063. [17] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, "Asynchronous methods for deep reinforcement learning," in International Conference on Machine Learning, 2016, pp. 1928-1937. [18] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, "Continuous control with deep reinforcement learning," arXiv preprint arXiv:1509.02971, 2015. [19] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and 0. Klimov, "Proximal policy optimization algorithms," arXiv preprint arXiv:1707.068.47,2017. [20] N. Heess, S. Sriram, J. Lemmon, J. Merel, G. Wayne, Y. Tassa, T. Erez, Z. Wang, S. Eslami, M. Riedmiller et al., "Emergence of locomotion behaviors in rich environments," arXiv preprint arXiv:1707.02286, 2017. [21] J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, "Trust region policy optimization," in International conference on machine learning, 2015, pp. 1889-1897. [22] T. Rollinger and S. Hoffman, "Sortino ratio: A better measure of risk," 2013. [23] J. Moody and M. Saffell, "Learning to trade via direct reinforcement," IEEE Transactions on Neural Networks, vol. 12, no. 4, pp. 875-889, 2001. [24] Y. Deng, F. Bao, Y. Kong, Z. Ren, and Q. Dai, "Deep direct reinforcement learning for financial signal representation and trading," IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 3, pp. 653-664, 2016. [25] Z. Xiong, X.-Y. Liu, S. Zhong, A. Walid, et al., "Practical deep reinforcement learning approach for stock trading," arXiv preprint arXiv:1811.07522, 2018. [26] Z. Liang, K. Jiang, H. Chen, J. Zhu, and Y. Li, "Deep reinforcement learning in portfolio management," arXiv preprint arXiv:1808.09940, 2018. [27] M. Hausknecht and P. Stone, "Deep recurrent q-learning for partially observable mdps," in 2015 AAAI Fall Symposium Series, 2015. [28] C. Y. Huang, "Financial trading as a game: A deep reinforcement learning approach," arXiv preprint arXiv:1807.02787, 2018.
描述	碩士國立政治大學資訊科學系碩士在職專班 106971001
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0106971001
資料類型	thesis

dc.contributor.advisor	胡毓忠	zh_TW
dc.contributor.advisor	Hu, Yuh-Jong	en_US
dc.contributor.author (作者)	林上人	zh_TW
dc.contributor.author (作者)	Lin, Shang-Jen	en_US
dc.creator (作者)	林上人	zh_TW
dc.creator (作者)	Lin, Shang-Jen	en_US
dc.date (日期)	2020	en_US
dc.date.accessioned	2-三月-2020 11:38:14 (UTC+8)	-
dc.date.available	2-三月-2020 11:38:14 (UTC+8)	-
dc.date.issued (上傳時間)	2-三月-2020 11:38:14 (UTC+8)	-
dc.identifier (其他識別碼)	G0106971001	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/128992	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	資訊科學系碩士在職專班	zh_TW
dc.description (描述)	106971001	zh_TW
dc.description.abstract (摘要)	本研究結合深度強化學習和金融科技，探討深度強化學習技術於資產配置議題上的效益，希望建構的模型能同時擁有判斷及學習資產配置優化的能力，因此透過強化學習體現學習的過程，並以深度學習的特徵學習技術加強判斷的能力。利用PPO深度強化學習演算法與GRU循環神經網路的結合來針對路孚特資料庫進行資產配置，最終目標是結合資料、判斷及學習此三項要素產生一個智慧理財軟體代理者，依照經驗與歷史資料來判斷是否要進行投資，並決定資產分配的結果。藉此驗證PPO是否可有效配置資產並提高資產總價值。本研究在比較每日交易與每30日交易兩種情況時，每日交易會因導致手續費過高進而使報酬遠低於每30日交易，因此固定為每30日進行交易。接著透過調整GRU使用層數與修改數據組成天數進行研究，利用2006年到2016年的股票資料訓練模型，並使用2017到2018年的股票資料做測試。過程中發現在本實驗的實驗設定之下，產生的手續費對報酬的變化影響幅度不足以讓智慧理財軟體代理者因此學到需要考量降低手續費的投資策略，且初期投資資金大小設定讓智慧理財軟體代理者分配的資金大多時候皆不足以買入一張高股價之股票，導致持股變化多集中在股價低的股票。實驗最終得到每30天交易一次、單純使用PPO並且每個資料由7天組成的參數組合能夠得到相對較穩定，表現較好的智慧理財軟體代理者模型，並獲得7.39%的年化報酬率。	zh_TW
dc.description.abstract (摘要)	This research integrates DRL and FinTech to discuss the benefit of the portfolio with DRL. Hoping to build the model with judgment and learning ability, therefore to practice the process of learning with reinforcement learning and strengthen judgment with deep learning. Using the algorithm of PPO and GRU recurrent neural networks to combine the Refinitiv Database for portfolio optimization. Combine database, judgment and learning, three features to make the smart financial software agent, according to the experience and historical data to decide to invest or not and come up with the asset allocation ratio. To verify the effect of PPO on portfolio optimization and increase the value of total assets. In this study, when comparing the daily transaction with the transaction every 30 days, the daily transactions would cause excessive fees and make the return far lower than every 30-day transaction. Therefore, the transaction was fixed at every 30 days. Next, the research was conducted by adjusting the number of GRU layers and modifying the data composition days. Training the model with 2006 to 2016 stock information then use 2017 to 2018 for testing. In the process, it was found that the amplitude of change on reward caused by the commission under the experimental conditions is not enough to allow the smart financial software agent to learn the investment strategy that would reduce the commission. Besides, the funds that the smart financial commissioner to allocate are often insufficient to buy a lot of high-priced stock. As a result, the shareholdings changes focused on stocks with a low price. The experiment finally obtained a parameter combination consisting of trading one time every 30 days, using only PPO, and each data consisting of 7 days, which can obtain a relatively stable and well-performing smart financial commissioner model. This smart financial commissioner obtained 7.39% annualized rate of return at the end.	en_US
dc.description.tableofcontents	目錄 iv 表目錄 vi 圖目錄 vii 第一章導論 1 1.1 研究動機 1 1.2 研究目的 1 1.3 研究價值 2 第二章研究背景 3 2.1 資產配置 3 2.1.1 資產配置的意義 3 2.1.2 資產配置的考量因子 4 2.1.3 資產配置的投資策略 4 2.2 深度學習應用於資產配置 5 2.3 強化學習應用於資產配置 7 2.4 深度強化學習應用於資產配置 11 2.5 財經資料庫 14 2.5.1 Datastream 14 2.5.2 元大台灣卓越50 14 第三章相關研究 15 第四章研究架構與方法 18 4.1 研究實驗流程 18 4.2 資料蒐集 20 4.3 定義金融市場模型 20 4.4 定義智慧理財軟體代理者模型 22 4.5 定義模型測試方法 22 第五章研究實作與比較 24 5.1 資料前處理 24 5.1.1 股票選擇 24 5.1.2 資料欄位組合 26 5.2 模型訓練 26 5.2.1 交易頻率選定 26 5.2.2 調整GRU使用層數 27 5.2.3 修改數據組成天數 28 5.3 模型測試 29 5.3.1 GRU層數不同之模型測試 29 5.3.2 數據組成天數不同之模型測試 31 第六章結論與未來展望 34 6.1 結論 34 6.2 未來展望 35 參考文獻 36	zh_TW
dc.format.extent	2690946 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0106971001	en_US
dc.subject (關鍵詞)	深度強化學習	zh_TW
dc.subject (關鍵詞)	投資組合	zh_TW
dc.subject (關鍵詞)	資產配置	zh_TW
dc.subject (關鍵詞)	機器人理財	zh_TW
dc.subject (關鍵詞)	Deep Reinforcement Learning	en_US
dc.subject (關鍵詞)	Proximal Policy Optimization	en_US
dc.subject (關鍵詞)	Portfolio Management	en_US
dc.subject (關鍵詞)	Asset allocation	en_US
dc.subject (關鍵詞)	Robo-Advisor	en_US
dc.title (題名)	應用PPO深度強化學習演算法於投資 : 組合之資產配置優化	zh_TW
dc.title (題名)	Applying Deep Reinforcement Learning Algorithm PPO for Portfolio Optimization	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	[1] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, "Playing atari with deep reinforcement learning," arXiv preprint arXiv:1312.5602, 2013. [2] R. Evans and J. Gao, "DeepMind AI reduces Google data centre cooling bill by 40%," Website, 2016, https://deepmind.com/blog/ deepmind-ai-reduces-google-data-centre-cooling-bill-40/. [3] BBC News, "Artificial intelligence: Google`s AlphaGo beats go master Lee Sedol," Website, 2016, https://www.bbc.com/news/technology-35785875. [4] R. A. Ferri, All About Asset Allocation. McGraw-Hill New York, 2010. [5] G. P. Brinson, B. D. Singer, and G. L. Beebower, "Determinants of portfolio performance ii: An update," Financial Analysts Journal, vol. 47, no. 3, pp. 40-48, 1991. [6] H. Markowitz, "Portfolio Selection," The Journal of Finance, vol. 7, no. 1, pp. 77-91, 1952. [7] A. F. Perold and W. F. Sharpe, "Dynamic strategies for asset allocation," Financial Analysts Journal, vol. 44, no. 1, pp. 16-27, 1988. [8] J. C. Singleton, Core-satellite portfolio management. McGraw Hill Professional, 2004. [9] Y. Bengio, "Using a financial training criterion rather than a prediction criterion," International Journal of Neural Systems, vol. 8, no. 04, pp. 433-443, 1997. [10] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997. [11] K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bandanau, F. Bougares, H. Schwenk, and Y. Bengio, "Learning phrase representations using rnn encoder-decoder for statistical machine translation," arXiv preprint arXiv:1406.1078, 2014. [12] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, "Empirical evaluation of gated recurrent neural networks on sequence modeling," arXiv preprint arXiv: 1412.3555, 2014. [13] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018. [14] C. J. Watkins and P. Dayan, "Q-learning," Machine Learning, vol. 8, no. 3-4, pp. 279-292, 1992. [15] R. J. Williams, "Simple statistical gradient-following algorithms for connectionist reinforcement learning," Machine Learning, vol. 8, no. 3-4, pp. 229-256, 1992. [16] R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation," in Advances in Neural Information Processing Systems, 2000, pp. 1057-1063. [17] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, "Asynchronous methods for deep reinforcement learning," in International Conference on Machine Learning, 2016, pp. 1928-1937. [18] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, "Continuous control with deep reinforcement learning," arXiv preprint arXiv:1509.02971, 2015. [19] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and 0. Klimov, "Proximal policy optimization algorithms," arXiv preprint arXiv:1707.068.47,2017. [20] N. Heess, S. Sriram, J. Lemmon, J. Merel, G. Wayne, Y. Tassa, T. Erez, Z. Wang, S. Eslami, M. Riedmiller et al., "Emergence of locomotion behaviors in rich environments," arXiv preprint arXiv:1707.02286, 2017. [21] J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, "Trust region policy optimization," in International conference on machine learning, 2015, pp. 1889-1897. [22] T. Rollinger and S. Hoffman, "Sortino ratio: A better measure of risk," 2013. [23] J. Moody and M. Saffell, "Learning to trade via direct reinforcement," IEEE Transactions on Neural Networks, vol. 12, no. 4, pp. 875-889, 2001. [24] Y. Deng, F. Bao, Y. Kong, Z. Ren, and Q. Dai, "Deep direct reinforcement learning for financial signal representation and trading," IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 3, pp. 653-664, 2016. [25] Z. Xiong, X.-Y. Liu, S. Zhong, A. Walid, et al., "Practical deep reinforcement learning approach for stock trading," arXiv preprint arXiv:1811.07522, 2018. [26] Z. Liang, K. Jiang, H. Chen, J. Zhu, and Y. Li, "Deep reinforcement learning in portfolio management," arXiv preprint arXiv:1808.09940, 2018. [27] M. Hausknecht and P. Stone, "Deep recurrent q-learning for partially observable mdps," in 2015 AAAI Fall Symposium Series, 2015. [28] C. Y. Huang, "Financial trading as a game: A deep reinforcement learning approach," arXiv preprint arXiv:1807.02787, 2018.	zh_TW
dc.identifier.doi (DOI)	10.6814/NCCU202000267	en_US

學術產出-學位論文

文章檢視/開啟

書目匯出

Google ScholarTM

政大圖書館

引文資訊

TAIR相關學術產出

Google Scholar^TM