Publications-Theses
Article View/Open
Publication Export
-
題名 應用TD3深度強化學習演算法進行資產優化管理配置
Applying DRL TD3 Algorithm for Portfolio Management Optimization作者 吳宇翔
Wu, Yu-Hsiang貢獻者 胡毓忠
Hu, Yuh-Jong
吳宇翔
Wu, Yu-Hsiang關鍵詞 LSTM
DRL
TD3
資產配置
LSTM
DRL
TD3
Portfolio management日期 2020 上傳時間 2-Mar-2020 11:38:39 (UTC+8) 摘要 AI 領域中的深度強化學習(Deep Reinforcement Learning,DRL),透 過不斷與環境互動來學習,從錯誤中學習、以極大化每一步決策的報酬, 常用於決策最佳化,近年最知名的 AlphaGo 就是強化學習最具代表性的實 例。DRL 適合用來模擬各種時序決策任務,為驗證此特性,本研究將此概 念運用於最佳資產管理配置議題上。本研究致力於金融資產配置最佳化中的投資決策過程,實作深度強化學 習 (Twin Delayed DDPG,TD3)及其變形(TD3+LSTM)演算法,找出 最佳配置權重,以期最大化投資報酬,探究 TD3 應用於優化動態資產管理 配置策略的適用性。本研究標的為台股 0050 ETF 成分股,並透過多項實 驗進行驗證,其表現結果優於買進持有(Buy and Hold)及定期定額策略。
DRL(Deep Reinforcement Learning) in AI, by interacting with the environment continuously and learning from errors, maximizing the rewards of every step, usually applying to optimizing strategy decision, AlphaGo is the most concept to portfolio management optimization.This study engages in studying the process of deciding in optimizing portfolio management. Implementing Twin Delayed DDPG(TD3) and TD3+LSTM algorithms. Finding out the best representative one in DRL. This study will apply this weight of distribution, maximizing investment rewards. And check if TD3 is suitable for optimizing the strategy of dynamic portfolio management. This study using a member of 0050 ETF of Taiwan. After implementing several experiments, the performance of TD3 is better than the Buy and Hold strategy and Systematic Investment Plan.參考文獻 [1] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin A. Riedmiller. Playing atari with deep reinforcement learning. ArXiv, abs/1312.5602, 2013. 1[2] Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David Meger. Deep reinforcement learning that matters. ArXiv, abs/ 1709.06560, 2017. 1[3] Alex Irpan. Deep reinforcement learning doesn’t work yet. https://www. alexirpan.com/2018/02/14/rl-hard.html, 2018. 1[4] Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997. 3[5] Gerald Tesauro. Temporal difference learning and td-gammon. Communications of the ACM, 38(3):58–68, 1995. 6[6] Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8(3-4): 279–292, 1992. 6[7] Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, and Nando de Freitas. Dueling network architectures for deep reinforcement learning. In ICML, 2015. 6[8] Sham M Kakade. A natural policy gradient. In Advances in neural information processing systems, pages 1531–1538, 2002. 7[9] Vijay R Konda and John N Tsitsiklis. Actor-critic algorithms. In Advances in neural information processing systems, pages 1008–1014, 2000. 7[10] David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. Deterministic policy gradient algorithms. 2014. 8[11] Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Manfred Otto Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. CoRR, abs/1509.02971, 2015. 8[12] Scott Fujimoto, Herke van Hoof, and Dave Meger. Addressing function approx- imation error in actor-critic methods. In ICML, 2018. 8, 17[13] Harry Markowitz. Portfolio selection. The journal of finance, 7(1):77–91, 1952. 12[14] William Sharpe. Capital asset prices: A theory of market equilibrium under conditions of risk. The journal of finance, 19(3):425–442, 1964. 12[15] Gary P Brinson, L Randolph Hood, and Gilbert L Beebower. Determinants of portfolio performance. Financial Analysts Journal, 42(4):39–44, 1986. 13[16] Gary P Brinson, Brian D Singer, and Gilbert L Beebower. Determinants of portfolio performance ii: An update. Financial Analysts Journal, 47(3):40–48, 1991. 13[17] Andre F Perold and William F Sharpe. Dynamic strategies for asset allocation. Financial Analysts Journal, 140, 1995. 13[18] Franois Balloux and Nicolas Lugon-Moulin. The estimation of population differ- entiation with microsatellite markers. Molecular ecology, 11(2):155–165, 2002. 13[19] William F Sharpe. Integrated asset allocation. Financial Analysts Journal, 43(5): 25–32, 1987. 14[20] Zhipeng Liang, Hao Chen, Junhao Zhu, Kangkang Jiang, and Yanran Li. Ad- versarial Deep Reinforcement Learning in Portfolio Management. arXiv e-prints, page arXiv:1808.09940, Aug 2018. 2, 15[21] Shashank Hegde, Vishal Kumar, and Atul Singh. Risk aware portfolio construc- tion using deep deterministic policy gradients. 2018 IEEE Symposium Series on Computational Intelligence (SSCI), pages 1861–1867, 2018. 2, 15, 23, 24[22] Zhuoran Xiong, Xiao-Yang Liu, Shan Zhong, Hongyang Yang, and Anwar El- walid. Practical deep reinforcement learning approach for stock trading. ArXiv, abs/1811.07522, 2018. 2, 15[23] Pengqian Yu, Joon Sern Lee, Ilya Kulyatin, Zekun Shi, and Sakyasingha Das- gupta. Model-based deep reinforcement learning for dynamic portfolio optimiza- tion. ArXiv, abs/1901.08740, 2019. 15[24] Y. Deng, F. Bao, Y. Kong, Z. Ren, and Q. Dai. Deep direct reinforcement learning for financial signal representation and trading. IEEE Transactions on Neural Networks and Learning Systems, 28(3):653–664, March 2017. 16[25] C. T. Chen, A. Chen, and S. Huang. Cloning strategies from trading records using agent-based reinforcement learning algorithm. In 2018 IEEE International Conference on Agents (ICA), pages 34–37, July 2018. 16[26] Qinma Kang, Huizhuo Zhou, and Yunfan Kang. An asynchronous advantage actor-critic reinforcement learning method for stock selection and portfolio man- agement. In ICBDR 2018, 2018. 16[27] Xiang Gao. Deep reinforcement learning for time series: playing idealized trading games. ArXiv, abs/1803.03916, 2018. 17[28] Yue Deng, Feng Bao, Youyong Kong, Zhiquan Ren, and Qionghai Dai. Deep di- rect reinforcement learning for financial signal representation and trading. IEEE transactions on neural networks and learning systems, 28(3):653–664, 2016. 38[29] Shashank Hegde, Vishal Kumar, and Atul Singh. Risk aware portfolio construc- tion using deep deterministic policy gradients. In 2018 IEEE Symposium Series on Computational Intelligence (SSCI), pages 1861–1867. IEEE, 2018. 38 描述 碩士
國立政治大學
資訊科學系碩士在職專班
106971009資料來源 http://thesis.lib.nccu.edu.tw/record/#G0106971009 資料類型 thesis dc.contributor.advisor 胡毓忠 zh_TW dc.contributor.advisor Hu, Yuh-Jong en_US dc.contributor.author (Authors) 吳宇翔 zh_TW dc.contributor.author (Authors) Wu, Yu-Hsiang en_US dc.creator (作者) 吳宇翔 zh_TW dc.creator (作者) Wu, Yu-Hsiang en_US dc.date (日期) 2020 en_US dc.date.accessioned 2-Mar-2020 11:38:39 (UTC+8) - dc.date.available 2-Mar-2020 11:38:39 (UTC+8) - dc.date.issued (上傳時間) 2-Mar-2020 11:38:39 (UTC+8) - dc.identifier (Other Identifiers) G0106971009 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/128994 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊科學系碩士在職專班 zh_TW dc.description (描述) 106971009 zh_TW dc.description.abstract (摘要) AI 領域中的深度強化學習(Deep Reinforcement Learning,DRL),透 過不斷與環境互動來學習,從錯誤中學習、以極大化每一步決策的報酬, 常用於決策最佳化,近年最知名的 AlphaGo 就是強化學習最具代表性的實 例。DRL 適合用來模擬各種時序決策任務,為驗證此特性,本研究將此概 念運用於最佳資產管理配置議題上。本研究致力於金融資產配置最佳化中的投資決策過程,實作深度強化學 習 (Twin Delayed DDPG,TD3)及其變形(TD3+LSTM)演算法,找出 最佳配置權重,以期最大化投資報酬,探究 TD3 應用於優化動態資產管理 配置策略的適用性。本研究標的為台股 0050 ETF 成分股,並透過多項實 驗進行驗證,其表現結果優於買進持有(Buy and Hold)及定期定額策略。 zh_TW dc.description.abstract (摘要) DRL(Deep Reinforcement Learning) in AI, by interacting with the environment continuously and learning from errors, maximizing the rewards of every step, usually applying to optimizing strategy decision, AlphaGo is the most concept to portfolio management optimization.This study engages in studying the process of deciding in optimizing portfolio management. Implementing Twin Delayed DDPG(TD3) and TD3+LSTM algorithms. Finding out the best representative one in DRL. This study will apply this weight of distribution, maximizing investment rewards. And check if TD3 is suitable for optimizing the strategy of dynamic portfolio management. This study using a member of 0050 ETF of Taiwan. After implementing several experiments, the performance of TD3 is better than the Buy and Hold strategy and Systematic Investment Plan. en_US dc.description.tableofcontents 目錄............................................ iii表目錄........................................... v圖目錄........................................... vi第一章 導論...................................... 11.1 研究動機 .................................. 11.2 研究目的 .................................. 2第二章 研究背景.................................... 32.1 長短期記憶(LSTM)........................... 32.2 強化學習 .................................. 42.2.1 Q-Learning ............................. 52.2.2 DeepQ-Learning.......................... 62.2.3 策略梯度.............................. 62.2.4 Actor-Critic演算法......................... 72.2.5 DPG與DDPG............................ 82.3 TwinDelayedDDPG(TD3) ...................... 82.4 資產配置與管理 .............................. 122.4.1 現代投資組合理論 ........................ 122.4.2 夏普值 ............................... 122.4.3 資產配置.............................. 13第三章 相關研究.................................... 153.1 使用DDPG演算法進行資產管理案例 .................. 153.2 非使用DDPG演算法進行資產管理案例................. 163.3 實驗方法比較................................ 17第四章 資產管理配置實驗流程設計........................ 184.1 資料蒐集 .................................. 184.2 資料前處理階段 .............................. 194.2.1 標的選擇.............................. 194.2.2 環境建置.............................. 204.2.3 OpenAIGym套件介紹 ...................... 204.2.4 動作環境設計 ........................... 214.2.5 觀測環境設計 ........................... 214.2.6 代理人動作及回饋設計 ..................... 224.3 模型設計 .................................. 224.3.1 TD3訓練模型 ........................... 234.3.2 TD3加入LSTM訓練........................ 234.3.3 固定交易頻率 ........................... 254.4 資料測試及結果分析 ........................... 26第五章 研究實作與比較............................... 275.1 資料前處理................................. 275.1.1 標的過濾.............................. 275.1.2 資料欄位處理 ........................... 285.2 模型建立及參數說明 ........................... 295.3 實驗結果 .................................. 315.3.1 TD3訓練結果........................... 315.3.2 TD3+LSTM訓練結果 ...................... 325.3.3 模型比較.............................. 335.3.4 結果分析.............................. 34第六章 結論與未來展望............................... 376.1 研究結論 .................................. 376.2 未來展望 .................................. 38參考文獻......................................... 39 zh_TW dc.format.extent 5843345 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0106971009 en_US dc.subject (關鍵詞) LSTM zh_TW dc.subject (關鍵詞) DRL zh_TW dc.subject (關鍵詞) TD3 zh_TW dc.subject (關鍵詞) 資產配置 zh_TW dc.subject (關鍵詞) LSTM en_US dc.subject (關鍵詞) DRL en_US dc.subject (關鍵詞) TD3 en_US dc.subject (關鍵詞) Portfolio management en_US dc.title (題名) 應用TD3深度強化學習演算法進行資產優化管理配置 zh_TW dc.title (題名) Applying DRL TD3 Algorithm for Portfolio Management Optimization en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) [1] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin A. Riedmiller. Playing atari with deep reinforcement learning. ArXiv, abs/1312.5602, 2013. 1[2] Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David Meger. Deep reinforcement learning that matters. ArXiv, abs/ 1709.06560, 2017. 1[3] Alex Irpan. Deep reinforcement learning doesn’t work yet. https://www. alexirpan.com/2018/02/14/rl-hard.html, 2018. 1[4] Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997. 3[5] Gerald Tesauro. Temporal difference learning and td-gammon. Communications of the ACM, 38(3):58–68, 1995. 6[6] Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8(3-4): 279–292, 1992. 6[7] Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, and Nando de Freitas. Dueling network architectures for deep reinforcement learning. In ICML, 2015. 6[8] Sham M Kakade. A natural policy gradient. In Advances in neural information processing systems, pages 1531–1538, 2002. 7[9] Vijay R Konda and John N Tsitsiklis. Actor-critic algorithms. In Advances in neural information processing systems, pages 1008–1014, 2000. 7[10] David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. Deterministic policy gradient algorithms. 2014. 8[11] Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Manfred Otto Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. CoRR, abs/1509.02971, 2015. 8[12] Scott Fujimoto, Herke van Hoof, and Dave Meger. Addressing function approx- imation error in actor-critic methods. In ICML, 2018. 8, 17[13] Harry Markowitz. Portfolio selection. The journal of finance, 7(1):77–91, 1952. 12[14] William Sharpe. Capital asset prices: A theory of market equilibrium under conditions of risk. The journal of finance, 19(3):425–442, 1964. 12[15] Gary P Brinson, L Randolph Hood, and Gilbert L Beebower. Determinants of portfolio performance. Financial Analysts Journal, 42(4):39–44, 1986. 13[16] Gary P Brinson, Brian D Singer, and Gilbert L Beebower. Determinants of portfolio performance ii: An update. Financial Analysts Journal, 47(3):40–48, 1991. 13[17] Andre F Perold and William F Sharpe. Dynamic strategies for asset allocation. Financial Analysts Journal, 140, 1995. 13[18] Franois Balloux and Nicolas Lugon-Moulin. The estimation of population differ- entiation with microsatellite markers. Molecular ecology, 11(2):155–165, 2002. 13[19] William F Sharpe. Integrated asset allocation. Financial Analysts Journal, 43(5): 25–32, 1987. 14[20] Zhipeng Liang, Hao Chen, Junhao Zhu, Kangkang Jiang, and Yanran Li. Ad- versarial Deep Reinforcement Learning in Portfolio Management. arXiv e-prints, page arXiv:1808.09940, Aug 2018. 2, 15[21] Shashank Hegde, Vishal Kumar, and Atul Singh. Risk aware portfolio construc- tion using deep deterministic policy gradients. 2018 IEEE Symposium Series on Computational Intelligence (SSCI), pages 1861–1867, 2018. 2, 15, 23, 24[22] Zhuoran Xiong, Xiao-Yang Liu, Shan Zhong, Hongyang Yang, and Anwar El- walid. Practical deep reinforcement learning approach for stock trading. ArXiv, abs/1811.07522, 2018. 2, 15[23] Pengqian Yu, Joon Sern Lee, Ilya Kulyatin, Zekun Shi, and Sakyasingha Das- gupta. Model-based deep reinforcement learning for dynamic portfolio optimiza- tion. ArXiv, abs/1901.08740, 2019. 15[24] Y. Deng, F. Bao, Y. Kong, Z. Ren, and Q. Dai. Deep direct reinforcement learning for financial signal representation and trading. IEEE Transactions on Neural Networks and Learning Systems, 28(3):653–664, March 2017. 16[25] C. T. Chen, A. Chen, and S. Huang. Cloning strategies from trading records using agent-based reinforcement learning algorithm. In 2018 IEEE International Conference on Agents (ICA), pages 34–37, July 2018. 16[26] Qinma Kang, Huizhuo Zhou, and Yunfan Kang. An asynchronous advantage actor-critic reinforcement learning method for stock selection and portfolio man- agement. In ICBDR 2018, 2018. 16[27] Xiang Gao. Deep reinforcement learning for time series: playing idealized trading games. ArXiv, abs/1803.03916, 2018. 17[28] Yue Deng, Feng Bao, Youyong Kong, Zhiquan Ren, and Qionghai Dai. Deep di- rect reinforcement learning for financial signal representation and trading. IEEE transactions on neural networks and learning systems, 28(3):653–664, 2016. 38[29] Shashank Hegde, Vishal Kumar, and Atul Singh. Risk aware portfolio construc- tion using deep deterministic policy gradients. In 2018 IEEE Symposium Series on Computational Intelligence (SSCI), pages 1861–1867. IEEE, 2018. 38 zh_TW dc.identifier.doi (DOI) 10.6814/NCCU202000257 en_US