Please use this identifier to cite or link to this item: https://ah.nccu.edu.tw/handle/140.119/128994


Title: 應用TD3深度強化學習演算法進行資產優化管理配置
Applying DRL TD3 Algorithm for Portfolio Management Optimization
Authors: 吳宇翔
Wu, Yu-Hsiang
Contributors: 胡毓忠
Hu, Yuh-Jong
吳宇翔
Wu, Yu-Hsiang
Keywords: LSTM
DRL
TD3
資產配置
LSTM
DRL
TD3
Portfolio management
Date: 2020
Issue Date: 2020-03-02 11:38:39 (UTC+8)
Abstract: AI 領域中的深度強化學習(Deep Reinforcement Learning,DRL),透 過不斷與環境互動來學習,從錯誤中學習、以極大化每一步決策的報酬, 常用於決策最佳化,近年最知名的 AlphaGo 就是強化學習最具代表性的實 例。DRL 適合用來模擬各種時序決策任務,為驗證此特性,本研究將此概 念運用於最佳資產管理配置議題上。
本研究致力於金融資產配置最佳化中的投資決策過程,實作深度強化學 習 (Twin Delayed DDPG,TD3)及其變形(TD3+LSTM)演算法,找出 最佳配置權重,以期最大化投資報酬,探究 TD3 應用於優化動態資產管理 配置策略的適用性。本研究標的為台股 0050 ETF 成分股,並透過多項實 驗進行驗證,其表現結果優於買進持有(Buy and Hold)及定期定額策略。
DRL(Deep Reinforcement Learning) in AI, by interacting with the environment continuously and learning from errors, maximizing the rewards of every step, usually applying to optimizing strategy decision, AlphaGo is the most concept to portfolio management optimization.
This study engages in studying the process of deciding in optimizing portfolio management. Implementing Twin Delayed DDPG(TD3) and TD3+LSTM algorithms. Finding out the best representative one in DRL. This study will apply this weight of distribution, maximizing investment rewards. And check if TD3 is suitable for optimizing the strategy of dynamic portfolio management. This study using a member of 0050 ETF of Taiwan. After implementing several experiments, the performance of TD3 is better than the Buy and Hold strategy and Systematic Investment Plan.
Reference: [1] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin A. Riedmiller. Playing atari with deep reinforcement learning. ArXiv, abs/1312.5602, 2013. 1
[2] Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David Meger. Deep reinforcement learning that matters. ArXiv, abs/ 1709.06560, 2017. 1
[3] Alex Irpan. Deep reinforcement learning doesn’t work yet. https://www. alexirpan.com/2018/02/14/rl-hard.html, 2018. 1
[4] Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997. 3
[5] Gerald Tesauro. Temporal difference learning and td-gammon. Communications of the ACM, 38(3):58–68, 1995. 6
[6] Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8(3-4): 279–292, 1992. 6
[7] Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, and Nando de Freitas. Dueling network architectures for deep reinforcement learning. In ICML, 2015. 6
[8] Sham M Kakade. A natural policy gradient. In Advances in neural information processing systems, pages 1531–1538, 2002. 7
[9] Vijay R Konda and John N Tsitsiklis. Actor-critic algorithms. In Advances in neural information processing systems, pages 1008–1014, 2000. 7
[10] David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. Deterministic policy gradient algorithms. 2014. 8
[11] Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Manfred Otto Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. CoRR, abs/1509.02971, 2015. 8
[12] Scott Fujimoto, Herke van Hoof, and Dave Meger. Addressing function approx- imation error in actor-critic methods. In ICML, 2018. 8, 17
[13] Harry Markowitz. Portfolio selection. The journal of finance, 7(1):77–91, 1952. 12
[14] William Sharpe. Capital asset prices: A theory of market equilibrium under conditions of risk. The journal of finance, 19(3):425–442, 1964. 12
[15] Gary P Brinson, L Randolph Hood, and Gilbert L Beebower. Determinants of portfolio performance. Financial Analysts Journal, 42(4):39–44, 1986. 13
[16] Gary P Brinson, Brian D Singer, and Gilbert L Beebower. Determinants of portfolio performance ii: An update. Financial Analysts Journal, 47(3):40–48, 1991. 13
[17] Andre F Perold and William F Sharpe. Dynamic strategies for asset allocation. Financial Analysts Journal, 140, 1995. 13
[18] Franois Balloux and Nicolas Lugon-Moulin. The estimation of population differ- entiation with microsatellite markers. Molecular ecology, 11(2):155–165, 2002. 13
[19] William F Sharpe. Integrated asset allocation. Financial Analysts Journal, 43(5): 25–32, 1987. 14
[20] Zhipeng Liang, Hao Chen, Junhao Zhu, Kangkang Jiang, and Yanran Li. Ad- versarial Deep Reinforcement Learning in Portfolio Management. arXiv e-prints, page arXiv:1808.09940, Aug 2018. 2, 15
[21] Shashank Hegde, Vishal Kumar, and Atul Singh. Risk aware portfolio construc- tion using deep deterministic policy gradients. 2018 IEEE Symposium Series on Computational Intelligence (SSCI), pages 1861–1867, 2018. 2, 15, 23, 24
[22] Zhuoran Xiong, Xiao-Yang Liu, Shan Zhong, Hongyang Yang, and Anwar El- walid. Practical deep reinforcement learning approach for stock trading. ArXiv, abs/1811.07522, 2018. 2, 15
[23] Pengqian Yu, Joon Sern Lee, Ilya Kulyatin, Zekun Shi, and Sakyasingha Das- gupta. Model-based deep reinforcement learning for dynamic portfolio optimiza- tion. ArXiv, abs/1901.08740, 2019. 15
[24] Y. Deng, F. Bao, Y. Kong, Z. Ren, and Q. Dai. Deep direct reinforcement learning for financial signal representation and trading. IEEE Transactions on Neural Networks and Learning Systems, 28(3):653–664, March 2017. 16
[25] C. T. Chen, A. Chen, and S. Huang. Cloning strategies from trading records using agent-based reinforcement learning algorithm. In 2018 IEEE International Conference on Agents (ICA), pages 34–37, July 2018. 16
[26] Qinma Kang, Huizhuo Zhou, and Yunfan Kang. An asynchronous advantage actor-critic reinforcement learning method for stock selection and portfolio man- agement. In ICBDR 2018, 2018. 16
[27] Xiang Gao. Deep reinforcement learning for time series: playing idealized trading games. ArXiv, abs/1803.03916, 2018. 17
[28] Yue Deng, Feng Bao, Youyong Kong, Zhiquan Ren, and Qionghai Dai. Deep di- rect reinforcement learning for financial signal representation and trading. IEEE transactions on neural networks and learning systems, 28(3):653–664, 2016. 38
[29] Shashank Hegde, Vishal Kumar, and Atul Singh. Risk aware portfolio construc- tion using deep deterministic policy gradients. In 2018 IEEE Symposium Series on Computational Intelligence (SSCI), pages 1861–1867. IEEE, 2018. 38
Description: 碩士
國立政治大學
資訊科學系碩士在職專班
106971009
Source URI: http://thesis.lib.nccu.edu.tw/record/#G0106971009
Data Type: thesis
Appears in Collections:[資訊科學系碩士在職專班] 學位論文

Files in This Item:

File SizeFormat
100901.pdf5706KbAdobe PDF0View/Open


All items in 學術集成 are protected by copyright, with all rights reserved.


社群 sharing