應用TD3深度強化學習演算法進行資產優化管理配置

Publications-Theses

Article View/Open

pdf(0)

Publication Export

Google Scholar^TM

NCCU Library

Discovery System

Citation Infomation

Related Publications in TAIR

Simple Record
Full Record

題名	應用TD3深度強化學習演算法進行資產優化管理配置 Applying DRL TD3 Algorithm for Portfolio Management Optimization
作者	吳宇翔 Wu, Yu-Hsiang
貢獻者	胡毓忠 Hu, Yuh-Jong 吳宇翔 Wu, Yu-Hsiang
關鍵詞	LSTM DRL TD3 資產配置 LSTM DRL TD3 Portfolio management
日期	2020
上傳時間	2-Mar-2020 11:38:39 (UTC+8)
摘要	AI 領域中的深度強化學習(Deep Reinforcement Learning，DRL)，透過不斷與環境互動來學習，從錯誤中學習、以極大化每一步決策的報酬，常用於決策最佳化，近年最知名的 AlphaGo 就是強化學習最具代表性的實例。DRL 適合用來模擬各種時序決策任務，為驗證此特性，本研究將此概念運用於最佳資產管理配置議題上。本研究致力於金融資產配置最佳化中的投資決策過程，實作深度強化學習 (Twin Delayed DDPG，TD3)及其變形(TD3+LSTM)演算法，找出最佳配置權重，以期最大化投資報酬，探究 TD3 應用於優化動態資產管理配置策略的適用性。本研究標的為台股 0050 ETF 成分股，並透過多項實驗進行驗證，其表現結果優於買進持有(Buy and Hold)及定期定額策略。 DRL(Deep Reinforcement Learning) in AI, by interacting with the environment continuously and learning from errors, maximizing the rewards of every step, usually applying to optimizing strategy decision, AlphaGo is the most concept to portfolio management optimization. This study engages in studying the process of deciding in optimizing portfolio management. Implementing Twin Delayed DDPG(TD3) and TD3+LSTM algorithms. Finding out the best representative one in DRL. This study will apply this weight of distribution, maximizing investment rewards. And check if TD3 is suitable for optimizing the strategy of dynamic portfolio management. This study using a member of 0050 ETF of Taiwan. After implementing several experiments, the performance of TD3 is better than the Buy and Hold strategy and Systematic Investment Plan.
參考文獻	[1] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin A. Riedmiller. Playing atari with deep reinforcement learning. ArXiv, abs/1312.5602, 2013. 1 [2] Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David Meger. Deep reinforcement learning that matters. ArXiv, abs/ 1709.06560, 2017. 1 [3] Alex Irpan. Deep reinforcement learning doesn’t work yet. https://www. alexirpan.com/2018/02/14/rl-hard.html, 2018. 1 [4] Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997. 3 [5] Gerald Tesauro. Temporal difference learning and td-gammon. Communications of the ACM, 38(3):58–68, 1995. 6 [6] Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8(3-4): 279–292, 1992. 6 [7] Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, and Nando de Freitas. Dueling network architectures for deep reinforcement learning. In ICML, 2015. 6 [8] Sham M Kakade. A natural policy gradient. In Advances in neural information processing systems, pages 1531–1538, 2002. 7 [9] Vijay R Konda and John N Tsitsiklis. Actor-critic algorithms. In Advances in neural information processing systems, pages 1008–1014, 2000. 7 [10] David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. Deterministic policy gradient algorithms. 2014. 8 [11] Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Manfred Otto Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. CoRR, abs/1509.02971, 2015. 8 [12] Scott Fujimoto, Herke van Hoof, and Dave Meger. Addressing function approx- imation error in actor-critic methods. In ICML, 2018. 8, 17 [13] Harry Markowitz. Portfolio selection. The journal of finance, 7(1):77–91, 1952. 12 [14] William Sharpe. Capital asset prices: A theory of market equilibrium under conditions of risk. The journal of finance, 19(3):425–442, 1964. 12 [15] Gary P Brinson, L Randolph Hood, and Gilbert L Beebower. Determinants of portfolio performance. Financial Analysts Journal, 42(4):39–44, 1986. 13 [16] Gary P Brinson, Brian D Singer, and Gilbert L Beebower. Determinants of portfolio performance ii: An update. Financial Analysts Journal, 47(3):40–48, 1991. 13 [17] Andre F Perold and William F Sharpe. Dynamic strategies for asset allocation. Financial Analysts Journal, 140, 1995. 13 [18] Franois Balloux and Nicolas Lugon-Moulin. The estimation of population differ- entiation with microsatellite markers. Molecular ecology, 11(2):155–165, 2002. 13 [19] William F Sharpe. Integrated asset allocation. Financial Analysts Journal, 43(5): 25–32, 1987. 14 [20] Zhipeng Liang, Hao Chen, Junhao Zhu, Kangkang Jiang, and Yanran Li. Ad- versarial Deep Reinforcement Learning in Portfolio Management. arXiv e-prints, page arXiv:1808.09940, Aug 2018. 2, 15 [21] Shashank Hegde, Vishal Kumar, and Atul Singh. Risk aware portfolio construc- tion using deep deterministic policy gradients. 2018 IEEE Symposium Series on Computational Intelligence (SSCI), pages 1861–1867, 2018. 2, 15, 23, 24 [22] Zhuoran Xiong, Xiao-Yang Liu, Shan Zhong, Hongyang Yang, and Anwar El- walid. Practical deep reinforcement learning approach for stock trading. ArXiv, abs/1811.07522, 2018. 2, 15 [23] Pengqian Yu, Joon Sern Lee, Ilya Kulyatin, Zekun Shi, and Sakyasingha Das- gupta. Model-based deep reinforcement learning for dynamic portfolio optimiza- tion. ArXiv, abs/1901.08740, 2019. 15 [24] Y. Deng, F. Bao, Y. Kong, Z. Ren, and Q. Dai. Deep direct reinforcement learning for financial signal representation and trading. IEEE Transactions on Neural Networks and Learning Systems, 28(3):653–664, March 2017. 16 [25] C. T. Chen, A. Chen, and S. Huang. Cloning strategies from trading records using agent-based reinforcement learning algorithm. In 2018 IEEE International Conference on Agents (ICA), pages 34–37, July 2018. 16 [26] Qinma Kang, Huizhuo Zhou, and Yunfan Kang. An asynchronous advantage actor-critic reinforcement learning method for stock selection and portfolio man- agement. In ICBDR 2018, 2018. 16 [27] Xiang Gao. Deep reinforcement learning for time series: playing idealized trading games. ArXiv, abs/1803.03916, 2018. 17 [28] Yue Deng, Feng Bao, Youyong Kong, Zhiquan Ren, and Qionghai Dai. Deep di- rect reinforcement learning for financial signal representation and trading. IEEE transactions on neural networks and learning systems, 28(3):653–664, 2016. 38 [29] Shashank Hegde, Vishal Kumar, and Atul Singh. Risk aware portfolio construc- tion using deep deterministic policy gradients. In 2018 IEEE Symposium Series on Computational Intelligence (SSCI), pages 1861–1867. IEEE, 2018. 38
描述	碩士國立政治大學資訊科學系碩士在職專班 106971009
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0106971009
資料類型	thesis

dc.contributor.advisor	胡毓忠	zh_TW
dc.contributor.advisor	Hu, Yuh-Jong	en_US
dc.contributor.author (Authors)	吳宇翔	zh_TW
dc.contributor.author (Authors)	Wu, Yu-Hsiang	en_US
dc.creator (作者)	吳宇翔	zh_TW
dc.creator (作者)	Wu, Yu-Hsiang	en_US
dc.date (日期)	2020	en_US
dc.date.accessioned	2-Mar-2020 11:38:39 (UTC+8)	-
dc.date.available	2-Mar-2020 11:38:39 (UTC+8)	-
dc.date.issued (上傳時間)	2-Mar-2020 11:38:39 (UTC+8)	-
dc.identifier (Other Identifiers)	G0106971009	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/128994	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	資訊科學系碩士在職專班	zh_TW
dc.description (描述)	106971009	zh_TW
dc.description.abstract (摘要)	AI 領域中的深度強化學習(Deep Reinforcement Learning，DRL)，透過不斷與環境互動來學習，從錯誤中學習、以極大化每一步決策的報酬，常用於決策最佳化，近年最知名的 AlphaGo 就是強化學習最具代表性的實例。DRL 適合用來模擬各種時序決策任務，為驗證此特性，本研究將此概念運用於最佳資產管理配置議題上。本研究致力於金融資產配置最佳化中的投資決策過程，實作深度強化學習 (Twin Delayed DDPG，TD3)及其變形(TD3+LSTM)演算法，找出最佳配置權重，以期最大化投資報酬，探究 TD3 應用於優化動態資產管理配置策略的適用性。本研究標的為台股 0050 ETF 成分股，並透過多項實驗進行驗證，其表現結果優於買進持有(Buy and Hold)及定期定額策略。	zh_TW
dc.description.abstract (摘要)	DRL(Deep Reinforcement Learning) in AI, by interacting with the environment continuously and learning from errors, maximizing the rewards of every step, usually applying to optimizing strategy decision, AlphaGo is the most concept to portfolio management optimization. This study engages in studying the process of deciding in optimizing portfolio management. Implementing Twin Delayed DDPG(TD3) and TD3+LSTM algorithms. Finding out the best representative one in DRL. This study will apply this weight of distribution, maximizing investment rewards. And check if TD3 is suitable for optimizing the strategy of dynamic portfolio management. This study using a member of 0050 ETF of Taiwan. After implementing several experiments, the performance of TD3 is better than the Buy and Hold strategy and Systematic Investment Plan.	en_US
dc.description.tableofcontents	目錄............................................ iii 表目錄........................................... v 圖目錄........................................... vi 第一章導論...................................... 1 1.1 研究動機 .................................. 1 1.2 研究目的 .................................. 2 第二章研究背景.................................... 3 2.1 長短期記憶(LSTM)........................... 3 2.2 強化學習 .................................. 4 2.2.1 Q-Learning ............................. 5 2.2.2 DeepQ-Learning.......................... 6 2.2.3 策略梯度.............................. 6 2.2.4 Actor-Critic演算法......................... 7 2.2.5 DPG與DDPG............................ 8 2.3 TwinDelayedDDPG(TD3) ...................... 8 2.4 資產配置與管理 .............................. 12 2.4.1 現代投資組合理論 ........................ 12 2.4.2 夏普值 ............................... 12 2.4.3 資產配置.............................. 13 第三章相關研究.................................... 15 3.1 使用DDPG演算法進行資產管理案例 .................. 15 3.2 非使用DDPG演算法進行資產管理案例................. 16 3.3 實驗方法比較................................ 17 第四章資產管理配置實驗流程設計........................ 18 4.1 資料蒐集 .................................. 18 4.2 資料前處理階段 .............................. 19 4.2.1 標的選擇.............................. 19 4.2.2 環境建置.............................. 20 4.2.3 OpenAIGym套件介紹 ...................... 20 4.2.4 動作環境設計 ........................... 21 4.2.5 觀測環境設計 ........................... 21 4.2.6 代理人動作及回饋設計 ..................... 22 4.3 模型設計 .................................. 22 4.3.1 TD3訓練模型 ........................... 23 4.3.2 TD3加入LSTM訓練........................ 23 4.3.3 固定交易頻率 ........................... 25 4.4 資料測試及結果分析 ........................... 26 第五章研究實作與比較............................... 27 5.1 資料前處理................................. 27 5.1.1 標的過濾.............................. 27 5.1.2 資料欄位處理 ........................... 28 5.2 模型建立及參數說明 ........................... 29 5.3 實驗結果 .................................. 31 5.3.1 TD3訓練結果........................... 31 5.3.2 TD3+LSTM訓練結果 ...................... 32 5.3.3 模型比較.............................. 33 5.3.4 結果分析.............................. 34 第六章結論與未來展望............................... 37 6.1 研究結論 .................................. 37 6.2 未來展望 .................................. 38 參考文獻......................................... 39	zh_TW
dc.format.extent	5843345 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0106971009	en_US
dc.subject (關鍵詞)	LSTM	zh_TW
dc.subject (關鍵詞)	DRL	zh_TW
dc.subject (關鍵詞)	TD3	zh_TW
dc.subject (關鍵詞)	資產配置	zh_TW
dc.subject (關鍵詞)	LSTM	en_US
dc.subject (關鍵詞)	DRL	en_US
dc.subject (關鍵詞)	TD3	en_US
dc.subject (關鍵詞)	Portfolio management	en_US
dc.title (題名)	應用TD3深度強化學習演算法進行資產優化管理配置	zh_TW
dc.title (題名)	Applying DRL TD3 Algorithm for Portfolio Management Optimization	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	[1] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin A. Riedmiller. Playing atari with deep reinforcement learning. ArXiv, abs/1312.5602, 2013. 1 [2] Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David Meger. Deep reinforcement learning that matters. ArXiv, abs/ 1709.06560, 2017. 1 [3] Alex Irpan. Deep reinforcement learning doesn’t work yet. https://www. alexirpan.com/2018/02/14/rl-hard.html, 2018. 1 [4] Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997. 3 [5] Gerald Tesauro. Temporal difference learning and td-gammon. Communications of the ACM, 38(3):58–68, 1995. 6 [6] Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8(3-4): 279–292, 1992. 6 [7] Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, and Nando de Freitas. Dueling network architectures for deep reinforcement learning. In ICML, 2015. 6 [8] Sham M Kakade. A natural policy gradient. In Advances in neural information processing systems, pages 1531–1538, 2002. 7 [9] Vijay R Konda and John N Tsitsiklis. Actor-critic algorithms. In Advances in neural information processing systems, pages 1008–1014, 2000. 7 [10] David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. Deterministic policy gradient algorithms. 2014. 8 [11] Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Manfred Otto Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. CoRR, abs/1509.02971, 2015. 8 [12] Scott Fujimoto, Herke van Hoof, and Dave Meger. Addressing function approx- imation error in actor-critic methods. In ICML, 2018. 8, 17 [13] Harry Markowitz. Portfolio selection. The journal of finance, 7(1):77–91, 1952. 12 [14] William Sharpe. Capital asset prices: A theory of market equilibrium under conditions of risk. The journal of finance, 19(3):425–442, 1964. 12 [15] Gary P Brinson, L Randolph Hood, and Gilbert L Beebower. Determinants of portfolio performance. Financial Analysts Journal, 42(4):39–44, 1986. 13 [16] Gary P Brinson, Brian D Singer, and Gilbert L Beebower. Determinants of portfolio performance ii: An update. Financial Analysts Journal, 47(3):40–48, 1991. 13 [17] Andre F Perold and William F Sharpe. Dynamic strategies for asset allocation. Financial Analysts Journal, 140, 1995. 13 [18] Franois Balloux and Nicolas Lugon-Moulin. The estimation of population differ- entiation with microsatellite markers. Molecular ecology, 11(2):155–165, 2002. 13 [19] William F Sharpe. Integrated asset allocation. Financial Analysts Journal, 43(5): 25–32, 1987. 14 [20] Zhipeng Liang, Hao Chen, Junhao Zhu, Kangkang Jiang, and Yanran Li. Ad- versarial Deep Reinforcement Learning in Portfolio Management. arXiv e-prints, page arXiv:1808.09940, Aug 2018. 2, 15 [21] Shashank Hegde, Vishal Kumar, and Atul Singh. Risk aware portfolio construc- tion using deep deterministic policy gradients. 2018 IEEE Symposium Series on Computational Intelligence (SSCI), pages 1861–1867, 2018. 2, 15, 23, 24 [22] Zhuoran Xiong, Xiao-Yang Liu, Shan Zhong, Hongyang Yang, and Anwar El- walid. Practical deep reinforcement learning approach for stock trading. ArXiv, abs/1811.07522, 2018. 2, 15 [23] Pengqian Yu, Joon Sern Lee, Ilya Kulyatin, Zekun Shi, and Sakyasingha Das- gupta. Model-based deep reinforcement learning for dynamic portfolio optimiza- tion. ArXiv, abs/1901.08740, 2019. 15 [24] Y. Deng, F. Bao, Y. Kong, Z. Ren, and Q. Dai. Deep direct reinforcement learning for financial signal representation and trading. IEEE Transactions on Neural Networks and Learning Systems, 28(3):653–664, March 2017. 16 [25] C. T. Chen, A. Chen, and S. Huang. Cloning strategies from trading records using agent-based reinforcement learning algorithm. In 2018 IEEE International Conference on Agents (ICA), pages 34–37, July 2018. 16 [26] Qinma Kang, Huizhuo Zhou, and Yunfan Kang. An asynchronous advantage actor-critic reinforcement learning method for stock selection and portfolio man- agement. In ICBDR 2018, 2018. 16 [27] Xiang Gao. Deep reinforcement learning for time series: playing idealized trading games. ArXiv, abs/1803.03916, 2018. 17 [28] Yue Deng, Feng Bao, Youyong Kong, Zhiquan Ren, and Qionghai Dai. Deep di- rect reinforcement learning for financial signal representation and trading. IEEE transactions on neural networks and learning systems, 28(3):653–664, 2016. 38 [29] Shashank Hegde, Vishal Kumar, and Atul Singh. Risk aware portfolio construc- tion using deep deterministic policy gradients. In 2018 IEEE Symposium Series on Computational Intelligence (SSCI), pages 1861–1867. IEEE, 2018. 38	zh_TW
dc.identifier.doi (DOI)	10.6814/NCCU202000257	en_US

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM