Please use this identifier to cite or link to this item:

Title: 運用Soft Actor-Critic深度強化學習演算法優化投資配置組合
A Deep Reinforcement Learning Algorithms of Soft Actor-Critic for Optimizing Stock Portfolio Allocation
Authors: 王衍晰
Wang, Yen-Hsi
Contributors: 胡毓忠
Hu, Yuh-Jong
Wang, Yen-Hsi
Keywords: 深度強化學習
SAC 演算法
Deep Reinforcement Learning
Soft Actor-Critic Algorithm
Stock Portfolio
Portfolio Allocation
Date: 2020
Issue Date: 2020-09-02 13:14:56 (UTC+8)
Abstract: 透過人工智慧演算法進行自動化交易是當前股市投資管理研究的發展趨勢。本研究結合深度強化學習與金融科技,探討運用 Soft Actor-Critic(SAC)演算法於股市資產配置之效益,並驗證演算法是否能有效應用於金融交易市場及藉配置資產提高投資總體價值。本研究自 Datastream 數據資料庫選定我國股票市場中 5 支股票為實驗標的,利用演算法在 OpenAI Gym 環境中訓練、運算並驗證該演算法在股市資產投資分配上之成效。實驗結果顯示,該 演算法能根據歷史數據學習預測目標股票未來績效表現,發揮自動調控風險及配置資產權 重之能力,產生最佳投資組合模型。另外本實驗結果與泛化投資組合策略(Universal Portfolio)相比,展現更為優異而穩定之收益,亦初步驗證深度強化學習能有效應用於金融交易市場。
The applications of artificial intelligence algorithms to automated trading have become one of the prominent domains of portfolio management studies. This study combines the key concepts of both deep reinforcement learning and financial technology, exploring the performance of applying soft actor-critic (SAC) algorithm for the optimal stock portfolio allocation.
In this thesis, we select five stocks via Taiwan stock market from the Datastream database as our experimental target. Then, with the operation of Docker containerization technology, we apply the SAC algorithm to train, calculate and come up with the most optimal stock portfolio allocation. A comparative analysis of the deep reinforcement learning based portfolio optimization versus the more traditional “Universal Portfolio”, “Best so Far”, and “Buy and Hold” is conducted to verify the effectiveness and stability of the overall performance of our SAC model.
The preliminary results show that through its off-policy updates with a stable stochastic actor- critic formulation, the SAC approach is capable of predicting future stock performance from the input training of historical data. Furthermore, with its automated learning process, the risk and asset allocation weight are under dynamic management, thus generating the optimal stock portfolio with a better and more stable performance, comparing with other traditional quantitative strategies.
Reference: [1] T. M. Cover and E. Ordentlich, "Universal portfolios with side information," IEEE Transactions on Information Theory, vol. 42, no. 2, pp. 348-363, 1996.
[2] S. Zhang, S. Wang, and X. Deng, "Portfolio selection theory with different interest rates for borrowing and leading," Journal of Global Optimization, vol. 28, no. 1, pp. 67-95, 2004.
[3] B. Li and S. C. Hoi, "Online portfolio selection: A survey," ACM Computing Surveys (CSUR), vol. 46, no. 3, pp. 1-36, 2014.
[4] F. D. Freitas, A. F. De Souza, and A. R. de Almeida, "Prediction-based portfolio optimization model using neural networks," Neurocomputing, vol. 72, no. 10-12, pp. 2155-2170, 2009.
[5] S. T. A. Niaki and S. Hoseinzade, "Forecasting S&P 500 index using artificial neural networks and design of experiments," Journal of Industrial Engineering International, vol. 9, no. 1, p. 1, 2013.
[6] J. Heaton, N. Polson, and J. H. Witte, "Deep learning for finance: deep portfolios," Applied Stochastic Models in Business and Industry, vol. 33, no. 1, pp. 3-12, 2017.
[7] Z. Jiang, D. Xu, and J. Liang, "A deep reinforcement learning framework for the financial portfolio management problem," arXiv preprint arXiv:1706.10059, 2017.
[8] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, "Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor," arXiv preprint arXiv:1801.01290, 2018.
[9] T. Haarnoja et al., "Soft actor-critic algorithms and applications," arXiv preprint arXiv:1812.05905, 2018.
[10] H. Markowitz, "Portfolio Selection The Journal of Finance, Vol. 7, No. 1," ed: Mar, 1952.
[11] A.-H. Chang and J.-D. Kung, "Applying Grey forecasting model on the investment performance of Markowitz efficiency frontier: A case of the Taiwan securities markets," in First International Conference on Innovative Computing, Information and Control-Volume I (ICICIC'06), 2006, vol. 2, pp. 254-257: IEEE.
[12] C.-F. Lee, A. C. Lee, and J. Lee, "Overview of Finance Theory and Quantitative Finance: Past, Present, and Future," 臺灣金融財務季刊, vol. 10, no. 4, pp. 1-85, 2009.
[13] A. Agarwal, E. Hazan, S. Kale, and R. E. Schapire, "Algorithms for portfolio management based on the newton method," in Proceedings of the 23rd international conference on Machine learning, 2006, pp. 9-16.
[14] Z. Jiang and J. Liang, "Cryptocurrency portfolio management with deep reinforcement learning," in 2017 Intelligent Systems Conference (IntelliSys), 2017, pp. 905-913: IEEE.
[15] L. P. Kaelbling, M. L. Littman, and A. W. Moore, "Reinforcement learning: A survey," Journal of artificial intelligence research, vol. 4, pp. 237-285, 1996.
[16] G. Tesauro, "TD-Gammon, a self-teaching backgammon program, achieves master-level play," Neural computation, vol. 6, no. 2, pp. 215-219, 1994.
[17] M. I. Shapiai, Z. Ibrahim, M. Khalid, L. W. Jau, and V. Pavlovich, "A non-linear function approximation from small samples based on Nadaraya-Watson kernel regression," in 2010 2nd International Conference on Computational Intelligence, Communication Systems and Networks, 2010, pp. 28-32: IEEE.
[18] T.-I. Tsai and D.-C. Li, "Approximate modeling for high order non-linear functions using small sample sets," Expert Systems with Applications, vol. 34, no. 1, pp. 564-569, 2008.
[19] V. Mnih et al., "Playing atari with deep reinforcement learning," arXiv preprint arXiv:1312.5602, 2013.
[20] T. P. Lillicrap et al., "Continuous control with deep reinforcement learning," arXiv preprint arXiv:1509.02971, 2015.
[21] M. E. Mangram, "A simplified perspective of the Markowitz portfolio theory," Global journal of business research, vol. 7, no. 1, pp. 59-70, 2013.
[22] Y. Deng, F. Bao, Y. Kong, Z. Ren, and Q. Dai, "Deep direct reinforcement learning for financial signal representation and trading," IEEE transactions on neural networks and learning systems, vol. 28, no. 3, pp. 653-664, 2016.
[23] P. Nechchi, "Reinforcement Learning for Automated Trading," Mathematical EngineeringPolitecnico di Milano: Milano, Italy, 2016.
[24] X. Li, Y. Li, Y. Zhan, and X.-Y. Liu, "Optimistic bull or pessimistic bear: adaptive deep reinforcement learning for stock portfolio allocation," arXiv preprint arXiv:1907.01503, 2019.
[25] T. Haarnoja, S. Ha, A. Zhou, J. Tan, G. Tucker, and S. Levine, "Learning to walk via deep reinforcement learning," arXiv preprint arXiv:1812.11103, 2018.
[26] Free Stock Charts, Stock Quotes, and Trade Ideas ─ TradingView (
Description: 碩士
Source URI:
Data Type: thesis
Appears in Collections:[資訊科學系碩士在職專班] 學位論文

Files in This Item:

File Description SizeFormat
100801.pdf2826KbAdobe PDF0View/Open

All items in 學術集成 are protected by copyright, with all rights reserved.

社群 sharing