學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

題名 運用Soft Actor-Critic深度強化學習演算法優化投資配置組合
A Deep Reinforcement Learning Algorithms of Soft Actor-Critic for Optimizing Stock Portfolio Allocation
作者 王衍晰
Wang, Yen-Hsi
貢獻者 胡毓忠
Hu, Yuh-Jong
王衍晰
Wang, Yen-Hsi
關鍵詞 深度強化學習
SAC 演算法
投資組合
資產配置
Deep Reinforcement Learning
Soft Actor-Critic Algorithm
Stock Portfolio
Portfolio Allocation
日期 2020
上傳時間 2-Sep-2020 13:14:56 (UTC+8)
摘要 透過人工智慧演算法進行自動化交易是當前股市投資管理研究的發展趨勢。本研究結合深度強化學習與金融科技,探討運用 Soft Actor-Critic(SAC)演算法於股市資產配置之效益,並驗證演算法是否能有效應用於金融交易市場及藉配置資產提高投資總體價值。本研究自 Datastream 數據資料庫選定我國股票市場中 5 支股票為實驗標的,利用演算法在 OpenAI Gym 環境中訓練、運算並驗證該演算法在股市資產投資分配上之成效。實驗結果顯示,該 演算法能根據歷史數據學習預測目標股票未來績效表現,發揮自動調控風險及配置資產權 重之能力,產生最佳投資組合模型。另外本實驗結果與泛化投資組合策略(Universal Portfolio)相比,展現更為優異而穩定之收益,亦初步驗證深度強化學習能有效應用於金融交易市場。
The applications of artificial intelligence algorithms to automated trading have become one of the prominent domains of portfolio management studies. This study combines the key concepts of both deep reinforcement learning and financial technology, exploring the performance of applying soft actor-critic (SAC) algorithm for the optimal stock portfolio allocation.
In this thesis, we select five stocks via Taiwan stock market from the Datastream database as our experimental target. Then, with the operation of Docker containerization technology, we apply the SAC algorithm to train, calculate and come up with the most optimal stock portfolio allocation. A comparative analysis of the deep reinforcement learning based portfolio optimization versus the more traditional “Universal Portfolio”, “Best so Far”, and “Buy and Hold” is conducted to verify the effectiveness and stability of the overall performance of our SAC model.
The preliminary results show that through its off-policy updates with a stable stochastic actor- critic formulation, the SAC approach is capable of predicting future stock performance from the input training of historical data. Furthermore, with its automated learning process, the risk and asset allocation weight are under dynamic management, thus generating the optimal stock portfolio with a better and more stable performance, comparing with other traditional quantitative strategies.
參考文獻 [1] T. M. Cover and E. Ordentlich, "Universal portfolios with side information," IEEE Transactions on Information Theory, vol. 42, no. 2, pp. 348-363, 1996.
[2] S. Zhang, S. Wang, and X. Deng, "Portfolio selection theory with different interest rates for borrowing and leading," Journal of Global Optimization, vol. 28, no. 1, pp. 67-95, 2004.
[3] B. Li and S. C. Hoi, "Online portfolio selection: A survey," ACM Computing Surveys (CSUR), vol. 46, no. 3, pp. 1-36, 2014.
[4] F. D. Freitas, A. F. De Souza, and A. R. de Almeida, "Prediction-based portfolio optimization model using neural networks," Neurocomputing, vol. 72, no. 10-12, pp. 2155-2170, 2009.
[5] S. T. A. Niaki and S. Hoseinzade, "Forecasting S&P 500 index using artificial neural networks and design of experiments," Journal of Industrial Engineering International, vol. 9, no. 1, p. 1, 2013.
[6] J. Heaton, N. Polson, and J. H. Witte, "Deep learning for finance: deep portfolios," Applied Stochastic Models in Business and Industry, vol. 33, no. 1, pp. 3-12, 2017.
[7] Z. Jiang, D. Xu, and J. Liang, "A deep reinforcement learning framework for the financial portfolio management problem," arXiv preprint arXiv:1706.10059, 2017.
[8] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, "Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor," arXiv preprint arXiv:1801.01290, 2018.
[9] T. Haarnoja et al., "Soft actor-critic algorithms and applications," arXiv preprint arXiv:1812.05905, 2018.
[10] H. Markowitz, "Portfolio Selection The Journal of Finance, Vol. 7, No. 1," ed: Mar, 1952.
[11] A.-H. Chang and J.-D. Kung, "Applying Grey forecasting model on the investment performance of Markowitz efficiency frontier: A case of the Taiwan securities markets," in First International Conference on Innovative Computing, Information and Control-Volume I (ICICIC`06), 2006, vol. 2, pp. 254-257: IEEE.
[12] C.-F. Lee, A. C. Lee, and J. Lee, "Overview of Finance Theory and Quantitative Finance: Past, Present, and Future," 臺灣金融財務季刊, vol. 10, no. 4, pp. 1-85, 2009.
[13] A. Agarwal, E. Hazan, S. Kale, and R. E. Schapire, "Algorithms for portfolio management based on the newton method," in Proceedings of the 23rd international conference on Machine learning, 2006, pp. 9-16.
[14] Z. Jiang and J. Liang, "Cryptocurrency portfolio management with deep reinforcement learning," in 2017 Intelligent Systems Conference (IntelliSys), 2017, pp. 905-913: IEEE.
[15] L. P. Kaelbling, M. L. Littman, and A. W. Moore, "Reinforcement learning: A survey," Journal of artificial intelligence research, vol. 4, pp. 237-285, 1996.
[16] G. Tesauro, "TD-Gammon, a self-teaching backgammon program, achieves master-level play," Neural computation, vol. 6, no. 2, pp. 215-219, 1994.
[17] M. I. Shapiai, Z. Ibrahim, M. Khalid, L. W. Jau, and V. Pavlovich, "A non-linear function approximation from small samples based on Nadaraya-Watson kernel regression," in 2010 2nd International Conference on Computational Intelligence, Communication Systems and Networks, 2010, pp. 28-32: IEEE.
[18] T.-I. Tsai and D.-C. Li, "Approximate modeling for high order non-linear functions using small sample sets," Expert Systems with Applications, vol. 34, no. 1, pp. 564-569, 2008.
[19] V. Mnih et al., "Playing atari with deep reinforcement learning," arXiv preprint arXiv:1312.5602, 2013.
[20] T. P. Lillicrap et al., "Continuous control with deep reinforcement learning," arXiv preprint arXiv:1509.02971, 2015.
[21] M. E. Mangram, "A simplified perspective of the Markowitz portfolio theory," Global journal of business research, vol. 7, no. 1, pp. 59-70, 2013.
[22] Y. Deng, F. Bao, Y. Kong, Z. Ren, and Q. Dai, "Deep direct reinforcement learning for financial signal representation and trading," IEEE transactions on neural networks and learning systems, vol. 28, no. 3, pp. 653-664, 2016.
[23] P. Nechchi, "Reinforcement Learning for Automated Trading," Mathematical EngineeringPolitecnico di Milano: Milano, Italy, 2016.
[24] X. Li, Y. Li, Y. Zhan, and X.-Y. Liu, "Optimistic bull or pessimistic bear: adaptive deep reinforcement learning for stock portfolio allocation," arXiv preprint arXiv:1907.01503, 2019.
[25] T. Haarnoja, S. Ha, A. Zhou, J. Tan, G. Tucker, and S. Levine, "Learning to walk via deep reinforcement learning," arXiv preprint arXiv:1812.11103, 2018.
[26] Free Stock Charts, Stock Quotes, and Trade Ideas ─ TradingView (https://www.tradingview.com)
描述 碩士
國立政治大學
資訊科學系碩士在職專班
104971008
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0104971008
資料類型 thesis
dc.contributor.advisor 胡毓忠zh_TW
dc.contributor.advisor Hu, Yuh-Jongen_US
dc.contributor.author (Authors) 王衍晰zh_TW
dc.contributor.author (Authors) Wang, Yen-Hsien_US
dc.creator (作者) 王衍晰zh_TW
dc.creator (作者) Wang, Yen-Hsien_US
dc.date (日期) 2020en_US
dc.date.accessioned 2-Sep-2020 13:14:56 (UTC+8)-
dc.date.available 2-Sep-2020 13:14:56 (UTC+8)-
dc.date.issued (上傳時間) 2-Sep-2020 13:14:56 (UTC+8)-
dc.identifier (Other Identifiers) G0104971008en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/131935-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學系碩士在職專班zh_TW
dc.description (描述) 104971008zh_TW
dc.description.abstract (摘要) 透過人工智慧演算法進行自動化交易是當前股市投資管理研究的發展趨勢。本研究結合深度強化學習與金融科技,探討運用 Soft Actor-Critic(SAC)演算法於股市資產配置之效益,並驗證演算法是否能有效應用於金融交易市場及藉配置資產提高投資總體價值。本研究自 Datastream 數據資料庫選定我國股票市場中 5 支股票為實驗標的,利用演算法在 OpenAI Gym 環境中訓練、運算並驗證該演算法在股市資產投資分配上之成效。實驗結果顯示,該 演算法能根據歷史數據學習預測目標股票未來績效表現,發揮自動調控風險及配置資產權 重之能力,產生最佳投資組合模型。另外本實驗結果與泛化投資組合策略(Universal Portfolio)相比,展現更為優異而穩定之收益,亦初步驗證深度強化學習能有效應用於金融交易市場。zh_TW
dc.description.abstract (摘要) The applications of artificial intelligence algorithms to automated trading have become one of the prominent domains of portfolio management studies. This study combines the key concepts of both deep reinforcement learning and financial technology, exploring the performance of applying soft actor-critic (SAC) algorithm for the optimal stock portfolio allocation.
In this thesis, we select five stocks via Taiwan stock market from the Datastream database as our experimental target. Then, with the operation of Docker containerization technology, we apply the SAC algorithm to train, calculate and come up with the most optimal stock portfolio allocation. A comparative analysis of the deep reinforcement learning based portfolio optimization versus the more traditional “Universal Portfolio”, “Best so Far”, and “Buy and Hold” is conducted to verify the effectiveness and stability of the overall performance of our SAC model.
The preliminary results show that through its off-policy updates with a stable stochastic actor- critic formulation, the SAC approach is capable of predicting future stock performance from the input training of historical data. Furthermore, with its automated learning process, the risk and asset allocation weight are under dynamic management, thus generating the optimal stock portfolio with a better and more stable performance, comparing with other traditional quantitative strategies.
en_US
dc.description.tableofcontents 摘要 I
ABSTRACT II
目次 III
表次 V
圖次 VI
第一章 緒論 7
1.1 研究背景與動機 7
1.2 研究目的 8
1.3 研究架構與流程 8
第二章 文獻探討 11
2.1 投資組合理論之發展 11
2.2 機器學習演算法 13
2.3 SAC演算法之基本架構 15
第三章 相關研究 19
3.1 量化交易方法應用於資產配置 19
3.2 深度強化學習應用於資產配置 20
3.3 SAC演算法應用實例分析 21
第四章 研究架構與方法 25
4.1 資料蒐集與處理 26
4.1.1. 資料來源及範圍 27
4.1.2. 投資標的選擇方法 27
4.1.3. 訓練資料前處理與篩選 29
4.2 資產配置條件前提與假設 30
4.3 模擬測試環境建置 31
第五章 研究實作 32
5.1 測試方法與模型定義 32
5.1.1. 訓練資料輸入樣本 32
5.1.2. 股價變化資料 33
5.1.3. 交易手續費計算 33
5.1.4. 資產總額清算方法 33
5.1.5. 獎勵函數 34
5.2 訓練參數與資產分配週期調整 34
5.2.1. 訓練參數設置 34
5.2.2. 手續費佔比調整 36
5.2.3. 模型參數測試與比較 36
5.3 訓練成果與模型績效評比 37
5.3.1. 時間區段1訓練成果比較分析 39
5.3.2. 時間區段2訓練成果比較分析 42
5.4 實驗結果綜合分析 45
第六章 結論 46
6.1 研究結論 46
6.2 未來展望 47
參考文獻 48
zh_TW
dc.format.extent 2894751 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0104971008en_US
dc.subject (關鍵詞) 深度強化學習zh_TW
dc.subject (關鍵詞) SAC 演算法zh_TW
dc.subject (關鍵詞) 投資組合zh_TW
dc.subject (關鍵詞) 資產配置zh_TW
dc.subject (關鍵詞) Deep Reinforcement Learningen_US
dc.subject (關鍵詞) Soft Actor-Critic Algorithmen_US
dc.subject (關鍵詞) Stock Portfolioen_US
dc.subject (關鍵詞) Portfolio Allocationen_US
dc.title (題名) 運用Soft Actor-Critic深度強化學習演算法優化投資配置組合zh_TW
dc.title (題名) A Deep Reinforcement Learning Algorithms of Soft Actor-Critic for Optimizing Stock Portfolio Allocationen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1] T. M. Cover and E. Ordentlich, "Universal portfolios with side information," IEEE Transactions on Information Theory, vol. 42, no. 2, pp. 348-363, 1996.
[2] S. Zhang, S. Wang, and X. Deng, "Portfolio selection theory with different interest rates for borrowing and leading," Journal of Global Optimization, vol. 28, no. 1, pp. 67-95, 2004.
[3] B. Li and S. C. Hoi, "Online portfolio selection: A survey," ACM Computing Surveys (CSUR), vol. 46, no. 3, pp. 1-36, 2014.
[4] F. D. Freitas, A. F. De Souza, and A. R. de Almeida, "Prediction-based portfolio optimization model using neural networks," Neurocomputing, vol. 72, no. 10-12, pp. 2155-2170, 2009.
[5] S. T. A. Niaki and S. Hoseinzade, "Forecasting S&P 500 index using artificial neural networks and design of experiments," Journal of Industrial Engineering International, vol. 9, no. 1, p. 1, 2013.
[6] J. Heaton, N. Polson, and J. H. Witte, "Deep learning for finance: deep portfolios," Applied Stochastic Models in Business and Industry, vol. 33, no. 1, pp. 3-12, 2017.
[7] Z. Jiang, D. Xu, and J. Liang, "A deep reinforcement learning framework for the financial portfolio management problem," arXiv preprint arXiv:1706.10059, 2017.
[8] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, "Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor," arXiv preprint arXiv:1801.01290, 2018.
[9] T. Haarnoja et al., "Soft actor-critic algorithms and applications," arXiv preprint arXiv:1812.05905, 2018.
[10] H. Markowitz, "Portfolio Selection The Journal of Finance, Vol. 7, No. 1," ed: Mar, 1952.
[11] A.-H. Chang and J.-D. Kung, "Applying Grey forecasting model on the investment performance of Markowitz efficiency frontier: A case of the Taiwan securities markets," in First International Conference on Innovative Computing, Information and Control-Volume I (ICICIC`06), 2006, vol. 2, pp. 254-257: IEEE.
[12] C.-F. Lee, A. C. Lee, and J. Lee, "Overview of Finance Theory and Quantitative Finance: Past, Present, and Future," 臺灣金融財務季刊, vol. 10, no. 4, pp. 1-85, 2009.
[13] A. Agarwal, E. Hazan, S. Kale, and R. E. Schapire, "Algorithms for portfolio management based on the newton method," in Proceedings of the 23rd international conference on Machine learning, 2006, pp. 9-16.
[14] Z. Jiang and J. Liang, "Cryptocurrency portfolio management with deep reinforcement learning," in 2017 Intelligent Systems Conference (IntelliSys), 2017, pp. 905-913: IEEE.
[15] L. P. Kaelbling, M. L. Littman, and A. W. Moore, "Reinforcement learning: A survey," Journal of artificial intelligence research, vol. 4, pp. 237-285, 1996.
[16] G. Tesauro, "TD-Gammon, a self-teaching backgammon program, achieves master-level play," Neural computation, vol. 6, no. 2, pp. 215-219, 1994.
[17] M. I. Shapiai, Z. Ibrahim, M. Khalid, L. W. Jau, and V. Pavlovich, "A non-linear function approximation from small samples based on Nadaraya-Watson kernel regression," in 2010 2nd International Conference on Computational Intelligence, Communication Systems and Networks, 2010, pp. 28-32: IEEE.
[18] T.-I. Tsai and D.-C. Li, "Approximate modeling for high order non-linear functions using small sample sets," Expert Systems with Applications, vol. 34, no. 1, pp. 564-569, 2008.
[19] V. Mnih et al., "Playing atari with deep reinforcement learning," arXiv preprint arXiv:1312.5602, 2013.
[20] T. P. Lillicrap et al., "Continuous control with deep reinforcement learning," arXiv preprint arXiv:1509.02971, 2015.
[21] M. E. Mangram, "A simplified perspective of the Markowitz portfolio theory," Global journal of business research, vol. 7, no. 1, pp. 59-70, 2013.
[22] Y. Deng, F. Bao, Y. Kong, Z. Ren, and Q. Dai, "Deep direct reinforcement learning for financial signal representation and trading," IEEE transactions on neural networks and learning systems, vol. 28, no. 3, pp. 653-664, 2016.
[23] P. Nechchi, "Reinforcement Learning for Automated Trading," Mathematical EngineeringPolitecnico di Milano: Milano, Italy, 2016.
[24] X. Li, Y. Li, Y. Zhan, and X.-Y. Liu, "Optimistic bull or pessimistic bear: adaptive deep reinforcement learning for stock portfolio allocation," arXiv preprint arXiv:1907.01503, 2019.
[25] T. Haarnoja, S. Ha, A. Zhou, J. Tan, G. Tucker, and S. Levine, "Learning to walk via deep reinforcement learning," arXiv preprint arXiv:1812.11103, 2018.
[26] Free Stock Charts, Stock Quotes, and Trade Ideas ─ TradingView (https://www.tradingview.com)
zh_TW
dc.identifier.doi (DOI) 10.6814/NCCU202001560en_US