運用Soft Actor-Critic深度強化學習演算法優化投資配置組合

Publications-Theses

Article View/Open

pdf(0)

Publication Export

Google Scholar^TM

NCCU Library

Discovery System

Citation Infomation

Related Publications in TAIR

Simple Record
Full Record

題名	運用Soft Actor-Critic深度強化學習演算法優化投資配置組合 A Deep Reinforcement Learning Algorithms of Soft Actor-Critic for Optimizing Stock Portfolio Allocation
作者	王衍晰 Wang, Yen-Hsi
貢獻者	胡毓忠 Hu, Yuh-Jong 王衍晰 Wang, Yen-Hsi
關鍵詞	深度強化學習 SAC 演算法投資組合資產配置 Deep Reinforcement Learning Soft Actor-Critic Algorithm Stock Portfolio Portfolio Allocation
日期	2020
上傳時間	2-Sep-2020 13:14:56 (UTC+8)
摘要	透過人工智慧演算法進行自動化交易是當前股市投資管理研究的發展趨勢。本研究結合深度強化學習與金融科技，探討運用 Soft Actor-Critic(SAC)演算法於股市資產配置之效益，並驗證演算法是否能有效應用於金融交易市場及藉配置資產提高投資總體價值。本研究自 Datastream 數據資料庫選定我國股票市場中 5 支股票為實驗標的，利用演算法在 OpenAI Gym 環境中訓練、運算並驗證該演算法在股市資產投資分配上之成效。實驗結果顯示，該演算法能根據歷史數據學習預測目標股票未來績效表現，發揮自動調控風險及配置資產權重之能力，產生最佳投資組合模型。另外本實驗結果與泛化投資組合策略(Universal Portfolio)相比，展現更為優異而穩定之收益，亦初步驗證深度強化學習能有效應用於金融交易市場。 The applications of artificial intelligence algorithms to automated trading have become one of the prominent domains of portfolio management studies. This study combines the key concepts of both deep reinforcement learning and financial technology, exploring the performance of applying soft actor-critic (SAC) algorithm for the optimal stock portfolio allocation. In this thesis, we select five stocks via Taiwan stock market from the Datastream database as our experimental target. Then, with the operation of Docker containerization technology, we apply the SAC algorithm to train, calculate and come up with the most optimal stock portfolio allocation. A comparative analysis of the deep reinforcement learning based portfolio optimization versus the more traditional “Universal Portfolio”, “Best so Far”, and “Buy and Hold” is conducted to verify the effectiveness and stability of the overall performance of our SAC model. The preliminary results show that through its off-policy updates with a stable stochastic actor- critic formulation, the SAC approach is capable of predicting future stock performance from the input training of historical data. Furthermore, with its automated learning process, the risk and asset allocation weight are under dynamic management, thus generating the optimal stock portfolio with a better and more stable performance, comparing with other traditional quantitative strategies.
參考文獻	[1] T. M. Cover and E. Ordentlich, "Universal portfolios with side information," IEEE Transactions on Information Theory, vol. 42, no. 2, pp. 348-363, 1996. [2] S. Zhang, S. Wang, and X. Deng, "Portfolio selection theory with different interest rates for borrowing and leading," Journal of Global Optimization, vol. 28, no. 1, pp. 67-95, 2004. [3] B. Li and S. C. Hoi, "Online portfolio selection: A survey," ACM Computing Surveys (CSUR), vol. 46, no. 3, pp. 1-36, 2014. [4] F. D. Freitas, A. F. De Souza, and A. R. de Almeida, "Prediction-based portfolio optimization model using neural networks," Neurocomputing, vol. 72, no. 10-12, pp. 2155-2170, 2009. [5] S. T. A. Niaki and S. Hoseinzade, "Forecasting S&P 500 index using artificial neural networks and design of experiments," Journal of Industrial Engineering International, vol. 9, no. 1, p. 1, 2013. [6] J. Heaton, N. Polson, and J. H. Witte, "Deep learning for finance: deep portfolios," Applied Stochastic Models in Business and Industry, vol. 33, no. 1, pp. 3-12, 2017. [7] Z. Jiang, D. Xu, and J. Liang, "A deep reinforcement learning framework for the financial portfolio management problem," arXiv preprint arXiv:1706.10059, 2017. [8] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, "Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor," arXiv preprint arXiv:1801.01290, 2018. [9] T. Haarnoja et al., "Soft actor-critic algorithms and applications," arXiv preprint arXiv:1812.05905, 2018. [10] H. Markowitz, "Portfolio Selection The Journal of Finance, Vol. 7, No. 1," ed: Mar, 1952. [11] A.-H. Chang and J.-D. Kung, "Applying Grey forecasting model on the investment performance of Markowitz efficiency frontier: A case of the Taiwan securities markets," in First International Conference on Innovative Computing, Information and Control-Volume I (ICICIC`06), 2006, vol. 2, pp. 254-257: IEEE. [12] C.-F. Lee, A. C. Lee, and J. Lee, "Overview of Finance Theory and Quantitative Finance: Past, Present, and Future," 臺灣金融財務季刊, vol. 10, no. 4, pp. 1-85, 2009. [13] A. Agarwal, E. Hazan, S. Kale, and R. E. Schapire, "Algorithms for portfolio management based on the newton method," in Proceedings of the 23rd international conference on Machine learning, 2006, pp. 9-16. [14] Z. Jiang and J. Liang, "Cryptocurrency portfolio management with deep reinforcement learning," in 2017 Intelligent Systems Conference (IntelliSys), 2017, pp. 905-913: IEEE. [15] L. P. Kaelbling, M. L. Littman, and A. W. Moore, "Reinforcement learning: A survey," Journal of artificial intelligence research, vol. 4, pp. 237-285, 1996. [16] G. Tesauro, "TD-Gammon, a self-teaching backgammon program, achieves master-level play," Neural computation, vol. 6, no. 2, pp. 215-219, 1994. [17] M. I. Shapiai, Z. Ibrahim, M. Khalid, L. W. Jau, and V. Pavlovich, "A non-linear function approximation from small samples based on Nadaraya-Watson kernel regression," in 2010 2nd International Conference on Computational Intelligence, Communication Systems and Networks, 2010, pp. 28-32: IEEE. [18] T.-I. Tsai and D.-C. Li, "Approximate modeling for high order non-linear functions using small sample sets," Expert Systems with Applications, vol. 34, no. 1, pp. 564-569, 2008. [19] V. Mnih et al., "Playing atari with deep reinforcement learning," arXiv preprint arXiv:1312.5602, 2013. [20] T. P. Lillicrap et al., "Continuous control with deep reinforcement learning," arXiv preprint arXiv:1509.02971, 2015. [21] M. E. Mangram, "A simplified perspective of the Markowitz portfolio theory," Global journal of business research, vol. 7, no. 1, pp. 59-70, 2013. [22] Y. Deng, F. Bao, Y. Kong, Z. Ren, and Q. Dai, "Deep direct reinforcement learning for financial signal representation and trading," IEEE transactions on neural networks and learning systems, vol. 28, no. 3, pp. 653-664, 2016. [23] P. Nechchi, "Reinforcement Learning for Automated Trading," Mathematical EngineeringPolitecnico di Milano: Milano, Italy, 2016. [24] X. Li, Y. Li, Y. Zhan, and X.-Y. Liu, "Optimistic bull or pessimistic bear: adaptive deep reinforcement learning for stock portfolio allocation," arXiv preprint arXiv:1907.01503, 2019. [25] T. Haarnoja, S. Ha, A. Zhou, J. Tan, G. Tucker, and S. Levine, "Learning to walk via deep reinforcement learning," arXiv preprint arXiv:1812.11103, 2018. [26] Free Stock Charts, Stock Quotes, and Trade Ideas ─ TradingView (https://www.tradingview.com)
描述	碩士國立政治大學資訊科學系碩士在職專班 104971008
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0104971008
資料類型	thesis

dc.contributor.advisor	胡毓忠	zh_TW
dc.contributor.advisor	Hu, Yuh-Jong	en_US
dc.contributor.author (Authors)	王衍晰	zh_TW
dc.contributor.author (Authors)	Wang, Yen-Hsi	en_US
dc.creator (作者)	王衍晰	zh_TW
dc.creator (作者)	Wang, Yen-Hsi	en_US
dc.date (日期)	2020	en_US
dc.date.accessioned	2-Sep-2020 13:14:56 (UTC+8)	-
dc.date.available	2-Sep-2020 13:14:56 (UTC+8)	-
dc.date.issued (上傳時間)	2-Sep-2020 13:14:56 (UTC+8)	-
dc.identifier (Other Identifiers)	G0104971008	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/131935	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	資訊科學系碩士在職專班	zh_TW
dc.description (描述)	104971008	zh_TW
dc.description.abstract (摘要)	透過人工智慧演算法進行自動化交易是當前股市投資管理研究的發展趨勢。本研究結合深度強化學習與金融科技，探討運用 Soft Actor-Critic(SAC)演算法於股市資產配置之效益，並驗證演算法是否能有效應用於金融交易市場及藉配置資產提高投資總體價值。本研究自 Datastream 數據資料庫選定我國股票市場中 5 支股票為實驗標的，利用演算法在 OpenAI Gym 環境中訓練、運算並驗證該演算法在股市資產投資分配上之成效。實驗結果顯示，該演算法能根據歷史數據學習預測目標股票未來績效表現，發揮自動調控風險及配置資產權重之能力，產生最佳投資組合模型。另外本實驗結果與泛化投資組合策略(Universal Portfolio)相比，展現更為優異而穩定之收益，亦初步驗證深度強化學習能有效應用於金融交易市場。	zh_TW
dc.description.abstract (摘要)	The applications of artificial intelligence algorithms to automated trading have become one of the prominent domains of portfolio management studies. This study combines the key concepts of both deep reinforcement learning and financial technology, exploring the performance of applying soft actor-critic (SAC) algorithm for the optimal stock portfolio allocation. In this thesis, we select five stocks via Taiwan stock market from the Datastream database as our experimental target. Then, with the operation of Docker containerization technology, we apply the SAC algorithm to train, calculate and come up with the most optimal stock portfolio allocation. A comparative analysis of the deep reinforcement learning based portfolio optimization versus the more traditional “Universal Portfolio”, “Best so Far”, and “Buy and Hold” is conducted to verify the effectiveness and stability of the overall performance of our SAC model. The preliminary results show that through its off-policy updates with a stable stochastic actor- critic formulation, the SAC approach is capable of predicting future stock performance from the input training of historical data. Furthermore, with its automated learning process, the risk and asset allocation weight are under dynamic management, thus generating the optimal stock portfolio with a better and more stable performance, comparing with other traditional quantitative strategies.	en_US
dc.description.tableofcontents	摘要 I ABSTRACT II 目次 III 表次 V 圖次 VI 第一章緒論 7 1.1 研究背景與動機 7 1.2 研究目的 8 1.3 研究架構與流程 8 第二章文獻探討 11 2.1 投資組合理論之發展 11 2.2 機器學習演算法 13 2.3 ＳＡＣ演算法之基本架構 15 第三章相關研究 19 3.1 量化交易方法應用於資產配置 19 3.2 深度強化學習應用於資產配置 20 3.3 ＳＡＣ演算法應用實例分析 21 第四章研究架構與方法 25 4.1 資料蒐集與處理 26 4.1.1. 資料來源及範圍 27 4.1.2. 投資標的選擇方法 27 4.1.3. 訓練資料前處理與篩選 29 4.2 資產配置條件前提與假設 30 4.3 模擬測試環境建置 31 第五章研究實作 32 5.1 測試方法與模型定義 32 5.1.1. 訓練資料輸入樣本 32 5.1.2. 股價變化資料 33 5.1.3. 交易手續費計算 33 5.1.4. 資產總額清算方法 33 5.1.5. 獎勵函數 34 5.2 訓練參數與資產分配週期調整 34 5.2.1. 訓練參數設置 34 5.2.2. 手續費佔比調整 36 5.2.3. 模型參數測試與比較 36 5.3 訓練成果與模型績效評比 37 5.3.1. 時間區段1訓練成果比較分析 39 5.3.2. 時間區段2訓練成果比較分析 42 5.4 實驗結果綜合分析 45 第六章結論 46 6.1 研究結論 46 6.2 未來展望 47 參考文獻 48	zh_TW
dc.format.extent	2894751 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0104971008	en_US
dc.subject (關鍵詞)	深度強化學習	zh_TW
dc.subject (關鍵詞)	SAC 演算法	zh_TW
dc.subject (關鍵詞)	投資組合	zh_TW
dc.subject (關鍵詞)	資產配置	zh_TW
dc.subject (關鍵詞)	Deep Reinforcement Learning	en_US
dc.subject (關鍵詞)	Soft Actor-Critic Algorithm	en_US
dc.subject (關鍵詞)	Stock Portfolio	en_US
dc.subject (關鍵詞)	Portfolio Allocation	en_US
dc.title (題名)	運用Soft Actor-Critic深度強化學習演算法優化投資配置組合	zh_TW
dc.title (題名)	A Deep Reinforcement Learning Algorithms of Soft Actor-Critic for Optimizing Stock Portfolio Allocation	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	[1] T. M. Cover and E. Ordentlich, "Universal portfolios with side information," IEEE Transactions on Information Theory, vol. 42, no. 2, pp. 348-363, 1996. [2] S. Zhang, S. Wang, and X. Deng, "Portfolio selection theory with different interest rates for borrowing and leading," Journal of Global Optimization, vol. 28, no. 1, pp. 67-95, 2004. [3] B. Li and S. C. Hoi, "Online portfolio selection: A survey," ACM Computing Surveys (CSUR), vol. 46, no. 3, pp. 1-36, 2014. [4] F. D. Freitas, A. F. De Souza, and A. R. de Almeida, "Prediction-based portfolio optimization model using neural networks," Neurocomputing, vol. 72, no. 10-12, pp. 2155-2170, 2009. [5] S. T. A. Niaki and S. Hoseinzade, "Forecasting S&P 500 index using artificial neural networks and design of experiments," Journal of Industrial Engineering International, vol. 9, no. 1, p. 1, 2013. [6] J. Heaton, N. Polson, and J. H. Witte, "Deep learning for finance: deep portfolios," Applied Stochastic Models in Business and Industry, vol. 33, no. 1, pp. 3-12, 2017. [7] Z. Jiang, D. Xu, and J. Liang, "A deep reinforcement learning framework for the financial portfolio management problem," arXiv preprint arXiv:1706.10059, 2017. [8] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, "Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor," arXiv preprint arXiv:1801.01290, 2018. [9] T. Haarnoja et al., "Soft actor-critic algorithms and applications," arXiv preprint arXiv:1812.05905, 2018. [10] H. Markowitz, "Portfolio Selection The Journal of Finance, Vol. 7, No. 1," ed: Mar, 1952. [11] A.-H. Chang and J.-D. Kung, "Applying Grey forecasting model on the investment performance of Markowitz efficiency frontier: A case of the Taiwan securities markets," in First International Conference on Innovative Computing, Information and Control-Volume I (ICICIC`06), 2006, vol. 2, pp. 254-257: IEEE. [12] C.-F. Lee, A. C. Lee, and J. Lee, "Overview of Finance Theory and Quantitative Finance: Past, Present, and Future," 臺灣金融財務季刊, vol. 10, no. 4, pp. 1-85, 2009. [13] A. Agarwal, E. Hazan, S. Kale, and R. E. Schapire, "Algorithms for portfolio management based on the newton method," in Proceedings of the 23rd international conference on Machine learning, 2006, pp. 9-16. [14] Z. Jiang and J. Liang, "Cryptocurrency portfolio management with deep reinforcement learning," in 2017 Intelligent Systems Conference (IntelliSys), 2017, pp. 905-913: IEEE. [15] L. P. Kaelbling, M. L. Littman, and A. W. Moore, "Reinforcement learning: A survey," Journal of artificial intelligence research, vol. 4, pp. 237-285, 1996. [16] G. Tesauro, "TD-Gammon, a self-teaching backgammon program, achieves master-level play," Neural computation, vol. 6, no. 2, pp. 215-219, 1994. [17] M. I. Shapiai, Z. Ibrahim, M. Khalid, L. W. Jau, and V. Pavlovich, "A non-linear function approximation from small samples based on Nadaraya-Watson kernel regression," in 2010 2nd International Conference on Computational Intelligence, Communication Systems and Networks, 2010, pp. 28-32: IEEE. [18] T.-I. Tsai and D.-C. Li, "Approximate modeling for high order non-linear functions using small sample sets," Expert Systems with Applications, vol. 34, no. 1, pp. 564-569, 2008. [19] V. Mnih et al., "Playing atari with deep reinforcement learning," arXiv preprint arXiv:1312.5602, 2013. [20] T. P. Lillicrap et al., "Continuous control with deep reinforcement learning," arXiv preprint arXiv:1509.02971, 2015. [21] M. E. Mangram, "A simplified perspective of the Markowitz portfolio theory," Global journal of business research, vol. 7, no. 1, pp. 59-70, 2013. [22] Y. Deng, F. Bao, Y. Kong, Z. Ren, and Q. Dai, "Deep direct reinforcement learning for financial signal representation and trading," IEEE transactions on neural networks and learning systems, vol. 28, no. 3, pp. 653-664, 2016. [23] P. Nechchi, "Reinforcement Learning for Automated Trading," Mathematical EngineeringPolitecnico di Milano: Milano, Italy, 2016. [24] X. Li, Y. Li, Y. Zhan, and X.-Y. Liu, "Optimistic bull or pessimistic bear: adaptive deep reinforcement learning for stock portfolio allocation," arXiv preprint arXiv:1907.01503, 2019. [25] T. Haarnoja, S. Ha, A. Zhou, J. Tan, G. Tucker, and S. Levine, "Learning to walk via deep reinforcement learning," arXiv preprint arXiv:1812.11103, 2018. [26] Free Stock Charts, Stock Quotes, and Trade Ideas ─ TradingView (https://www.tradingview.com)	zh_TW
dc.identifier.doi (DOI)	10.6814/NCCU202001560	en_US

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM