Publications-Theses
Article View/Open
Publication Export
-
題名 應用深度強化學習演算法於資產配置優化之比較
Comparison of Deep Reinforcement Learning Algorithms For Optimizing Portfolio Management作者 黃牧天
Huang, Mu-Tien貢獻者 胡毓忠
Hu, Yuh-Jong
黃牧天
Huang, Mu-Tien關鍵詞 財務工程
深度學習
強化學習
深度強化學習
Financial Engineering
Deep Learning
Reinforcement Learning
Deep Reinforcement Learning日期 2021 上傳時間 2-Sep-2021 18:17:58 (UTC+8) 摘要 本文主要有三個命題,命題一,深度強化學習模型應用於資產配置是否需財務時間序列與統計的背景知識?命題二,比較不同的深度強化學習演算法在不同市場情境下之優劣。命題三,比較深度強化學習演算法與現代投資組合理論之績效表現,深度強化學習演算法是否具有實務應用價值?以三命題剖析應用深度強化學習演算法於資產配置之各類比較,命題一研究成果顯示,使用特徵資料如符合深度強化學習模型前提假設之馬可夫性,將使模型具事半功倍之成效;命題二研究成果顯示,不同深度強化學習模型具不同偏差與方差權衡之特性,可對應於實務資產管理權衡績效與模型穩定度之取捨;命題三研究成果顯示,深度強化學習模型顯著優於現代投資組合理論之均值方差模型,並輔以客戶體驗角度論述其價值性;三類比較以貫穿本文主旨,期能以客觀公允之方式交付具意涵的比較分析結果,俾提升深度強化學習模型應用於資產配置之有效性。
The purpose of this paper is three-fold. First, does the application of DRL require statistical (time-series) knowledge? The results revealed that using data that meets the model`s assumptions will make the model more effective. Second, compare the pros and cons of DRL algorithms in different market. The results revealed that building DRL algorithms are forced to make decisions about the bias and variance. Ultimately, asset management companies have to find the correct balance for their customers. Third, What is the value of DRL? Compare the performance of DRL and MVO in detail to explain the value of DRL. The results revealed that DRL is significantly better than MVO, which can solve the pain points of current customers.參考文獻 [1] AdvisoryHQ.COM. Comarison review, betterment vs wealthfrontvs vanguard. https://www.advisoryhq.com/articles/betterment-vs-wealthfront-vs-vanguard-ranking-review/. [Online; accessed 17March2021].[2] Annasamy, R. M., and Sycara, K. Towards better interpretability in deep qnetworks.In Proceedings of the AAAI Conference on Artificial Intelligence (2019), vol. 33,pp. 4561–4569.[3] Black, F., and Litterman, R. Global portfolio optimization. Financial analysts journal48, 5 (1992), 28–43.[4] Bzdok, D., Altman, N., and Krzywinski, M. Points of significance: statistics versus machine learning, 2018.[5] Choi, B., and Choi, M. General solution of the black–scholes boundaryvalueproblem.Physica A: Statistical Mechanics and its Applications 509 (2018), 546–550.[6] Choi, P.M.Reinforcement learning in nonstationary environments. Hong KongUniversity of Science and Technology (Hong Kong), 2000.[7] Cortes, C., and Vapnik, V. Supportvectornetworks. Machine learning 20, 3 (1995),273–297.[8] Cover, T. M. Universal portfolios. In The Kelly Capital Growth Investment Criterion:Theory and Practice. World Scientific, 2011, pp. 181–209.[9] Dankwa, S., and Zheng, W. Twindelayedddpg: A deep reinforcement learning techniqueto model a continuous movement of an intelligent robot agent. In Proceedingsof the 3rd International Conference on Vision, Image and Signal Processing (2019),pp. 1–5.[10] Degris, T., Pilarski, P. M., and Sutton, R. S. Modelfreereinforcement learning withcontinuous action in practice. In 2012 American Control Conference (ACC) (2012),IEEE, pp. 2177–2182.[11] Engle, R., and Granger, C. Longruneconomic relationships: Readings in cointegration.Oxford University Press, 1991.[12] Fairbank, M., and Alonso, E. The divergence of reinforcement learning algorithmswith valueiterationand function approximation. In The 2012 International JointConference on Neural Networks (IJCNN) (2012), IEEE, pp. 1–8.[13] Fan, J., Wang, Z., Xie, Y., and Yang, Z. A theoretical analysis of deep qlearning.In Learning for Dynamics and Control (2020), PMLR, pp. 486–489.[14] Filos, A. Reinforcement learning for portfolio management. arXiv preprintarXiv:1909.09571 (2019).[15] Fridman, M. Hidden markov model regression.[16] Fujimoto, S., Hoof, H., and Meger, D. Addressing function approximation errorin actorcritic methods. In International Conference on Machine Learning (2018),PMLR, pp. 1587–1596.[17] Fürnkranz, J., Hüllermeier, E., Cheng, W., and Park, S.H.Preferencebased reinforcement learning: a formal framework and a policy iteration algorithm. Machine learning 89, 12(2012), 123–156.[18] Gappmair, W. Claude e. shannon: The 50th anniversary of information theory. IEEECommunications Magazine 37, 4 (1999), 102–105.[19] Gourieroux, C., Wickens, M., Ghysels, E., and Smith, R. J. Applied time serieseconometrics. Cambridge university press, 2004.[20] Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. Soft actorcritic:Offpolicy maximum entropy deep reinforcement learning with a stochastic actor. In InternationalConference on Machine Learning (2018), PMLR, pp. 1861–1870.[21] Kolm, P. N., and Ritter, G. Modern perspectives on reinforcement learning in finance.Modern Perspectives on Reinforcement Learning in Finance (September 6, 2019). The Journal of Machine Learning in Finance 1, 1 (2020).[22] Kolm, P. N., Tütüncü, R., and Fabozzi, F. J. 60 years of portfolio optimization:Practical challenges and current trends. European Journal of Operational Research234, 2 (2014), 356–371.[23] Kuan, C.M.Lecture on the markov switching model. Institute of EconomicsAcademia Sinica 8, 15 (2002), 1–30.[24] Lam, J. W. Roboadvisors:A portfolio management perspective. Senior thesis, Yale College 20 (2016).[25] Lanne, M., Lütkepohl, H., and Maciejowska, K. Structural vector autoregressionswith markov switching. Journal of Economic Dynamics and Control 34, 2 (2010),121–131.[26] Li, B., Zhao, P., Hoi, S. C., and Gopalkrishnan, V. Pamr: Passive aggressive meanreversion strategy for portfolio selection. Machine learning 87, 2 (2012), 221–258.[27] Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., andWierstra, D. Continuous control with deep reinforcement learning. arXiv preprintarXiv:1509.02971 (2015).[28] Longstaff, F. A., and Schwartz, E. S. Interest rate volatility and the term structure: Atwofactor general equilibrium model. The Journal of Finance 47, 4 (1992), 1259–1282.[29] Markowitz, H. The utility of wealth. Journal of political Economy 60, 2 (1952),151–158.[30] McCulloch, W. S., and Pitts, W. A logical calculus of the ideas immanent in nervousactivity. The bulletin of mathematical biophysics 5, 4 (1943), 115–133.[31] Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G.,Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., et al. Humanlevelcontrol through deep reinforcement learning. nature 518, 7540 (2015), 529–533.[32] Moerland, T. M., Broekens, J., and Jonker, C. M. Modelbasedreinforcement learning: A survey. arXiv preprint arXiv:2006.16712 (2020).[33] Moody, J., and Wu, L. Optimization of trading systems and portfolios. In Proceedingsof the IEEE/IAFE 1997 Computational Intelligence for Financial Engineering (CIFEr) (1997), IEEE, pp. 300–307.[34] Nachum, O., Norouzi, M., Xu, K., and Schuurmans, D. Bridging the gap betweenvalue and policy based reinforcement learning. arXiv preprint arXiv:1702.08892 (2017).[35] Ng, A. Y., Russell, S. J., et al. Algorithms for inverse reinforcement learning. InIcml (2000), vol. 1, p. 2.[36] Onali, E., and Goddard, J. Are european equity markets efficient? new evidencefrom fractal analysis. International Review of Financial Analysis 20, 2 (2011), 59–67.[37] Perold, A. F. The capital asset pricing model. Journal of economic perspectives 18,3 (2004), 3–24.[38] Rasekhschaffe, K. C., and Jones, R. C. Machine learning for stock selection. FinancialAnalysts Journal 75, 3 (2019), 70–88.[39] Rasmussen, C. E. Gaussian processes in machine learning. In Summer school onmachine learning (2003), Springer, pp. 63–71.[40] Rezaee, Z., Aliabadi, S., Dorestani, A., and Rezaee, N. J. Application of time seriesmodels in business research: Correlation, association, causation. Sustainability 12,12 (2020), 4833.[41] Rosenblatt, M. A central limit theorem and a strong mixing condition. Proceedingsof the National Academy of Sciences of the United States of America 42, 1 (1956), 43.[42] Sato, Y. Modelfreereinforcement learning for financial portfolios: a brief survey.arXiv preprint arXiv:1904.04973 (2019).[43] Sculley, D., Snoek, J., Wiltschko, A., and Rahimi, A. Winner’s curse? on pace,progress, and empirical rigor.[44] Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., and Riedmiller, M. Deterministicpolicy gradient algorithms. In International conference on machine learning(2014), PMLR, pp. 387–395.[45] Statista. Personal finance report 2021. https://www.statista.com/outlook/dmo/fintech/personal-finance/robo-advisors/worldwide. [Online; accessed 10Jun2021].[46] Sutton, R. S., and Barto, A. G. Reinforcement learning: An introduction. MIT press, 2018.[47] Sutton, R. S., McAllester, D. A., Singh, S. P., Mansour, Y., et al. Policy gradientmethods for reinforcement learning with function approximation. In NIPs (1999),vol. 99, Citeseer, pp. 1057–1063.[48] Weinan, E., Han, J., and Jentzen, A. Deep learningbasednumerical methods for highdimensional parabolic partial differential equations and backward stochasticdifferential equations. Communications in Mathematics and Statistics 5, 4 (2017), 349–380.[49] Xiong, J. X., and Idzorek, T. M. The impact of skewness and fat tails on the assetallocation decision. Financial Analysts Journal 67, 2 (2011), 23–35.[50] 金融監督管理委員會. 金融科技(fintech) 全球發展趨勢與證券市場應用評估.https://www.fsc.gov.tw. [Online; accessed 10Jun2021]. 描述 碩士
國立政治大學
資訊科學系碩士在職專班
108971007資料來源 http://thesis.lib.nccu.edu.tw/record/#G0108971007 資料類型 thesis dc.contributor.advisor 胡毓忠 zh_TW dc.contributor.advisor Hu, Yuh-Jong en_US dc.contributor.author (Authors) 黃牧天 zh_TW dc.contributor.author (Authors) Huang, Mu-Tien en_US dc.creator (作者) 黃牧天 zh_TW dc.creator (作者) Huang, Mu-Tien en_US dc.date (日期) 2021 en_US dc.date.accessioned 2-Sep-2021 18:17:58 (UTC+8) - dc.date.available 2-Sep-2021 18:17:58 (UTC+8) - dc.date.issued (上傳時間) 2-Sep-2021 18:17:58 (UTC+8) - dc.identifier (Other Identifiers) G0108971007 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/137167 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊科學系碩士在職專班 zh_TW dc.description (描述) 108971007 zh_TW dc.description.abstract (摘要) 本文主要有三個命題,命題一,深度強化學習模型應用於資產配置是否需財務時間序列與統計的背景知識?命題二,比較不同的深度強化學習演算法在不同市場情境下之優劣。命題三,比較深度強化學習演算法與現代投資組合理論之績效表現,深度強化學習演算法是否具有實務應用價值?以三命題剖析應用深度強化學習演算法於資產配置之各類比較,命題一研究成果顯示,使用特徵資料如符合深度強化學習模型前提假設之馬可夫性,將使模型具事半功倍之成效;命題二研究成果顯示,不同深度強化學習模型具不同偏差與方差權衡之特性,可對應於實務資產管理權衡績效與模型穩定度之取捨;命題三研究成果顯示,深度強化學習模型顯著優於現代投資組合理論之均值方差模型,並輔以客戶體驗角度論述其價值性;三類比較以貫穿本文主旨,期能以客觀公允之方式交付具意涵的比較分析結果,俾提升深度強化學習模型應用於資產配置之有效性。 zh_TW dc.description.abstract (摘要) The purpose of this paper is three-fold. First, does the application of DRL require statistical (time-series) knowledge? The results revealed that using data that meets the model`s assumptions will make the model more effective. Second, compare the pros and cons of DRL algorithms in different market. The results revealed that building DRL algorithms are forced to make decisions about the bias and variance. Ultimately, asset management companies have to find the correct balance for their customers. Third, What is the value of DRL? Compare the performance of DRL and MVO in detail to explain the value of DRL. The results revealed that DRL is significantly better than MVO, which can solve the pain points of current customers. en_US dc.description.tableofcontents 1 前言 11.1 研究動機 11.2 研究目的 21.3 研究架構 32 文獻探討 42.1 現代投資組合理論 42.2 資訊理論 52.3 強化學習理論 52.4 演員評論家演算法 103 相關研究 123.1 現代投資組合理論 123.2 深度強化學習理論 144 研究方法 234.1 實驗命題 234.2 實驗流程 254.3 實驗設計 275 研究實作 315.1 資料蒐集 315.2 特徵工程 335.3 模型訓練 345.4 模型測試 415.5 成果評量 446 結論 486.1 研究結論 486.2 未來展望 50Reference 52 zh_TW dc.format.extent 2336327 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0108971007 en_US dc.subject (關鍵詞) 財務工程 zh_TW dc.subject (關鍵詞) 深度學習 zh_TW dc.subject (關鍵詞) 強化學習 zh_TW dc.subject (關鍵詞) 深度強化學習 zh_TW dc.subject (關鍵詞) Financial Engineering en_US dc.subject (關鍵詞) Deep Learning en_US dc.subject (關鍵詞) Reinforcement Learning en_US dc.subject (關鍵詞) Deep Reinforcement Learning en_US dc.title (題名) 應用深度強化學習演算法於資產配置優化之比較 zh_TW dc.title (題名) Comparison of Deep Reinforcement Learning Algorithms For Optimizing Portfolio Management en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) [1] AdvisoryHQ.COM. Comarison review, betterment vs wealthfrontvs vanguard. https://www.advisoryhq.com/articles/betterment-vs-wealthfront-vs-vanguard-ranking-review/. [Online; accessed 17March2021].[2] Annasamy, R. M., and Sycara, K. Towards better interpretability in deep qnetworks.In Proceedings of the AAAI Conference on Artificial Intelligence (2019), vol. 33,pp. 4561–4569.[3] Black, F., and Litterman, R. Global portfolio optimization. Financial analysts journal48, 5 (1992), 28–43.[4] Bzdok, D., Altman, N., and Krzywinski, M. Points of significance: statistics versus machine learning, 2018.[5] Choi, B., and Choi, M. General solution of the black–scholes boundaryvalueproblem.Physica A: Statistical Mechanics and its Applications 509 (2018), 546–550.[6] Choi, P.M.Reinforcement learning in nonstationary environments. Hong KongUniversity of Science and Technology (Hong Kong), 2000.[7] Cortes, C., and Vapnik, V. Supportvectornetworks. Machine learning 20, 3 (1995),273–297.[8] Cover, T. M. Universal portfolios. In The Kelly Capital Growth Investment Criterion:Theory and Practice. World Scientific, 2011, pp. 181–209.[9] Dankwa, S., and Zheng, W. Twindelayedddpg: A deep reinforcement learning techniqueto model a continuous movement of an intelligent robot agent. In Proceedingsof the 3rd International Conference on Vision, Image and Signal Processing (2019),pp. 1–5.[10] Degris, T., Pilarski, P. M., and Sutton, R. S. Modelfreereinforcement learning withcontinuous action in practice. In 2012 American Control Conference (ACC) (2012),IEEE, pp. 2177–2182.[11] Engle, R., and Granger, C. Longruneconomic relationships: Readings in cointegration.Oxford University Press, 1991.[12] Fairbank, M., and Alonso, E. The divergence of reinforcement learning algorithmswith valueiterationand function approximation. In The 2012 International JointConference on Neural Networks (IJCNN) (2012), IEEE, pp. 1–8.[13] Fan, J., Wang, Z., Xie, Y., and Yang, Z. A theoretical analysis of deep qlearning.In Learning for Dynamics and Control (2020), PMLR, pp. 486–489.[14] Filos, A. Reinforcement learning for portfolio management. arXiv preprintarXiv:1909.09571 (2019).[15] Fridman, M. Hidden markov model regression.[16] Fujimoto, S., Hoof, H., and Meger, D. Addressing function approximation errorin actorcritic methods. In International Conference on Machine Learning (2018),PMLR, pp. 1587–1596.[17] Fürnkranz, J., Hüllermeier, E., Cheng, W., and Park, S.H.Preferencebased reinforcement learning: a formal framework and a policy iteration algorithm. Machine learning 89, 12(2012), 123–156.[18] Gappmair, W. Claude e. shannon: The 50th anniversary of information theory. IEEECommunications Magazine 37, 4 (1999), 102–105.[19] Gourieroux, C., Wickens, M., Ghysels, E., and Smith, R. J. Applied time serieseconometrics. Cambridge university press, 2004.[20] Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. Soft actorcritic:Offpolicy maximum entropy deep reinforcement learning with a stochastic actor. In InternationalConference on Machine Learning (2018), PMLR, pp. 1861–1870.[21] Kolm, P. N., and Ritter, G. Modern perspectives on reinforcement learning in finance.Modern Perspectives on Reinforcement Learning in Finance (September 6, 2019). The Journal of Machine Learning in Finance 1, 1 (2020).[22] Kolm, P. N., Tütüncü, R., and Fabozzi, F. J. 60 years of portfolio optimization:Practical challenges and current trends. European Journal of Operational Research234, 2 (2014), 356–371.[23] Kuan, C.M.Lecture on the markov switching model. Institute of EconomicsAcademia Sinica 8, 15 (2002), 1–30.[24] Lam, J. W. Roboadvisors:A portfolio management perspective. Senior thesis, Yale College 20 (2016).[25] Lanne, M., Lütkepohl, H., and Maciejowska, K. Structural vector autoregressionswith markov switching. Journal of Economic Dynamics and Control 34, 2 (2010),121–131.[26] Li, B., Zhao, P., Hoi, S. C., and Gopalkrishnan, V. Pamr: Passive aggressive meanreversion strategy for portfolio selection. Machine learning 87, 2 (2012), 221–258.[27] Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., andWierstra, D. Continuous control with deep reinforcement learning. arXiv preprintarXiv:1509.02971 (2015).[28] Longstaff, F. A., and Schwartz, E. S. Interest rate volatility and the term structure: Atwofactor general equilibrium model. The Journal of Finance 47, 4 (1992), 1259–1282.[29] Markowitz, H. The utility of wealth. Journal of political Economy 60, 2 (1952),151–158.[30] McCulloch, W. S., and Pitts, W. A logical calculus of the ideas immanent in nervousactivity. The bulletin of mathematical biophysics 5, 4 (1943), 115–133.[31] Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G.,Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., et al. Humanlevelcontrol through deep reinforcement learning. nature 518, 7540 (2015), 529–533.[32] Moerland, T. M., Broekens, J., and Jonker, C. M. Modelbasedreinforcement learning: A survey. arXiv preprint arXiv:2006.16712 (2020).[33] Moody, J., and Wu, L. Optimization of trading systems and portfolios. In Proceedingsof the IEEE/IAFE 1997 Computational Intelligence for Financial Engineering (CIFEr) (1997), IEEE, pp. 300–307.[34] Nachum, O., Norouzi, M., Xu, K., and Schuurmans, D. Bridging the gap betweenvalue and policy based reinforcement learning. arXiv preprint arXiv:1702.08892 (2017).[35] Ng, A. Y., Russell, S. J., et al. Algorithms for inverse reinforcement learning. InIcml (2000), vol. 1, p. 2.[36] Onali, E., and Goddard, J. Are european equity markets efficient? new evidencefrom fractal analysis. International Review of Financial Analysis 20, 2 (2011), 59–67.[37] Perold, A. F. The capital asset pricing model. Journal of economic perspectives 18,3 (2004), 3–24.[38] Rasekhschaffe, K. C., and Jones, R. C. Machine learning for stock selection. FinancialAnalysts Journal 75, 3 (2019), 70–88.[39] Rasmussen, C. E. Gaussian processes in machine learning. In Summer school onmachine learning (2003), Springer, pp. 63–71.[40] Rezaee, Z., Aliabadi, S., Dorestani, A., and Rezaee, N. J. Application of time seriesmodels in business research: Correlation, association, causation. Sustainability 12,12 (2020), 4833.[41] Rosenblatt, M. A central limit theorem and a strong mixing condition. Proceedingsof the National Academy of Sciences of the United States of America 42, 1 (1956), 43.[42] Sato, Y. Modelfreereinforcement learning for financial portfolios: a brief survey.arXiv preprint arXiv:1904.04973 (2019).[43] Sculley, D., Snoek, J., Wiltschko, A., and Rahimi, A. Winner’s curse? on pace,progress, and empirical rigor.[44] Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., and Riedmiller, M. Deterministicpolicy gradient algorithms. In International conference on machine learning(2014), PMLR, pp. 387–395.[45] Statista. Personal finance report 2021. https://www.statista.com/outlook/dmo/fintech/personal-finance/robo-advisors/worldwide. [Online; accessed 10Jun2021].[46] Sutton, R. S., and Barto, A. G. Reinforcement learning: An introduction. MIT press, 2018.[47] Sutton, R. S., McAllester, D. A., Singh, S. P., Mansour, Y., et al. Policy gradientmethods for reinforcement learning with function approximation. In NIPs (1999),vol. 99, Citeseer, pp. 1057–1063.[48] Weinan, E., Han, J., and Jentzen, A. Deep learningbasednumerical methods for highdimensional parabolic partial differential equations and backward stochasticdifferential equations. Communications in Mathematics and Statistics 5, 4 (2017), 349–380.[49] Xiong, J. X., and Idzorek, T. M. The impact of skewness and fat tails on the assetallocation decision. Financial Analysts Journal 67, 2 (2011), 23–35.[50] 金融監督管理委員會. 金融科技(fintech) 全球發展趨勢與證券市場應用評估.https://www.fsc.gov.tw. [Online; accessed 10Jun2021]. zh_TW dc.identifier.doi (DOI) 10.6814/NCCU202101194 en_US
