應用強化學習於股票的投資選擇-以台灣股市為例

學術產出-Theses

Article View/Open

pdf(0)

Publication Export

Google Scholar^TM

政大圖書館

學術資源探索系統

Citation Infomation

Simple Record
Full Record

題名	應用強化學習於股票的投資選擇-以台灣股市為例 Applying Reinforcement Learning to Stock Investment–Taiwan Stock Market as an Example
作者	彭志偉 Phang, Chee-Wai
貢獻者	蕭明福<br>蔡瑞煌彭志偉 Phang, Chee-Wai
關鍵詞	金融股票市場機器學習強化學習神經網路股票選擇 Stock Market Machine Learning Reinforcement Learning Neural Networks Stock Selection
日期	2021
上傳時間	4-Aug-2021 16:01:10 (UTC+8)
摘要	強化學習在各領域都是一門不可或缺的學科，而在金融界的實際應用已有信用借貸/違約評估、風險控管、人工智慧客服及股市預測等等，金融科技則是運用數學模型來解決金融環境中的問題，本研究將應用強化學習演算法的學習框架套用於臺灣股票金融市場環境當中，設計一個股票投資的學習環境並模擬投資人在該環境中進行演算法超參數調整的實驗，代理人的最終目的在於控制投資風險的情況下將投資報酬最大化，本研究採用已上市達21年，且為臺灣股市總市值前15大之股票作為強化學習之環境模擬的訓練對象，使用2000年至2016年的股票歷史資料作為訓練數據資料集來進行訓練，2017年至2021年作為測試資料集，最後本研究將評估其實驗結果及跟其他的投資績效策略進行投資報酬績效的比較。本研究在強化學習框架中所訓練之智慧代理人在環境模擬訓練的過程中，智慧代理人透過模擬學習在一定程度上捕捉到股票市場上股票價格的變動，並且藉由訓練達到有效的自我提升，在其後介紹的實驗測試結果中將會詳細介紹。而研究結果顯示，部分實驗測試的成果比加權股票指數及隨機分配投資策略的績效要好，在經過超參數調參後，仍以本研究之實驗二的成果為最佳選擇，並在測試結果中發現代理人在訓練的過程中有效的學習到了在控制投資風險的情況下進行投資獲利。 Reinforcement learning is an indispensable subject in various fields, and the practical applications in the financial sector include credit lending, default assessment, risk control, artificial intelligence customer service, stock market forecasting, etc., and financial technology uses mathematical tools to explain the problems of the financial environment, this research will apply the learning framework of reinforcement learning algorithm to the Taiwan stock financial market environment, design a stock investment learning environment and simulate the experiment of investors in the environment to adjust the hyper parameters of the algorithm, and the ultimate purpose of the reinforcement learning’s agent is putting effort on learning to minimize investment risks and maximize investment returns. The total time data set in this study is 21 years long, and the stock history data from year 2000 to 2016 is used as the training data set for training, from year 2017 to 2021 will be treated as a test data set. Finally, this research will evaluate its experimental results and compare its return on investment performance with other investment performance strategies. In the process of environmental simulation training, the intelligent agent trained in this research in the framework of reinforcement learning is able to acquire the stock’s price movement that changes in the stock market in a certain extent and can achieve effective self-improvement. In experiments two, five and ten The results of the test are better than the weighted stock price index and random allocation of investment strategies. In the test results of the experiments, that is found the agent is able to learn to make investment profits while controlling investment risks during the training process.
參考文獻	中文部分 [1] 蔡岳霖(2013)，一個使用遺傳演算法改良之投資組合保險模型之研究，國立高雄大學資訊工程學系碩士論文。 [2] 施承和(2016)，機構投資人與散戶的投資策略之探討，朝陽科技大學財務金融系碩士論文。 [3] 劉俞含(2018)，XGBoost模型、隨機森林模型、彈性網模型於股價指數趨勢之預測—以台灣、日本、美國為例，國立中山大學財務管理學系碩士論文。 [4] 陳人豪(2018)，台股股利完全填權息關鍵影響因素之研究，國立政治大學資訊科學系碩士在職專班碩士論文。 [5] 陳昱安(2020)，資產配置基於集成學習的多因子模型－以台灣股市為例，國立政治大學金融學系碩士論文。英文部分 [1] Markowitz, H. (1952). PORTFOLIO SELECTION. The Journal of Finance 7(1): 77-91. [2] H. Ahmadi (1990). Testability of the arbitrage pricing theory by neural network, IJCNN International Joint Conference on Neural Networks, 1990, pp. 385-393 vol.1, doi: 10.1109/IJCNN.1990.137598. [3] Nison, S. (1991). Japanese candlestick charting techniques : a contemporary guide to the ancient investment techniques of the Far East, New York Institute Of Finance. [4] Sharpe, W. (1994). The Sharpe Ratio. Journal of Portfolio Management 21, No.1, Fall: 49-58. [5] Acar, E. and S. James (1997). Maximum loss and maximum drawdown in financial markets. Proceedings of International Conference on Forecasting Financial Markets. [6] Hochreiter, S. and J. Schmidhuber (1997). LSTM can solve hard long time lag problems. Advances in neural information processing systems. [7] Moody, J. and L. Wu (1997). Optimization of trading systems and portfolios. Proceedings of the IEEE/IAFE Computational Intelligence for Financial Engineering: 300-307. [8] Powell, Nicole, et al. (2008). Supervised and Unsupervised Methods for Stock Trend Forecasting. 203 - 205. 10.1109/SSST.2008.4480220. [9] Chung, J., et al. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555. [10] Kingma, D. P. and J. Ba (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. [11] Cumming, J., et al. (2015). An investigation into the use of reinforcement learning techniques within the algorithmic trading domain, Imperial College London: London, UK. [12] Gabrielsson, P. and U. Johansson (2015). High-frequency equity index futures trading using recurrent reinforcement learning with candlesticks. 2015 IEEE Symposium Series on Computational Intelligence, IEEE. [13] Lillicrap, T. P., et al. (2015). Continuous control with deep reinforcement learning. arXiv preprint arXiv:.02971. [14] Meger, D., et al. (2018). Addressing function approximation error in actor-critic methods. International Conference on Machine Learning(PMLR): 1587-1596. [15] Pendharkar, P. C. and P. Cusatis (2018). Trading financial indices with reinforcement learning agents. Expert Systems with Applications 103: 1-13. [16] Kanwar, N. (2019). Deep Reinforcement Learning-based Portfolio Management, Ph.D. Dissertation, The University of Texas at Arlington: Arlington, TX, USA. [17] Liu, L., et al. (2019). On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:.03265. [18] Misra, D. (2019). Mish: A self regularized non-monotonic neural activation function. arXiv preprint arXiv:.08681. [19] Zhang, M., et al. (2019). Lookahead optimizer: k steps forward, 1 step back. Advances in Neural Information Processing Systems. [20] Corazza, et al. (2019). A comparison among Reinforcement Learning algorithms in financial trading systems, No 2019:33, Working Papers, Department of Economics, University of Venice "Ca` Foscari".
描述	碩士國立政治大學經濟學系 108258044
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0108258044
資料類型	thesis

dc.contributor.advisor	蕭明福<br>蔡瑞煌	zh_TW
dc.contributor.author (Authors)	彭志偉	zh_TW
dc.contributor.author (Authors)	Phang, Chee-Wai	en_US
dc.creator (作者)	彭志偉	zh_TW
dc.creator (作者)	Phang, Chee-Wai	en_US
dc.date (日期)	2021	en_US
dc.date.accessioned	4-Aug-2021 16:01:10 (UTC+8)	-
dc.date.available	4-Aug-2021 16:01:10 (UTC+8)	-
dc.date.issued (上傳時間)	4-Aug-2021 16:01:10 (UTC+8)	-
dc.identifier (Other Identifiers)	G0108258044	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/136570	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	經濟學系	zh_TW
dc.description (描述)	108258044	zh_TW
dc.description.abstract (摘要)	強化學習在各領域都是一門不可或缺的學科，而在金融界的實際應用已有信用借貸/違約評估、風險控管、人工智慧客服及股市預測等等，金融科技則是運用數學模型來解決金融環境中的問題，本研究將應用強化學習演算法的學習框架套用於臺灣股票金融市場環境當中，設計一個股票投資的學習環境並模擬投資人在該環境中進行演算法超參數調整的實驗，代理人的最終目的在於控制投資風險的情況下將投資報酬最大化，本研究採用已上市達21年，且為臺灣股市總市值前15大之股票作為強化學習之環境模擬的訓練對象，使用2000年至2016年的股票歷史資料作為訓練數據資料集來進行訓練，2017年至2021年作為測試資料集，最後本研究將評估其實驗結果及跟其他的投資績效策略進行投資報酬績效的比較。本研究在強化學習框架中所訓練之智慧代理人在環境模擬訓練的過程中，智慧代理人透過模擬學習在一定程度上捕捉到股票市場上股票價格的變動，並且藉由訓練達到有效的自我提升，在其後介紹的實驗測試結果中將會詳細介紹。而研究結果顯示，部分實驗測試的成果比加權股票指數及隨機分配投資策略的績效要好，在經過超參數調參後，仍以本研究之實驗二的成果為最佳選擇，並在測試結果中發現代理人在訓練的過程中有效的學習到了在控制投資風險的情況下進行投資獲利。	zh_TW
dc.description.abstract (摘要)	Reinforcement learning is an indispensable subject in various fields, and the practical applications in the financial sector include credit lending, default assessment, risk control, artificial intelligence customer service, stock market forecasting, etc., and financial technology uses mathematical tools to explain the problems of the financial environment, this research will apply the learning framework of reinforcement learning algorithm to the Taiwan stock financial market environment, design a stock investment learning environment and simulate the experiment of investors in the environment to adjust the hyper parameters of the algorithm, and the ultimate purpose of the reinforcement learning’s agent is putting effort on learning to minimize investment risks and maximize investment returns. The total time data set in this study is 21 years long, and the stock history data from year 2000 to 2016 is used as the training data set for training, from year 2017 to 2021 will be treated as a test data set. Finally, this research will evaluate its experimental results and compare its return on investment performance with other investment performance strategies. In the process of environmental simulation training, the intelligent agent trained in this research in the framework of reinforcement learning is able to acquire the stock’s price movement that changes in the stock market in a certain extent and can achieve effective self-improvement. In experiments two, five and ten The results of the test are better than the weighted stock price index and random allocation of investment strategies. In the test results of the experiments, that is found the agent is able to learn to make investment profits while controlling investment risks during the training process.	en_US
dc.description.tableofcontents	第一章緒論 1 第一節研究背景 1 第二節研究動機 5 第三節研究目的 7 第四節論文架構 9 第二章文獻回顧 10 第一節強化學習 10 2.1.1. 行動 11 2.1.2. 獎勵 12 2.1.3. 狀態和環境 12 2.1.4. TD3演算法 13 第二節優化器與激勵函數 17 2.2.1. 優化器 17 2.2.2. 激勵函數 18 第三章實驗設計 19 第一節變數設定 22 第二節資料收集及資料前置處理 25 3.2.1. 選股標的 25 3.2.2. 敘述統計 26 第三節 TD3應用及設定 27 3.3.1 超參數設定 29 3.3.2. 硬體環境與程式工具 31 第四章實驗結果 32 第一節測試結果 35 第二節績效策略比較 36 第五章結論與未來展望 39 第一節結論 39 第二節未來展望 41 參考文獻 42	zh_TW
dc.format.extent	2781282 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0108258044	en_US
dc.subject (關鍵詞)	金融股票市場	zh_TW
dc.subject (關鍵詞)	機器學習	zh_TW
dc.subject (關鍵詞)	強化學習	zh_TW
dc.subject (關鍵詞)	神經網路	zh_TW
dc.subject (關鍵詞)	股票選擇	zh_TW
dc.subject (關鍵詞)	Stock Market	en_US
dc.subject (關鍵詞)	Machine Learning	en_US
dc.subject (關鍵詞)	Reinforcement Learning	en_US
dc.subject (關鍵詞)	Neural Networks	en_US
dc.subject (關鍵詞)	Stock Selection	en_US
dc.title (題名)	應用強化學習於股票的投資選擇-以台灣股市為例	zh_TW
dc.title (題名)	Applying Reinforcement Learning to Stock Investment–Taiwan Stock Market as an Example	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	中文部分 [1] 蔡岳霖(2013)，一個使用遺傳演算法改良之投資組合保險模型之研究，國立高雄大學資訊工程學系碩士論文。 [2] 施承和(2016)，機構投資人與散戶的投資策略之探討，朝陽科技大學財務金融系碩士論文。 [3] 劉俞含(2018)，XGBoost模型、隨機森林模型、彈性網模型於股價指數趨勢之預測—以台灣、日本、美國為例，國立中山大學財務管理學系碩士論文。 [4] 陳人豪(2018)，台股股利完全填權息關鍵影響因素之研究，國立政治大學資訊科學系碩士在職專班碩士論文。 [5] 陳昱安(2020)，資產配置基於集成學習的多因子模型－以台灣股市為例，國立政治大學金融學系碩士論文。英文部分 [1] Markowitz, H. (1952). PORTFOLIO SELECTION. The Journal of Finance 7(1): 77-91. [2] H. Ahmadi (1990). Testability of the arbitrage pricing theory by neural network, IJCNN International Joint Conference on Neural Networks, 1990, pp. 385-393 vol.1, doi: 10.1109/IJCNN.1990.137598. [3] Nison, S. (1991). Japanese candlestick charting techniques : a contemporary guide to the ancient investment techniques of the Far East, New York Institute Of Finance. [4] Sharpe, W. (1994). The Sharpe Ratio. Journal of Portfolio Management 21, No.1, Fall: 49-58. [5] Acar, E. and S. James (1997). Maximum loss and maximum drawdown in financial markets. Proceedings of International Conference on Forecasting Financial Markets. [6] Hochreiter, S. and J. Schmidhuber (1997). LSTM can solve hard long time lag problems. Advances in neural information processing systems. [7] Moody, J. and L. Wu (1997). Optimization of trading systems and portfolios. Proceedings of the IEEE/IAFE Computational Intelligence for Financial Engineering: 300-307. [8] Powell, Nicole, et al. (2008). Supervised and Unsupervised Methods for Stock Trend Forecasting. 203 - 205. 10.1109/SSST.2008.4480220. [9] Chung, J., et al. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555. [10] Kingma, D. P. and J. Ba (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. [11] Cumming, J., et al. (2015). An investigation into the use of reinforcement learning techniques within the algorithmic trading domain, Imperial College London: London, UK. [12] Gabrielsson, P. and U. Johansson (2015). High-frequency equity index futures trading using recurrent reinforcement learning with candlesticks. 2015 IEEE Symposium Series on Computational Intelligence, IEEE. [13] Lillicrap, T. P., et al. (2015). Continuous control with deep reinforcement learning. arXiv preprint arXiv:.02971. [14] Meger, D., et al. (2018). Addressing function approximation error in actor-critic methods. International Conference on Machine Learning(PMLR): 1587-1596. [15] Pendharkar, P. C. and P. Cusatis (2018). Trading financial indices with reinforcement learning agents. Expert Systems with Applications 103: 1-13. [16] Kanwar, N. (2019). Deep Reinforcement Learning-based Portfolio Management, Ph.D. Dissertation, The University of Texas at Arlington: Arlington, TX, USA. [17] Liu, L., et al. (2019). On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:.03265. [18] Misra, D. (2019). Mish: A self regularized non-monotonic neural activation function. arXiv preprint arXiv:.08681. [19] Zhang, M., et al. (2019). Lookahead optimizer: k steps forward, 1 step back. Advances in Neural Information Processing Systems. [20] Corazza, et al. (2019). A comparison among Reinforcement Learning algorithms in financial trading systems, No 2019:33, Working Papers, Department of Economics, University of Venice "Ca` Foscari".	zh_TW
dc.identifier.doi (DOI)	10.6814/NCCU202100964	en_US

學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

Google Scholar^TM