應用強化學習於股票的投資選擇-以台灣股市為例

彭志偉; Phang, Chee-Wai

Please use this identifier to cite or link to this item: https://ah.lib.nccu.edu.tw/handle/140.119/136570

DC Field	Value	Language
dc.contributor.advisor	蕭明福<br>蔡瑞煌	zh_TW
dc.contributor.author	彭志偉	zh_TW
dc.contributor.author	Phang, Chee-Wai	en_US
dc.creator	彭志偉	zh_TW
dc.creator	Phang, Chee-Wai	en_US
dc.date	2021	en_US
dc.date.accessioned	2021-08-04T08:01:10Z	-
dc.date.available	2021-08-04T08:01:10Z	-
dc.date.issued	2021-08-04T08:01:10Z	-
dc.identifier	G0108258044	en_US
dc.identifier.uri	http://nccur.lib.nccu.edu.tw/handle/140.119/136570	-
dc.description	碩士	zh_TW
dc.description	國立政治大學	zh_TW
dc.description	經濟學系	zh_TW
dc.description	108258044	zh_TW
dc.description.abstract	強化學習在各領域都是一門不可或缺的學科，而在金融界的實際應用已有信用借貸/違約評估、風險控管、人工智慧客服及股市預測等等，金融科技則是運用數學模型來解決金融環境中的問題，本研究將應用強化學習演算法的學習框架套用於臺灣股票金融市場環境當中，設計一個股票投資的學習環境並模擬投資人在該環境中進行演算法超參數調整的實驗，代理人的最終目的在於控制投資風險的情況下將投資報酬最大化，本研究採用已上市達21年，且為臺灣股市總市值前15大之股票作為強化學習之環境模擬的訓練對象，使用2000年至2016年的股票歷史資料作為訓練數據資料集來進行訓練，2017年至2021年作為測試資料集，最後本研究將評估其實驗結果及跟其他的投資績效策略進行投資報酬績效的比較。\n本研究在強化學習框架中所訓練之智慧代理人在環境模擬訓練的過程中，智慧代理人透過模擬學習在一定程度上捕捉到股票市場上股票價格的變動，並且藉由訓練達到有效的自我提升，在其後介紹的實驗測試結果中將會詳細介紹。而研究結果顯示，部分實驗測試的成果比加權股票指數及隨機分配投資策略的績效要好，在經過超參數調參後，仍以本研究之實驗二的成果為最佳選擇，並在測試結果中發現代理人在訓練的過程中有效的學習到了在控制投資風險的情況下進行投資獲利。	zh_TW
dc.description.abstract	Reinforcement learning is an indispensable subject in various fields, and the practical applications in the financial sector include credit lending, default assessment, risk control, artificial intelligence customer service, stock market forecasting, etc., and financial technology uses mathematical tools to explain the problems of the financial environment, this research will apply the learning framework of reinforcement learning algorithm to the Taiwan stock financial market environment, design a stock investment learning environment and simulate the experiment of investors in the environment to adjust the hyper parameters of the algorithm, and the ultimate purpose of the reinforcement learning’s agent is putting effort on learning to minimize investment risks and maximize investment returns. The total time data set in this study is 21 years long, and the stock history data from year 2000 to 2016 is used as the training data set for training, from year 2017 to 2021 will be treated as a test data set. Finally, this research will evaluate its experimental results and compare its return on investment performance with other investment performance strategies.\nIn the process of environmental simulation training, the intelligent agent trained in this research in the framework of reinforcement learning is able to acquire the stock’s price movement that changes in the stock market in a certain extent and can achieve effective self-improvement. In experiments two, five and ten The results of the test are better than the weighted stock price index and random allocation of investment strategies. In the test results of the experiments, that is found the agent is able to learn to make investment profits while controlling investment risks during the training process.	en_US
dc.description.tableofcontents	第一章緒論 1\n第一節研究背景 1\n第二節研究動機 5\n第三節研究目的 7\n第四節論文架構 9\n第二章文獻回顧 10\n第一節強化學習 10\n2.1.1. 行動 11\n2.1.2. 獎勵 12\n2.1.3. 狀態和環境 12\n2.1.4. TD3演算法 13\n第二節優化器與激勵函數 17\n2.2.1. 優化器 17\n2.2.2. 激勵函數 18\n第三章實驗設計 19\n第一節變數設定 22\n第二節資料收集及資料前置處理 25\n3.2.1. 選股標的 25\n3.2.2. 敘述統計 26\n第三節 TD3應用及設定 27\n3.3.1 超參數設定 29\n3.3.2. 硬體環境與程式工具 31\n第四章實驗結果 32\n第一節測試結果 35\n第二節績效策略比較 36\n第五章結論與未來展望 39\n第一節結論 39\n第二節未來展望 41\n參考文獻 42	zh_TW
dc.format.extent	2781282 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri	http://thesis.lib.nccu.edu.tw/record/#G0108258044	en_US
dc.subject	金融股票市場	zh_TW
dc.subject	機器學習	zh_TW
dc.subject	強化學習	zh_TW
dc.subject	神經網路	zh_TW
dc.subject	股票選擇	zh_TW
dc.subject	Stock Market	en_US
dc.subject	Machine Learning	en_US
dc.subject	Reinforcement Learning	en_US
dc.subject	Neural Networks	en_US
dc.subject	Stock Selection	en_US
dc.title	應用強化學習於股票的投資選擇-以台灣股市為例	zh_TW
dc.title	Applying Reinforcement Learning to Stock Investment–Taiwan Stock Market as an Example	en_US
dc.type	thesis	en_US
dc.relation.reference	中文部分\n[1] 蔡岳霖(2013)，一個使用遺傳演算法改良之投資組合保險模型之研究，國立高雄大學資訊工程學系碩士論文。\n[2] 施承和(2016)，機構投資人與散戶的投資策略之探討，朝陽科技大學財務金融系碩士論文。\n[3] 劉俞含(2018)，XGBoost模型、隨機森林模型、彈性網模型於股價指數趨勢之預測—以台灣、日本、美國為例，國立中山大學財務管理學系碩士論文。\n[4] 陳人豪(2018)，台股股利完全填權息關鍵影響因素之研究，國立政治大學資訊科學系碩士在職專班碩士論文。\n[5] 陳昱安(2020)，資產配置基於集成學習的多因子模型－以台灣股市為例，國立政治大學金融學系碩士論文。\n\n英文部分\n[1] Markowitz, H. (1952). PORTFOLIO SELECTION. The Journal of Finance 7(1): 77-91.\n[2] H. Ahmadi (1990). Testability of the arbitrage pricing theory by neural network, IJCNN International Joint Conference on Neural Networks, 1990, pp. 385-393 vol.1, doi: 10.1109/IJCNN.1990.137598.\n[3] Nison, S. (1991). Japanese candlestick charting techniques : a contemporary guide to the ancient investment techniques of the Far East, New York Institute Of Finance.\n[4] Sharpe, W. (1994). The Sharpe Ratio. Journal of Portfolio Management 21, No.1, Fall: 49-58.\n[5] Acar, E. and S. James (1997). Maximum loss and maximum drawdown in financial markets. Proceedings of International Conference on Forecasting Financial Markets.\n[6] Hochreiter, S. and J. Schmidhuber (1997). LSTM can solve hard long time lag problems. Advances in neural information processing systems.\n[7] Moody, J. and L. Wu (1997). Optimization of trading systems and portfolios. Proceedings of the IEEE/IAFE Computational Intelligence for Financial Engineering: 300-307.\n[8] Powell, Nicole, et al. (2008). Supervised and Unsupervised Methods for Stock Trend Forecasting. 203 - 205. 10.1109/SSST.2008.4480220.\n[9] Chung, J., et al. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.\n[10] Kingma, D. P. and J. Ba (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.\n[11] Cumming, J., et al. (2015). An investigation into the use of reinforcement learning techniques within the algorithmic trading domain, Imperial College London: London, UK.\n[12] Gabrielsson, P. and U. Johansson (2015). High-frequency equity index futures trading using recurrent reinforcement learning with candlesticks. 2015 IEEE Symposium Series on Computational Intelligence, IEEE.\n[13] Lillicrap, T. P., et al. (2015). Continuous control with deep reinforcement learning. arXiv preprint arXiv:.02971.\n[14] Meger, D., et al. (2018). Addressing function approximation error in actor-critic methods. International Conference on Machine Learning(PMLR): 1587-1596.\n[15] Pendharkar, P. C. and P. Cusatis (2018). Trading financial indices with reinforcement learning agents. Expert Systems with Applications 103: 1-13.\n[16] Kanwar, N. (2019). Deep Reinforcement Learning-based Portfolio Management, Ph.D. Dissertation, The University of Texas at Arlington: Arlington, TX, USA.\n[17] Liu, L., et al. (2019). On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:.03265.\n[18] Misra, D. (2019). Mish: A self regularized non-monotonic neural activation function. arXiv preprint arXiv:.08681.\n[19] Zhang, M., et al. (2019). Lookahead optimizer: k steps forward, 1 step back. Advances in Neural Information Processing Systems.\n[20] Corazza, et al. (2019). A comparison among Reinforcement Learning algorithms in financial trading systems, No 2019:33, Working Papers, Department of Economics, University of Venice "Ca` Foscari".	zh_TW
dc.identifier.doi	10.6814/NCCU202100964	en_US
item.grantfulltext	embargo_20260719	-
item.openairecristype	http://purl.org/coar/resource_type/c_46ec	-
item.fulltext	With Fulltext	-
item.cerifentitytype	Publications	-
item.openairetype	thesis	-
Appears in Collections:	學位論文

Files in This Item:

File	Description	Size	Format
804401.pdf		2.72 MB	Adobe PDF2	View/Open

Show simple item record

Google Scholar^TM

Check

Files in This Item:

Google Scholar^TM

Altmetric

Altmetric

Files in This Item:

Google ScholarTM

Altmetric

Altmetric

Google Scholar^TM