Please use this identifier to cite or link to this item: https://ah.lib.nccu.edu.tw/handle/140.119/136570
DC FieldValueLanguage
dc.contributor.advisor蕭明福<br>蔡瑞煌zh_TW
dc.contributor.author彭志偉zh_TW
dc.contributor.authorPhang, Chee-Waien_US
dc.creator彭志偉zh_TW
dc.creatorPhang, Chee-Waien_US
dc.date2021en_US
dc.date.accessioned2021-08-04T08:01:10Z-
dc.date.available2021-08-04T08:01:10Z-
dc.date.issued2021-08-04T08:01:10Z-
dc.identifierG0108258044en_US
dc.identifier.urihttp://nccur.lib.nccu.edu.tw/handle/140.119/136570-
dc.description碩士zh_TW
dc.description國立政治大學zh_TW
dc.description經濟學系zh_TW
dc.description108258044zh_TW
dc.description.abstract強化學習在各領域都是一門不可或缺的學科,而在金融界的實際應用已有信用借貸/違約評估、風險控管、人工智慧客服及股市預測等等,金融科技則是運用數學模型來解決金融環境中的問題,本研究將應用強化學習演算法的學習框架套用於臺灣股票金融市場環境當中,設計一個股票投資的學習環境並模擬投資人在該環境中進行演算法超參數調整的實驗,代理人的最終目的在於控制投資風險的情況下將投資報酬最大化,本研究採用已上市達21年,且為臺灣股市總市值前15大之股票作為強化學習之環境模擬的訓練對象,使用2000年至2016年的股票歷史資料作為訓練數據資料集來進行訓練,2017年至2021年作為測試資料集,最後本研究將評估其實驗結果及跟其他的投資績效策略進行投資報酬績效的比較。\n本研究在強化學習框架中所訓練之智慧代理人在環境模擬訓練的過程中,智慧代理人透過模擬學習在一定程度上捕捉到股票市場上股票價格的變動,並且藉由訓練達到有效的自我提升,在其後介紹的實驗測試結果中將會詳細介紹。而研究結果顯示,部分實驗測試的成果比加權股票指數及隨機分配投資策略的績效要好,在經過超參數調參後,仍以本研究之實驗二的成果為最佳選擇,並在測試結果中發現代理人在訓練的過程中有效的學習到了在控制投資風險的情況下進行投資獲利。zh_TW
dc.description.abstractReinforcement learning is an indispensable subject in various fields, and the practical applications in the financial sector include credit lending, default assessment, risk control, artificial intelligence customer service, stock market forecasting, etc., and financial technology uses mathematical tools to explain the problems of the financial environment, this research will apply the learning framework of reinforcement learning algorithm to the Taiwan stock financial market environment, design a stock investment learning environment and simulate the experiment of investors in the environment to adjust the hyper parameters of the algorithm, and the ultimate purpose of the reinforcement learning’s agent is putting effort on learning to minimize investment risks and maximize investment returns. The total time data set in this study is 21 years long, and the stock history data from year 2000 to 2016 is used as the training data set for training, from year 2017 to 2021 will be treated as a test data set. Finally, this research will evaluate its experimental results and compare its return on investment performance with other investment performance strategies.\nIn the process of environmental simulation training, the intelligent agent trained in this research in the framework of reinforcement learning is able to acquire the stock’s price movement that changes in the stock market in a certain extent and can achieve effective self-improvement. In experiments two, five and ten The results of the test are better than the weighted stock price index and random allocation of investment strategies. In the test results of the experiments, that is found the agent is able to learn to make investment profits while controlling investment risks during the training process.en_US
dc.description.tableofcontents第一章 緒論 1\n第一節 研究背景 1\n第二節 研究動機 5\n第三節 研究目的 7\n第四節 論文架構 9\n第二章 文獻回顧 10\n第一節 強化學習 10\n2.1.1. 行動 11\n2.1.2. 獎勵 12\n2.1.3. 狀態和環境 12\n2.1.4. TD3演算法 13\n第二節 優化器與激勵函數 17\n2.2.1. 優化器 17\n2.2.2. 激勵函數 18\n第三章 實驗設計 19\n第一節 變數設定 22\n第二節 資料收集及資料前置處理 25\n3.2.1. 選股標的 25\n3.2.2. 敘述統計 26\n第三節 TD3應用及設定 27\n3.3.1 超參數設定 29\n3.3.2. 硬體環境與程式工具 31\n第四章 實驗結果 32\n第一節 測試結果 35\n第二節 績效策略比較 36\n第五章 結論與未來展望 39\n第一節 結論 39\n第二節 未來展望 41\n參考文獻 42zh_TW
dc.format.extent2781282 bytes-
dc.format.mimetypeapplication/pdf-
dc.source.urihttp://thesis.lib.nccu.edu.tw/record/#G0108258044en_US
dc.subject金融股票市場zh_TW
dc.subject機器學習zh_TW
dc.subject強化學習zh_TW
dc.subject神經網路zh_TW
dc.subject股票選擇zh_TW
dc.subjectStock Marketen_US
dc.subjectMachine Learningen_US
dc.subjectReinforcement Learningen_US
dc.subjectNeural Networksen_US
dc.subjectStock Selectionen_US
dc.title應用強化學習於股票的投資選擇-以台灣股市為例zh_TW
dc.titleApplying Reinforcement Learning to Stock Investment–Taiwan Stock Market as an Exampleen_US
dc.typethesisen_US
dc.relation.reference中文部分\n[1] 蔡岳霖(2013),一個使用遺傳演算法改良之投資組合保險模型之研究,國立高雄大學資訊工程學系碩士論文。\n[2] 施承和(2016),機構投資人與散戶的投資策略之探討,朝陽科技大學財務金融系碩士論文。\n[3] 劉俞含(2018),XGBoost模型、隨機森林模型、彈性網模型於股價指數趨勢之預測—以台灣、日本、美國為例,國立中山大學財務管理學系碩士論文。\n[4] 陳人豪(2018),台股股利完全填權息關鍵影響因素之研究,國立政治大學資訊科學系碩士在職專班碩士論文。\n[5] 陳昱安(2020),資產配置基於集成學習的多因子模型-以台灣股市為例,國立政治大學金融學系碩士論文。\n\n英文部分\n[1] Markowitz, H. (1952). PORTFOLIO SELECTION. The Journal of Finance 7(1): 77-91.\n[2] H. Ahmadi (1990). Testability of the arbitrage pricing theory by neural network, IJCNN International Joint Conference on Neural Networks, 1990, pp. 385-393 vol.1, doi: 10.1109/IJCNN.1990.137598.\n[3] Nison, S. (1991). Japanese candlestick charting techniques : a contemporary guide to the ancient investment techniques of the Far East, New York Institute Of Finance.\n[4] Sharpe, W. (1994). The Sharpe Ratio. Journal of Portfolio Management 21, No.1, Fall: 49-58.\n[5] Acar, E. and S. James (1997). Maximum loss and maximum drawdown in financial markets. Proceedings of International Conference on Forecasting Financial Markets.\n[6] Hochreiter, S. and J. Schmidhuber (1997). LSTM can solve hard long time lag problems. Advances in neural information processing systems.\n[7] Moody, J. and L. Wu (1997). Optimization of trading systems and portfolios. Proceedings of the IEEE/IAFE Computational Intelligence for Financial Engineering: 300-307.\n[8] Powell, Nicole, et al. (2008). Supervised and Unsupervised Methods for Stock Trend Forecasting. 203 - 205. 10.1109/SSST.2008.4480220.\n[9] Chung, J., et al. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.\n[10] Kingma, D. P. and J. Ba (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.\n[11] Cumming, J., et al. (2015). An investigation into the use of reinforcement learning techniques within the algorithmic trading domain, Imperial College London: London, UK.\n[12] Gabrielsson, P. and U. Johansson (2015). High-frequency equity index futures trading using recurrent reinforcement learning with candlesticks. 2015 IEEE Symposium Series on Computational Intelligence, IEEE.\n[13] Lillicrap, T. P., et al. (2015). Continuous control with deep reinforcement learning. arXiv preprint arXiv:.02971.\n[14] Meger, D., et al. (2018). Addressing function approximation error in actor-critic methods. International Conference on Machine Learning(PMLR): 1587-1596.\n[15] Pendharkar, P. C. and P. Cusatis (2018). Trading financial indices with reinforcement learning agents. Expert Systems with Applications 103: 1-13.\n[16] Kanwar, N. (2019). Deep Reinforcement Learning-based Portfolio Management, Ph.D. Dissertation, The University of Texas at Arlington: Arlington, TX, USA.\n[17] Liu, L., et al. (2019). On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:.03265.\n[18] Misra, D. (2019). Mish: A self regularized non-monotonic neural activation function. arXiv preprint arXiv:.08681.\n[19] Zhang, M., et al. (2019). Lookahead optimizer: k steps forward, 1 step back. Advances in Neural Information Processing Systems.\n[20] Corazza, et al. (2019). A comparison among Reinforcement Learning algorithms in financial trading systems, No 2019:33, Working Papers, Department of Economics, University of Venice &quot;Ca` Foscari&quot;.zh_TW
dc.identifier.doi10.6814/NCCU202100964en_US
item.grantfulltextembargo_20260719-
item.openairecristypehttp://purl.org/coar/resource_type/c_46ec-
item.fulltextWith Fulltext-
item.cerifentitytypePublications-
item.openairetypethesis-
Appears in Collections:學位論文
Files in This Item:
File Description SizeFormat
804401.pdf2.72 MBAdobe PDF2View/Open
Show simple item record

Google ScholarTM

Check

Altmetric

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.