Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 應用AI強化學習於建立股票交易代理人之研究-以台積電股票為例
A study on establishing trading agent of stocks by AI reinforcement learning in Taiwan semiconductor manufacturing company stocks
作者 林睿峰
Lin, Jui-Feng
貢獻者 姜國輝<br>季延平
林睿峰
Lin, Jui-Feng
關鍵詞 機器學習
強化學習
Q-learning
Deep Q-learning
Machine learning
Reinforcement learning
Q-learning
Deep Q-learning
日期 2018
上傳時間 17-Jul-2018 11:25:51 (UTC+8)
摘要   機器學習技術中,強化學習受心理學的行為主義啟發,模仿生物從與環境的互動中,透過追求獎勵與避開懲罰逐步改變行為的學習方法。強化學習非常擅長進行連續多次決策的決策控制,而股市交易符合此類型問題的性質。
  然而股市環境狀態具有多樣性,難以用有限的狀態種類來概括,要讓學習代理人能夠學到面對所有環境狀態的應對行動會花費大量的訓練成本,因此本研究採用兩種訓練模型,其一是配合非監督式學習的分群能力先將環境狀態分群,再經由Q-learning演算法訓練;其二是使用將強化學習與深度學習結合的Deep Q-learning演算法訓練價值函數,利用深度學習擬似函數的能力,以Deep Q Network(DQN)為基礎建立股票交易代理人。
  系統設計上,本研究採用包含MA、MACD、RSI、BIAS、KD等多種技術指標作為交易代理人觀察市場環境狀態的方法,為歸納何種技術指標較能夠代表市場狀態,本研究設計七組技術指標組合並實測、比較其績效。以投資結束時所持資金與投資開始時所持資金的差距,即總獲利或總虧損作為獎勵訊號激勵代理人改變其交易行為,追求更高的獲利。
  本研究以台積電股票為例,擷取自2011年11月3日至2017年12月1日,共六年的臺灣證券交易所網站所公開之盤後資訊,訓練與測試交易代理人的性能,在其中表現最優的模型中,交易代理人平均具有16.14%年獲利率,並形成穩定的交易策略,具備有效獲利的能力。
  Reinforcement learning is one of machine learning techniques. Reinforcement learning is inspired by psychology`s behaviorism. Agents imitate learning methods that change behavior by pursuing rewards and avoiding punishment, just as creatures interact with the environment. Reinforcement learning is very good at continuous multiple decision-making. Stock market trading meets the nature of this type of problem.
  The state of the stock market environment is uncountable. It takes a lot of training costs for the learning agent to learn the response to all environmental states. This study uses two training models. First, cluster environmental states with unsupervised learning. Second, train the value function by Deep Q-learning algorithm which is combined with reinforcement learning and deep learning.
  This study uses technical indicators including MA, MACD, RSI, BIAS, KD as environmental states for trading agents to observe the market environment. This study designed seven sets of technical indicators. We compare their performance to find out which technical indicators are more representative of the market state. Take the total profit or total loss which is the difference between the funds held at the end of the trading and the funds held at the beginning of the trading as the reward signal.
  This study takes Taiwan Semiconductor Manufacturing Company stock as an example. We take six years of the after hours information on the website of the Taiwan Stock Exchange to train and test the performance of trading agents. Trading agents showed an average annual interest rate of 16.14% in the best performing model. The agent presents a stable trading strategy with effective profitability.
參考文獻 [ 1 ] 吳欣曄. (2004). 以增強式學習法設計機台派工法則之研究. 臺灣大學電機工程學研究所學位論文, 1-77.
[ 2 ] 林典南. (2008). 使用 AdaBoost 之臺股指數期貨當沖交易系統. 臺灣大學資訊網路與多媒體研究所學位論文, 1-55.
[ 3 ] 周俊志. (2008). 自動交易系統與策略評價之研究. 臺灣大學資訊工程學研究所學位論文, 1-48.
[ 4 ] 賴怡玲. (2009). 使用增強式學習法建立臺灣股價指數期貨當沖交易策略. 臺灣大學資訊工程學研究所學位論文, 1-24.
[ 5 ] Hsiao, Y. W., Liu, H. J., & Liao, Y. F. (2016). 基於增強式深層類神經網路之語言辨認系統 (Reinforcement Training for Deep Neural Networks-based Language Recognition)[In Chinese]. In Proceedings of the 28th Conference on Computational Linguistics and Speech Processing (ROCLING 2016) (pp. 325-341).
[ 6 ] Lee, J. W. (2001). Stock price prediction using reinforcement learning. In Industrial Electronics, 2001. Proceedings. ISIE 2001. IEEE International Symposium on (Vol. 1, pp. 690-695). IEEE.
[ 7 ] Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
[ 8 ] Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., ... & Petersen, S. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529.
[ 9 ] Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (Vol. 1, No. 1). Cambridge: MIT press.
描述 碩士
國立政治大學
資訊管理學系
105356036
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0105356036
資料類型 thesis
dc.contributor.advisor 姜國輝<br>季延平zh_TW
dc.contributor.author (Authors) 林睿峰zh_TW
dc.contributor.author (Authors) Lin, Jui-Fengen_US
dc.creator (作者) 林睿峰zh_TW
dc.creator (作者) Lin, Jui-Fengen_US
dc.date (日期) 2018en_US
dc.date.accessioned 17-Jul-2018 11:25:51 (UTC+8)-
dc.date.available 17-Jul-2018 11:25:51 (UTC+8)-
dc.date.issued (上傳時間) 17-Jul-2018 11:25:51 (UTC+8)-
dc.identifier (Other Identifiers) G0105356036en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/118697-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊管理學系zh_TW
dc.description (描述) 105356036zh_TW
dc.description.abstract (摘要)   機器學習技術中,強化學習受心理學的行為主義啟發,模仿生物從與環境的互動中,透過追求獎勵與避開懲罰逐步改變行為的學習方法。強化學習非常擅長進行連續多次決策的決策控制,而股市交易符合此類型問題的性質。
  然而股市環境狀態具有多樣性,難以用有限的狀態種類來概括,要讓學習代理人能夠學到面對所有環境狀態的應對行動會花費大量的訓練成本,因此本研究採用兩種訓練模型,其一是配合非監督式學習的分群能力先將環境狀態分群,再經由Q-learning演算法訓練;其二是使用將強化學習與深度學習結合的Deep Q-learning演算法訓練價值函數,利用深度學習擬似函數的能力,以Deep Q Network(DQN)為基礎建立股票交易代理人。
  系統設計上,本研究採用包含MA、MACD、RSI、BIAS、KD等多種技術指標作為交易代理人觀察市場環境狀態的方法,為歸納何種技術指標較能夠代表市場狀態,本研究設計七組技術指標組合並實測、比較其績效。以投資結束時所持資金與投資開始時所持資金的差距,即總獲利或總虧損作為獎勵訊號激勵代理人改變其交易行為,追求更高的獲利。
  本研究以台積電股票為例,擷取自2011年11月3日至2017年12月1日,共六年的臺灣證券交易所網站所公開之盤後資訊,訓練與測試交易代理人的性能,在其中表現最優的模型中,交易代理人平均具有16.14%年獲利率,並形成穩定的交易策略,具備有效獲利的能力。
zh_TW
dc.description.abstract (摘要)   Reinforcement learning is one of machine learning techniques. Reinforcement learning is inspired by psychology`s behaviorism. Agents imitate learning methods that change behavior by pursuing rewards and avoiding punishment, just as creatures interact with the environment. Reinforcement learning is very good at continuous multiple decision-making. Stock market trading meets the nature of this type of problem.
  The state of the stock market environment is uncountable. It takes a lot of training costs for the learning agent to learn the response to all environmental states. This study uses two training models. First, cluster environmental states with unsupervised learning. Second, train the value function by Deep Q-learning algorithm which is combined with reinforcement learning and deep learning.
  This study uses technical indicators including MA, MACD, RSI, BIAS, KD as environmental states for trading agents to observe the market environment. This study designed seven sets of technical indicators. We compare their performance to find out which technical indicators are more representative of the market state. Take the total profit or total loss which is the difference between the funds held at the end of the trading and the funds held at the beginning of the trading as the reward signal.
  This study takes Taiwan Semiconductor Manufacturing Company stock as an example. We take six years of the after hours information on the website of the Taiwan Stock Exchange to train and test the performance of trading agents. Trading agents showed an average annual interest rate of 16.14% in the best performing model. The agent presents a stable trading strategy with effective profitability.
en_US
dc.description.tableofcontents 第一章 緒論 1
  第一節 研究背景與動機 1
  第二節 研究目的 3
第二章 文獻回顧 4
  第一節 強化學習 4
    一、 強化學習問題 4
    二、 強化學習元素 5
    三、 時間分差法 6
    四、 探索與遵從的平衡 8
  第二節 Q-learning演算法 8
  第三節 Deep Q-learning演算法 10

第三章 研究方法 15
  第一節 資料蒐集 15
  第二節 投資歷程設計 16
  第三節 系統設計 17
  第四節 學習流程 19
  第五節 實驗模型設計 26
  第六節 測試 27
第四章 研究結果 29
  第一節 與非監督式學習結合之績效一覽 29
  第二節 深度強化學習之績效一覽 36
  第三節 各模型績效比較分析 38
  第四節 代理人交易行為分析 41
  第五章 研究結論與未來研究 43
第六章 參考文獻 45
zh_TW
dc.format.extent 2334541 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0105356036en_US
dc.subject (關鍵詞) 機器學習zh_TW
dc.subject (關鍵詞) 強化學習zh_TW
dc.subject (關鍵詞) Q-learningzh_TW
dc.subject (關鍵詞) Deep Q-learningzh_TW
dc.subject (關鍵詞) Machine learningen_US
dc.subject (關鍵詞) Reinforcement learningen_US
dc.subject (關鍵詞) Q-learningen_US
dc.subject (關鍵詞) Deep Q-learningen_US
dc.title (題名) 應用AI強化學習於建立股票交易代理人之研究-以台積電股票為例zh_TW
dc.title (題名) A study on establishing trading agent of stocks by AI reinforcement learning in Taiwan semiconductor manufacturing company stocksen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [ 1 ] 吳欣曄. (2004). 以增強式學習法設計機台派工法則之研究. 臺灣大學電機工程學研究所學位論文, 1-77.
[ 2 ] 林典南. (2008). 使用 AdaBoost 之臺股指數期貨當沖交易系統. 臺灣大學資訊網路與多媒體研究所學位論文, 1-55.
[ 3 ] 周俊志. (2008). 自動交易系統與策略評價之研究. 臺灣大學資訊工程學研究所學位論文, 1-48.
[ 4 ] 賴怡玲. (2009). 使用增強式學習法建立臺灣股價指數期貨當沖交易策略. 臺灣大學資訊工程學研究所學位論文, 1-24.
[ 5 ] Hsiao, Y. W., Liu, H. J., & Liao, Y. F. (2016). 基於增強式深層類神經網路之語言辨認系統 (Reinforcement Training for Deep Neural Networks-based Language Recognition)[In Chinese]. In Proceedings of the 28th Conference on Computational Linguistics and Speech Processing (ROCLING 2016) (pp. 325-341).
[ 6 ] Lee, J. W. (2001). Stock price prediction using reinforcement learning. In Industrial Electronics, 2001. Proceedings. ISIE 2001. IEEE International Symposium on (Vol. 1, pp. 690-695). IEEE.
[ 7 ] Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
[ 8 ] Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., ... & Petersen, S. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529.
[ 9 ] Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (Vol. 1, No. 1). Cambridge: MIT press.
zh_TW
dc.identifier.doi (DOI) 10.6814/THE.NCCU.MIS.004.2018.A05-