Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 透過文字探勘預測台股報酬
Predicting Taiwan Stocks Returns with Text Data
作者 郭亭佑
Kuo, Ting-You
貢獻者 翁久幸<br>林士貴
Weng, Chiu-Hsing<br>Lin, Shih-Kuei
郭亭佑
Kuo, Ting-You
關鍵詞 非結構化數據
文字探勘
股票新聞
機器學習
預測股票報酬
情緒分析
效率市場假說
超額報酬
Unstructured Data
Text Mining
Stock News
Machine Learning
Predict Stock Returns
Sentiment Analysis
Efficient-Market Hypothesis
Abnormal Returns
日期 2021
上傳時間 4-Aug-2021 14:43:11 (UTC+8)
摘要 近年來非結構化數據成長快速,因而引發多位學者針對新聞媒體對於股票報酬之影響此類議題進行研究分析。新聞為一般投資人進行交易行為時,最為普遍接觸之「公開資訊」。然而,新聞文章不若財報資訊中有明確數據資料供投資人研究分析後,作為其投資之參考依據。本研究欲透過文字探勘方法獲取台股新聞情緒信息,並利用新聞情緒分數預測台股報酬。本文依據 Ke, Kelly & Xiu (2019) 提出之文字探勘方法建構台股新聞情緒分數模型(Taiwan Stocks Sentiment Extraction via Screening and Topic Modeling, 台股SESTM),我們發現該方法特別適合用於分析新聞文章與股價走勢之間的變動關係,因此本研究欲將該文字探勘方法拓展至臺灣股票市場,並用於實證臺灣效率市場假說。我們發現使用台股SESTM所估算之新聞情緒分數,於臺灣股票市場建構投資組合交易策略同樣有巨大經濟效益,而該情緒分數對於個股報酬有顯著的預測能力及解釋力。若比較美國與台股SESTM交易策略績效表現,可發現台股SESTM對於新聞發佈前之股票報酬有較高的預測能力。同時也發現,儘管台股SESTM對於股票報酬之預測能力顯著有效,但我們透過評估績效發現,新聞對於臺灣投資人決策行為之影響與美國是顯著不同的,這些結果均符合我們對於臺灣股票市場的經濟直觀。我們期待此研究所建構之台股SESTM能夠幫助臺灣財務文字探勘領域建立研究基底。
In recent years, unstructured data has grown rapidly, which has triggered many scholars to conduct research and analysis on the impact of news media on stock price returns. News article is the most common and accessible “open information” by investors when they conduct transactions. However, news articles, unlike financial report or stock price, news articles cannot be converted to specific numerical data as a reference basis for investment. Our research intends to obtain sentiment information from Taiwan stocks news through text-mining and use news sentiment scores to predict Taiwan stocks` returns. Our research is based on the text-mining methodology introduce by Ke, Kelly & Xiu (2019) to construct a Taiwan stock news sentiment model (Taiwan Stocks Sentiment Extraction via Screening and Topic Modeling, Taiwan SESTM). We found that this methodology is particularly suitable for analyzing the relationship between news articles and stock price trends. Therefore, this study intends to extend this text-mining methodology to the Taiwan stock market and use the empirical analysis of Taiwan`s efficiency-market hypothesis by news articles. We found that using the news sentiment score estimated by Taiwan SESTM to construct a portfolio trading strategy in the Taiwan stock market also has huge economic benefits, and the sentiment score is significantly effective on predict stock returns and explain their correlation. We compare the performance of the United States and Taiwan SESTM trading strategies, we found that Taiwan SESTM has a higher predictive ability for stock price returns before the news articles release. At the same time, we also found the impact of news on the decision making of Taiwanese investors is significantly different with United States by evaluate our portfolio performance. These results are in line with our economic intuition about the Taiwan stock market. We hope that the Taiwan SESTM constructed by this research can help establish a research base in the field of financial text-mining in Taiwan.
參考文獻 1. 李昱穎. (2019). 新聞輿情分析在台灣股票市場之應用: 文字轉向量與動能策略. 政治大學金融學系學位論文, 1-40.
2. 陳信宏, 陳昱志,& 鄭舜仁.(2006). 以時間數列模型檢定台灣股票市場弱式效率性之研究. 管理科學與統計決策, 3(4), 8-17.
3. 鍾任明, 李維平, & 吳澤民. (2005). 運用文字探勘於日內股價漲跌趨勢預測之研究 (Doctoral dissertation, 撰者).
4. Azar, P. D., & Lo, A. W. (2016). The wisdom of Twitter crowds: Predicting stock market reactions to FOMC meetings via Twitter feeds. The Journal of Portfolio Management, 42(5), 123-134.
5. Alvarez-Ramirez, J., Rodriguez, E., & Espinosa-Paredes, G. (2012). Is the US stock market becoming weakly efficient over time? Evidence from 80-year-long data. Physica A: Statistical Mechanics and its Applications, 391(22), 5643-5647.
6. Bernard, V. L., & Thomas, J. K. (1990). Evidence that stock prices do not fully reflect the implications of current earnings for future earnings. Journal of Accounting and Economics, 13(4), 305-340.
7. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12, 2493-2537.
8. Cowles 3rd, A. (1933). Can stock market forecasters forecast?. Econometrica: Journal of the Econometric Society, 309-324.
9. Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q. V., & Salakhutdinov, R. (2019). Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860.
10. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
11. Fama, E. F. (1970). Efficient capital markets: A review of theory and empirical work. The Journal of Finance, 25(2), 383-417.
12. Fan, J., & Lv, J. (2008). Sure independence screening for ultrahigh dimensioal feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(5), 849-911.
13. Gehring, J., Auli, M., Grangier, D., Yarats, D., & Dauphin, Y. N. (2017, July). Convolutional sequence to sequence learning. In International Conference on Machine Learning (pp. 1243-1252). PMLR.
14. Heston, S. L., & Sinha, N. R. (2017). News vs. sentiment: Predicting stock returns from news stories. Financial Analysts Journal, 73(3), 67-83.
15. Hutchins, R. M. (1954). Great books. Western World.
16. Jegadeesh, N., & Titman, S. (1993). Returns to buying winners and selling losers: Implications for stock market efficiency. The Journal of Finance, 48(1), 65-91.
17. Jegadeesh, N., & Wu, D. (2013). Word power: A new approach for content analysis. Journal of Financial Economics, 110(3), 712-729.
18. Kalchbrenner, N., Grefenstette, E., & Blunsom, P. (2014). A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188.
19. Ke, Z. T., Kelly, B. T., & Xiu, D. (2019). Predicting returns with text data (No. w26186). National Bureau of Economic Research.
20. Lakonishok, J., & Vermaelen, T. (1990). Anomalous price behavior around repurchase tender offers. The Journal of Finance, 45(2), 455-477.
21. Le, Q., & Mikolov, T. (2014, June). Distributed representations of sentences and documents. In International Conference on Machine Learning (pp. 1188-1196). PMLR.
22. Loper, E., & Bird, S. (2002). NLTK: the natural language toolkit. arXiv preprint cs/0205028.
23. Loughran, T., & McDonald, B. (2011). When is a liability not a liability? Textual analysis, dictionaries, and 10‐Ks. The Journal of Finance, 66(1), 35-65.
24. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, 26, 3111-3119.
25. Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Polosukhin, I. (2017). Attention is all you need. arXiv preprint arXiv:1706.03762.
26. Ritter, J. R. (1991). The long‐run performance of initial public offerings. The Journal of Finance, 46(1), 3-27.
27. Spiess, D. K., & Affleck-Graves, J. (1995). Underperformance in long-run stock returns following seasoned equity offerings. Journal of Financial Economics, 38(3), 243-267.
28. Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with
neural networks. arXiv preprint arXiv:1409.3215.
29. Tetlock, P. C. (2007). Giving content to investor sentiment: The role of media in the stock market. The Journal of Finance, 62(3), 1139-1168.
30. Tetlock, P. C. (2014). Information transmission in finance. Annual Review of Financial Economics, 6(1), 365-384.
31. Turing, I. B. A. (1950). Computing machinery and intelligence-AM Turing. Mind, 59(236), 433.
32. Wilson, D. S. (1975). A theory of group selection. Proceedings of the National Academy of Sciences, 72(1), 143-146.
33. Yang, B., Yih, W. T., He, X., Gao, J., & Deng, L. (2014). Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575.
34. Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv preprint arXiv:1802.05365.
35. Zhang, Y., & Wallace, B. (2015). A sensitivity analysis of (and practitioners` guide to) convolutional neural networks for sentence classification. arXiv preprint arXiv:1510.03820.
描述 碩士
國立政治大學
統計學系
108354023
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0108354023
資料類型 thesis
dc.contributor.advisor 翁久幸<br>林士貴zh_TW
dc.contributor.advisor Weng, Chiu-Hsing<br>Lin, Shih-Kueien_US
dc.contributor.author (Authors) 郭亭佑zh_TW
dc.contributor.author (Authors) Kuo, Ting-Youen_US
dc.creator (作者) 郭亭佑zh_TW
dc.creator (作者) Kuo, Ting-Youen_US
dc.date (日期) 2021en_US
dc.date.accessioned 4-Aug-2021 14:43:11 (UTC+8)-
dc.date.available 4-Aug-2021 14:43:11 (UTC+8)-
dc.date.issued (上傳時間) 4-Aug-2021 14:43:11 (UTC+8)-
dc.identifier (Other Identifiers) G0108354023en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/136324-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 統計學系zh_TW
dc.description (描述) 108354023zh_TW
dc.description.abstract (摘要) 近年來非結構化數據成長快速,因而引發多位學者針對新聞媒體對於股票報酬之影響此類議題進行研究分析。新聞為一般投資人進行交易行為時,最為普遍接觸之「公開資訊」。然而,新聞文章不若財報資訊中有明確數據資料供投資人研究分析後,作為其投資之參考依據。本研究欲透過文字探勘方法獲取台股新聞情緒信息,並利用新聞情緒分數預測台股報酬。本文依據 Ke, Kelly & Xiu (2019) 提出之文字探勘方法建構台股新聞情緒分數模型(Taiwan Stocks Sentiment Extraction via Screening and Topic Modeling, 台股SESTM),我們發現該方法特別適合用於分析新聞文章與股價走勢之間的變動關係,因此本研究欲將該文字探勘方法拓展至臺灣股票市場,並用於實證臺灣效率市場假說。我們發現使用台股SESTM所估算之新聞情緒分數,於臺灣股票市場建構投資組合交易策略同樣有巨大經濟效益,而該情緒分數對於個股報酬有顯著的預測能力及解釋力。若比較美國與台股SESTM交易策略績效表現,可發現台股SESTM對於新聞發佈前之股票報酬有較高的預測能力。同時也發現,儘管台股SESTM對於股票報酬之預測能力顯著有效,但我們透過評估績效發現,新聞對於臺灣投資人決策行為之影響與美國是顯著不同的,這些結果均符合我們對於臺灣股票市場的經濟直觀。我們期待此研究所建構之台股SESTM能夠幫助臺灣財務文字探勘領域建立研究基底。zh_TW
dc.description.abstract (摘要) In recent years, unstructured data has grown rapidly, which has triggered many scholars to conduct research and analysis on the impact of news media on stock price returns. News article is the most common and accessible “open information” by investors when they conduct transactions. However, news articles, unlike financial report or stock price, news articles cannot be converted to specific numerical data as a reference basis for investment. Our research intends to obtain sentiment information from Taiwan stocks news through text-mining and use news sentiment scores to predict Taiwan stocks` returns. Our research is based on the text-mining methodology introduce by Ke, Kelly & Xiu (2019) to construct a Taiwan stock news sentiment model (Taiwan Stocks Sentiment Extraction via Screening and Topic Modeling, Taiwan SESTM). We found that this methodology is particularly suitable for analyzing the relationship between news articles and stock price trends. Therefore, this study intends to extend this text-mining methodology to the Taiwan stock market and use the empirical analysis of Taiwan`s efficiency-market hypothesis by news articles. We found that using the news sentiment score estimated by Taiwan SESTM to construct a portfolio trading strategy in the Taiwan stock market also has huge economic benefits, and the sentiment score is significantly effective on predict stock returns and explain their correlation. We compare the performance of the United States and Taiwan SESTM trading strategies, we found that Taiwan SESTM has a higher predictive ability for stock price returns before the news articles release. At the same time, we also found the impact of news on the decision making of Taiwanese investors is significantly different with United States by evaluate our portfolio performance. These results are in line with our economic intuition about the Taiwan stock market. We hope that the Taiwan SESTM constructed by this research can help establish a research base in the field of financial text-mining in Taiwan.en_US
dc.description.tableofcontents 目錄
1 緒論             7
1.1 研究背景           7
1.2 研究動機與目的        9
2   文獻回顧           10
2.1 自然語言處理         10
2.1.1 文字探勘及量化        10
2.1.2 文字探勘於財務領域之應用   11
2.2 效率市場假說         13
3 研究方法           16
3.1 模型設定           16
3.1.1 資料結構           16
3.1.2 股票報酬分配         17
3.1.3 新聞文本分配         17
3.2 模型估計           18
3.2.1 篩選情感詞          19
3.2.2 建構新聞情緒分數模型     20
3.2.3 估計新文章情緒分數      22
3.3 台股新聞情緒分數模型估計步驟 23
4 實證分析           24
4.1 資料來源與敘述統計      24
4.2 資料預處理          27
4.2.1 自然語言處理         27
4.2.2 正規化            29
4.2.3 新聞情緒分數範例       30
4.3 實證結果           33
4.3.1 訓練及預測股票報酬      33
4.3.2 情感詞            35
4.3.3 實證臺灣效率市場假說     36
4.3.4 新聞與價格延遲之關係     40
4.3.5 新聞反應速度         43
5 結論與建議          47
6 參考文獻           48
zh_TW
dc.format.extent 3157934 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0108354023en_US
dc.subject (關鍵詞) 非結構化數據zh_TW
dc.subject (關鍵詞) 文字探勘zh_TW
dc.subject (關鍵詞) 股票新聞zh_TW
dc.subject (關鍵詞) 機器學習zh_TW
dc.subject (關鍵詞) 預測股票報酬zh_TW
dc.subject (關鍵詞) 情緒分析zh_TW
dc.subject (關鍵詞) 效率市場假說zh_TW
dc.subject (關鍵詞) 超額報酬zh_TW
dc.subject (關鍵詞) Unstructured Dataen_US
dc.subject (關鍵詞) Text Miningen_US
dc.subject (關鍵詞) Stock Newsen_US
dc.subject (關鍵詞) Machine Learningen_US
dc.subject (關鍵詞) Predict Stock Returnsen_US
dc.subject (關鍵詞) Sentiment Analysisen_US
dc.subject (關鍵詞) Efficient-Market Hypothesisen_US
dc.subject (關鍵詞) Abnormal Returnsen_US
dc.title (題名) 透過文字探勘預測台股報酬zh_TW
dc.title (題名) Predicting Taiwan Stocks Returns with Text Dataen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) 1. 李昱穎. (2019). 新聞輿情分析在台灣股票市場之應用: 文字轉向量與動能策略. 政治大學金融學系學位論文, 1-40.
2. 陳信宏, 陳昱志,& 鄭舜仁.(2006). 以時間數列模型檢定台灣股票市場弱式效率性之研究. 管理科學與統計決策, 3(4), 8-17.
3. 鍾任明, 李維平, & 吳澤民. (2005). 運用文字探勘於日內股價漲跌趨勢預測之研究 (Doctoral dissertation, 撰者).
4. Azar, P. D., & Lo, A. W. (2016). The wisdom of Twitter crowds: Predicting stock market reactions to FOMC meetings via Twitter feeds. The Journal of Portfolio Management, 42(5), 123-134.
5. Alvarez-Ramirez, J., Rodriguez, E., & Espinosa-Paredes, G. (2012). Is the US stock market becoming weakly efficient over time? Evidence from 80-year-long data. Physica A: Statistical Mechanics and its Applications, 391(22), 5643-5647.
6. Bernard, V. L., & Thomas, J. K. (1990). Evidence that stock prices do not fully reflect the implications of current earnings for future earnings. Journal of Accounting and Economics, 13(4), 305-340.
7. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12, 2493-2537.
8. Cowles 3rd, A. (1933). Can stock market forecasters forecast?. Econometrica: Journal of the Econometric Society, 309-324.
9. Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q. V., & Salakhutdinov, R. (2019). Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860.
10. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
11. Fama, E. F. (1970). Efficient capital markets: A review of theory and empirical work. The Journal of Finance, 25(2), 383-417.
12. Fan, J., & Lv, J. (2008). Sure independence screening for ultrahigh dimensioal feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(5), 849-911.
13. Gehring, J., Auli, M., Grangier, D., Yarats, D., & Dauphin, Y. N. (2017, July). Convolutional sequence to sequence learning. In International Conference on Machine Learning (pp. 1243-1252). PMLR.
14. Heston, S. L., & Sinha, N. R. (2017). News vs. sentiment: Predicting stock returns from news stories. Financial Analysts Journal, 73(3), 67-83.
15. Hutchins, R. M. (1954). Great books. Western World.
16. Jegadeesh, N., & Titman, S. (1993). Returns to buying winners and selling losers: Implications for stock market efficiency. The Journal of Finance, 48(1), 65-91.
17. Jegadeesh, N., & Wu, D. (2013). Word power: A new approach for content analysis. Journal of Financial Economics, 110(3), 712-729.
18. Kalchbrenner, N., Grefenstette, E., & Blunsom, P. (2014). A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188.
19. Ke, Z. T., Kelly, B. T., & Xiu, D. (2019). Predicting returns with text data (No. w26186). National Bureau of Economic Research.
20. Lakonishok, J., & Vermaelen, T. (1990). Anomalous price behavior around repurchase tender offers. The Journal of Finance, 45(2), 455-477.
21. Le, Q., & Mikolov, T. (2014, June). Distributed representations of sentences and documents. In International Conference on Machine Learning (pp. 1188-1196). PMLR.
22. Loper, E., & Bird, S. (2002). NLTK: the natural language toolkit. arXiv preprint cs/0205028.
23. Loughran, T., & McDonald, B. (2011). When is a liability not a liability? Textual analysis, dictionaries, and 10‐Ks. The Journal of Finance, 66(1), 35-65.
24. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, 26, 3111-3119.
25. Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Polosukhin, I. (2017). Attention is all you need. arXiv preprint arXiv:1706.03762.
26. Ritter, J. R. (1991). The long‐run performance of initial public offerings. The Journal of Finance, 46(1), 3-27.
27. Spiess, D. K., & Affleck-Graves, J. (1995). Underperformance in long-run stock returns following seasoned equity offerings. Journal of Financial Economics, 38(3), 243-267.
28. Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with
neural networks. arXiv preprint arXiv:1409.3215.
29. Tetlock, P. C. (2007). Giving content to investor sentiment: The role of media in the stock market. The Journal of Finance, 62(3), 1139-1168.
30. Tetlock, P. C. (2014). Information transmission in finance. Annual Review of Financial Economics, 6(1), 365-384.
31. Turing, I. B. A. (1950). Computing machinery and intelligence-AM Turing. Mind, 59(236), 433.
32. Wilson, D. S. (1975). A theory of group selection. Proceedings of the National Academy of Sciences, 72(1), 143-146.
33. Yang, B., Yih, W. T., He, X., Gao, J., & Deng, L. (2014). Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575.
34. Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv preprint arXiv:1802.05365.
35. Zhang, Y., & Wallace, B. (2015). A sensitivity analysis of (and practitioners` guide to) convolutional neural networks for sentence classification. arXiv preprint arXiv:1510.03820.
zh_TW
dc.identifier.doi (DOI) 10.6814/NCCU202101087en_US