在臺灣新聞資料下透過貪婪演算法預測股票報酬

程長磊; Cheng, Chang-Lei

Please use this identifier to cite or link to this item: https://ah.lib.nccu.edu.tw/handle/140.119/146908

題名:	在臺灣新聞資料下透過貪婪演算法預測股票報酬 Predicting Stock Returns via Greedy Algorithm with Taiwanese News Data
作者:	程長磊 Cheng, Chang-Lei
貢獻者:	林士貴<br>翁久幸 Lin, Shi-Kui<br>Weng, Chiu-Hsing 程長磊 Cheng, Chang-Lei
關鍵詞:	文字探勘統計學習新聞情緒分析預測股票報酬 OGA CGA Text mining Statistical Learning News Sentiment Analysis Stock Returns Prediction OGA CGA
日期:	2023
上傳時間:	1-Sep-2023
摘要:	隨著大數據、自然語言處理等領域發展，使得非結構化資料(Unstructured Data)具有極大的學術研究價值，尤其是文本資料。許多研究著手文字訊息對資產報酬之影響，使其成為財務領域中重要的研究目標之一，然而文本資料屬於高維度資料，如何正確分析文本資料與報酬間的關係成為此類研究的重要議題。而新聞文章是投資人在交易時最普遍接觸的文本資料，新聞文章與財報資料不同的地方在於新聞文章並沒有實際量化資料做為投資的依據，因此本研究欲透過Ing and Lai (2011)提出之 Orthogonal Greedy Algorithm (OGA) 以及由Chen, Dai, Ing, Lai (2019) 所改良之Chebyshev Greedy Algorithm (CGA) 高維度選模模型，挑選新聞中常用字詞之文字探勘方法以量化新聞文章之情緒分數，並在排除公司報酬因子下計算新聞情緒因子與公司報酬間之關係，並比較當應變數報酬為線性或是非線性的假設之下，利用新聞情緒分數所建構之投資組合之報酬差異。在應變數報酬為連續變數之線性假設下使用 OGA 並推廣為 OGA Predict模型，而在應變數報酬為非線性假設下則使用CGA並推廣為CGA Predict模型，並將上述兩種選模方法創新應用於財務文本分析之中。我們發現相較於OGA Predict，CGA predict模型可以得到更好的超額報酬，同時透過績效評估發現，新聞文章情緒對於散戶投資人為主的臺灣市場之影響與法人投資人為主的美國市場相比是顯著不同的，其結果也符合我們對於臺灣股票市場的經濟直觀。 The development of unstructured data grows fast and has the value of research along with the improvement of the realm of big data, especially for textual data. However, textual data are high dimensional data (i.e. the number of text in the news articles far exceeded than the news articles themselves.), therefore analyzing the relationship between textual data and the average return correctly has been an important issue according to this realm of research. When trading, the textual data that are most commonly received by investors are news articles. The difference between news articles and financial statements is that news articles can not provide quantitative information as an investment foundation. Therefore, we suppose to use two different kinds of high dimensional model selection methods, Orthogonal Greedy Algorithm(Ing and Lai (2011)) and Chebyshev Greedy Algorithm(Chen, Dai, Ing, Lai(2019)), and then select the frequently use words from news articles in order to quantify the sentiment scores of news articles. Moreover, we compare the difference of the portfolio returns which are constructed under two different assumptions(linear or nonlinear) of dependent variables according to the news sentiments. We use the OGA predict model to construct news sentiment when the dependent variable is under linear assumption, otherwise, we use the CGA predict. We find that the average return from the CGA predict model is better than the average return from the OGA predict model. Moreover, there is a significant difference in decision making when trading between the Taiwanese market and US market.
參考文獻:	1. 郭亭佑. (2021). 透過文字探勘預測台股報酬. 政治大學金融學系學位論文\n2. Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3,993-1022\n3. Chen, Y. L, Dai, C. S and Ing, C. K (2019). High dimensional model selection via Chebyshev greedy algorithms. Working paper.\n4. Fan, J., Xue, L., and Zhou, Y. (2021). How much can machines learn finance from Chinese text data?. Working Paper.\n5. Gentzkow, M., Kelly, B., and Taddy, M. (2019). Text as data. Journal of Economic Literature, 57 (3), 535-74.\n6. Henry, E. (2008). Are investors influenced by how earnings press releases are written?. The Journal of Business Communication, 45(4), 363–407.\n7. Ing, C. K., and Lai, T. L. (2011). A stepwise regression method and consistent model selection for high-dimensional sparse linear models. Statistica Sinica, 1473-1513.\n8. Jegadeesh, N., and Wu, D. (2013). Word power: A new approach for content analysis. Journal of Financial Economics, 110(3), 712-729.\n9. Ke, Z. T., Kelly, B. T., and Xiu, D. (2019). Predicting returns with text data. Working Paper.\n10. Loughran, T., and McDonald, B. (2011). When is a liability not a liability? Textual analysis,\ndictionaries, and 10-Ks. Journal of Finance, 66(1), 35-65.\n11. Manela, A., and Moreira, A. (2017). News implied volatility and disaster concerns. Journal of Financial Economics, 123(1), 137–162.\n12. Temlyakov, V. N. (2015). Greedy approximation in convex optimization. Constructive Approximation, 41(2), 269-296.\n13. Tetlock, P. C. (2007). Giving content to investor sentiment: The role of media in the stock market. The Journal of Finance, 62(3), 1139-1168.\n14. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267-288.\n15. You, J., Zhang, B., and Zhang, L. (2018). Who captures the power of the pen?. Review of Financial Studies, 31(1), 43–96.
描述:	碩士國立政治大學統計學系 110354030
資料來源:	http://thesis.lib.nccu.edu.tw/record/#G0110354030
資料類型:	thesis
Appears in Collections:	學位論文

Files in This Item:

File	Description	Size	Format
403001.pdf		3.3 MB	Adobe PDF2	View/Open

Show full item record

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Google Scholar^TM