學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

題名 輔以機器學習的新聞文本情緒分類於投資組合建構
Machine-learning assisted portfolio construction based on news sentiment classification
作者 李晨瑜
Lee, Chen-Yu
貢獻者 江彌修
Chiang, Mi-Hsiu
李晨瑜
Lee, Chen-Yu
關鍵詞 機器學習
文字探勘
文本分類
情緒分析
資產配置
投資組合
Machine learning
Text mining
Text classification
Sentiment analysis
Asset allocation
Portfolio construction
日期 2020
上傳時間 3-Aug-2020 17:37:55 (UTC+8)
摘要 過去傳統財務理論認為情緒的改變導致的需求衝擊無法影響資產價格,不過隨著行為財務學的發展,我們認識到情緒的掌握才是投資獲利的關鍵,而近年來處理非結構化資料技術快速發展,我們也得以將文本資料作為情緒萃取來源。本研究將台灣50 ETF 成分個股作為標的對象,以個股相關中文新聞文本透過樸素貝葉斯分類器、支持向量機與隨機森林等分類模型預測結果萃取出新聞情緒,首先驗證各分類模型預測成效優劣,並以模型預測結果建立情緒指標,作為投資組合建構依據,最後探討投資組合績效表現。實證結果顯示,在新聞情緒分類上,隨機森林模型整體而言能達到較佳的效率;而以新聞情緒指標來做為投資組合中調整個股權重的依據,當個股新聞多呈現正面情緒時增加該個股權重、呈現負面情緒時則減少該個股持有,確實能帶來相對大盤的超額報酬,其累積獲利能力能優於台灣50 ETF 與均等加權投資組合。
In the past, traditional financial theory believed that the demand shock caused by the change of sentiment could not affect asset prices. However, with the development of behavioral finance, we recognize that the grasp of sentiment is the key to have profitable investment. As technology advances in handling unstructured data, now we can also use text data as a source of sentiment extraction. In the paper, we choose stocks from Taiwan Top 50 Tracker Fund as our target, and news sentiment is extracted from the prediction results of classification models such as naïve Bayes classifiers, support vector machine and random forests with the Chinese news related to these stocks. We firstly verify the prediction ability of each classification model, and second, we discuss the performance of stock portfolio which is constructed by the sentiment index generated from previous step. The results show that in the classification of news sentiment, random forest can achieve better efficiency in general. The empirical results also show that if we use news sentiment index as the basis for adjusting the weight of stock in portfolio, when the news of related stock shows more positive sentiment, increase the weight of that stock, and vice versa, it indeed brings excess return relative to the market, and its cumulative profitability can be better than Taiwan Top 50 Tracker Fund or the equally-weighted portfolio.
參考文獻 [1] 王韻怡、池祥萱、周冠男(2016),行為財務學文獻回顧與展望:台灣市場之研究。經濟論文叢刊。第四十四卷,第一期,頁1-55。
[2] 田高銘(2019),新聞文本情緒分類之實證研究 – 以鉅亨網新聞為例,國立中山大學財務管理研究所碩士論文。
[3] 李昱穎(2019),新聞輿情分析在台灣股票市場之應用:文字轉向量動能策略,國立政治大學金融研究所碩士論文。
[4] 林政修(2017),文字探勘投資策略分析,國立雲林科技大學財金系碩士論文。
[5] 周賓凰、張宇志、林美珍(2019),投資人情緒與股票報酬互動關係。證券市場發展季刊,行為財務學特別專刊,頁153-190。
[6] 陳俊達、王台平、劉昭麟(2007),以文件分類技術預測股價趨勢。自然語言與語音處理研討會論文集。
[7] 蔡承恩(2019),10-K財報情緒與多因子模型對超額報酬之影響。國立政治大學金融研究所碩士論文。
[8] 鍾任明(2007),運用文字探勘於日內股價漲跌趨勢預測之研究。中華管理評論國際學報。
[9] Ammann, M., Frey, R., & Verhofen, M. (2014). Do newspaper articles predict stock returns? Journal of Behavior Finance, 15(3) 195-213.
[10] Bernstein, J. (2008). 投資心理學(陳重亨譯)。台北:財信。(原著出版於2000)
[11] Cagliero, L., Attanasio, G., Garza, P., & Baralis, E. (2019). Combining news sentiment and technical analysis to predict stock trend reversal. 2019 International Conference on Data Mining Workshops (ICDMW), Beijing, China, pp. 514-521.
[12] Deng, S., Mitsubuchi, T., Shioda, K., Shimada, T., & Sakurai, A. (2011). Combining technical analysis with Sentiment Analysis for stock price prediction. 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing, Sydney, NSW, pp. 800-807.
[13] Gidofalvi, G. (2001). Using news articles to predict stock price movements. Department of Computer Science and Engineering. University of California, San Diego.
[14] Hui, J. L. O., Hoon, G. K., & Zainon, W.M.N.W. (2017). Effects of word class and text position in sentiment-based news classification. Procedia Computer Science. Vol. 124, Pages 77-85.
[15] Joshi, K., Bharathi, H., & Rao, J. (2016). Stock trend prediction using news. International Journal of Computer Science & Information Technology (IJCSIT) Vol 8, No 3.
[16] Kaya, M., & Karsligil, M. (2010). Stock price prediction using financial news articles. 2010 2nd IEEE International Conference on Information and Financial Engineering, Chongqing, pp. 478-482.
[17] Koppel, M., & Shtrimberg, I. (2006). Good news or bad news? Let the market decide. In Computing attitude and affect in text: Theory and applications, 297-301.
[18] Lee, C. J. (2010). Multi-factor model and enhanced index fund performance. Master’s thesis, Department of Finance, National Sun Yat-Sen University.
[19] Mittermayer, M. (2004). Forecasting intraday stock price trends with text mining techniques, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the, Big Island, HI, pp. 10 pp.-.
[20] Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
[21] Pompian, M. M. (2008). 行為財務學與財富管理(歐陽秀宜、陳軒儀譯)。台北:台灣金融研訓院。(原著出版於2006)
[22] Jing, L., Huang, H., & Shi, H. (2002). Improved feature selection approach TFIDF in text mining. Proceedings. International Conference on Machine Learning and Cybernetics, 2, pp. 944-946 vol.2.
[23] Picasso, A., Merello, S., Ma, Y., Oneto, L., & Cambria, E. (2019). Technical analysis and sentiment embeddings for market trend prediction. Expert Syst. Appl. Volume 135, Pages 60-70.
[24] Song, Q., Yang, S. Y., & Liu, A. (2017). Stock portfolio selection using Learning-to-rank algorithms with news sentiment. Neurocomputing, Volume 264, 15 November 2017, Pages 20-28.
[25] Tsai, Y. G. (2011). A multi-factor model and enhanced index fund - with application in Singapore market. Master’s thesis, Department of Finance, National Sun Yat-Sen University.
[26] Ting, S.L., Ip, W.H., & Tsang, A. H.C. (2011). Is naive Bayes a good classifier for document classification? International Journal of Software Engineering and Its Application. Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University.
[27] Wang S., & Mannin C. (2012). Baselines and Bigrams: simple, good sentiment and topic classification. In proceedings of the 50th annual meeting of the association for computational linguistics. Short papers-volume 2, 90-94.
[28] Wu, J. L., Su, C. C., Yu, L. C., & Chang, P. C. (2012). Stock price prediction using combinational features from sentimental analysis of stock news and technical analysis of trading information. International Proceedings of Economics Development & Research, Vol. 55, p8.
[29] Xu, T., & Zhang, H. (2015). A new approach using Weibo data to predict the China Shanghai stock market. Proceedings of the 2015 International Conference on Artificial Intelligence and Industrial Engineering. Atlantis Press.
[30] Zhai, J., Cohen, N., & Atreya, A. (2011). Sentiment analysis of news articles for financial signal prediction. Stanford University.
描述 碩士
國立政治大學
金融學系
107352018
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0107352018
資料類型 thesis
dc.contributor.advisor 江彌修zh_TW
dc.contributor.advisor Chiang, Mi-Hsiuen_US
dc.contributor.author (Authors) 李晨瑜zh_TW
dc.contributor.author (Authors) Lee, Chen-Yuen_US
dc.creator (作者) 李晨瑜zh_TW
dc.creator (作者) Lee, Chen-Yuen_US
dc.date (日期) 2020en_US
dc.date.accessioned 3-Aug-2020 17:37:55 (UTC+8)-
dc.date.available 3-Aug-2020 17:37:55 (UTC+8)-
dc.date.issued (上傳時間) 3-Aug-2020 17:37:55 (UTC+8)-
dc.identifier (Other Identifiers) G0107352018en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/130989-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 金融學系zh_TW
dc.description (描述) 107352018zh_TW
dc.description.abstract (摘要) 過去傳統財務理論認為情緒的改變導致的需求衝擊無法影響資產價格,不過隨著行為財務學的發展,我們認識到情緒的掌握才是投資獲利的關鍵,而近年來處理非結構化資料技術快速發展,我們也得以將文本資料作為情緒萃取來源。本研究將台灣50 ETF 成分個股作為標的對象,以個股相關中文新聞文本透過樸素貝葉斯分類器、支持向量機與隨機森林等分類模型預測結果萃取出新聞情緒,首先驗證各分類模型預測成效優劣,並以模型預測結果建立情緒指標,作為投資組合建構依據,最後探討投資組合績效表現。實證結果顯示,在新聞情緒分類上,隨機森林模型整體而言能達到較佳的效率;而以新聞情緒指標來做為投資組合中調整個股權重的依據,當個股新聞多呈現正面情緒時增加該個股權重、呈現負面情緒時則減少該個股持有,確實能帶來相對大盤的超額報酬,其累積獲利能力能優於台灣50 ETF 與均等加權投資組合。zh_TW
dc.description.abstract (摘要) In the past, traditional financial theory believed that the demand shock caused by the change of sentiment could not affect asset prices. However, with the development of behavioral finance, we recognize that the grasp of sentiment is the key to have profitable investment. As technology advances in handling unstructured data, now we can also use text data as a source of sentiment extraction. In the paper, we choose stocks from Taiwan Top 50 Tracker Fund as our target, and news sentiment is extracted from the prediction results of classification models such as naïve Bayes classifiers, support vector machine and random forests with the Chinese news related to these stocks. We firstly verify the prediction ability of each classification model, and second, we discuss the performance of stock portfolio which is constructed by the sentiment index generated from previous step. The results show that in the classification of news sentiment, random forest can achieve better efficiency in general. The empirical results also show that if we use news sentiment index as the basis for adjusting the weight of stock in portfolio, when the news of related stock shows more positive sentiment, increase the weight of that stock, and vice versa, it indeed brings excess return relative to the market, and its cumulative profitability can be better than Taiwan Top 50 Tracker Fund or the equally-weighted portfolio.en_US
dc.description.tableofcontents 摘要 i
Abstract ii
目錄 iii
表目錄 v
圖目錄 vi
第一章 緒論 1
第一節 研究背景與動機 1
第二節 研究目的 2
第二章 文獻探討 5
第一節 文字探勘與文本情緒應用 5
第二節 文本分類模型 6
第三章 研究方法 9
第一節 資料蒐集與建立 10
一、新聞資料蒐集 10
二、新聞貼標設定 10
三、資料集劃分 11
第二節 資料前處理 12
一、結巴 (Jieba) 斷詞 12
二、TF-IDF (Term frequency-inverse document frequency) 關鍵詞彙萃取 13
第三節 新聞文本分類模型 15
一、樸素貝葉斯分類器 (Naïve-Bayes classifier, NBC) 15
二、支持向量機 (Support Vector Machine, SVM) 模型 16
三、隨機森林 (Random Forest, RF) 模型 18
第四節 分類模型預測能力衡量指標 21
一、混淆矩陣 (Confusion Matrix) 21
二、準確率 (Accuracy) 21
三、 F1 值 ( F1-score) 22
第五節 情緒量化方法 23
一、單一情緒分數 23
二、綜合情緒指標 23
第六節 投資組合建立 25
一、權重調整 25
二、績效評估 26
第四章 實證結果 28
第一節 實證資料敘述 28
第二節 新聞分類模型表現評估 31
一、各貼標設定下之模型混淆矩陣與預測能力結果 31
二、不同 TF-IDF 關鍵字詞特徵數下之模型混淆矩陣與預測能力結果 35
第三節 投資組合績效 39
一、單一情緒分數與綜合情緒指標比較 39
二、各模型及各貼標設定下生成之情緒指標比較 42
三、以不同 TF-IDF 特徵字詞數生成之情緒指標比較 46
第五章 研究結論與建議 52
第一節 研究結論 52
第二節 未來建議 53
參考文獻 54
附錄 57
zh_TW
dc.format.extent 3124160 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0107352018en_US
dc.subject (關鍵詞) 機器學習zh_TW
dc.subject (關鍵詞) 文字探勘zh_TW
dc.subject (關鍵詞) 文本分類zh_TW
dc.subject (關鍵詞) 情緒分析zh_TW
dc.subject (關鍵詞) 資產配置zh_TW
dc.subject (關鍵詞) 投資組合zh_TW
dc.subject (關鍵詞) Machine learningen_US
dc.subject (關鍵詞) Text miningen_US
dc.subject (關鍵詞) Text classificationen_US
dc.subject (關鍵詞) Sentiment analysisen_US
dc.subject (關鍵詞) Asset allocationen_US
dc.subject (關鍵詞) Portfolio constructionen_US
dc.title (題名) 輔以機器學習的新聞文本情緒分類於投資組合建構zh_TW
dc.title (題名) Machine-learning assisted portfolio construction based on news sentiment classificationen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1] 王韻怡、池祥萱、周冠男(2016),行為財務學文獻回顧與展望:台灣市場之研究。經濟論文叢刊。第四十四卷,第一期,頁1-55。
[2] 田高銘(2019),新聞文本情緒分類之實證研究 – 以鉅亨網新聞為例,國立中山大學財務管理研究所碩士論文。
[3] 李昱穎(2019),新聞輿情分析在台灣股票市場之應用:文字轉向量動能策略,國立政治大學金融研究所碩士論文。
[4] 林政修(2017),文字探勘投資策略分析,國立雲林科技大學財金系碩士論文。
[5] 周賓凰、張宇志、林美珍(2019),投資人情緒與股票報酬互動關係。證券市場發展季刊,行為財務學特別專刊,頁153-190。
[6] 陳俊達、王台平、劉昭麟(2007),以文件分類技術預測股價趨勢。自然語言與語音處理研討會論文集。
[7] 蔡承恩(2019),10-K財報情緒與多因子模型對超額報酬之影響。國立政治大學金融研究所碩士論文。
[8] 鍾任明(2007),運用文字探勘於日內股價漲跌趨勢預測之研究。中華管理評論國際學報。
[9] Ammann, M., Frey, R., & Verhofen, M. (2014). Do newspaper articles predict stock returns? Journal of Behavior Finance, 15(3) 195-213.
[10] Bernstein, J. (2008). 投資心理學(陳重亨譯)。台北:財信。(原著出版於2000)
[11] Cagliero, L., Attanasio, G., Garza, P., & Baralis, E. (2019). Combining news sentiment and technical analysis to predict stock trend reversal. 2019 International Conference on Data Mining Workshops (ICDMW), Beijing, China, pp. 514-521.
[12] Deng, S., Mitsubuchi, T., Shioda, K., Shimada, T., & Sakurai, A. (2011). Combining technical analysis with Sentiment Analysis for stock price prediction. 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing, Sydney, NSW, pp. 800-807.
[13] Gidofalvi, G. (2001). Using news articles to predict stock price movements. Department of Computer Science and Engineering. University of California, San Diego.
[14] Hui, J. L. O., Hoon, G. K., & Zainon, W.M.N.W. (2017). Effects of word class and text position in sentiment-based news classification. Procedia Computer Science. Vol. 124, Pages 77-85.
[15] Joshi, K., Bharathi, H., & Rao, J. (2016). Stock trend prediction using news. International Journal of Computer Science & Information Technology (IJCSIT) Vol 8, No 3.
[16] Kaya, M., & Karsligil, M. (2010). Stock price prediction using financial news articles. 2010 2nd IEEE International Conference on Information and Financial Engineering, Chongqing, pp. 478-482.
[17] Koppel, M., & Shtrimberg, I. (2006). Good news or bad news? Let the market decide. In Computing attitude and affect in text: Theory and applications, 297-301.
[18] Lee, C. J. (2010). Multi-factor model and enhanced index fund performance. Master’s thesis, Department of Finance, National Sun Yat-Sen University.
[19] Mittermayer, M. (2004). Forecasting intraday stock price trends with text mining techniques, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the, Big Island, HI, pp. 10 pp.-.
[20] Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
[21] Pompian, M. M. (2008). 行為財務學與財富管理(歐陽秀宜、陳軒儀譯)。台北:台灣金融研訓院。(原著出版於2006)
[22] Jing, L., Huang, H., & Shi, H. (2002). Improved feature selection approach TFIDF in text mining. Proceedings. International Conference on Machine Learning and Cybernetics, 2, pp. 944-946 vol.2.
[23] Picasso, A., Merello, S., Ma, Y., Oneto, L., & Cambria, E. (2019). Technical analysis and sentiment embeddings for market trend prediction. Expert Syst. Appl. Volume 135, Pages 60-70.
[24] Song, Q., Yang, S. Y., & Liu, A. (2017). Stock portfolio selection using Learning-to-rank algorithms with news sentiment. Neurocomputing, Volume 264, 15 November 2017, Pages 20-28.
[25] Tsai, Y. G. (2011). A multi-factor model and enhanced index fund - with application in Singapore market. Master’s thesis, Department of Finance, National Sun Yat-Sen University.
[26] Ting, S.L., Ip, W.H., & Tsang, A. H.C. (2011). Is naive Bayes a good classifier for document classification? International Journal of Software Engineering and Its Application. Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University.
[27] Wang S., & Mannin C. (2012). Baselines and Bigrams: simple, good sentiment and topic classification. In proceedings of the 50th annual meeting of the association for computational linguistics. Short papers-volume 2, 90-94.
[28] Wu, J. L., Su, C. C., Yu, L. C., & Chang, P. C. (2012). Stock price prediction using combinational features from sentimental analysis of stock news and technical analysis of trading information. International Proceedings of Economics Development & Research, Vol. 55, p8.
[29] Xu, T., & Zhang, H. (2015). A new approach using Weibo data to predict the China Shanghai stock market. Proceedings of the 2015 International Conference on Artificial Intelligence and Industrial Engineering. Atlantis Press.
[30] Zhai, J., Cohen, N., & Atreya, A. (2011). Sentiment analysis of news articles for financial signal prediction. Stanford University.
zh_TW
dc.identifier.doi (DOI) 10.6814/NCCU202000689en_US