Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 結合數值資訊與文字資訊的股價預測模型: 以有考慮ESG因子的股票為例
Stock price prediction model combining numerical information and text information: taking stocks that consider ESG factors as an example
作者 蔡青穎
貢獻者 黃泓智
蔡青穎
關鍵詞 ESG投資
嵌入模型
離散小波轉換
集成學習
ESG investment
Embedding model
Discrete wavelet transform
Ensemble learning
日期 2024
上傳時間 5-Aug-2024 14:03:16 (UTC+8)
摘要 本研究探討結合數值資訊與文字資訊來預測有考慮ESG因子股票的隔日報酬率表現,提出了一種多元的預測模型。ESG投資日益受到重視,不僅因其社會責任和環境影響,還因其在長期財務表現中的潛力。然而,台灣現有的ESG數據庫存在資料缺失和標準不一致等問題,促使我們採用不同的文字資料來源進行預測。研究方法包括資料的預處理,包含利用M3-Embedding的嵌入模型作為文字資訊的向量化處理,並跳脫以往情緒分析的框架,直接將所有向量作為特徵值,以及數值資料的離散小波轉換,機器學習部分則利用多種機器學習模型(包括極限學習機、隨機森林、多層感知器、支援向量機以及卷積神經網路)和集成學習方法進行訓練和比較。實證結果顯示,僅含有數值資訊的模型仍有較低的誤差值,然而,整體來看,綜合了數值和文字資訊的模型在預測股價報酬率和風險控制方面均表現出較好的績效,尤其是在夏普比率、最大回落以及報酬率等績效指標上優於僅使用單一類型資訊的模型,且更能有效地利用市場的即時資訊進行預測。綜上,本研究證明了在股票預測中結合文字和數值資訊的可行性和優勢,為ESG投資的相關研究提供了新的方向和參考。
This study explores the combination of numerical information and text information to predict the next-day return performance of stocks that consider ESG factors, and proposes a multivariate prediction model. ESG investing is increasingly valued not only for its social responsibility and environmental impact, but also for its potential in long-term financial performance. However, existing ESG databases in Taiwan have problems such as missing data and inconsistent standards, which prompts us to use different textual data sources for prediction. Research methods include data preprocessing, including using the M3-Embedding embedding model as vectorization processing of text information, breaking away from the previous sentiment analysis framework, directly using all vectors as features, and discrete wavelet transformation of numerical data. Also, this study uses a variety of machine learning models (including extreme learning machines, random forests, multi-layer perceptrons, support vector machines and convolutional neural networks) and ensemble learning method for training and comparison. Empirical results show that models containing only numerical information still have lower error values. However, the model that combine numerical and text information show better performance in predicting stock price changes and risk control, especially in terms of Sharpe ratio, performance indicators such as maximum drawdown and return rate are better than models that only use a single type of information, and can more effectively utilize real-time market information for prediction. In summary, this study proves the feasibility and advantages of combining text and numerical information in stock prediction, and provides new directions and references for ESG investment-related research.
參考文獻 吳漢瑞. (2011). 應用文字探勘技術於臺灣上市公司重大訊息對股價影響之研究 吳漢瑞]. 林美雯. (2016). 台灣上市公司重大訊息揭露與股票行為之關聯性研究 東吳大學]. 臺灣博碩士論文知識加值系統. 台北市. https://hdl.handle.net/11296/gvxxta 孫亦農. (2023). BERT模型在財金新聞情緒與台灣股票報酬預測之運用 國立中山大學]. 臺灣博碩士論文知識加值系統. 高雄市. https://hdl.handle.net/11296/3g98f5 Chen, J., Xiao, S., Zhang, P., Luo, K., Lian, D., & Liu, Z. (2024). Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation. arXiv preprint arXiv:2402.03216. Gao, L., Dai, Z., & Callan, J. (2021). COIL: Revisit exact lexical match in information retrieval with contextualized inverted list. arXiv preprint arXiv:2104.07186. Hafez, P., & Gomez, F. (2019). Socially responsible investing: Combining ESG ratings with news sentiment generates alpha. Haryono, A. T., Sarno, R., & Abdullah, R. (2022). Aspect-based sentiment analysis of financial headlines and microblogs using semantic similarity and bidirectional long short-term memory. International Journal of Intelligent Engineering and Systems, 15(3), 233-241. Huang, G.-B., Zhu, Q.-Y., & Siew, C.-K. (2006). Extreme learning machine: theory and applications. Neurocomputing, 70(1-3), 489-501. Khattab, O., & Zaharia, M. (2020). Colbert: Efficient and effective passage search via contextualized late interaction over bert. Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, Kumar, M., & Thenmozhi, M. (2006). Forecasting stock index movement: A comparison of support vector machines and random forest. Indian institute of capital markets 9th capital markets conference paper, Lepik, Ü., & Tamme, E. (2007). Solution of nonlinear Fredholm integral equations via the Haar wavelet method. Proceedings of the Estonian Academy of Sciences, Physics, Mathematics, Maqbool, J., Aggarwal, P., Kaur, R., Mittal, A., & Ganaie, I. A. (2023). Stock prediction by integrating sentiment scores of financial news and MLP-regressor: A machine learning approach. Procedia Computer Science, 218, 1067-1078. Mehta, S., Rana, P., Singh, S., Sharma, A., & Agarwal, P. (2019). Ensemble learning approach for enhanced stock prediction. 2019 twelfth international conference on contemporary computing (IC3), Mehtab, S., & Sen, J. (2020). Stock price prediction using CNN and LSTM-based deep learning models. 2020 International Conference on Decision Aid Sciences and Application (DASA), Miche, Y. (2010). Publication A Yoan Miche, Antti Sorjamaa, Patrick Bas, Olli Simula, Christian Jutten, and Amaury Lendasse. 2010. OP-ELM: Optimally Pruned Extreme Learning Machine. IEEE Transactions on Neural Networks, volume 21, number 1, pages 158-162. IEEE TRANSACTIONS ON NEURAL NETWORKS, 21(1). Ortega, L., & Khashanah, K. (2014). A neuro‐wavelet model for the short‐term forecasting of high‐frequency time series of stock returns. Journal of Forecasting, 33(2), 134-146. Pedersen, L. H., Fitzgibbons, S., & Pomorski, L. (2021). Responsible investing: The ESG-efficient frontier. Journal of financial economics, 142(2), 572-597. Sakhare, N. N., & Imambi, S. S. (2019). Performance analysis of regression based machine learning techniques for prediction of stock market movement. Int. J. Recent Technol. Eng, 7(6S4), 206-213. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
描述 碩士
國立政治大學
風險管理與保險學系
111358028
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0111358028
資料類型 thesis
dc.contributor.advisor 黃泓智zh_TW
dc.contributor.author (Authors) 蔡青穎zh_TW
dc.creator (作者) 蔡青穎zh_TW
dc.date (日期) 2024en_US
dc.date.accessioned 5-Aug-2024 14:03:16 (UTC+8)-
dc.date.available 5-Aug-2024 14:03:16 (UTC+8)-
dc.date.issued (上傳時間) 5-Aug-2024 14:03:16 (UTC+8)-
dc.identifier (Other Identifiers) G0111358028en_US
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/152791-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 風險管理與保險學系zh_TW
dc.description (描述) 111358028zh_TW
dc.description.abstract (摘要) 本研究探討結合數值資訊與文字資訊來預測有考慮ESG因子股票的隔日報酬率表現,提出了一種多元的預測模型。ESG投資日益受到重視,不僅因其社會責任和環境影響,還因其在長期財務表現中的潛力。然而,台灣現有的ESG數據庫存在資料缺失和標準不一致等問題,促使我們採用不同的文字資料來源進行預測。研究方法包括資料的預處理,包含利用M3-Embedding的嵌入模型作為文字資訊的向量化處理,並跳脫以往情緒分析的框架,直接將所有向量作為特徵值,以及數值資料的離散小波轉換,機器學習部分則利用多種機器學習模型(包括極限學習機、隨機森林、多層感知器、支援向量機以及卷積神經網路)和集成學習方法進行訓練和比較。實證結果顯示,僅含有數值資訊的模型仍有較低的誤差值,然而,整體來看,綜合了數值和文字資訊的模型在預測股價報酬率和風險控制方面均表現出較好的績效,尤其是在夏普比率、最大回落以及報酬率等績效指標上優於僅使用單一類型資訊的模型,且更能有效地利用市場的即時資訊進行預測。綜上,本研究證明了在股票預測中結合文字和數值資訊的可行性和優勢,為ESG投資的相關研究提供了新的方向和參考。zh_TW
dc.description.abstract (摘要) This study explores the combination of numerical information and text information to predict the next-day return performance of stocks that consider ESG factors, and proposes a multivariate prediction model. ESG investing is increasingly valued not only for its social responsibility and environmental impact, but also for its potential in long-term financial performance. However, existing ESG databases in Taiwan have problems such as missing data and inconsistent standards, which prompts us to use different textual data sources for prediction. Research methods include data preprocessing, including using the M3-Embedding embedding model as vectorization processing of text information, breaking away from the previous sentiment analysis framework, directly using all vectors as features, and discrete wavelet transformation of numerical data. Also, this study uses a variety of machine learning models (including extreme learning machines, random forests, multi-layer perceptrons, support vector machines and convolutional neural networks) and ensemble learning method for training and comparison. Empirical results show that models containing only numerical information still have lower error values. However, the model that combine numerical and text information show better performance in predicting stock price changes and risk control, especially in terms of Sharpe ratio, performance indicators such as maximum drawdown and return rate are better than models that only use a single type of information, and can more effectively utilize real-time market information for prediction. In summary, this study proves the feasibility and advantages of combining text and numerical information in stock prediction, and provides new directions and references for ESG investment-related research.en_US
dc.description.tableofcontents 圖目錄 V 表目錄 VI 第壹章 緒論 1 第一節 研究動機與背景 1 第二節 研究目的 2 第三節 研究流程 2 第貳章 文獻回顧 3 第一節 特徵選用 4 第二節 資料預處理 6 第三節 機器學習方法 7 第四節 集成學習方法 9 第參章、研究方法 10 第一節 研究架構 10 第二節 資料預處理 12 第三節 機器學習模型建構 16 第四節 集成學習方法 23 第五節 誤差指標說明 23 第六節 績效指標說明 24 第肆章、實證結果 25 第一節 模型訓練結果分析 25 第二節 績效比較分析 30 第伍章 結論與建議 38 第一節 結論 38 第二節 未來建議 39 參考文獻 41zh_TW
dc.format.extent 2731913 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0111358028en_US
dc.subject (關鍵詞) ESG投資zh_TW
dc.subject (關鍵詞) 嵌入模型zh_TW
dc.subject (關鍵詞) 離散小波轉換zh_TW
dc.subject (關鍵詞) 集成學習zh_TW
dc.subject (關鍵詞) ESG investmenten_US
dc.subject (關鍵詞) Embedding modelen_US
dc.subject (關鍵詞) Discrete wavelet transformen_US
dc.subject (關鍵詞) Ensemble learningen_US
dc.title (題名) 結合數值資訊與文字資訊的股價預測模型: 以有考慮ESG因子的股票為例zh_TW
dc.title (題名) Stock price prediction model combining numerical information and text information: taking stocks that consider ESG factors as an exampleen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) 吳漢瑞. (2011). 應用文字探勘技術於臺灣上市公司重大訊息對股價影響之研究 吳漢瑞]. 林美雯. (2016). 台灣上市公司重大訊息揭露與股票行為之關聯性研究 東吳大學]. 臺灣博碩士論文知識加值系統. 台北市. https://hdl.handle.net/11296/gvxxta 孫亦農. (2023). BERT模型在財金新聞情緒與台灣股票報酬預測之運用 國立中山大學]. 臺灣博碩士論文知識加值系統. 高雄市. https://hdl.handle.net/11296/3g98f5 Chen, J., Xiao, S., Zhang, P., Luo, K., Lian, D., & Liu, Z. (2024). Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation. arXiv preprint arXiv:2402.03216. Gao, L., Dai, Z., & Callan, J. (2021). COIL: Revisit exact lexical match in information retrieval with contextualized inverted list. arXiv preprint arXiv:2104.07186. Hafez, P., & Gomez, F. (2019). Socially responsible investing: Combining ESG ratings with news sentiment generates alpha. Haryono, A. T., Sarno, R., & Abdullah, R. (2022). Aspect-based sentiment analysis of financial headlines and microblogs using semantic similarity and bidirectional long short-term memory. International Journal of Intelligent Engineering and Systems, 15(3), 233-241. Huang, G.-B., Zhu, Q.-Y., & Siew, C.-K. (2006). Extreme learning machine: theory and applications. Neurocomputing, 70(1-3), 489-501. Khattab, O., & Zaharia, M. (2020). Colbert: Efficient and effective passage search via contextualized late interaction over bert. Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, Kumar, M., & Thenmozhi, M. (2006). Forecasting stock index movement: A comparison of support vector machines and random forest. Indian institute of capital markets 9th capital markets conference paper, Lepik, Ü., & Tamme, E. (2007). Solution of nonlinear Fredholm integral equations via the Haar wavelet method. Proceedings of the Estonian Academy of Sciences, Physics, Mathematics, Maqbool, J., Aggarwal, P., Kaur, R., Mittal, A., & Ganaie, I. A. (2023). Stock prediction by integrating sentiment scores of financial news and MLP-regressor: A machine learning approach. Procedia Computer Science, 218, 1067-1078. Mehta, S., Rana, P., Singh, S., Sharma, A., & Agarwal, P. (2019). Ensemble learning approach for enhanced stock prediction. 2019 twelfth international conference on contemporary computing (IC3), Mehtab, S., & Sen, J. (2020). Stock price prediction using CNN and LSTM-based deep learning models. 2020 International Conference on Decision Aid Sciences and Application (DASA), Miche, Y. (2010). Publication A Yoan Miche, Antti Sorjamaa, Patrick Bas, Olli Simula, Christian Jutten, and Amaury Lendasse. 2010. OP-ELM: Optimally Pruned Extreme Learning Machine. IEEE Transactions on Neural Networks, volume 21, number 1, pages 158-162. IEEE TRANSACTIONS ON NEURAL NETWORKS, 21(1). Ortega, L., & Khashanah, K. (2014). A neuro‐wavelet model for the short‐term forecasting of high‐frequency time series of stock returns. Journal of Forecasting, 33(2), 134-146. Pedersen, L. H., Fitzgibbons, S., & Pomorski, L. (2021). Responsible investing: The ESG-efficient frontier. Journal of financial economics, 142(2), 572-597. Sakhare, N. N., & Imambi, S. S. (2019). Performance analysis of regression based machine learning techniques for prediction of stock market movement. Int. J. Recent Technol. Eng, 7(6S4), 206-213. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.zh_TW