學術產出-Theses
Article View/Open
Publication Export
-
題名 基於10-K報表ESG情緒萃取之企業違約預測模型:應用語意分析遷移學習
Corporate Default Prediction Model with ESG Sentiment: Transfer Learning-Based Sentiment Analysis of 10-K Reports作者 陳科穎
Chen, Ke-Ying貢獻者 江彌修
Chiang, Mi-Hsiu
陳科穎
Chen, Ke-Ying關鍵詞 BERT
FinBERT
10-K
機器學習
文本情緒
ESG
企業破產預測
BERT
FinBERT
10-K
Machine Learning
Text Sentiment
ESG
Corporate Bankruptcy Prediction日期 2023 上傳時間 6-Jul-2023 16:46:19 (UTC+8) 摘要 企業破產研究一直是財務論文中重要的命題,過往許多文獻使用不同方法研 究企業違約風險以及公司潛在破產因子,透過分析財務報表之會計數據套用於計 量模型進行回歸分析研究。然而早期論文中,較缺乏探討非結構資料對於破產因 子的重要性,近幾年的研究,逐漸加入文字特徵提取,文字探勘技術運用在許多 層面萃取情緒,包含央行會議紀錄、新聞標題與內文、產業研究報告、10-K、永 續報告書等,透過模型萃取情緒分數,並加入情緒因子訓練模型,並期望能強化 與改善模型預測能力。本次研究以結構型資料與非結構資料建立機器學習模型, 進行企業破產違約預測,非結構化資料採取 BERT (Bidirectional Encoder Representations from Transformers) 與 FinBERT (BERT for Financial Text Mining) 分 別萃取美國上市公司 10-K MD&A 報表,企業表達營運情緒的正負分數,以及管 理階層對於 ESG 相關討論之重視程度的情緒分數,觀察兩因子是否能有效增強機 器學習模型預測能力。根據實證,加入正負情緒分數與 ESG 情緒分數能讓機器學 習的 AUC、RECALL 上升,後續比較 Logistic、SVM、Random Forest、XGBoost 模型中,所有模型預測能力皆上升,並且發現過採樣 (SMOTE) 能夠解決樣本不平 衡問題,強化整體預測能力,而本次研究發現集成學習預測能力較線性模型表現 更好,且 XGBoost 為所有模型中預測效果最佳的模型。
Bankruptcy prediction has always been an important topic in financial literature. Past studies have used different methods to investigate corporate default risk and potential bankruptcy factors, applying regression analysis to financial statement accounting data. However, early literature lacked exploration of the importance of non-structural data for bankruptcy factors. In recent years, research has gradually incorporated text feature extraction and text mining techniques to extract sentiment, including central bank meeting records, Fed minutes, news headlines and content, industry research reports, 10-K, and sustainability reports. By extracting sentiment scores through models and incorporating emotional factors into the training process, it is hoped to enhance the predictive power of the model. This study establishes a machine learning model based on structured and unstructured data to predict corporate bankruptcy and default. Unstructured data is extracted using BERT (Bidirectional Encoder Representations from Transformers) and FinBERT (BERT for Financial Text Mining) from 10-K MD&A reports of US listed companies, which express the positive and negative sentiment scores of corporate operating emotions and the degree of importance of ESG-related discussions by management in 10-K MD&A reports. We observe whether the two factors can effectively enhance the predictive power of the machine learning model. According to empirical results, adding positive and negative sentiment scores and ESG sentiment scores can increase the AUC and RECALL of machine learning. Moreover, among the Logistic, SVM, Random Forest, and XGBoost models, all models have improved predictive power. It was also found that oversampling can solve the problem of sample imbalance, enhancing overall predictive power. Ensemble learning was found to perform better than linear models, and XGBoost was the best-performing model among all models.參考文獻 Albuquerque, R., Koskinen, Y., & Zhang, C. (2019). Corporate Social Responsibility and Firm Risk: Theory and Empirical Evidence. Management Science, 65(10), 4451-4469.Altman, E. I. (1968). Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy. The Journal of Finance, 23(4), 589-609.Araci, D. (2019). Finbert: Financial sentiment analysis with pre-trained language models. arXiv preprint arXiv:1908.10063.Barboza, F., Kimura, H., & Altman, E. (2017). Machine learning models and bankruptcy prediction. The Journal of Finance, 83, 405-417.Beaver, W. H. (1966). Financial Ratios as Predictors of Failure. Journal of Accounting Research, 4, 71-111.Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research, 16, 321-357.Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.Friede, G., Busch, T., & Bassen, A. (2015). ESG and financial performance: Aggregated evidence from more than 2000 empirical studies. Journal of Sustainable Finance & Investment, 5(4), 210-233.Huang, A. H., Wang, H., & Yang, Y. (2022). FinBERT: A Large Language Model for Extracting Information from Financial Text. Contemporary Accounting Research.Ionescu, G. H., Firoiu, D., Pirvu, R., & Vilag, R. D. (2019). The impact of ESG factors on market value of companies from travel and tourism industry. Technological and Economic Development of Economy, 25(5), 820-849.Kim, A. G., & Yoon, S. (2021). Corporate Bankruptcy Prediction with Domain-Adapted BERT. EMNLP 2021, 3rd Workshop on ECONLP.Lin, W. L., Law, S. H., Ho, J. A., & Sambasivan, M. (2019). The causality direction of the corporate social responsibility—Corporate financial performance Nexus:Application of Panel Vector Autoregression approach. The North American Journal of Economics and Finance, 48, 401–418.Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2020). ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. ICLR, OpenReview.net.Lundberg, S. M. & Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan & R. Garnett (Eds.), Advances in Neural Information Processing Systems 30 (pp. 4765--4774). Curran Associates, Inc.Loughran, T., & McDonald, B. (2011). When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‐Ks. Journal of Finance, 66, 1, 35-65.Mai, F., Tian, S., Lee, C., & Ma, L. (2019). Deep learning models for bankruptcy prediction using textual disclosures. European Journal of Operational Research, 274, 2, 743–758.Narvekar, A., & Guha, D. (2021). Bankruptcy prediction using machine learning and an application to the case of the COVID-19 recession. Data Science in Finance and Economics, 1, 2, 180-195.Ohlson, J. A. (1980). Financial Ratios and the Probabilistic Prediction of Bankruptcy. Journal of Accounting Research, 109-131.Premachandra, I. M., Chen, Y., & Watson, J. (2011). DEA as a tool for predicting corporate failure and success: A case of bankruptcy assessment. Omega, 39, 6, 620- 626.Salton, G., & McGill, M. J. (1983). Introduction to Modern Information Retrieval. McGraw-Hill.Shetty, S., Musa, M., & Brédart, X. (2022). Bankruptcy Prediction Using Machine Learning Techniques. Journal of Risk and Financial Management, 15, 1, 35.Shumway, T. (2001). Forecasting Bankruptcy More Accurately: A Simple Hazard Model. Journal of Business, 74, 101-124.Velte, P. (2017). Does ESG performance have an impact on financial performance? Evidence from Germany. Journal of Global Responsibility, 8, 2, 169-178.Wang, N. (2017). Bankruptcy prediction using machine learning. Journal of Mathematical Finance, 7, 908-918.Wilson, D. L. (1972). Asymptotic Properties of Nearest Neighbor Rules Using Edited Data. IEEE Transactions on Systems, Man, and Cybernetics, 2, 3, 408-421.Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. R., & Le, Q. V. (2019). XLNet: Generalized Autoregressive Pretraining for Language Understanding. Advances in Neural Information Processing Systems, 32, 5754-5764. 描述 碩士
國立政治大學
金融學系
110352008資料來源 http://thesis.lib.nccu.edu.tw/record/#G0110352008 資料類型 thesis dc.contributor.advisor 江彌修 zh_TW dc.contributor.advisor Chiang, Mi-Hsiu en_US dc.contributor.author (Authors) 陳科穎 zh_TW dc.contributor.author (Authors) Chen, Ke-Ying en_US dc.creator (作者) 陳科穎 zh_TW dc.creator (作者) Chen, Ke-Ying en_US dc.date (日期) 2023 en_US dc.date.accessioned 6-Jul-2023 16:46:19 (UTC+8) - dc.date.available 6-Jul-2023 16:46:19 (UTC+8) - dc.date.issued (上傳時間) 6-Jul-2023 16:46:19 (UTC+8) - dc.identifier (Other Identifiers) G0110352008 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/145857 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 金融學系 zh_TW dc.description (描述) 110352008 zh_TW dc.description.abstract (摘要) 企業破產研究一直是財務論文中重要的命題,過往許多文獻使用不同方法研 究企業違約風險以及公司潛在破產因子,透過分析財務報表之會計數據套用於計 量模型進行回歸分析研究。然而早期論文中,較缺乏探討非結構資料對於破產因 子的重要性,近幾年的研究,逐漸加入文字特徵提取,文字探勘技術運用在許多 層面萃取情緒,包含央行會議紀錄、新聞標題與內文、產業研究報告、10-K、永 續報告書等,透過模型萃取情緒分數,並加入情緒因子訓練模型,並期望能強化 與改善模型預測能力。本次研究以結構型資料與非結構資料建立機器學習模型, 進行企業破產違約預測,非結構化資料採取 BERT (Bidirectional Encoder Representations from Transformers) 與 FinBERT (BERT for Financial Text Mining) 分 別萃取美國上市公司 10-K MD&A 報表,企業表達營運情緒的正負分數,以及管 理階層對於 ESG 相關討論之重視程度的情緒分數,觀察兩因子是否能有效增強機 器學習模型預測能力。根據實證,加入正負情緒分數與 ESG 情緒分數能讓機器學 習的 AUC、RECALL 上升,後續比較 Logistic、SVM、Random Forest、XGBoost 模型中,所有模型預測能力皆上升,並且發現過採樣 (SMOTE) 能夠解決樣本不平 衡問題,強化整體預測能力,而本次研究發現集成學習預測能力較線性模型表現 更好,且 XGBoost 為所有模型中預測效果最佳的模型。 zh_TW dc.description.abstract (摘要) Bankruptcy prediction has always been an important topic in financial literature. Past studies have used different methods to investigate corporate default risk and potential bankruptcy factors, applying regression analysis to financial statement accounting data. However, early literature lacked exploration of the importance of non-structural data for bankruptcy factors. In recent years, research has gradually incorporated text feature extraction and text mining techniques to extract sentiment, including central bank meeting records, Fed minutes, news headlines and content, industry research reports, 10-K, and sustainability reports. By extracting sentiment scores through models and incorporating emotional factors into the training process, it is hoped to enhance the predictive power of the model. This study establishes a machine learning model based on structured and unstructured data to predict corporate bankruptcy and default. Unstructured data is extracted using BERT (Bidirectional Encoder Representations from Transformers) and FinBERT (BERT for Financial Text Mining) from 10-K MD&A reports of US listed companies, which express the positive and negative sentiment scores of corporate operating emotions and the degree of importance of ESG-related discussions by management in 10-K MD&A reports. We observe whether the two factors can effectively enhance the predictive power of the machine learning model. According to empirical results, adding positive and negative sentiment scores and ESG sentiment scores can increase the AUC and RECALL of machine learning. Moreover, among the Logistic, SVM, Random Forest, and XGBoost models, all models have improved predictive power. It was also found that oversampling can solve the problem of sample imbalance, enhancing overall predictive power. Ensemble learning was found to perform better than linear models, and XGBoost was the best-performing model among all models. en_US dc.description.tableofcontents 摘要 iAbstract ii目錄 iii圖目錄 iv表目錄 vi第一章 緒論 1第一節 研究動機 1第二節 研究目的 2第二章 文獻探討 4第一節 破產預測相關研究 4第二節 ESG對企業營運之影響 5第三節 應用機器學習方法研究 6第四節 文字探勘技術 7第五節 文本遷移式學習 8第三章 研究方法 9第一節 BERT模型 9第二節 FinBERT模型 13第三節 預測破產模型 14第四節 樣本採樣 22第五節 模型衡量指標 24第四章 數據處理與遷移分析 29第一節 數據處理 29第二節 BERT模型訓練與遷移 32第三節 文本情緒分數 35第四節 選取變數 39第五章 實證分析 41第一節 純財務變數預測企業破產 42第二節 財務變數加上BERT 10-K文本情緒預測企業破產 44第三節 財務變數加上FinBERT ESG情緒預測企業破產 46第四節 財務變數加正負情緒與 ESG 情緒預測企業破產 48第五節 未使用 SMOTE 平衡樣本結果 60第六章 研究結論與建議 64參考文獻 67 zh_TW dc.format.extent 2623139 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0110352008 en_US dc.subject (關鍵詞) BERT zh_TW dc.subject (關鍵詞) FinBERT zh_TW dc.subject (關鍵詞) 10-K zh_TW dc.subject (關鍵詞) 機器學習 zh_TW dc.subject (關鍵詞) 文本情緒 zh_TW dc.subject (關鍵詞) ESG zh_TW dc.subject (關鍵詞) 企業破產預測 zh_TW dc.subject (關鍵詞) BERT en_US dc.subject (關鍵詞) FinBERT en_US dc.subject (關鍵詞) 10-K en_US dc.subject (關鍵詞) Machine Learning en_US dc.subject (關鍵詞) Text Sentiment en_US dc.subject (關鍵詞) ESG en_US dc.subject (關鍵詞) Corporate Bankruptcy Prediction en_US dc.title (題名) 基於10-K報表ESG情緒萃取之企業違約預測模型:應用語意分析遷移學習 zh_TW dc.title (題名) Corporate Default Prediction Model with ESG Sentiment: Transfer Learning-Based Sentiment Analysis of 10-K Reports en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) Albuquerque, R., Koskinen, Y., & Zhang, C. (2019). Corporate Social Responsibility and Firm Risk: Theory and Empirical Evidence. Management Science, 65(10), 4451-4469.Altman, E. I. (1968). Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy. The Journal of Finance, 23(4), 589-609.Araci, D. (2019). Finbert: Financial sentiment analysis with pre-trained language models. arXiv preprint arXiv:1908.10063.Barboza, F., Kimura, H., & Altman, E. (2017). Machine learning models and bankruptcy prediction. The Journal of Finance, 83, 405-417.Beaver, W. H. (1966). Financial Ratios as Predictors of Failure. Journal of Accounting Research, 4, 71-111.Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research, 16, 321-357.Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.Friede, G., Busch, T., & Bassen, A. (2015). ESG and financial performance: Aggregated evidence from more than 2000 empirical studies. Journal of Sustainable Finance & Investment, 5(4), 210-233.Huang, A. H., Wang, H., & Yang, Y. (2022). FinBERT: A Large Language Model for Extracting Information from Financial Text. Contemporary Accounting Research.Ionescu, G. H., Firoiu, D., Pirvu, R., & Vilag, R. D. (2019). The impact of ESG factors on market value of companies from travel and tourism industry. Technological and Economic Development of Economy, 25(5), 820-849.Kim, A. G., & Yoon, S. (2021). Corporate Bankruptcy Prediction with Domain-Adapted BERT. EMNLP 2021, 3rd Workshop on ECONLP.Lin, W. L., Law, S. H., Ho, J. A., & Sambasivan, M. (2019). The causality direction of the corporate social responsibility—Corporate financial performance Nexus:Application of Panel Vector Autoregression approach. The North American Journal of Economics and Finance, 48, 401–418.Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2020). ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. ICLR, OpenReview.net.Lundberg, S. M. & Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan & R. Garnett (Eds.), Advances in Neural Information Processing Systems 30 (pp. 4765--4774). Curran Associates, Inc.Loughran, T., & McDonald, B. (2011). When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‐Ks. Journal of Finance, 66, 1, 35-65.Mai, F., Tian, S., Lee, C., & Ma, L. (2019). Deep learning models for bankruptcy prediction using textual disclosures. European Journal of Operational Research, 274, 2, 743–758.Narvekar, A., & Guha, D. (2021). Bankruptcy prediction using machine learning and an application to the case of the COVID-19 recession. Data Science in Finance and Economics, 1, 2, 180-195.Ohlson, J. A. (1980). Financial Ratios and the Probabilistic Prediction of Bankruptcy. Journal of Accounting Research, 109-131.Premachandra, I. M., Chen, Y., & Watson, J. (2011). DEA as a tool for predicting corporate failure and success: A case of bankruptcy assessment. Omega, 39, 6, 620- 626.Salton, G., & McGill, M. J. (1983). Introduction to Modern Information Retrieval. McGraw-Hill.Shetty, S., Musa, M., & Brédart, X. (2022). Bankruptcy Prediction Using Machine Learning Techniques. Journal of Risk and Financial Management, 15, 1, 35.Shumway, T. (2001). Forecasting Bankruptcy More Accurately: A Simple Hazard Model. Journal of Business, 74, 101-124.Velte, P. (2017). Does ESG performance have an impact on financial performance? Evidence from Germany. Journal of Global Responsibility, 8, 2, 169-178.Wang, N. (2017). Bankruptcy prediction using machine learning. Journal of Mathematical Finance, 7, 908-918.Wilson, D. L. (1972). Asymptotic Properties of Nearest Neighbor Rules Using Edited Data. IEEE Transactions on Systems, Man, and Cybernetics, 2, 3, 408-421.Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. R., & Le, Q. V. (2019). XLNet: Generalized Autoregressive Pretraining for Language Understanding. Advances in Neural Information Processing Systems, 32, 5754-5764. zh_TW