學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 基於10-K報表ESG情緒萃取之企業違約預測模型:應用語意分析遷移學習
Corporate Default Prediction Model with ESG Sentiment: Transfer Learning-Based Sentiment Analysis of 10-K Reports
作者 陳科穎
Chen, Ke-Ying
貢獻者 江彌修
Chiang, Mi-Hsiu
陳科穎
Chen, Ke-Ying
關鍵詞 BERT
FinBERT
10-K
機器學習
文本情緒
ESG
企業破產預測
BERT
FinBERT
10-K
Machine Learning
Text Sentiment
ESG
Corporate Bankruptcy Prediction
日期 2023
上傳時間 6-Jul-2023 16:46:19 (UTC+8)
摘要 企業破產研究一直是財務論文中重要的命題,過往許多文獻使用不同方法研 究企業違約風險以及公司潛在破產因子,透過分析財務報表之會計數據套用於計 量模型進行回歸分析研究。然而早期論文中,較缺乏探討非結構資料對於破產因 子的重要性,近幾年的研究,逐漸加入文字特徵提取,文字探勘技術運用在許多 層面萃取情緒,包含央行會議紀錄、新聞標題與內文、產業研究報告、10-K、永 續報告書等,透過模型萃取情緒分數,並加入情緒因子訓練模型,並期望能強化 與改善模型預測能力。本次研究以結構型資料與非結構資料建立機器學習模型, 進行企業破產違約預測,非結構化資料採取 BERT (Bidirectional Encoder Representations from Transformers) 與 FinBERT (BERT for Financial Text Mining) 分 別萃取美國上市公司 10-K MD&A 報表,企業表達營運情緒的正負分數,以及管 理階層對於 ESG 相關討論之重視程度的情緒分數,觀察兩因子是否能有效增強機 器學習模型預測能力。根據實證,加入正負情緒分數與 ESG 情緒分數能讓機器學 習的 AUC、RECALL 上升,後續比較 Logistic、SVM、Random Forest、XGBoost 模型中,所有模型預測能力皆上升,並且發現過採樣 (SMOTE) 能夠解決樣本不平 衡問題,強化整體預測能力,而本次研究發現集成學習預測能力較線性模型表現 更好,且 XGBoost 為所有模型中預測效果最佳的模型。
Bankruptcy prediction has always been an important topic in financial literature. Past studies have used different methods to investigate corporate default risk and potential bankruptcy factors, applying regression analysis to financial statement accounting data. However, early literature lacked exploration of the importance of non-structural data for bankruptcy factors. In recent years, research has gradually incorporated text feature extraction and text mining techniques to extract sentiment, including central bank meeting records, Fed minutes, news headlines and content, industry research reports, 10-K, and sustainability reports. By extracting sentiment scores through models and incorporating emotional factors into the training process, it is hoped to enhance the predictive power of the model. This study establishes a machine learning model based on structured and unstructured data to predict corporate bankruptcy and default. Unstructured data is extracted using BERT (Bidirectional Encoder Representations from Transformers) and FinBERT (BERT for Financial Text Mining) from 10-K MD&A reports of US listed companies, which express the positive and negative sentiment scores of corporate operating emotions and the degree of importance of ESG-related discussions by management in 10-K MD&A reports. We observe whether the two factors can effectively enhance the predictive power of the machine learning model. According to empirical results, adding positive and negative sentiment scores and ESG sentiment scores can increase the AUC and RECALL of machine learning. Moreover, among the Logistic, SVM, Random Forest, and XGBoost models, all models have improved predictive power. It was also found that oversampling can solve the problem of sample imbalance, enhancing overall predictive power. Ensemble learning was found to perform better than linear models, and XGBoost was the best-performing model among all models.
參考文獻 Albuquerque, R., Koskinen, Y., & Zhang, C. (2019). Corporate Social Responsibility and Firm Risk: Theory and Empirical Evidence. Management Science, 65(10), 4451-4469.
Altman, E. I. (1968). Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy. The Journal of Finance, 23(4), 589-609.
Araci, D. (2019). Finbert: Financial sentiment analysis with pre-trained language models. arXiv preprint arXiv:1908.10063.
Barboza, F., Kimura, H., & Altman, E. (2017). Machine learning models and bankruptcy prediction. The Journal of Finance, 83, 405-417.
Beaver, W. H. (1966). Financial Ratios as Predictors of Failure. Journal of Accounting Research, 4, 71-111.
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research, 16, 321-357.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Friede, G., Busch, T., & Bassen, A. (2015). ESG and financial performance: Aggregated evidence from more than 2000 empirical studies. Journal of Sustainable Finance & Investment, 5(4), 210-233.
Huang, A. H., Wang, H., & Yang, Y. (2022). FinBERT: A Large Language Model for Extracting Information from Financial Text. Contemporary Accounting Research.
Ionescu, G. H., Firoiu, D., Pirvu, R., & Vilag, R. D. (2019). The impact of ESG factors on market value of companies from travel and tourism industry. Technological and Economic Development of Economy, 25(5), 820-849.
Kim, A. G., & Yoon, S. (2021). Corporate Bankruptcy Prediction with Domain-Adapted BERT. EMNLP 2021, 3rd Workshop on ECONLP.
Lin, W. L., Law, S. H., Ho, J. A., & Sambasivan, M. (2019). The causality direction of the corporate social responsibility—Corporate financial performance Nexus:Application of Panel Vector Autoregression approach. The North American Journal of Economics and Finance, 48, 401–418.
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2020). ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. ICLR, OpenReview.net.
Lundberg, S. M. & Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan & R. Garnett (Eds.), Advances in Neural Information Processing Systems 30 (pp. 4765--4774). Curran Associates, Inc.
Loughran, T., & McDonald, B. (2011). When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‐Ks. Journal of Finance, 66, 1, 35-65.
Mai, F., Tian, S., Lee, C., & Ma, L. (2019). Deep learning models for bankruptcy prediction using textual disclosures. European Journal of Operational Research, 274, 2, 743–758.
Narvekar, A., & Guha, D. (2021). Bankruptcy prediction using machine learning and an application to the case of the COVID-19 recession. Data Science in Finance and Economics, 1, 2, 180-195.
Ohlson, J. A. (1980). Financial Ratios and the Probabilistic Prediction of Bankruptcy. Journal of Accounting Research, 109-131.
Premachandra, I. M., Chen, Y., & Watson, J. (2011). DEA as a tool for predicting corporate failure and success: A case of bankruptcy assessment. Omega, 39, 6, 620- 626.
Salton, G., & McGill, M. J. (1983). Introduction to Modern Information Retrieval. McGraw-Hill.
Shetty, S., Musa, M., & Brédart, X. (2022). Bankruptcy Prediction Using Machine Learning Techniques. Journal of Risk and Financial Management, 15, 1, 35.
Shumway, T. (2001). Forecasting Bankruptcy More Accurately: A Simple Hazard Model. Journal of Business, 74, 101-124.
Velte, P. (2017). Does ESG performance have an impact on financial performance? Evidence from Germany. Journal of Global Responsibility, 8, 2, 169-178.Wang, N. (2017). Bankruptcy prediction using machine learning. Journal of Mathematical Finance, 7, 908-918.
Wilson, D. L. (1972). Asymptotic Properties of Nearest Neighbor Rules Using Edited Data. IEEE Transactions on Systems, Man, and Cybernetics, 2, 3, 408-421.
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. R., & Le, Q. V. (2019). XLNet: Generalized Autoregressive Pretraining for Language Understanding. Advances in Neural Information Processing Systems, 32, 5754-5764.
描述 碩士
國立政治大學
金融學系
110352008
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0110352008
資料類型 thesis
dc.contributor.advisor 江彌修zh_TW
dc.contributor.advisor Chiang, Mi-Hsiuen_US
dc.contributor.author (Authors) 陳科穎zh_TW
dc.contributor.author (Authors) Chen, Ke-Yingen_US
dc.creator (作者) 陳科穎zh_TW
dc.creator (作者) Chen, Ke-Yingen_US
dc.date (日期) 2023en_US
dc.date.accessioned 6-Jul-2023 16:46:19 (UTC+8)-
dc.date.available 6-Jul-2023 16:46:19 (UTC+8)-
dc.date.issued (上傳時間) 6-Jul-2023 16:46:19 (UTC+8)-
dc.identifier (Other Identifiers) G0110352008en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/145857-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 金融學系zh_TW
dc.description (描述) 110352008zh_TW
dc.description.abstract (摘要) 企業破產研究一直是財務論文中重要的命題,過往許多文獻使用不同方法研 究企業違約風險以及公司潛在破產因子,透過分析財務報表之會計數據套用於計 量模型進行回歸分析研究。然而早期論文中,較缺乏探討非結構資料對於破產因 子的重要性,近幾年的研究,逐漸加入文字特徵提取,文字探勘技術運用在許多 層面萃取情緒,包含央行會議紀錄、新聞標題與內文、產業研究報告、10-K、永 續報告書等,透過模型萃取情緒分數,並加入情緒因子訓練模型,並期望能強化 與改善模型預測能力。本次研究以結構型資料與非結構資料建立機器學習模型, 進行企業破產違約預測,非結構化資料採取 BERT (Bidirectional Encoder Representations from Transformers) 與 FinBERT (BERT for Financial Text Mining) 分 別萃取美國上市公司 10-K MD&A 報表,企業表達營運情緒的正負分數,以及管 理階層對於 ESG 相關討論之重視程度的情緒分數,觀察兩因子是否能有效增強機 器學習模型預測能力。根據實證,加入正負情緒分數與 ESG 情緒分數能讓機器學 習的 AUC、RECALL 上升,後續比較 Logistic、SVM、Random Forest、XGBoost 模型中,所有模型預測能力皆上升,並且發現過採樣 (SMOTE) 能夠解決樣本不平 衡問題,強化整體預測能力,而本次研究發現集成學習預測能力較線性模型表現 更好,且 XGBoost 為所有模型中預測效果最佳的模型。zh_TW
dc.description.abstract (摘要) Bankruptcy prediction has always been an important topic in financial literature. Past studies have used different methods to investigate corporate default risk and potential bankruptcy factors, applying regression analysis to financial statement accounting data. However, early literature lacked exploration of the importance of non-structural data for bankruptcy factors. In recent years, research has gradually incorporated text feature extraction and text mining techniques to extract sentiment, including central bank meeting records, Fed minutes, news headlines and content, industry research reports, 10-K, and sustainability reports. By extracting sentiment scores through models and incorporating emotional factors into the training process, it is hoped to enhance the predictive power of the model. This study establishes a machine learning model based on structured and unstructured data to predict corporate bankruptcy and default. Unstructured data is extracted using BERT (Bidirectional Encoder Representations from Transformers) and FinBERT (BERT for Financial Text Mining) from 10-K MD&A reports of US listed companies, which express the positive and negative sentiment scores of corporate operating emotions and the degree of importance of ESG-related discussions by management in 10-K MD&A reports. We observe whether the two factors can effectively enhance the predictive power of the machine learning model. According to empirical results, adding positive and negative sentiment scores and ESG sentiment scores can increase the AUC and RECALL of machine learning. Moreover, among the Logistic, SVM, Random Forest, and XGBoost models, all models have improved predictive power. It was also found that oversampling can solve the problem of sample imbalance, enhancing overall predictive power. Ensemble learning was found to perform better than linear models, and XGBoost was the best-performing model among all models.en_US
dc.description.tableofcontents 摘要 i
Abstract ii
目錄 iii
圖目錄 iv
表目錄 vi
第一章 緒論 1
第一節 研究動機 1
第二節 研究目的 2
第二章 文獻探討 4
第一節 破產預測相關研究 4
第二節 ESG對企業營運之影響 5
第三節 應用機器學習方法研究 6
第四節 文字探勘技術 7
第五節 文本遷移式學習 8
第三章 研究方法 9
第一節 BERT模型 9
第二節 FinBERT模型 13
第三節 預測破產模型 14
第四節 樣本採樣 22
第五節 模型衡量指標 24
第四章 數據處理與遷移分析 29
第一節 數據處理 29
第二節 BERT模型訓練與遷移 32
第三節 文本情緒分數 35
第四節 選取變數 39
第五章 實證分析 41
第一節 純財務變數預測企業破產 42
第二節 財務變數加上BERT 10-K文本情緒預測企業破產 44
第三節 財務變數加上FinBERT ESG情緒預測企業破產 46
第四節 財務變數加正負情緒與 ESG 情緒預測企業破產 48
第五節 未使用 SMOTE 平衡樣本結果 60
第六章 研究結論與建議 64
參考文獻 67
zh_TW
dc.format.extent 2623139 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0110352008en_US
dc.subject (關鍵詞) BERTzh_TW
dc.subject (關鍵詞) FinBERTzh_TW
dc.subject (關鍵詞) 10-Kzh_TW
dc.subject (關鍵詞) 機器學習zh_TW
dc.subject (關鍵詞) 文本情緒zh_TW
dc.subject (關鍵詞) ESGzh_TW
dc.subject (關鍵詞) 企業破產預測zh_TW
dc.subject (關鍵詞) BERTen_US
dc.subject (關鍵詞) FinBERTen_US
dc.subject (關鍵詞) 10-Ken_US
dc.subject (關鍵詞) Machine Learningen_US
dc.subject (關鍵詞) Text Sentimenten_US
dc.subject (關鍵詞) ESGen_US
dc.subject (關鍵詞) Corporate Bankruptcy Predictionen_US
dc.title (題名) 基於10-K報表ESG情緒萃取之企業違約預測模型:應用語意分析遷移學習zh_TW
dc.title (題名) Corporate Default Prediction Model with ESG Sentiment: Transfer Learning-Based Sentiment Analysis of 10-K Reportsen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) Albuquerque, R., Koskinen, Y., & Zhang, C. (2019). Corporate Social Responsibility and Firm Risk: Theory and Empirical Evidence. Management Science, 65(10), 4451-4469.
Altman, E. I. (1968). Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy. The Journal of Finance, 23(4), 589-609.
Araci, D. (2019). Finbert: Financial sentiment analysis with pre-trained language models. arXiv preprint arXiv:1908.10063.
Barboza, F., Kimura, H., & Altman, E. (2017). Machine learning models and bankruptcy prediction. The Journal of Finance, 83, 405-417.
Beaver, W. H. (1966). Financial Ratios as Predictors of Failure. Journal of Accounting Research, 4, 71-111.
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research, 16, 321-357.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Friede, G., Busch, T., & Bassen, A. (2015). ESG and financial performance: Aggregated evidence from more than 2000 empirical studies. Journal of Sustainable Finance & Investment, 5(4), 210-233.
Huang, A. H., Wang, H., & Yang, Y. (2022). FinBERT: A Large Language Model for Extracting Information from Financial Text. Contemporary Accounting Research.
Ionescu, G. H., Firoiu, D., Pirvu, R., & Vilag, R. D. (2019). The impact of ESG factors on market value of companies from travel and tourism industry. Technological and Economic Development of Economy, 25(5), 820-849.
Kim, A. G., & Yoon, S. (2021). Corporate Bankruptcy Prediction with Domain-Adapted BERT. EMNLP 2021, 3rd Workshop on ECONLP.
Lin, W. L., Law, S. H., Ho, J. A., & Sambasivan, M. (2019). The causality direction of the corporate social responsibility—Corporate financial performance Nexus:Application of Panel Vector Autoregression approach. The North American Journal of Economics and Finance, 48, 401–418.
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2020). ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. ICLR, OpenReview.net.
Lundberg, S. M. & Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan & R. Garnett (Eds.), Advances in Neural Information Processing Systems 30 (pp. 4765--4774). Curran Associates, Inc.
Loughran, T., & McDonald, B. (2011). When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‐Ks. Journal of Finance, 66, 1, 35-65.
Mai, F., Tian, S., Lee, C., & Ma, L. (2019). Deep learning models for bankruptcy prediction using textual disclosures. European Journal of Operational Research, 274, 2, 743–758.
Narvekar, A., & Guha, D. (2021). Bankruptcy prediction using machine learning and an application to the case of the COVID-19 recession. Data Science in Finance and Economics, 1, 2, 180-195.
Ohlson, J. A. (1980). Financial Ratios and the Probabilistic Prediction of Bankruptcy. Journal of Accounting Research, 109-131.
Premachandra, I. M., Chen, Y., & Watson, J. (2011). DEA as a tool for predicting corporate failure and success: A case of bankruptcy assessment. Omega, 39, 6, 620- 626.
Salton, G., & McGill, M. J. (1983). Introduction to Modern Information Retrieval. McGraw-Hill.
Shetty, S., Musa, M., & Brédart, X. (2022). Bankruptcy Prediction Using Machine Learning Techniques. Journal of Risk and Financial Management, 15, 1, 35.
Shumway, T. (2001). Forecasting Bankruptcy More Accurately: A Simple Hazard Model. Journal of Business, 74, 101-124.
Velte, P. (2017). Does ESG performance have an impact on financial performance? Evidence from Germany. Journal of Global Responsibility, 8, 2, 169-178.Wang, N. (2017). Bankruptcy prediction using machine learning. Journal of Mathematical Finance, 7, 908-918.
Wilson, D. L. (1972). Asymptotic Properties of Nearest Neighbor Rules Using Edited Data. IEEE Transactions on Systems, Man, and Cybernetics, 2, 3, 408-421.
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. R., & Le, Q. V. (2019). XLNet: Generalized Autoregressive Pretraining for Language Understanding. Advances in Neural Information Processing Systems, 32, 5754-5764.
zh_TW