基於10-K報表ESG情緒萃取之企業違約預測模型:應用語意分析遷移學習

學術產出-Theses

Article View/Open

pdf(5)

Publication Export

Google Scholar^TM

政大圖書館

學術資源探索系統

Citation Infomation

No doi shows Citation Infomation

Simple Record
Full Record

題名	基於10-K報表ESG情緒萃取之企業違約預測模型:應用語意分析遷移學習 Corporate Default Prediction Model with ESG Sentiment: Transfer Learning-Based Sentiment Analysis of 10-K Reports
作者	陳科穎 Chen, Ke-Ying
貢獻者	江彌修 Chiang, Mi-Hsiu 陳科穎 Chen, Ke-Ying
關鍵詞	BERT FinBERT 10-K 機器學習文本情緒 ESG 企業破產預測 BERT FinBERT 10-K Machine Learning Text Sentiment ESG Corporate Bankruptcy Prediction
日期	2023
上傳時間	6-Jul-2023 16:46:19 (UTC+8)
摘要	企業破產研究一直是財務論文中重要的命題，過往許多文獻使用不同方法研究企業違約風險以及公司潛在破產因子，透過分析財務報表之會計數據套用於計量模型進行回歸分析研究。然而早期論文中，較缺乏探討非結構資料對於破產因子的重要性，近幾年的研究，逐漸加入文字特徵提取，文字探勘技術運用在許多層面萃取情緒，包含央行會議紀錄、新聞標題與內文、產業研究報告、10-K、永續報告書等，透過模型萃取情緒分數，並加入情緒因子訓練模型，並期望能強化與改善模型預測能力。本次研究以結構型資料與非結構資料建立機器學習模型，進行企業破產違約預測，非結構化資料採取 BERT (Bidirectional Encoder Representations from Transformers) 與 FinBERT (BERT for Financial Text Mining) 分別萃取美國上市公司 10-K MD&A 報表，企業表達營運情緒的正負分數，以及管理階層對於 ESG 相關討論之重視程度的情緒分數，觀察兩因子是否能有效增強機器學習模型預測能力。根據實證，加入正負情緒分數與 ESG 情緒分數能讓機器學習的 AUC、RECALL 上升，後續比較 Logistic、SVM、Random Forest、XGBoost 模型中，所有模型預測能力皆上升，並且發現過採樣 (SMOTE) 能夠解決樣本不平衡問題，強化整體預測能力，而本次研究發現集成學習預測能力較線性模型表現更好，且 XGBoost 為所有模型中預測效果最佳的模型。 Bankruptcy prediction has always been an important topic in financial literature. Past studies have used different methods to investigate corporate default risk and potential bankruptcy factors, applying regression analysis to financial statement accounting data. However, early literature lacked exploration of the importance of non-structural data for bankruptcy factors. In recent years, research has gradually incorporated text feature extraction and text mining techniques to extract sentiment, including central bank meeting records, Fed minutes, news headlines and content, industry research reports, 10-K, and sustainability reports. By extracting sentiment scores through models and incorporating emotional factors into the training process, it is hoped to enhance the predictive power of the model. This study establishes a machine learning model based on structured and unstructured data to predict corporate bankruptcy and default. Unstructured data is extracted using BERT (Bidirectional Encoder Representations from Transformers) and FinBERT (BERT for Financial Text Mining) from 10-K MD&A reports of US listed companies, which express the positive and negative sentiment scores of corporate operating emotions and the degree of importance of ESG-related discussions by management in 10-K MD&A reports. We observe whether the two factors can effectively enhance the predictive power of the machine learning model. According to empirical results, adding positive and negative sentiment scores and ESG sentiment scores can increase the AUC and RECALL of machine learning. Moreover, among the Logistic, SVM, Random Forest, and XGBoost models, all models have improved predictive power. It was also found that oversampling can solve the problem of sample imbalance, enhancing overall predictive power. Ensemble learning was found to perform better than linear models, and XGBoost was the best-performing model among all models.
參考文獻	Albuquerque, R., Koskinen, Y., & Zhang, C. (2019). Corporate Social Responsibility and Firm Risk: Theory and Empirical Evidence. Management Science, 65(10), 4451-4469. Altman, E. I. (1968). Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy. The Journal of Finance, 23(4), 589-609. Araci, D. (2019). Finbert: Financial sentiment analysis with pre-trained language models. arXiv preprint arXiv:1908.10063. Barboza, F., Kimura, H., & Altman, E. (2017). Machine learning models and bankruptcy prediction. The Journal of Finance, 83, 405-417. Beaver, W. H. (1966). Financial Ratios as Predictors of Failure. Journal of Accounting Research, 4, 71-111. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research, 16, 321-357. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Friede, G., Busch, T., & Bassen, A. (2015). ESG and financial performance: Aggregated evidence from more than 2000 empirical studies. Journal of Sustainable Finance & Investment, 5(4), 210-233. Huang, A. H., Wang, H., & Yang, Y. (2022). FinBERT: A Large Language Model for Extracting Information from Financial Text. Contemporary Accounting Research. Ionescu, G. H., Firoiu, D., Pirvu, R., & Vilag, R. D. (2019). The impact of ESG factors on market value of companies from travel and tourism industry. Technological and Economic Development of Economy, 25(5), 820-849. Kim, A. G., & Yoon, S. (2021). Corporate Bankruptcy Prediction with Domain-Adapted BERT. EMNLP 2021, 3rd Workshop on ECONLP. Lin, W. L., Law, S. H., Ho, J. A., & Sambasivan, M. (2019). The causality direction of the corporate social responsibility—Corporate financial performance Nexus:Application of Panel Vector Autoregression approach. The North American Journal of Economics and Finance, 48, 401–418. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2020). ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. ICLR, OpenReview.net. Lundberg, S. M. & Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan & R. Garnett (Eds.), Advances in Neural Information Processing Systems 30 (pp. 4765--4774). Curran Associates, Inc. Loughran, T., & McDonald, B. (2011). When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‐Ks. Journal of Finance, 66, 1, 35-65. Mai, F., Tian, S., Lee, C., & Ma, L. (2019). Deep learning models for bankruptcy prediction using textual disclosures. European Journal of Operational Research, 274, 2, 743–758. Narvekar, A., & Guha, D. (2021). Bankruptcy prediction using machine learning and an application to the case of the COVID-19 recession. Data Science in Finance and Economics, 1, 2, 180-195. Ohlson, J. A. (1980). Financial Ratios and the Probabilistic Prediction of Bankruptcy. Journal of Accounting Research, 109-131. Premachandra, I. M., Chen, Y., & Watson, J. (2011). DEA as a tool for predicting corporate failure and success: A case of bankruptcy assessment. Omega, 39, 6, 620- 626. Salton, G., & McGill, M. J. (1983). Introduction to Modern Information Retrieval. McGraw-Hill. Shetty, S., Musa, M., & Brédart, X. (2022). Bankruptcy Prediction Using Machine Learning Techniques. Journal of Risk and Financial Management, 15, 1, 35. Shumway, T. (2001). Forecasting Bankruptcy More Accurately: A Simple Hazard Model. Journal of Business, 74, 101-124. Velte, P. (2017). Does ESG performance have an impact on financial performance? Evidence from Germany. Journal of Global Responsibility, 8, 2, 169-178.Wang, N. (2017). Bankruptcy prediction using machine learning. Journal of Mathematical Finance, 7, 908-918. Wilson, D. L. (1972). Asymptotic Properties of Nearest Neighbor Rules Using Edited Data. IEEE Transactions on Systems, Man, and Cybernetics, 2, 3, 408-421. Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. R., & Le, Q. V. (2019). XLNet: Generalized Autoregressive Pretraining for Language Understanding. Advances in Neural Information Processing Systems, 32, 5754-5764.
描述	碩士國立政治大學金融學系 110352008
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0110352008
資料類型	thesis

dc.contributor.advisor	江彌修	zh_TW
dc.contributor.advisor	Chiang, Mi-Hsiu	en_US
dc.contributor.author (Authors)	陳科穎	zh_TW
dc.contributor.author (Authors)	Chen, Ke-Ying	en_US
dc.creator (作者)	陳科穎	zh_TW
dc.creator (作者)	Chen, Ke-Ying	en_US
dc.date (日期)	2023	en_US
dc.date.accessioned	6-Jul-2023 16:46:19 (UTC+8)	-
dc.date.available	6-Jul-2023 16:46:19 (UTC+8)	-
dc.date.issued (上傳時間)	6-Jul-2023 16:46:19 (UTC+8)	-
dc.identifier (Other Identifiers)	G0110352008	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/145857	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	金融學系	zh_TW
dc.description (描述)	110352008	zh_TW
dc.description.abstract (摘要)	企業破產研究一直是財務論文中重要的命題，過往許多文獻使用不同方法研究企業違約風險以及公司潛在破產因子，透過分析財務報表之會計數據套用於計量模型進行回歸分析研究。然而早期論文中，較缺乏探討非結構資料對於破產因子的重要性，近幾年的研究，逐漸加入文字特徵提取，文字探勘技術運用在許多層面萃取情緒，包含央行會議紀錄、新聞標題與內文、產業研究報告、10-K、永續報告書等，透過模型萃取情緒分數，並加入情緒因子訓練模型，並期望能強化與改善模型預測能力。本次研究以結構型資料與非結構資料建立機器學習模型，進行企業破產違約預測，非結構化資料採取 BERT (Bidirectional Encoder Representations from Transformers) 與 FinBERT (BERT for Financial Text Mining) 分別萃取美國上市公司 10-K MD&A 報表，企業表達營運情緒的正負分數，以及管理階層對於 ESG 相關討論之重視程度的情緒分數，觀察兩因子是否能有效增強機器學習模型預測能力。根據實證，加入正負情緒分數與 ESG 情緒分數能讓機器學習的 AUC、RECALL 上升，後續比較 Logistic、SVM、Random Forest、XGBoost 模型中，所有模型預測能力皆上升，並且發現過採樣 (SMOTE) 能夠解決樣本不平衡問題，強化整體預測能力，而本次研究發現集成學習預測能力較線性模型表現更好，且 XGBoost 為所有模型中預測效果最佳的模型。	zh_TW
dc.description.abstract (摘要)	Bankruptcy prediction has always been an important topic in financial literature. Past studies have used different methods to investigate corporate default risk and potential bankruptcy factors, applying regression analysis to financial statement accounting data. However, early literature lacked exploration of the importance of non-structural data for bankruptcy factors. In recent years, research has gradually incorporated text feature extraction and text mining techniques to extract sentiment, including central bank meeting records, Fed minutes, news headlines and content, industry research reports, 10-K, and sustainability reports. By extracting sentiment scores through models and incorporating emotional factors into the training process, it is hoped to enhance the predictive power of the model. This study establishes a machine learning model based on structured and unstructured data to predict corporate bankruptcy and default. Unstructured data is extracted using BERT (Bidirectional Encoder Representations from Transformers) and FinBERT (BERT for Financial Text Mining) from 10-K MD&A reports of US listed companies, which express the positive and negative sentiment scores of corporate operating emotions and the degree of importance of ESG-related discussions by management in 10-K MD&A reports. We observe whether the two factors can effectively enhance the predictive power of the machine learning model. According to empirical results, adding positive and negative sentiment scores and ESG sentiment scores can increase the AUC and RECALL of machine learning. Moreover, among the Logistic, SVM, Random Forest, and XGBoost models, all models have improved predictive power. It was also found that oversampling can solve the problem of sample imbalance, enhancing overall predictive power. Ensemble learning was found to perform better than linear models, and XGBoost was the best-performing model among all models.	en_US
dc.description.tableofcontents	摘要 i Abstract ii 目錄 iii 圖目錄 iv 表目錄 vi 第一章緒論 1 第一節研究動機 1 第二節研究目的 2 第二章文獻探討 4 第一節破產預測相關研究 4 第二節 ESG對企業營運之影響 5 第三節應用機器學習方法研究 6 第四節文字探勘技術 7 第五節文本遷移式學習 8 第三章研究方法 9 第一節 BERT模型 9 第二節 FinBERT模型 13 第三節預測破產模型 14 第四節樣本採樣 22 第五節模型衡量指標 24 第四章數據處理與遷移分析 29 第一節數據處理 29 第二節 BERT模型訓練與遷移 32 第三節文本情緒分數 35 第四節選取變數 39 第五章實證分析 41 第一節純財務變數預測企業破產 42 第二節財務變數加上BERT 10-K文本情緒預測企業破產 44 第三節財務變數加上FinBERT ESG情緒預測企業破產 46 第四節財務變數加正負情緒與 ESG 情緒預測企業破產 48 第五節未使用 SMOTE 平衡樣本結果 60 第六章研究結論與建議 64 參考文獻 67	zh_TW
dc.format.extent	2623139 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0110352008	en_US
dc.subject (關鍵詞)	BERT	zh_TW
dc.subject (關鍵詞)	FinBERT	zh_TW
dc.subject (關鍵詞)	10-K	zh_TW
dc.subject (關鍵詞)	機器學習	zh_TW
dc.subject (關鍵詞)	文本情緒	zh_TW
dc.subject (關鍵詞)	ESG	zh_TW
dc.subject (關鍵詞)	企業破產預測	zh_TW
dc.subject (關鍵詞)	BERT	en_US
dc.subject (關鍵詞)	FinBERT	en_US
dc.subject (關鍵詞)	10-K	en_US
dc.subject (關鍵詞)	Machine Learning	en_US
dc.subject (關鍵詞)	Text Sentiment	en_US
dc.subject (關鍵詞)	ESG	en_US
dc.subject (關鍵詞)	Corporate Bankruptcy Prediction	en_US
dc.title (題名)	基於10-K報表ESG情緒萃取之企業違約預測模型:應用語意分析遷移學習	zh_TW
dc.title (題名)	Corporate Default Prediction Model with ESG Sentiment: Transfer Learning-Based Sentiment Analysis of 10-K Reports	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	Albuquerque, R., Koskinen, Y., & Zhang, C. (2019). Corporate Social Responsibility and Firm Risk: Theory and Empirical Evidence. Management Science, 65(10), 4451-4469. Altman, E. I. (1968). Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy. The Journal of Finance, 23(4), 589-609. Araci, D. (2019). Finbert: Financial sentiment analysis with pre-trained language models. arXiv preprint arXiv:1908.10063. Barboza, F., Kimura, H., & Altman, E. (2017). Machine learning models and bankruptcy prediction. The Journal of Finance, 83, 405-417. Beaver, W. H. (1966). Financial Ratios as Predictors of Failure. Journal of Accounting Research, 4, 71-111. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research, 16, 321-357. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Friede, G., Busch, T., & Bassen, A. (2015). ESG and financial performance: Aggregated evidence from more than 2000 empirical studies. Journal of Sustainable Finance & Investment, 5(4), 210-233. Huang, A. H., Wang, H., & Yang, Y. (2022). FinBERT: A Large Language Model for Extracting Information from Financial Text. Contemporary Accounting Research. Ionescu, G. H., Firoiu, D., Pirvu, R., & Vilag, R. D. (2019). The impact of ESG factors on market value of companies from travel and tourism industry. Technological and Economic Development of Economy, 25(5), 820-849. Kim, A. G., & Yoon, S. (2021). Corporate Bankruptcy Prediction with Domain-Adapted BERT. EMNLP 2021, 3rd Workshop on ECONLP. Lin, W. L., Law, S. H., Ho, J. A., & Sambasivan, M. (2019). The causality direction of the corporate social responsibility—Corporate financial performance Nexus:Application of Panel Vector Autoregression approach. The North American Journal of Economics and Finance, 48, 401–418. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2020). ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. ICLR, OpenReview.net. Lundberg, S. M. & Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan & R. Garnett (Eds.), Advances in Neural Information Processing Systems 30 (pp. 4765--4774). Curran Associates, Inc. Loughran, T., & McDonald, B. (2011). When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‐Ks. Journal of Finance, 66, 1, 35-65. Mai, F., Tian, S., Lee, C., & Ma, L. (2019). Deep learning models for bankruptcy prediction using textual disclosures. European Journal of Operational Research, 274, 2, 743–758. Narvekar, A., & Guha, D. (2021). Bankruptcy prediction using machine learning and an application to the case of the COVID-19 recession. Data Science in Finance and Economics, 1, 2, 180-195. Ohlson, J. A. (1980). Financial Ratios and the Probabilistic Prediction of Bankruptcy. Journal of Accounting Research, 109-131. Premachandra, I. M., Chen, Y., & Watson, J. (2011). DEA as a tool for predicting corporate failure and success: A case of bankruptcy assessment. Omega, 39, 6, 620- 626. Salton, G., & McGill, M. J. (1983). Introduction to Modern Information Retrieval. McGraw-Hill. Shetty, S., Musa, M., & Brédart, X. (2022). Bankruptcy Prediction Using Machine Learning Techniques. Journal of Risk and Financial Management, 15, 1, 35. Shumway, T. (2001). Forecasting Bankruptcy More Accurately: A Simple Hazard Model. Journal of Business, 74, 101-124. Velte, P. (2017). Does ESG performance have an impact on financial performance? Evidence from Germany. Journal of Global Responsibility, 8, 2, 169-178.Wang, N. (2017). Bankruptcy prediction using machine learning. Journal of Mathematical Finance, 7, 908-918. Wilson, D. L. (1972). Asymptotic Properties of Nearest Neighbor Rules Using Edited Data. IEEE Transactions on Systems, Man, and Cybernetics, 2, 3, 408-421. Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. R., & Le, Q. V. (2019). XLNet: Generalized Autoregressive Pretraining for Language Understanding. Advances in Neural Information Processing Systems, 32, 5754-5764.	zh_TW

學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

Google Scholar^TM