學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

題名 基於自然語言分析建構預測企業信用評等變動之模型
Construction of Corporate Credit Rating Prediction Model Based on Natural Language Analysis
作者 陳明勝
Chen, Ming-Sheng
貢獻者 江彌修<br>趙世偉
Chiang, Mi-Hsiu<br>Chao, Shih-Wei
陳明勝
Chen, Ming-Sheng
關鍵詞 自然語言分析
神經網路
領域遷移
企業信用預警
Natural Language Analysis
Neural Network
Domain Adaption
Corporate Credit Prediction
日期 2022
上傳時間 1-Aug-2022 17:30:32 (UTC+8)
摘要 為改進過去語言分析模型無法辨認語言一字多義以及訓練域與預測域不一致之問題,本研究嘗試以BERT(Bidirectional Encoder Representations from Transformers)模型針對金融領域文本進行領域遷移(Domain Adaption),比較有無經過遷移對模型效能之改進,接著以遷移過之模型分析RavenPack資料庫內所含的美國企業相關新聞,並以此建構信用評等變動預警模型。

本研究實證結果顯示,經過遷移之模型預測財金文本情緒的預測準確率比未經遷移之模型高出30.47%,且領域遷移後辨認的新聞情緒提升對未來企業信用評等變動的預測。另外,本研究建構四個隨機森林模型,用以證明企業金融財務面的媒體情緒隱含對企業未來評級可能變動的有效資訊。
To improve the inability of the language analysis model to recognize the polysemy of the language and the inconsistency between the training domain and the prediction domain, this study uses the BERT (Bidirectional Encoder Representations from Transformers) model to perform Domain Adaption for the financial corpus. The adaption improves the performance of the model, and we further use the adapted model to analyze the news related to US companies contained in the RavenPack database and construct an early warning model for credit rating changes.

The empirical results show that the prediction accuracy of the adapted model in predicting the sentiment of financial texts is 30.47% higher than that of the non-adapted one, which shows that adaption learning indeed improves the prediction of the corporate credit rating changes. Also, we developed four different random forest models to prove that the media sentiment on the company`s financial news contains effective information on the possible changes in the company`s future rating.
參考文獻 Altman, E. I. (1968). Financial ratios, discriminant analysis and the prediction of corporate
bankruptcy. The Journal of Finance, 23(4):589–609.
Araci, D. (2019). Finbert: Financial sentiment analysis with pre-trained language models.
arXiv preprint arXiv:1908.10063.
Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. Journal of
Machine Learning Research, 3(Jan):993–1022.
Collin-Dufresn, P., Goldstein, R. S., and Martin, J. S. (2001). The determinants of credit
spread changes. The Journal of Finance, 56(6):2177–2207.
Da, Z., Engelberg, J., and Gao, P. (2015). The sum of all fears investor sentiment and asset
prices. The Review of Financial Studies, 28(1):1–32.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training
of deep bidirectional transformers for language understanding. arXiv preprint
arXiv:1810.04805.
Dyer, T., Lang, M., and Stice-Lawrence, L. (2017). The evolution of 10-k textual disclosure: Evidence from latent dirichlet allocation. Journal of Accounting and Economics,
64(2-3):221–245.
Ericsson, J., Jacobs, K., and Oviedo, R. (2009). The determinants of credit default swap
premia. Journal of Financial and Quantitative Analysis, 44(1):109–132.
Fama, E. F. (1960). Efficient market hypothesis. Diss. PhD Thesis, Ph. D. dissertation.
Galil, K. and Soffer, G. (2011). Good news, bad news and rating announcements: An
empirical investigation. Journal of Banking & Finance, 35(11):3101–3119.
Hajek, P. and Michalak, K. (2013). Feature selection in corporate credit rating prediction.
Knowledge-Based Systems, 51:72–84.
Huang, A. H., Lehavy, R., Zang, A. Y., and Zheng, R. (2018). Analyst information discovery and interpretation roles: A topic modeling approach. Management Science,
64(6):2833–2855.
Hull, J., Predescu, M., and White, A. (2004). The relationship between credit default swap
spreads, bond yields, and credit rating announcements. Journal of Banking & Finance,
28(11):2789–2811.
Hutto, C. and Gilbert, E. (2014). Vader: A parsimonious rule-based model for sentiment
analysis of social media text. In Proceedings of the International AAAI Conference on
Web and Social Media, volume 8, pages 216–225.
Jarrow, R. A. and Turnbull, S. M. (1995). Pricing derivatives on financial securities subject
to credit risk. The Journal of Finance, 50(1):53–85.
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., and Soricut, R. (2019). Albert: A lite bert for self-supervised learning of language representations. arXiv preprint
arXiv:1909.11942.
Lawrence, A. (2013). Individual investors and financial disclosure. Journal of Accounting
and Economics, 56(1):130–147.
Lee, Y.-C. (2007). Application of support vector machines to corporate credit rating prediction. Expert Systems with Applications, 33(1):67–74.
Li, F. (2008). Annual report readability, current earnings, and earnings persistence. Journal of Accounting and Economics, 45(2-3):221–247.
Li, X., Xie, H., Chen, L., Wang, J., and Deng, X. (2014). News impact on stock price
return via sentiment analysis. Knowledge-Based Systems, 69:14–23.
Liberti, J. M. and Petersen, M. A. (2019). Information: Hard and soft. Review of Corporate
Finance Studies, 8(1):1–41.
Loughran, T. and McDonald, B. (2011). When is a liability not a liability? textual analysis,
dictionaries, and 10-ks. The Journal of Finance, 66(1):35–65.
Loughran, T. and McDonald, B. (2014). Measuring readability in financial disclosures.
the Journal of Finance, 69(4):1643–1671.
Loughran, T. and McDonald, B. (2016). Textual analysis in accounting and finance: A
survey. Journal of Accounting Research, 54(4):1187–1230.
Lu, H.-M., Tsai, F.-T., Chen, H., Hung, M.-W., and Li, S.-H. (2012). Credit rating change
modeling using news and financial ratios. ACM Transactions on Management Information Systems (TMIS), 3(3):1–30.
Lundberg, S. M. and Lee, S.-I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30.
Mayew, W. J. and Venkatachalam, M. (2012). The power of voice: Managerial affective
states and future firm performance. The Journal of Finance, 67(1):1–43.
Merton, R. C. (1973). Theory of rational option pricing. The Bell Journal of Economics
and Management Science, pages 141–183.
Merton, R. C. (1974). On the pricing of corporate debt: The risk structure of interest rates.
The Journal of Finance, 29(2):449–470.
Miller, B. P. (2010). The effects of reporting complexity on small and large investor
trading. The Accounting Review, 85(6):2107–2143.
Norden, L. (2017). Information in cds spreads. Journal of Banking & Finance, 75:118–
135.
Norden, L. and Weber, M. (2004). Informational efficiency of credit default swap and
stock markets: The impact of credit rating announcements. Journal of Banking & Finance, 28(11):2813–2843.
Orsenigo, C. and Vercellis, C. (2013). Linear versus nonlinear dimensionality reduction
for banks’credit rating prediction. Knowledge-Based Systems, 47:14–22.
Pedrosa, M. (1998). Systematic risk in corporate bond credit spreads. Journal of Fixed
Income, 8(3):7–26.
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu,
P. J., et al. (2020). Exploring the limits of transfer learning with a unified text-to-text
transformer. J. Mach. Learn. Res., 21(140):1–67.
Shapiro, A. H., Sudhof, M., and Wilson, D. J. (2020). Measuring news sentiment. Journal
of Econometrics.
Smales, L. A. (2016). News sentiment and bank credit risk. Journal of Empirical Finance,
38:37–61.
Tetlock, P. C. (2007). Giving content to investor sentiment: The role of media in the stock
market. The Journal of Finance, 62(3):1139–1168.
Tetlock, P. C., Saar-Tsechansky, M., and Macskassy, S. (2008). More than words: Quantifying language to measure firms’ fundamentals. The journal of finance, 63(3):1437–
1467.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł.,
and Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information
Processing Systems, 30.
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. R., and Le, Q. V. (2019).
Xlnet: Generalized autoregressive pretraining for language understanding. Advances
in Neural Information Processing Systems, 32.
描述 碩士
國立政治大學
金融學系
109352029
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0109352029
資料類型 thesis
dc.contributor.advisor 江彌修<br>趙世偉zh_TW
dc.contributor.advisor Chiang, Mi-Hsiu<br>Chao, Shih-Weien_US
dc.contributor.author (Authors) 陳明勝zh_TW
dc.contributor.author (Authors) Chen, Ming-Shengen_US
dc.creator (作者) 陳明勝zh_TW
dc.creator (作者) Chen, Ming-Shengen_US
dc.date (日期) 2022en_US
dc.date.accessioned 1-Aug-2022 17:30:32 (UTC+8)-
dc.date.available 1-Aug-2022 17:30:32 (UTC+8)-
dc.date.issued (上傳時間) 1-Aug-2022 17:30:32 (UTC+8)-
dc.identifier (Other Identifiers) G0109352029en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/141068-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 金融學系zh_TW
dc.description (描述) 109352029zh_TW
dc.description.abstract (摘要) 為改進過去語言分析模型無法辨認語言一字多義以及訓練域與預測域不一致之問題,本研究嘗試以BERT(Bidirectional Encoder Representations from Transformers)模型針對金融領域文本進行領域遷移(Domain Adaption),比較有無經過遷移對模型效能之改進,接著以遷移過之模型分析RavenPack資料庫內所含的美國企業相關新聞,並以此建構信用評等變動預警模型。

本研究實證結果顯示,經過遷移之模型預測財金文本情緒的預測準確率比未經遷移之模型高出30.47%,且領域遷移後辨認的新聞情緒提升對未來企業信用評等變動的預測。另外,本研究建構四個隨機森林模型,用以證明企業金融財務面的媒體情緒隱含對企業未來評級可能變動的有效資訊。
zh_TW
dc.description.abstract (摘要) To improve the inability of the language analysis model to recognize the polysemy of the language and the inconsistency between the training domain and the prediction domain, this study uses the BERT (Bidirectional Encoder Representations from Transformers) model to perform Domain Adaption for the financial corpus. The adaption improves the performance of the model, and we further use the adapted model to analyze the news related to US companies contained in the RavenPack database and construct an early warning model for credit rating changes.

The empirical results show that the prediction accuracy of the adapted model in predicting the sentiment of financial texts is 30.47% higher than that of the non-adapted one, which shows that adaption learning indeed improves the prediction of the corporate credit rating changes. Also, we developed four different random forest models to prove that the media sentiment on the company`s financial news contains effective information on the possible changes in the company`s future rating.
en_US
dc.description.tableofcontents 第一章 緒論 1
1.1 研究動機與背景 1
1.2 研究目的 2
第二章 文獻回顧 3
2.1 衡量企業信用風險 3
2.2 文字分析模型 5
第三章 研究方法 10
3.1 BERT 模型 10
3.2 隨機森林 14
3.3 模型績效衡量指標 17
第四章 實證分析 22
4.1 資料處理 22
4.2 特徵生成 24
4.3 建構信用評等預警模型 28
4.4 各模型預警成效 31
第五章 結論與建議 45
參考文獻 47
zh_TW
dc.format.extent 2437979 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0109352029en_US
dc.subject (關鍵詞) 自然語言分析zh_TW
dc.subject (關鍵詞) 神經網路zh_TW
dc.subject (關鍵詞) 領域遷移zh_TW
dc.subject (關鍵詞) 企業信用預警zh_TW
dc.subject (關鍵詞) Natural Language Analysisen_US
dc.subject (關鍵詞) Neural Networken_US
dc.subject (關鍵詞) Domain Adaptionen_US
dc.subject (關鍵詞) Corporate Credit Predictionen_US
dc.title (題名) 基於自然語言分析建構預測企業信用評等變動之模型zh_TW
dc.title (題名) Construction of Corporate Credit Rating Prediction Model Based on Natural Language Analysisen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) Altman, E. I. (1968). Financial ratios, discriminant analysis and the prediction of corporate
bankruptcy. The Journal of Finance, 23(4):589–609.
Araci, D. (2019). Finbert: Financial sentiment analysis with pre-trained language models.
arXiv preprint arXiv:1908.10063.
Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. Journal of
Machine Learning Research, 3(Jan):993–1022.
Collin-Dufresn, P., Goldstein, R. S., and Martin, J. S. (2001). The determinants of credit
spread changes. The Journal of Finance, 56(6):2177–2207.
Da, Z., Engelberg, J., and Gao, P. (2015). The sum of all fears investor sentiment and asset
prices. The Review of Financial Studies, 28(1):1–32.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training
of deep bidirectional transformers for language understanding. arXiv preprint
arXiv:1810.04805.
Dyer, T., Lang, M., and Stice-Lawrence, L. (2017). The evolution of 10-k textual disclosure: Evidence from latent dirichlet allocation. Journal of Accounting and Economics,
64(2-3):221–245.
Ericsson, J., Jacobs, K., and Oviedo, R. (2009). The determinants of credit default swap
premia. Journal of Financial and Quantitative Analysis, 44(1):109–132.
Fama, E. F. (1960). Efficient market hypothesis. Diss. PhD Thesis, Ph. D. dissertation.
Galil, K. and Soffer, G. (2011). Good news, bad news and rating announcements: An
empirical investigation. Journal of Banking & Finance, 35(11):3101–3119.
Hajek, P. and Michalak, K. (2013). Feature selection in corporate credit rating prediction.
Knowledge-Based Systems, 51:72–84.
Huang, A. H., Lehavy, R., Zang, A. Y., and Zheng, R. (2018). Analyst information discovery and interpretation roles: A topic modeling approach. Management Science,
64(6):2833–2855.
Hull, J., Predescu, M., and White, A. (2004). The relationship between credit default swap
spreads, bond yields, and credit rating announcements. Journal of Banking & Finance,
28(11):2789–2811.
Hutto, C. and Gilbert, E. (2014). Vader: A parsimonious rule-based model for sentiment
analysis of social media text. In Proceedings of the International AAAI Conference on
Web and Social Media, volume 8, pages 216–225.
Jarrow, R. A. and Turnbull, S. M. (1995). Pricing derivatives on financial securities subject
to credit risk. The Journal of Finance, 50(1):53–85.
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., and Soricut, R. (2019). Albert: A lite bert for self-supervised learning of language representations. arXiv preprint
arXiv:1909.11942.
Lawrence, A. (2013). Individual investors and financial disclosure. Journal of Accounting
and Economics, 56(1):130–147.
Lee, Y.-C. (2007). Application of support vector machines to corporate credit rating prediction. Expert Systems with Applications, 33(1):67–74.
Li, F. (2008). Annual report readability, current earnings, and earnings persistence. Journal of Accounting and Economics, 45(2-3):221–247.
Li, X., Xie, H., Chen, L., Wang, J., and Deng, X. (2014). News impact on stock price
return via sentiment analysis. Knowledge-Based Systems, 69:14–23.
Liberti, J. M. and Petersen, M. A. (2019). Information: Hard and soft. Review of Corporate
Finance Studies, 8(1):1–41.
Loughran, T. and McDonald, B. (2011). When is a liability not a liability? textual analysis,
dictionaries, and 10-ks. The Journal of Finance, 66(1):35–65.
Loughran, T. and McDonald, B. (2014). Measuring readability in financial disclosures.
the Journal of Finance, 69(4):1643–1671.
Loughran, T. and McDonald, B. (2016). Textual analysis in accounting and finance: A
survey. Journal of Accounting Research, 54(4):1187–1230.
Lu, H.-M., Tsai, F.-T., Chen, H., Hung, M.-W., and Li, S.-H. (2012). Credit rating change
modeling using news and financial ratios. ACM Transactions on Management Information Systems (TMIS), 3(3):1–30.
Lundberg, S. M. and Lee, S.-I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30.
Mayew, W. J. and Venkatachalam, M. (2012). The power of voice: Managerial affective
states and future firm performance. The Journal of Finance, 67(1):1–43.
Merton, R. C. (1973). Theory of rational option pricing. The Bell Journal of Economics
and Management Science, pages 141–183.
Merton, R. C. (1974). On the pricing of corporate debt: The risk structure of interest rates.
The Journal of Finance, 29(2):449–470.
Miller, B. P. (2010). The effects of reporting complexity on small and large investor
trading. The Accounting Review, 85(6):2107–2143.
Norden, L. (2017). Information in cds spreads. Journal of Banking & Finance, 75:118–
135.
Norden, L. and Weber, M. (2004). Informational efficiency of credit default swap and
stock markets: The impact of credit rating announcements. Journal of Banking & Finance, 28(11):2813–2843.
Orsenigo, C. and Vercellis, C. (2013). Linear versus nonlinear dimensionality reduction
for banks’credit rating prediction. Knowledge-Based Systems, 47:14–22.
Pedrosa, M. (1998). Systematic risk in corporate bond credit spreads. Journal of Fixed
Income, 8(3):7–26.
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu,
P. J., et al. (2020). Exploring the limits of transfer learning with a unified text-to-text
transformer. J. Mach. Learn. Res., 21(140):1–67.
Shapiro, A. H., Sudhof, M., and Wilson, D. J. (2020). Measuring news sentiment. Journal
of Econometrics.
Smales, L. A. (2016). News sentiment and bank credit risk. Journal of Empirical Finance,
38:37–61.
Tetlock, P. C. (2007). Giving content to investor sentiment: The role of media in the stock
market. The Journal of Finance, 62(3):1139–1168.
Tetlock, P. C., Saar-Tsechansky, M., and Macskassy, S. (2008). More than words: Quantifying language to measure firms’ fundamentals. The journal of finance, 63(3):1437–
1467.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł.,
and Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information
Processing Systems, 30.
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. R., and Le, Q. V. (2019).
Xlnet: Generalized autoregressive pretraining for language understanding. Advances
in Neural Information Processing Systems, 32.
zh_TW
dc.identifier.doi (DOI) 10.6814/NCCU202200901en_US