學術產出-Theses
Article View/Open
Publication Export
-
題名 應用LDA主題模型於美國企業破產預測之研究
Applying LDA Topic Modeling to U.S. Corporate Bankruptcy Prediction作者 許哲維
Hsu, Che-Wei貢獻者 江彌修
Chiang, Mi-Hsiu
許哲維
Hsu, Che-Wei關鍵詞 LDA
主題模型
企業破產預警
10-K財報
LDA
Topic modeling
Corporate bankruptcy prediction
10-K日期 2019 上傳時間 6-Dec-2019 09:25:54 (UTC+8) 摘要 近年來,利用文字探勘進行文本資訊的特徵提取愈來愈便捷,許多研究逐漸運用文字探勘的技術,結合企業相關的新聞內容或公司發布的消息等文本資料,應用於金融與會計領域的研究,希望透過文字本身隱含的情緒萃取出更精準且即時的訊息,以增強模型的解釋能力、預測能力及結果的穩定程度。本研究以主題模型中的隱含狄利克雷分布LDA(Latent Dirichlet Allocation),將10-K財報的文本資訊透過模型進行主題的分類,觀察和風險有關的主題之下的字詞經由標準化後形成的變數是否能有效增加破產模型預測的準確度。根據實證結果,以10-K財報建立LDA主題分類後,選取和風險攸關的字詞之頻率進行檢驗,並標準化形成風險攸關主題變數後,發現無論是使用Logit模型或是Probit模型,納入風險攸關主題變數皆能夠提升美國企業破產預測的結果。
In recent years, as it is way less time-consuming to apply text mining techniques, more researchers have made efforts to extract certain characteristics from soft data by combining text mining techniques with their own field of expertise to further capture real-time text information and improve their research as well. However, there is little research focusing on topic modeling and consideration of latent topics existing in every document in the field of finance. In this research, LDA topic modeling, a fashion to perform latent semantic analysis, is applied to categorize soft information from 10-K financial reports into several topics. The ultimate goal in this research is to analyze whether the standardization of word frequencies of the words under risk-related topics could improve corporate bankruptcy predicting accuracy. According to the empirical results, when using risk-related topic variable after enforcing LDA topic modeling and further transforming the outcome to a standardized variable in the model, the U.S. corporate bankruptcy predicting accuracy during the time period from 1998 to 2017 is improved under both Logit and Probit models.參考文獻 [1] Altman, E. I. (1968). Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy. The journal of finance, 23(4), 589-609.[2] Aziz, S., Dowling, M. M., Hammami, H., & Piepenbrink, A. (2019). Machine Learning in Finance: A Topic Modeling Approach. Available at SSRN 3327277.[3] Beaver, W. H., McNichols, M. F., & Rhie, J. W. (2005). Have Financial Statements Become Less Informative? Evidence from the Ability of Financial Ratios to Predict Bankruptcy. Review of Accounting studies, 10(1), 93-122.[4] Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3(Jan), 993-1022.[5] Bodnaruk, A., Loughran, T., & McDonald, B. (2015). Using 10-K Text to Gauge Financial Constraints. Journal of Financial and Quantitative Analysis, 50(4), 623-646.[6] Crosbie, P. J., & Bohn, J. R. (1999). Modeling Default Risk (KMV LLC).[7] Dyer, T., Lang, M., & Stice-Lawrence, L. (2017). The Evolution of 10-K Textual Disclosure: Evidence from Latent Dirichlet Allocation. Journal of Accounting and Economics, 64(2-3), 221-245.[8] Edison, H., & Carcel, H. (2019). Text Data Analysis Using Latent Dirichlet Allocation: An Application to FOMC Transcripts (No. 11). Bank of Lithuania.[9] Griffiths, T. L., & Steyvers, M. (2004). Finding Scientific Topics. Proceedings of the National Academy of Sciences, 101(suppl 1), 5228-5235.[10] Hansen, S., McMahon, M., & Prat, A. (2017). Transparency and Deliberation Within the FOMC: a Computational Linguistics Approach. The Quarterly Journal of Economics, 133(2), 801-870.[11] Hofmann, T. (1999, July). Probabilistic Latent Semantic Analysis. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (pp. 289-296). Morgan Kaufmann Publishers Inc.[12] Loughran, T., & McDonald, B. (2011). When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‐Ks. The Journal of Finance, 66(1), 35-65.[13] Merton, R. C. (1974). On the Pricing of Corporate Debt: The Risk Structure of Interest Rates. The Journal of finance, 29(2), 449-470.[14] Moro, S., Cortez, P., & Rita, P. (2015). Business Intelligence in Banking: A literature Analysis from 2002 to 2013 Using Text Mining and Latent Dirichlet Allocation. Expert Systems with Applications, 42(3), 1314-1324.[15] Odom, M. D., & Sharda, R. (1990, June). A Neural Network Model for Bankruptcy Prediction. In 1990 IJCNN International Joint Conference on Neural Networks (pp. 163-168). IEEE.[16] Ohlson, J. A. (1980). Financial Ratios and the Probabilistic Prediction of Bankruptcy. Journal of Accounting Research, 109-131.[17] Tsai, F. T., Lu, H. M., & Hung, M. W. (2016). The Impact of News Articles and Corporate Disclosure on Credit Risk Valuation. Journal of Banking & Finance, 68, 100-116.[18] Tsai, M. F., & Wang, C. J. (2017). On the Risk Prediction and Analysis of Soft Information in Finance Reports. European Journal of Operational Research, 257(1), 243-250.[19] Salton, G., & McGill, M. J. (1983). Introduction to Modern Information Retrieval. Mcgraw-hill.[20] Schwarz, G. (1978). Estimating the Dimension of a Model. The Annals of Statistics, 6(2), 461-464.[21] Shumway, T. (2001). Forecasting Bankruptcy More Accurately: A Simple Hazard Model. The Journal of Business, 74(1), 101-124.[22] Timmermans, M., & Finance, M. (2014). US Corporate Bankruptcy Predicting Models (Doctoral Dissertation, Master’s thesis.[online]. Tilburg University, Tilburg. Available from: http://arno. uvt. nl/show. cgi).[23] Zmijewski, M. E. (1984). Methodological Issues Related to the Estimation of Financial Distress Prediction Models. Journal of Accounting Research, 59-82. 描述 碩士
國立政治大學
金融學系
106352002資料來源 http://thesis.lib.nccu.edu.tw/record/#G0106352002 資料類型 thesis dc.contributor.advisor 江彌修 zh_TW dc.contributor.advisor Chiang, Mi-Hsiu en_US dc.contributor.author (Authors) 許哲維 zh_TW dc.contributor.author (Authors) Hsu, Che-Wei en_US dc.creator (作者) 許哲維 zh_TW dc.creator (作者) Hsu, Che-Wei en_US dc.date (日期) 2019 en_US dc.date.accessioned 6-Dec-2019 09:25:54 (UTC+8) - dc.date.available 6-Dec-2019 09:25:54 (UTC+8) - dc.date.issued (上傳時間) 6-Dec-2019 09:25:54 (UTC+8) - dc.identifier (Other Identifiers) G0106352002 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/127745 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 金融學系 zh_TW dc.description (描述) 106352002 zh_TW dc.description.abstract (摘要) 近年來,利用文字探勘進行文本資訊的特徵提取愈來愈便捷,許多研究逐漸運用文字探勘的技術,結合企業相關的新聞內容或公司發布的消息等文本資料,應用於金融與會計領域的研究,希望透過文字本身隱含的情緒萃取出更精準且即時的訊息,以增強模型的解釋能力、預測能力及結果的穩定程度。本研究以主題模型中的隱含狄利克雷分布LDA(Latent Dirichlet Allocation),將10-K財報的文本資訊透過模型進行主題的分類,觀察和風險有關的主題之下的字詞經由標準化後形成的變數是否能有效增加破產模型預測的準確度。根據實證結果,以10-K財報建立LDA主題分類後,選取和風險攸關的字詞之頻率進行檢驗,並標準化形成風險攸關主題變數後,發現無論是使用Logit模型或是Probit模型,納入風險攸關主題變數皆能夠提升美國企業破產預測的結果。 zh_TW dc.description.abstract (摘要) In recent years, as it is way less time-consuming to apply text mining techniques, more researchers have made efforts to extract certain characteristics from soft data by combining text mining techniques with their own field of expertise to further capture real-time text information and improve their research as well. However, there is little research focusing on topic modeling and consideration of latent topics existing in every document in the field of finance. In this research, LDA topic modeling, a fashion to perform latent semantic analysis, is applied to categorize soft information from 10-K financial reports into several topics. The ultimate goal in this research is to analyze whether the standardization of word frequencies of the words under risk-related topics could improve corporate bankruptcy predicting accuracy. According to the empirical results, when using risk-related topic variable after enforcing LDA topic modeling and further transforming the outcome to a standardized variable in the model, the U.S. corporate bankruptcy predicting accuracy during the time period from 1998 to 2017 is improved under both Logit and Probit models. en_US dc.description.tableofcontents 第一章 緒論 1第一節 研究動機與背景 1第二節 研究目的 2第二章 文獻探討 4第一節 破產預測相關研究 4第二節 文字探勘與主題模型 6第三節 10-K財報應用於文字探勘之相關文獻 9第三章 研究方法 10第一節 主題模型LDA 10第二節 破產預測模型 12第三節 模型績效衡量 14第四章 資料來源與處理過程 18第一節 Altman、Ohlson與Zmijewski變數 18第二節 LDA模型之下的風險指標 21第三節 變數選取 29第五章 實證分析 33第一節 破產預測模型建立 33第二節 模型績效評估 37第六章 結論與建議 44參考文獻 46 zh_TW dc.format.extent 1975464 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0106352002 en_US dc.subject (關鍵詞) LDA zh_TW dc.subject (關鍵詞) 主題模型 zh_TW dc.subject (關鍵詞) 企業破產預警 zh_TW dc.subject (關鍵詞) 10-K財報 zh_TW dc.subject (關鍵詞) LDA en_US dc.subject (關鍵詞) Topic modeling en_US dc.subject (關鍵詞) Corporate bankruptcy prediction en_US dc.subject (關鍵詞) 10-K en_US dc.title (題名) 應用LDA主題模型於美國企業破產預測之研究 zh_TW dc.title (題名) Applying LDA Topic Modeling to U.S. Corporate Bankruptcy Prediction en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) [1] Altman, E. I. (1968). Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy. The journal of finance, 23(4), 589-609.[2] Aziz, S., Dowling, M. M., Hammami, H., & Piepenbrink, A. (2019). Machine Learning in Finance: A Topic Modeling Approach. Available at SSRN 3327277.[3] Beaver, W. H., McNichols, M. F., & Rhie, J. W. (2005). Have Financial Statements Become Less Informative? Evidence from the Ability of Financial Ratios to Predict Bankruptcy. Review of Accounting studies, 10(1), 93-122.[4] Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3(Jan), 993-1022.[5] Bodnaruk, A., Loughran, T., & McDonald, B. (2015). Using 10-K Text to Gauge Financial Constraints. Journal of Financial and Quantitative Analysis, 50(4), 623-646.[6] Crosbie, P. J., & Bohn, J. R. (1999). Modeling Default Risk (KMV LLC).[7] Dyer, T., Lang, M., & Stice-Lawrence, L. (2017). The Evolution of 10-K Textual Disclosure: Evidence from Latent Dirichlet Allocation. Journal of Accounting and Economics, 64(2-3), 221-245.[8] Edison, H., & Carcel, H. (2019). Text Data Analysis Using Latent Dirichlet Allocation: An Application to FOMC Transcripts (No. 11). Bank of Lithuania.[9] Griffiths, T. L., & Steyvers, M. (2004). Finding Scientific Topics. Proceedings of the National Academy of Sciences, 101(suppl 1), 5228-5235.[10] Hansen, S., McMahon, M., & Prat, A. (2017). Transparency and Deliberation Within the FOMC: a Computational Linguistics Approach. The Quarterly Journal of Economics, 133(2), 801-870.[11] Hofmann, T. (1999, July). Probabilistic Latent Semantic Analysis. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (pp. 289-296). Morgan Kaufmann Publishers Inc.[12] Loughran, T., & McDonald, B. (2011). When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‐Ks. The Journal of Finance, 66(1), 35-65.[13] Merton, R. C. (1974). On the Pricing of Corporate Debt: The Risk Structure of Interest Rates. The Journal of finance, 29(2), 449-470.[14] Moro, S., Cortez, P., & Rita, P. (2015). Business Intelligence in Banking: A literature Analysis from 2002 to 2013 Using Text Mining and Latent Dirichlet Allocation. Expert Systems with Applications, 42(3), 1314-1324.[15] Odom, M. D., & Sharda, R. (1990, June). A Neural Network Model for Bankruptcy Prediction. In 1990 IJCNN International Joint Conference on Neural Networks (pp. 163-168). IEEE.[16] Ohlson, J. A. (1980). Financial Ratios and the Probabilistic Prediction of Bankruptcy. Journal of Accounting Research, 109-131.[17] Tsai, F. T., Lu, H. M., & Hung, M. W. (2016). The Impact of News Articles and Corporate Disclosure on Credit Risk Valuation. Journal of Banking & Finance, 68, 100-116.[18] Tsai, M. F., & Wang, C. J. (2017). On the Risk Prediction and Analysis of Soft Information in Finance Reports. European Journal of Operational Research, 257(1), 243-250.[19] Salton, G., & McGill, M. J. (1983). Introduction to Modern Information Retrieval. Mcgraw-hill.[20] Schwarz, G. (1978). Estimating the Dimension of a Model. The Annals of Statistics, 6(2), 461-464.[21] Shumway, T. (2001). Forecasting Bankruptcy More Accurately: A Simple Hazard Model. The Journal of Business, 74(1), 101-124.[22] Timmermans, M., & Finance, M. (2014). US Corporate Bankruptcy Predicting Models (Doctoral Dissertation, Master’s thesis.[online]. Tilburg University, Tilburg. Available from: http://arno. uvt. nl/show. cgi).[23] Zmijewski, M. E. (1984). Methodological Issues Related to the Estimation of Financial Distress Prediction Models. Journal of Accounting Research, 59-82. zh_TW dc.identifier.doi (DOI) 10.6814/NCCU201901264 en_US