學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

題名 應用LDA主題模型於美國企業破產預測之研究
Applying LDA Topic Modeling to U.S. Corporate Bankruptcy Prediction
作者 許哲維
Hsu, Che-Wei
貢獻者 江彌修
Chiang, Mi-Hsiu
許哲維
Hsu, Che-Wei
關鍵詞 LDA
主題模型
企業破產預警
10-K財報
LDA
Topic modeling
Corporate bankruptcy prediction
10-K
日期 2019
上傳時間 6-Dec-2019 09:25:54 (UTC+8)
摘要 近年來,利用文字探勘進行文本資訊的特徵提取愈來愈便捷,許多研究逐漸運用文字探勘的技術,結合企業相關的新聞內容或公司發布的消息等文本資料,應用於金融與會計領域的研究,希望透過文字本身隱含的情緒萃取出更精準且即時的訊息,以增強模型的解釋能力、預測能力及結果的穩定程度。本研究以主題模型中的隱含狄利克雷分布LDA(Latent Dirichlet Allocation),將10-K財報的文本資訊透過模型進行主題的分類,觀察和風險有關的主題之下的字詞經由標準化後形成的變數是否能有效增加破產模型預測的準確度。根據實證結果,以10-K財報建立LDA主題分類後,選取和風險攸關的字詞之頻率進行檢驗,並標準化形成風險攸關主題變數後,發現無論是使用Logit模型或是Probit模型,納入風險攸關主題變數皆能夠提升美國企業破產預測的結果。
In recent years, as it is way less time-consuming to apply text mining techniques, more researchers have made efforts to extract certain characteristics from soft data by combining text mining techniques with their own field of expertise to further capture real-time text information and improve their research as well. However, there is little research focusing on topic modeling and consideration of latent topics existing in every document in the field of finance. In this research, LDA topic modeling, a fashion to perform latent semantic analysis, is applied to categorize soft information from 10-K financial reports into several topics. The ultimate goal in this research is to analyze whether the standardization of word frequencies of the words under risk-related topics could improve corporate bankruptcy predicting accuracy. According to the empirical results, when using risk-related topic variable after enforcing LDA topic modeling and further transforming the outcome to a standardized variable in the model, the U.S. corporate bankruptcy predicting accuracy during the time period from 1998 to 2017 is improved under both Logit and Probit models.
參考文獻 [1] Altman, E. I. (1968). Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy. The journal of finance, 23(4), 589-609.
[2] Aziz, S., Dowling, M. M., Hammami, H., & Piepenbrink, A. (2019). Machine Learning in Finance: A Topic Modeling Approach. Available at SSRN 3327277.
[3] Beaver, W. H., McNichols, M. F., & Rhie, J. W. (2005). Have Financial Statements Become Less Informative? Evidence from the Ability of Financial Ratios to Predict Bankruptcy. Review of Accounting studies, 10(1), 93-122.
[4] Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3(Jan), 993-1022.
[5] Bodnaruk, A., Loughran, T., & McDonald, B. (2015). Using 10-K Text to Gauge Financial Constraints. Journal of Financial and Quantitative Analysis, 50(4), 623-646.
[6] Crosbie, P. J., & Bohn, J. R. (1999). Modeling Default Risk (KMV LLC).
[7] Dyer, T., Lang, M., & Stice-Lawrence, L. (2017). The Evolution of 10-K Textual Disclosure: Evidence from Latent Dirichlet Allocation. Journal of Accounting and Economics, 64(2-3), 221-245.
[8] Edison, H., & Carcel, H. (2019). Text Data Analysis Using Latent Dirichlet Allocation: An Application to FOMC Transcripts (No. 11). Bank of Lithuania.
[9] Griffiths, T. L., & Steyvers, M. (2004). Finding Scientific Topics. Proceedings of the National Academy of Sciences, 101(suppl 1), 5228-5235.
[10] Hansen, S., McMahon, M., & Prat, A. (2017). Transparency and Deliberation Within the FOMC: a Computational Linguistics Approach. The Quarterly Journal of Economics, 133(2), 801-870.
[11] Hofmann, T. (1999, July). Probabilistic Latent Semantic Analysis. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (pp. 289-296). Morgan Kaufmann Publishers Inc.
[12] Loughran, T., & McDonald, B. (2011). When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‐Ks. The Journal of Finance, 66(1), 35-65.
[13] Merton, R. C. (1974). On the Pricing of Corporate Debt: The Risk Structure of Interest Rates. The Journal of finance, 29(2), 449-470.
[14] Moro, S., Cortez, P., & Rita, P. (2015). Business Intelligence in Banking: A literature Analysis from 2002 to 2013 Using Text Mining and Latent Dirichlet Allocation. Expert Systems with Applications, 42(3), 1314-1324.
[15] Odom, M. D., & Sharda, R. (1990, June). A Neural Network Model for Bankruptcy Prediction. In 1990 IJCNN International Joint Conference on Neural Networks (pp. 163-168). IEEE.
[16] Ohlson, J. A. (1980). Financial Ratios and the Probabilistic Prediction of Bankruptcy. Journal of Accounting Research, 109-131.
[17] Tsai, F. T., Lu, H. M., & Hung, M. W. (2016). The Impact of News Articles and Corporate Disclosure on Credit Risk Valuation. Journal of Banking & Finance, 68, 100-116.
[18] Tsai, M. F., & Wang, C. J. (2017). On the Risk Prediction and Analysis of Soft Information in Finance Reports. European Journal of Operational Research, 257(1), 243-250.
[19] Salton, G., & McGill, M. J. (1983). Introduction to Modern Information Retrieval. Mcgraw-hill.
[20] Schwarz, G. (1978). Estimating the Dimension of a Model. The Annals of Statistics, 6(2), 461-464.
[21] Shumway, T. (2001). Forecasting Bankruptcy More Accurately: A Simple Hazard Model. The Journal of Business, 74(1), 101-124.
[22] Timmermans, M., & Finance, M. (2014). US Corporate Bankruptcy Predicting Models (Doctoral Dissertation, Master’s thesis.[online]. Tilburg University, Tilburg. Available from: http://arno. uvt. nl/show. cgi).
[23] Zmijewski, M. E. (1984). Methodological Issues Related to the Estimation of Financial Distress Prediction Models. Journal of Accounting Research, 59-82.
描述 碩士
國立政治大學
金融學系
106352002
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0106352002
資料類型 thesis
dc.contributor.advisor 江彌修zh_TW
dc.contributor.advisor Chiang, Mi-Hsiuen_US
dc.contributor.author (Authors) 許哲維zh_TW
dc.contributor.author (Authors) Hsu, Che-Weien_US
dc.creator (作者) 許哲維zh_TW
dc.creator (作者) Hsu, Che-Weien_US
dc.date (日期) 2019en_US
dc.date.accessioned 6-Dec-2019 09:25:54 (UTC+8)-
dc.date.available 6-Dec-2019 09:25:54 (UTC+8)-
dc.date.issued (上傳時間) 6-Dec-2019 09:25:54 (UTC+8)-
dc.identifier (Other Identifiers) G0106352002en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/127745-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 金融學系zh_TW
dc.description (描述) 106352002zh_TW
dc.description.abstract (摘要) 近年來,利用文字探勘進行文本資訊的特徵提取愈來愈便捷,許多研究逐漸運用文字探勘的技術,結合企業相關的新聞內容或公司發布的消息等文本資料,應用於金融與會計領域的研究,希望透過文字本身隱含的情緒萃取出更精準且即時的訊息,以增強模型的解釋能力、預測能力及結果的穩定程度。本研究以主題模型中的隱含狄利克雷分布LDA(Latent Dirichlet Allocation),將10-K財報的文本資訊透過模型進行主題的分類,觀察和風險有關的主題之下的字詞經由標準化後形成的變數是否能有效增加破產模型預測的準確度。根據實證結果,以10-K財報建立LDA主題分類後,選取和風險攸關的字詞之頻率進行檢驗,並標準化形成風險攸關主題變數後,發現無論是使用Logit模型或是Probit模型,納入風險攸關主題變數皆能夠提升美國企業破產預測的結果。zh_TW
dc.description.abstract (摘要) In recent years, as it is way less time-consuming to apply text mining techniques, more researchers have made efforts to extract certain characteristics from soft data by combining text mining techniques with their own field of expertise to further capture real-time text information and improve their research as well. However, there is little research focusing on topic modeling and consideration of latent topics existing in every document in the field of finance. In this research, LDA topic modeling, a fashion to perform latent semantic analysis, is applied to categorize soft information from 10-K financial reports into several topics. The ultimate goal in this research is to analyze whether the standardization of word frequencies of the words under risk-related topics could improve corporate bankruptcy predicting accuracy. According to the empirical results, when using risk-related topic variable after enforcing LDA topic modeling and further transforming the outcome to a standardized variable in the model, the U.S. corporate bankruptcy predicting accuracy during the time period from 1998 to 2017 is improved under both Logit and Probit models.en_US
dc.description.tableofcontents 第一章 緒論 1
第一節 研究動機與背景 1
第二節 研究目的 2
第二章 文獻探討 4
第一節 破產預測相關研究 4
第二節 文字探勘與主題模型 6
第三節 10-K財報應用於文字探勘之相關文獻 9
第三章 研究方法 10
第一節 主題模型LDA 10
第二節 破產預測模型 12
第三節 模型績效衡量 14
第四章 資料來源與處理過程 18
第一節 Altman、Ohlson與Zmijewski變數 18
第二節 LDA模型之下的風險指標 21
第三節 變數選取 29
第五章 實證分析 33
第一節 破產預測模型建立 33
第二節 模型績效評估 37
第六章 結論與建議 44
參考文獻 46
zh_TW
dc.format.extent 1975464 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0106352002en_US
dc.subject (關鍵詞) LDAzh_TW
dc.subject (關鍵詞) 主題模型zh_TW
dc.subject (關鍵詞) 企業破產預警zh_TW
dc.subject (關鍵詞) 10-K財報zh_TW
dc.subject (關鍵詞) LDAen_US
dc.subject (關鍵詞) Topic modelingen_US
dc.subject (關鍵詞) Corporate bankruptcy predictionen_US
dc.subject (關鍵詞) 10-Ken_US
dc.title (題名) 應用LDA主題模型於美國企業破產預測之研究zh_TW
dc.title (題名) Applying LDA Topic Modeling to U.S. Corporate Bankruptcy Predictionen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1] Altman, E. I. (1968). Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy. The journal of finance, 23(4), 589-609.
[2] Aziz, S., Dowling, M. M., Hammami, H., & Piepenbrink, A. (2019). Machine Learning in Finance: A Topic Modeling Approach. Available at SSRN 3327277.
[3] Beaver, W. H., McNichols, M. F., & Rhie, J. W. (2005). Have Financial Statements Become Less Informative? Evidence from the Ability of Financial Ratios to Predict Bankruptcy. Review of Accounting studies, 10(1), 93-122.
[4] Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3(Jan), 993-1022.
[5] Bodnaruk, A., Loughran, T., & McDonald, B. (2015). Using 10-K Text to Gauge Financial Constraints. Journal of Financial and Quantitative Analysis, 50(4), 623-646.
[6] Crosbie, P. J., & Bohn, J. R. (1999). Modeling Default Risk (KMV LLC).
[7] Dyer, T., Lang, M., & Stice-Lawrence, L. (2017). The Evolution of 10-K Textual Disclosure: Evidence from Latent Dirichlet Allocation. Journal of Accounting and Economics, 64(2-3), 221-245.
[8] Edison, H., & Carcel, H. (2019). Text Data Analysis Using Latent Dirichlet Allocation: An Application to FOMC Transcripts (No. 11). Bank of Lithuania.
[9] Griffiths, T. L., & Steyvers, M. (2004). Finding Scientific Topics. Proceedings of the National Academy of Sciences, 101(suppl 1), 5228-5235.
[10] Hansen, S., McMahon, M., & Prat, A. (2017). Transparency and Deliberation Within the FOMC: a Computational Linguistics Approach. The Quarterly Journal of Economics, 133(2), 801-870.
[11] Hofmann, T. (1999, July). Probabilistic Latent Semantic Analysis. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (pp. 289-296). Morgan Kaufmann Publishers Inc.
[12] Loughran, T., & McDonald, B. (2011). When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‐Ks. The Journal of Finance, 66(1), 35-65.
[13] Merton, R. C. (1974). On the Pricing of Corporate Debt: The Risk Structure of Interest Rates. The Journal of finance, 29(2), 449-470.
[14] Moro, S., Cortez, P., & Rita, P. (2015). Business Intelligence in Banking: A literature Analysis from 2002 to 2013 Using Text Mining and Latent Dirichlet Allocation. Expert Systems with Applications, 42(3), 1314-1324.
[15] Odom, M. D., & Sharda, R. (1990, June). A Neural Network Model for Bankruptcy Prediction. In 1990 IJCNN International Joint Conference on Neural Networks (pp. 163-168). IEEE.
[16] Ohlson, J. A. (1980). Financial Ratios and the Probabilistic Prediction of Bankruptcy. Journal of Accounting Research, 109-131.
[17] Tsai, F. T., Lu, H. M., & Hung, M. W. (2016). The Impact of News Articles and Corporate Disclosure on Credit Risk Valuation. Journal of Banking & Finance, 68, 100-116.
[18] Tsai, M. F., & Wang, C. J. (2017). On the Risk Prediction and Analysis of Soft Information in Finance Reports. European Journal of Operational Research, 257(1), 243-250.
[19] Salton, G., & McGill, M. J. (1983). Introduction to Modern Information Retrieval. Mcgraw-hill.
[20] Schwarz, G. (1978). Estimating the Dimension of a Model. The Annals of Statistics, 6(2), 461-464.
[21] Shumway, T. (2001). Forecasting Bankruptcy More Accurately: A Simple Hazard Model. The Journal of Business, 74(1), 101-124.
[22] Timmermans, M., & Finance, M. (2014). US Corporate Bankruptcy Predicting Models (Doctoral Dissertation, Master’s thesis.[online]. Tilburg University, Tilburg. Available from: http://arno. uvt. nl/show. cgi).
[23] Zmijewski, M. E. (1984). Methodological Issues Related to the Estimation of Financial Distress Prediction Models. Journal of Accounting Research, 59-82.
zh_TW
dc.identifier.doi (DOI) 10.6814/NCCU201901264en_US