應用LDA主題模型於美國企業破產預測之研究

學術產出-Theses

Article View/Open

pdf(41)

Publication Export

Google Scholar^TM

政大圖書館

學術資源探索系統

Citation Infomation

Simple Record
Full Record

題名	應用LDA主題模型於美國企業破產預測之研究 Applying LDA Topic Modeling to U.S. Corporate Bankruptcy Prediction
作者	許哲維 Hsu, Che-Wei
貢獻者	江彌修 Chiang, Mi-Hsiu 許哲維 Hsu, Che-Wei
關鍵詞	LDA 主題模型企業破產預警 10-K財報 LDA Topic modeling Corporate bankruptcy prediction 10-K
日期	2019
上傳時間	6-Dec-2019 09:25:54 (UTC+8)
摘要	近年來，利用文字探勘進行文本資訊的特徵提取愈來愈便捷，許多研究逐漸運用文字探勘的技術，結合企業相關的新聞內容或公司發布的消息等文本資料，應用於金融與會計領域的研究，希望透過文字本身隱含的情緒萃取出更精準且即時的訊息，以增強模型的解釋能力、預測能力及結果的穩定程度。本研究以主題模型中的隱含狄利克雷分布LDA（Latent Dirichlet Allocation），將10-K財報的文本資訊透過模型進行主題的分類，觀察和風險有關的主題之下的字詞經由標準化後形成的變數是否能有效增加破產模型預測的準確度。根據實證結果，以10-K財報建立LDA主題分類後，選取和風險攸關的字詞之頻率進行檢驗，並標準化形成風險攸關主題變數後，發現無論是使用Logit模型或是Probit模型，納入風險攸關主題變數皆能夠提升美國企業破產預測的結果。 In recent years, as it is way less time-consuming to apply text mining techniques, more researchers have made efforts to extract certain characteristics from soft data by combining text mining techniques with their own field of expertise to further capture real-time text information and improve their research as well. However, there is little research focusing on topic modeling and consideration of latent topics existing in every document in the field of finance. In this research, LDA topic modeling, a fashion to perform latent semantic analysis, is applied to categorize soft information from 10-K financial reports into several topics. The ultimate goal in this research is to analyze whether the standardization of word frequencies of the words under risk-related topics could improve corporate bankruptcy predicting accuracy. According to the empirical results, when using risk-related topic variable after enforcing LDA topic modeling and further transforming the outcome to a standardized variable in the model, the U.S. corporate bankruptcy predicting accuracy during the time period from 1998 to 2017 is improved under both Logit and Probit models.
參考文獻	[1] Altman, E. I. (1968). Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy. The journal of finance, 23(4), 589-609. [2] Aziz, S., Dowling, M. M., Hammami, H., & Piepenbrink, A. (2019). Machine Learning in Finance: A Topic Modeling Approach. Available at SSRN 3327277. [3] Beaver, W. H., McNichols, M. F., & Rhie, J. W. (2005). Have Financial Statements Become Less Informative? Evidence from the Ability of Financial Ratios to Predict Bankruptcy. Review of Accounting studies, 10(1), 93-122. [4] Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3(Jan), 993-1022. [5] Bodnaruk, A., Loughran, T., & McDonald, B. (2015). Using 10-K Text to Gauge Financial Constraints. Journal of Financial and Quantitative Analysis, 50(4), 623-646. [6] Crosbie, P. J., & Bohn, J. R. (1999). Modeling Default Risk (KMV LLC). [7] Dyer, T., Lang, M., & Stice-Lawrence, L. (2017). The Evolution of 10-K Textual Disclosure: Evidence from Latent Dirichlet Allocation. Journal of Accounting and Economics, 64(2-3), 221-245. [8] Edison, H., & Carcel, H. (2019). Text Data Analysis Using Latent Dirichlet Allocation: An Application to FOMC Transcripts (No. 11). Bank of Lithuania. [9] Griffiths, T. L., & Steyvers, M. (2004). Finding Scientific Topics. Proceedings of the National Academy of Sciences, 101(suppl 1), 5228-5235. [10] Hansen, S., McMahon, M., & Prat, A. (2017). Transparency and Deliberation Within the FOMC: a Computational Linguistics Approach. The Quarterly Journal of Economics, 133(2), 801-870. [11] Hofmann, T. (1999, July). Probabilistic Latent Semantic Analysis. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (pp. 289-296). Morgan Kaufmann Publishers Inc. [12] Loughran, T., & McDonald, B. (2011). When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‐Ks. The Journal of Finance, 66(1), 35-65. [13] Merton, R. C. (1974). On the Pricing of Corporate Debt: The Risk Structure of Interest Rates. The Journal of finance, 29(2), 449-470. [14] Moro, S., Cortez, P., & Rita, P. (2015). Business Intelligence in Banking: A literature Analysis from 2002 to 2013 Using Text Mining and Latent Dirichlet Allocation. Expert Systems with Applications, 42(3), 1314-1324. [15] Odom, M. D., & Sharda, R. (1990, June). A Neural Network Model for Bankruptcy Prediction. In 1990 IJCNN International Joint Conference on Neural Networks (pp. 163-168). IEEE. [16] Ohlson, J. A. (1980). Financial Ratios and the Probabilistic Prediction of Bankruptcy. Journal of Accounting Research, 109-131. [17] Tsai, F. T., Lu, H. M., & Hung, M. W. (2016). The Impact of News Articles and Corporate Disclosure on Credit Risk Valuation. Journal of Banking & Finance, 68, 100-116. [18] Tsai, M. F., & Wang, C. J. (2017). On the Risk Prediction and Analysis of Soft Information in Finance Reports. European Journal of Operational Research, 257(1), 243-250. [19] Salton, G., & McGill, M. J. (1983). Introduction to Modern Information Retrieval. Mcgraw-hill. [20] Schwarz, G. (1978). Estimating the Dimension of a Model. The Annals of Statistics, 6(2), 461-464. [21] Shumway, T. (2001). Forecasting Bankruptcy More Accurately: A Simple Hazard Model. The Journal of Business, 74(1), 101-124. [22] Timmermans, M., & Finance, M. (2014). US Corporate Bankruptcy Predicting Models (Doctoral Dissertation, Master’s thesis.[online]. Tilburg University, Tilburg. Available from: http://arno. uvt. nl/show. cgi). [23] Zmijewski, M. E. (1984). Methodological Issues Related to the Estimation of Financial Distress Prediction Models. Journal of Accounting Research, 59-82.
描述	碩士國立政治大學金融學系 106352002
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0106352002
資料類型	thesis

dc.contributor.advisor	江彌修	zh_TW
dc.contributor.advisor	Chiang, Mi-Hsiu	en_US
dc.contributor.author (Authors)	許哲維	zh_TW
dc.contributor.author (Authors)	Hsu, Che-Wei	en_US
dc.creator (作者)	許哲維	zh_TW
dc.creator (作者)	Hsu, Che-Wei	en_US
dc.date (日期)	2019	en_US
dc.date.accessioned	6-Dec-2019 09:25:54 (UTC+8)	-
dc.date.available	6-Dec-2019 09:25:54 (UTC+8)	-
dc.date.issued (上傳時間)	6-Dec-2019 09:25:54 (UTC+8)	-
dc.identifier (Other Identifiers)	G0106352002	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/127745	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	金融學系	zh_TW
dc.description (描述)	106352002	zh_TW
dc.description.abstract (摘要)	近年來，利用文字探勘進行文本資訊的特徵提取愈來愈便捷，許多研究逐漸運用文字探勘的技術，結合企業相關的新聞內容或公司發布的消息等文本資料，應用於金融與會計領域的研究，希望透過文字本身隱含的情緒萃取出更精準且即時的訊息，以增強模型的解釋能力、預測能力及結果的穩定程度。本研究以主題模型中的隱含狄利克雷分布LDA（Latent Dirichlet Allocation），將10-K財報的文本資訊透過模型進行主題的分類，觀察和風險有關的主題之下的字詞經由標準化後形成的變數是否能有效增加破產模型預測的準確度。根據實證結果，以10-K財報建立LDA主題分類後，選取和風險攸關的字詞之頻率進行檢驗，並標準化形成風險攸關主題變數後，發現無論是使用Logit模型或是Probit模型，納入風險攸關主題變數皆能夠提升美國企業破產預測的結果。	zh_TW
dc.description.abstract (摘要)	In recent years, as it is way less time-consuming to apply text mining techniques, more researchers have made efforts to extract certain characteristics from soft data by combining text mining techniques with their own field of expertise to further capture real-time text information and improve their research as well. However, there is little research focusing on topic modeling and consideration of latent topics existing in every document in the field of finance. In this research, LDA topic modeling, a fashion to perform latent semantic analysis, is applied to categorize soft information from 10-K financial reports into several topics. The ultimate goal in this research is to analyze whether the standardization of word frequencies of the words under risk-related topics could improve corporate bankruptcy predicting accuracy. According to the empirical results, when using risk-related topic variable after enforcing LDA topic modeling and further transforming the outcome to a standardized variable in the model, the U.S. corporate bankruptcy predicting accuracy during the time period from 1998 to 2017 is improved under both Logit and Probit models.	en_US
dc.description.tableofcontents	第一章緒論 1 第一節研究動機與背景 1 第二節研究目的 2 第二章文獻探討 4 第一節破產預測相關研究 4 第二節文字探勘與主題模型 6 第三節 10-K財報應用於文字探勘之相關文獻 9 第三章研究方法 10 第一節主題模型LDA 10 第二節破產預測模型 12 第三節模型績效衡量 14 第四章資料來源與處理過程 18 第一節 Altman、Ohlson與Zmijewski變數 18 第二節 LDA模型之下的風險指標 21 第三節變數選取 29 第五章實證分析 33 第一節破產預測模型建立 33 第二節模型績效評估 37 第六章結論與建議 44 參考文獻 46	zh_TW
dc.format.extent	1975464 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0106352002	en_US
dc.subject (關鍵詞)	LDA	zh_TW
dc.subject (關鍵詞)	主題模型	zh_TW
dc.subject (關鍵詞)	企業破產預警	zh_TW
dc.subject (關鍵詞)	10-K財報	zh_TW
dc.subject (關鍵詞)	LDA	en_US
dc.subject (關鍵詞)	Topic modeling	en_US
dc.subject (關鍵詞)	Corporate bankruptcy prediction	en_US
dc.subject (關鍵詞)	10-K	en_US
dc.title (題名)	應用LDA主題模型於美國企業破產預測之研究	zh_TW
dc.title (題名)	Applying LDA Topic Modeling to U.S. Corporate Bankruptcy Prediction	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	[1] Altman, E. I. (1968). Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy. The journal of finance, 23(4), 589-609. [2] Aziz, S., Dowling, M. M., Hammami, H., & Piepenbrink, A. (2019). Machine Learning in Finance: A Topic Modeling Approach. Available at SSRN 3327277. [3] Beaver, W. H., McNichols, M. F., & Rhie, J. W. (2005). Have Financial Statements Become Less Informative? Evidence from the Ability of Financial Ratios to Predict Bankruptcy. Review of Accounting studies, 10(1), 93-122. [4] Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3(Jan), 993-1022. [5] Bodnaruk, A., Loughran, T., & McDonald, B. (2015). Using 10-K Text to Gauge Financial Constraints. Journal of Financial and Quantitative Analysis, 50(4), 623-646. [6] Crosbie, P. J., & Bohn, J. R. (1999). Modeling Default Risk (KMV LLC). [7] Dyer, T., Lang, M., & Stice-Lawrence, L. (2017). The Evolution of 10-K Textual Disclosure: Evidence from Latent Dirichlet Allocation. Journal of Accounting and Economics, 64(2-3), 221-245. [8] Edison, H., & Carcel, H. (2019). Text Data Analysis Using Latent Dirichlet Allocation: An Application to FOMC Transcripts (No. 11). Bank of Lithuania. [9] Griffiths, T. L., & Steyvers, M. (2004). Finding Scientific Topics. Proceedings of the National Academy of Sciences, 101(suppl 1), 5228-5235. [10] Hansen, S., McMahon, M., & Prat, A. (2017). Transparency and Deliberation Within the FOMC: a Computational Linguistics Approach. The Quarterly Journal of Economics, 133(2), 801-870. [11] Hofmann, T. (1999, July). Probabilistic Latent Semantic Analysis. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (pp. 289-296). Morgan Kaufmann Publishers Inc. [12] Loughran, T., & McDonald, B. (2011). When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‐Ks. The Journal of Finance, 66(1), 35-65. [13] Merton, R. C. (1974). On the Pricing of Corporate Debt: The Risk Structure of Interest Rates. The Journal of finance, 29(2), 449-470. [14] Moro, S., Cortez, P., & Rita, P. (2015). Business Intelligence in Banking: A literature Analysis from 2002 to 2013 Using Text Mining and Latent Dirichlet Allocation. Expert Systems with Applications, 42(3), 1314-1324. [15] Odom, M. D., & Sharda, R. (1990, June). A Neural Network Model for Bankruptcy Prediction. In 1990 IJCNN International Joint Conference on Neural Networks (pp. 163-168). IEEE. [16] Ohlson, J. A. (1980). Financial Ratios and the Probabilistic Prediction of Bankruptcy. Journal of Accounting Research, 109-131. [17] Tsai, F. T., Lu, H. M., & Hung, M. W. (2016). The Impact of News Articles and Corporate Disclosure on Credit Risk Valuation. Journal of Banking & Finance, 68, 100-116. [18] Tsai, M. F., & Wang, C. J. (2017). On the Risk Prediction and Analysis of Soft Information in Finance Reports. European Journal of Operational Research, 257(1), 243-250. [19] Salton, G., & McGill, M. J. (1983). Introduction to Modern Information Retrieval. Mcgraw-hill. [20] Schwarz, G. (1978). Estimating the Dimension of a Model. The Annals of Statistics, 6(2), 461-464. [21] Shumway, T. (2001). Forecasting Bankruptcy More Accurately: A Simple Hazard Model. The Journal of Business, 74(1), 101-124. [22] Timmermans, M., & Finance, M. (2014). US Corporate Bankruptcy Predicting Models (Doctoral Dissertation, Master’s thesis.[online]. Tilburg University, Tilburg. Available from: http://arno. uvt. nl/show. cgi). [23] Zmijewski, M. E. (1984). Methodological Issues Related to the Estimation of Financial Distress Prediction Models. Journal of Accounting Research, 59-82.	zh_TW
dc.identifier.doi (DOI)	10.6814/NCCU201901264	en_US

學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

Google Scholar^TM