學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

題名 基於集成學習框架之信用違約預測-以信用卡客戶為例
The Credit Default Prediction Based on Ensemble Learning-The Case of Credit Card Customers
作者 陳靜怡
Chen, Ching-Yi
貢獻者 江彌修
Chiang, Mi-Hsiu
陳靜怡
Chen, Ching-Yi
關鍵詞 信用風險
違約風險
信用卡客戶
集成學習
機器學習
Credit risk
Default risk
Credit card clients
Ensemble learning
Machine learning
日期 2019
上傳時間 1-Jul-2019 10:47:37 (UTC+8)
摘要 信用風險為金融機構最主要的風險來源之一,意指交易對手或借款者發生違約的風險。本研究基於Blending與Stacking集成學習框架,建構信用卡客戶違約風險預警模型,預測既有客戶未來違約的可能性,藉此在客戶發生違約行為之前,能先採取相關因應措施,並以單一模型之預測表現為基準進行比較。本研究以國內某大型銀行之信用卡客戶為研究對象,樣本資料期間為2005年4月至9月,包含信用卡持有人於這段期間的刷卡消費金額、付款金額、違約紀錄等交易相關資訊,與持有人之個人資訊。除了對原始資料進行資料前處理與特徵工程,本研究亦使用合成少數類過取樣技術 (SMOTE) 處理資料類別不平衡的情況。本研究採用實務上較適合評估信用風險的指標,如型二誤差、ROC曲線下方面積值 (AUC) 等,作為衡量模型成效的標準。實證結果顯示,相較於單一模型、以及Blending集成框架,經由Stacking集成框架所建構的模型在上述評估指標的衡量下之預測表現最好,驗證集成學習具有效提升模型成效的特性,但前提為在挑選集成框架中第一層分類器的模型時,必須考慮下列準則, (1) 各個模型間最好具差異性, (2) 各個模型的預測表現不能相差太大。
Credit risk is the risk of default on a debt that may arise from a borrower or counterparty failing to make required payment, which has been the main source of risk in most financial institutions. The purpose of this research is to construct an ensemble-learning-based credit risk model, especially based on Blending and Stacking approaches, for credit card default payment prediction. Financial institutions can take countermeasures to avoid losses due to existing customers with default payments, with the help of default alerts provided by our model. We also benchmark the performance of ensemble models against their base classifiers. This paper uses payment data in October, 2005, from an important bank in Taiwan and the targets are existing credit card holders of the bank. Our customer data include the amount of bill statement and previous payment, the past monthly payment records, and personal information etc. In addition to data preprocessing and feature engineering, we conduct Synthetic Minority Oversampling Technique (SMOTE) to deal with our imbalanced data. We use three evaluation metrics that are applicable to credit risk management in practice, such as Type II error, F_1-score, and the value of area under ROC curve, to evaluate the performance of these classification models. The results show that the classification model built based on Stacking approach outperforms base classifiers and Blending approach. The experimental evaluation also shows that ensemble learning has the potential to improve overall classification performance effectively under the premise of the base classifiers generated with high diversity and local accuracy.
參考文獻 中文文獻
[1] 林萍珍、柯博昌、游俊忠 (2010),演化式多重組合羅吉斯迴歸模型—應用於信用評等,資訊管理學報,第十七卷第二期,頁115-140。
[2] 林榮禾,陳奕昌 (2008),利用資料探勘技術建構整合型信用評等模型,國立臺北科技大學商業自動化與管理研究所碩士論文。
[3] 柯柏成、孫玉清 (2014),信用風險衡量模式之探討,證券櫃檯月刊170期,103年4月號,頁98-105。
[4] 洪智力,陳勁宏 (2007),破產預測選擇性集成模型比較,中原大學資訊管理學系會議論文。
[5] 黃焜烽 (2018),利用深度類神經網路模型預測台灣股價指數走勢,國立臺北大學金融與合作經營學系碩士論文。
[6] 楊東翰 (2019),深度校準:以G2++利率模型為例,國立政治大學金融研究所碩士論文。
[7] 鍾經樊、黃嘉龍、黃博怡、謝有隆 (2006),台灣地區企業信用評分系統的建置、驗證和比較,中央研究院經濟研究所。
英文文獻
[8] Barr, R.S., Helgason, R.V., Kennington, J.L., eds. (1997), Interfaces in Computer Science and Operations Research: Advances in Metaheuristics, Optimization, and Stochastic Modeling Technologies, Springer Publishing.
[9] Desai, Y.S., Crook, J.N. & Overstreet, G.A. (1996), “A comparison of neural networks and linear scoring models in the credit environment,” European Journal of Operations Research, 85: 24–37.
[10] Dietterich T.G. (2000), “Ensemble methods in machine learning,” Proceedings of the First International Workshop on Multiple Classifier Systems, pp. 1-15.
[11] Sarkar, D., Natarajan, V. (2019), “Ensemble Machine Learning Cookbook: Over 35 practical recipes to explore ensemble machine learning techniques using Python,” PACKT Publishing, pp.188.
[12] Foreman, R. D. (2003), “A Logistic Analysis of Bankruptcy within the US Local Telecommunications Industry,” Journal of Economics and Business, (55:2), pp. 135-166.
[13] Koh, H.C., Tan, W.C., Goh, C.P. (2006), “A Two-step Method to Construct Credit Scoring Models with Data Mining Techniques,” International Journal of Business and Information, Volume 1, pp. 96-118.
[14] He, H., Member, IEEE, and Edwardo A.G. (2009), “Learning from Imbalanced Data,” IEEE Transactions on Knowledge and Data Engineering, VOL. 21, NO. 9.
[15] Ince, H., Aktan, B. (2008), “A comparison of data mining techniques for credit scoring in banking: A managerial perspective,” Journal of Business Economics and Management, 10(3): 233-240.
[16] Yeh, I-C., Li, J.-W., Lee, Y.-S. & Ting, T.-M. (2010), “Can the Risk Probability of Credit Card Customers be Estimated?” Journal of Information Technology and Applications.
[17] Kacprzyk, J., Pedrycz, W. (2015), Handbook of Computational Intelligence, Springer Publishing.
[18] Kingma, D.T., Ba, J. (2014), “Adam: A Method For Stochastic Optimization,” arXiv:1412.6980[cs.LG].
[19] Lo, A.W. (1986), “Logit Versus Discriminant Analysis-A Specification Test and Application to Corporate Bankruptcies,” Journal of Econometrics, Vol. 31, pp.151-178.
[20] Ribeiro, M.T., Singh S. & Guestrin C. (2016), “Why Should I Trust You? Explaining the Predictions of Any Classifier,” KDD.
[21] Ohlson, J. A. (1980), “Financial ratios and the probabilistic prediction of bankruptcy,” Journal of Accounting Research, 18, pp.109-131.
[22] Peng, R.-Z. (2017), “Personal Credit Assessment Model Based on Stacking Ensemble Learning Algorithm,” Statistics and Application, 6(4), pp. 441-417.
[23] Dzˇeroski, S., Zˇenko, B. (2004), “Is Combining Classifiers with Stacking Better than Selecting the Best One?” Kluwer Academic Publishers, pp.255-273.
[24] West, D., Dellana, S & Qian, J. (2005), “Neural network ensemble strategies for financial decision applications,” Computers & Operations Research, Vol. 32, pp. 2543-2559.
[25] Wolpert, D. (1992), “Stacked generalization,” Neural Networks, Volume 5, Issue 2, pp.241-259.
[26] Tounsi, Y., Hassouni, L., & Anoun, H. (2018), “An Enhanced Comparative Assessment of Ensemble Learning for Credit Scoring,” International Journal of Machine Learning and Computing, Volume 8, No.5.
描述 碩士
國立政治大學
金融學系
106352010
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0106352010
資料類型 thesis
dc.contributor.advisor 江彌修zh_TW
dc.contributor.advisor Chiang, Mi-Hsiuen_US
dc.contributor.author (Authors) 陳靜怡zh_TW
dc.contributor.author (Authors) Chen, Ching-Yien_US
dc.creator (作者) 陳靜怡zh_TW
dc.creator (作者) Chen, Ching-Yien_US
dc.date (日期) 2019en_US
dc.date.accessioned 1-Jul-2019 10:47:37 (UTC+8)-
dc.date.available 1-Jul-2019 10:47:37 (UTC+8)-
dc.date.issued (上傳時間) 1-Jul-2019 10:47:37 (UTC+8)-
dc.identifier (Other Identifiers) G0106352010en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/124140-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 金融學系zh_TW
dc.description (描述) 106352010zh_TW
dc.description.abstract (摘要) 信用風險為金融機構最主要的風險來源之一,意指交易對手或借款者發生違約的風險。本研究基於Blending與Stacking集成學習框架,建構信用卡客戶違約風險預警模型,預測既有客戶未來違約的可能性,藉此在客戶發生違約行為之前,能先採取相關因應措施,並以單一模型之預測表現為基準進行比較。本研究以國內某大型銀行之信用卡客戶為研究對象,樣本資料期間為2005年4月至9月,包含信用卡持有人於這段期間的刷卡消費金額、付款金額、違約紀錄等交易相關資訊,與持有人之個人資訊。除了對原始資料進行資料前處理與特徵工程,本研究亦使用合成少數類過取樣技術 (SMOTE) 處理資料類別不平衡的情況。本研究採用實務上較適合評估信用風險的指標,如型二誤差、ROC曲線下方面積值 (AUC) 等,作為衡量模型成效的標準。實證結果顯示,相較於單一模型、以及Blending集成框架,經由Stacking集成框架所建構的模型在上述評估指標的衡量下之預測表現最好,驗證集成學習具有效提升模型成效的特性,但前提為在挑選集成框架中第一層分類器的模型時,必須考慮下列準則, (1) 各個模型間最好具差異性, (2) 各個模型的預測表現不能相差太大。zh_TW
dc.description.abstract (摘要) Credit risk is the risk of default on a debt that may arise from a borrower or counterparty failing to make required payment, which has been the main source of risk in most financial institutions. The purpose of this research is to construct an ensemble-learning-based credit risk model, especially based on Blending and Stacking approaches, for credit card default payment prediction. Financial institutions can take countermeasures to avoid losses due to existing customers with default payments, with the help of default alerts provided by our model. We also benchmark the performance of ensemble models against their base classifiers. This paper uses payment data in October, 2005, from an important bank in Taiwan and the targets are existing credit card holders of the bank. Our customer data include the amount of bill statement and previous payment, the past monthly payment records, and personal information etc. In addition to data preprocessing and feature engineering, we conduct Synthetic Minority Oversampling Technique (SMOTE) to deal with our imbalanced data. We use three evaluation metrics that are applicable to credit risk management in practice, such as Type II error, F_1-score, and the value of area under ROC curve, to evaluate the performance of these classification models. The results show that the classification model built based on Stacking approach outperforms base classifiers and Blending approach. The experimental evaluation also shows that ensemble learning has the potential to improve overall classification performance effectively under the premise of the base classifiers generated with high diversity and local accuracy.en_US
dc.description.tableofcontents 第一章 緒論 1
第二章 文獻回顧 6
第一節 傳統信用違約預測方法 6
第二節 機器學習與集成學習之應用 7
第三章 研究方法 10
第一節 研究資料分析 10
第二節 資料處理與特徵工程 (FEATURE ENGINEERING) 16
第三節 資料集切割 19
第四節 類別不平衡問題處理 20
第五節 模型建立 22
第六節 模型預測能力衡量指標 45
第四章 實證結果 51
第一節 模型參數配置 51
第二節 單一模型預測表現 54
第三節 集成模型預測表現之探討 59
第五章 研究結論與建議 70
第一節 研究結論 70
第二節 未來建議 71
參考文獻 72
zh_TW
dc.format.extent 3744808 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0106352010en_US
dc.subject (關鍵詞) 信用風險zh_TW
dc.subject (關鍵詞) 違約風險zh_TW
dc.subject (關鍵詞) 信用卡客戶zh_TW
dc.subject (關鍵詞) 集成學習zh_TW
dc.subject (關鍵詞) 機器學習zh_TW
dc.subject (關鍵詞) Credit risken_US
dc.subject (關鍵詞) Default risken_US
dc.subject (關鍵詞) Credit card clientsen_US
dc.subject (關鍵詞) Ensemble learningen_US
dc.subject (關鍵詞) Machine learningen_US
dc.title (題名) 基於集成學習框架之信用違約預測-以信用卡客戶為例zh_TW
dc.title (題名) The Credit Default Prediction Based on Ensemble Learning-The Case of Credit Card Customersen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) 中文文獻
[1] 林萍珍、柯博昌、游俊忠 (2010),演化式多重組合羅吉斯迴歸模型—應用於信用評等,資訊管理學報,第十七卷第二期,頁115-140。
[2] 林榮禾,陳奕昌 (2008),利用資料探勘技術建構整合型信用評等模型,國立臺北科技大學商業自動化與管理研究所碩士論文。
[3] 柯柏成、孫玉清 (2014),信用風險衡量模式之探討,證券櫃檯月刊170期,103年4月號,頁98-105。
[4] 洪智力,陳勁宏 (2007),破產預測選擇性集成模型比較,中原大學資訊管理學系會議論文。
[5] 黃焜烽 (2018),利用深度類神經網路模型預測台灣股價指數走勢,國立臺北大學金融與合作經營學系碩士論文。
[6] 楊東翰 (2019),深度校準:以G2++利率模型為例,國立政治大學金融研究所碩士論文。
[7] 鍾經樊、黃嘉龍、黃博怡、謝有隆 (2006),台灣地區企業信用評分系統的建置、驗證和比較,中央研究院經濟研究所。
英文文獻
[8] Barr, R.S., Helgason, R.V., Kennington, J.L., eds. (1997), Interfaces in Computer Science and Operations Research: Advances in Metaheuristics, Optimization, and Stochastic Modeling Technologies, Springer Publishing.
[9] Desai, Y.S., Crook, J.N. & Overstreet, G.A. (1996), “A comparison of neural networks and linear scoring models in the credit environment,” European Journal of Operations Research, 85: 24–37.
[10] Dietterich T.G. (2000), “Ensemble methods in machine learning,” Proceedings of the First International Workshop on Multiple Classifier Systems, pp. 1-15.
[11] Sarkar, D., Natarajan, V. (2019), “Ensemble Machine Learning Cookbook: Over 35 practical recipes to explore ensemble machine learning techniques using Python,” PACKT Publishing, pp.188.
[12] Foreman, R. D. (2003), “A Logistic Analysis of Bankruptcy within the US Local Telecommunications Industry,” Journal of Economics and Business, (55:2), pp. 135-166.
[13] Koh, H.C., Tan, W.C., Goh, C.P. (2006), “A Two-step Method to Construct Credit Scoring Models with Data Mining Techniques,” International Journal of Business and Information, Volume 1, pp. 96-118.
[14] He, H., Member, IEEE, and Edwardo A.G. (2009), “Learning from Imbalanced Data,” IEEE Transactions on Knowledge and Data Engineering, VOL. 21, NO. 9.
[15] Ince, H., Aktan, B. (2008), “A comparison of data mining techniques for credit scoring in banking: A managerial perspective,” Journal of Business Economics and Management, 10(3): 233-240.
[16] Yeh, I-C., Li, J.-W., Lee, Y.-S. & Ting, T.-M. (2010), “Can the Risk Probability of Credit Card Customers be Estimated?” Journal of Information Technology and Applications.
[17] Kacprzyk, J., Pedrycz, W. (2015), Handbook of Computational Intelligence, Springer Publishing.
[18] Kingma, D.T., Ba, J. (2014), “Adam: A Method For Stochastic Optimization,” arXiv:1412.6980[cs.LG].
[19] Lo, A.W. (1986), “Logit Versus Discriminant Analysis-A Specification Test and Application to Corporate Bankruptcies,” Journal of Econometrics, Vol. 31, pp.151-178.
[20] Ribeiro, M.T., Singh S. & Guestrin C. (2016), “Why Should I Trust You? Explaining the Predictions of Any Classifier,” KDD.
[21] Ohlson, J. A. (1980), “Financial ratios and the probabilistic prediction of bankruptcy,” Journal of Accounting Research, 18, pp.109-131.
[22] Peng, R.-Z. (2017), “Personal Credit Assessment Model Based on Stacking Ensemble Learning Algorithm,” Statistics and Application, 6(4), pp. 441-417.
[23] Dzˇeroski, S., Zˇenko, B. (2004), “Is Combining Classifiers with Stacking Better than Selecting the Best One?” Kluwer Academic Publishers, pp.255-273.
[24] West, D., Dellana, S & Qian, J. (2005), “Neural network ensemble strategies for financial decision applications,” Computers & Operations Research, Vol. 32, pp. 2543-2559.
[25] Wolpert, D. (1992), “Stacked generalization,” Neural Networks, Volume 5, Issue 2, pp.241-259.
[26] Tounsi, Y., Hassouni, L., & Anoun, H. (2018), “An Enhanced Comparative Assessment of Ensemble Learning for Credit Scoring,” International Journal of Machine Learning and Computing, Volume 8, No.5.
zh_TW
dc.identifier.doi (DOI) 10.6814/NCCU201900090en_US