應用大數據於信用評等之模型探討

Publications-Theses

Article View/Open

pdf(18)

Publication Export

Google Scholar^TM

題名	應用大數據於信用評等之模型探討 The Application of Big Data on Credit Scoring Model
作者	林瑀甯
貢獻者	鄭宇庭<br>郭訓志 Cheng, Yu Ting<br>Kuo, Hsun Chih 林瑀甯
關鍵詞	信用風險羅吉斯迴歸信用評等模型 Credit risk Logistic regression Credit scoring model
日期	2018
上傳時間	1-Jun-2018 17:33:42 (UTC+8)
摘要	信用風險或信用違約意旨金融機構提供給客戶服務卻未得償還的機率，故其在銀行信貸決策的領域是常被鑽研的對象，因為其對於金融機構所扮演的角色尤其重要，對商業銀行來說更是常難以解釋或控制，然而拜現今進步的科技所賜，金融機構可以藉由操控較過去低的成本即可進一步發展強健且精煉的系統與模型去做預測還有信用風險的控管，有鑑於對客戶的評分自大數據時代來臨起，即使是學生亦開始有了可以評鑑的痕跡，憑藉前人所實驗或仰賴的基本考量面向如客戶基本資料、財力狀況或是其於該公司今昔的借貸訊息，再輔以藉由開放資料所帶來的資訊，發想可能影響信用違約率的變數如外在規範對該客戶的紀錄，想驗證是否真有尚可開發的方向，若有則其影響可以到多深。眾所皆知從過去到現在即有很多種方法被開創以及提出以預測信用違約率，當然所使用的方法和金融機構本身的複雜性、規模大小以及信貸類型有關，最常見的有判別分析，但其對於變數有嚴格的假設，而新興的方法神經網路可以克服判別分析的缺陷且預測的效能也不錯，但神經網路只給予預測結果而運算過程是未知的，對於想要了解變數間的關係無濟於事，故還是選擇從可以對二元分類做預測亦可以藉由模型係數看到應變數和自變數間關係的羅吉斯迴歸方法著手，而研究過程即是依著前人對於羅吉斯迴歸在信用風險上的繩索摸索，將資料如何清理、變數如何轉換、模型如何建立以及最後如何篩選做一個完整的陳述，縱然長道漫漫，對於研究假設在結果終得驗證也始見曙光，考慮的新面向確有其影響力，而在模型係數上也看到其影響的大小，為了更彰顯羅吉斯迴歸對於變數間提供的訊息，故在最後將研究結果以較文字易讀的視覺化方式作呈現。 Credit risk or credit default means the probability of non-repayment that banks or financial institutions get after they provide services to their customers. Credit risk is also studied intensively in the field of bank lending strategy because it’s usually hard to interpret and control. However, thanks to advanced technology nowadays, banks can manipulate reduced cost to develop robust and well-trained system and models so as to predict and mange credit risk. In the light of the score on customers from the beginning of big data era, every single one can be tracked to assess even though he or she is student. Relying on common facets like personal information, financial statement and past relationship of loan in a specific bank, come up with possible variables like regulations which influence credit risk according to information from open data. Try to verify if there is a new aspect of modeling and how far it effects. As everyone knows, there are several created and offered methodologies in order to predict credit default. They differ from complexity of banks and institutions, size and type of loan. One of the most popular method is discriminant analysis, but variables are restricted to its assumption. Neural network can fix the flaws of the assumption and work efficiently. Considering the unknown process of calculation in neural network, choose logistic regression as research method which can see the relationship between variables and predict the binary category. With the posterior research on credit risk, make a complete statement about how to clean data, how to transform variables and how to build or screen models. Although the procedure is complicated, the result of this study still validates original hypothesis that new aspect indeed has an impact on credit risk and the coefficient shows how deep it affects.
參考文獻	一、中文文獻 1. 向暉，2011，個人信用評分組合模型研究與應用 (Doctoral dissertation, 博士學位論文]. 長沙: 湖南大學)。 2. 沈俊誠、唐麗英，2003，整合金融機構風險評估與信用評等模式之研究 (Doctoral dissertation)。 3. 林宗勳，2006，Support Vector Machines 簡介。 4. 劉應興，1996，類別資料分析導論. 台北市: 華泰文化事業公司。 5. 鍾經樊、黃嘉龍、黃博怡、謝有隆，2006，台灣地區企業信用評分系統的建置, 驗證和比較. 經濟論文, 34(4), 541-590。二、英文文獻 1. Caire, D., Barton, S., Zubiria, A., Alexiev, Z., Dyer, J., Bundred, F. & Brislin, N. 2006. A Handbook for developing credit scoring systems in a microfinance context. United States Agency for International Development. 2. Fawcett, T. 2006. An introduction to ROC analysis. Pattern recognition letters, 27(8), p. 861-874. 3. Hagan, M. T., Demuth, H. B. & Beale, M. H. 1996. Neural network design (Vol. 20). Boston: Pws Pub.. 4. Hosmer Jr, D. W., Lemeshow, S. & Sturdivant, R. X. 2013. Applied logistic regression (Vol. 398). John Wiley & Sons. 5. Keenan, S. & Sobehart, J. R. 1999. Performance measures for credit risk models. Moody’s Risk Management Services. 6. Massey , Jr F. J. 1951. The Kolmogorov-Smirnov test for goodness of fit. Journal of the American statistical Association, 46(253), p. 68-78. 7. Mays, E. (Ed.). 2001. Handbook of credit scoring. Global Professional Publish. 8. Moore, D. S., McCabe, G. P., & Craig, B. A. 2009. Introduction to the Practice of Statistics (p. 522-526). New York: WH Freeman. 9. Neter, J., Kutner, M. H., Nachtsheim, C. J. & Wasserman, W. (1996). Applied linear statistical models (Vol. 4, p. 318). Chicago: Irwin. 10. Safavian, S. R. & Landgrebe, D. 1991. A survey of decision tree classifier methodology. IEEE transactions on systems, man, and cybernetics, 21(3), p. 660-674. 11. Shannon, C. E. 1951. Prediction and entropy of printed English. Bell Labs Technical Journal, 30(1), p. 50-64. 12. Siddiqi, N. 2012. Credit risk scorecards: developing and implementing intelligent credit scoring (Vol. 3). John Wiley & Sons. 13. Wu, W., Mallet, Y., Walczak, B., Penninckx, W., Massart, D. L., Heuerding, S. & Erni, F. 1996. Comparison of regularized discriminant analysis linear discriminant analysis and quadratic discriminant analysis applied to NIR data. Analytica Chimica Acta, 329(3), p. 257-265. 14. Zou, K. H., O’Malley, A. J. & Mauri, L. 2007. Receiver-operating characteristic analysis for evaluating diagnostic tests and predictive models. Circulation, 115(5), p. 654-657.
描述	碩士國立政治大學統計學系 105354001
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0105354001
資料類型	thesis

dc.contributor.advisor	鄭宇庭<br>郭訓志	zh_TW
dc.contributor.advisor	Cheng, Yu Ting<br>Kuo, Hsun Chih	en_US
dc.contributor.author (Authors)	林瑀甯	zh_TW
dc.creator (作者)	林瑀甯	zh_TW
dc.date (日期)	2018	en_US
dc.date.accessioned	1-Jun-2018 17:33:42 (UTC+8)	-
dc.date.available	1-Jun-2018 17:33:42 (UTC+8)	-
dc.date.issued (上傳時間)	1-Jun-2018 17:33:42 (UTC+8)	-
dc.identifier (Other Identifiers)	G0105354001	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/117439	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	統計學系	zh_TW
dc.description (描述)	105354001	zh_TW
dc.description.abstract (摘要)	信用風險或信用違約意旨金融機構提供給客戶服務卻未得償還的機率，故其在銀行信貸決策的領域是常被鑽研的對象，因為其對於金融機構所扮演的角色尤其重要，對商業銀行來說更是常難以解釋或控制，然而拜現今進步的科技所賜，金融機構可以藉由操控較過去低的成本即可進一步發展強健且精煉的系統與模型去做預測還有信用風險的控管，有鑑於對客戶的評分自大數據時代來臨起，即使是學生亦開始有了可以評鑑的痕跡，憑藉前人所實驗或仰賴的基本考量面向如客戶基本資料、財力狀況或是其於該公司今昔的借貸訊息，再輔以藉由開放資料所帶來的資訊，發想可能影響信用違約率的變數如外在規範對該客戶的紀錄，想驗證是否真有尚可開發的方向，若有則其影響可以到多深。眾所皆知從過去到現在即有很多種方法被開創以及提出以預測信用違約率，當然所使用的方法和金融機構本身的複雜性、規模大小以及信貸類型有關，最常見的有判別分析，但其對於變數有嚴格的假設，而新興的方法神經網路可以克服判別分析的缺陷且預測的效能也不錯，但神經網路只給予預測結果而運算過程是未知的，對於想要了解變數間的關係無濟於事，故還是選擇從可以對二元分類做預測亦可以藉由模型係數看到應變數和自變數間關係的羅吉斯迴歸方法著手，而研究過程即是依著前人對於羅吉斯迴歸在信用風險上的繩索摸索，將資料如何清理、變數如何轉換、模型如何建立以及最後如何篩選做一個完整的陳述，縱然長道漫漫，對於研究假設在結果終得驗證也始見曙光，考慮的新面向確有其影響力，而在模型係數上也看到其影響的大小，為了更彰顯羅吉斯迴歸對於變數間提供的訊息，故在最後將研究結果以較文字易讀的視覺化方式作呈現。	zh_TW
dc.description.abstract (摘要)	Credit risk or credit default means the probability of non-repayment that banks or financial institutions get after they provide services to their customers. Credit risk is also studied intensively in the field of bank lending strategy because it’s usually hard to interpret and control. However, thanks to advanced technology nowadays, banks can manipulate reduced cost to develop robust and well-trained system and models so as to predict and mange credit risk. In the light of the score on customers from the beginning of big data era, every single one can be tracked to assess even though he or she is student. Relying on common facets like personal information, financial statement and past relationship of loan in a specific bank, come up with possible variables like regulations which influence credit risk according to information from open data. Try to verify if there is a new aspect of modeling and how far it effects. As everyone knows, there are several created and offered methodologies in order to predict credit default. They differ from complexity of banks and institutions, size and type of loan. One of the most popular method is discriminant analysis, but variables are restricted to its assumption. Neural network can fix the flaws of the assumption and work efficiently. Considering the unknown process of calculation in neural network, choose logistic regression as research method which can see the relationship between variables and predict the binary category. With the posterior research on credit risk, make a complete statement about how to clean data, how to transform variables and how to build or screen models. Although the procedure is complicated, the result of this study still validates original hypothesis that new aspect indeed has an impact on credit risk and the coefficient shows how deep it affects.	en_US
dc.description.tableofcontents	目錄 I 表目錄 II 圖目錄 IV 第壹章緒論 5 第一節研究背景與動機 5 第二節研究目的 5 第三節研究流程 6 第貳章文獻探討 7 第一節信用評等的文獻回顧 7 第二節常用的建構信用評等模型方法之概述 9 第參章研究方法 12 第一節資料來源 12 第二節研究架構 12 第三節操作性變數定義 14 第四節分析方法 18 第肆章實證分析 28 第一節探索性分析 28 第二節樣本代表性檢定（卡方適合度檢定） 30 第三節 Fine Classing & Coarse Classing 31 第四節羅吉斯迴歸分析 38 第伍章結論與建議 58 第一節結論 58 第二節建議 59 參考文獻 61	zh_TW
dc.format.extent	1380458 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0105354001	en_US
dc.subject (關鍵詞)	信用風險	zh_TW
dc.subject (關鍵詞)	羅吉斯迴歸	zh_TW
dc.subject (關鍵詞)	信用評等模型	zh_TW
dc.subject (關鍵詞)	Credit risk	en_US
dc.subject (關鍵詞)	Logistic regression	en_US
dc.subject (關鍵詞)	Credit scoring model	en_US
dc.title (題名)	應用大數據於信用評等之模型探討	zh_TW
dc.title (題名)	The Application of Big Data on Credit Scoring Model	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	一、中文文獻 1. 向暉，2011，個人信用評分組合模型研究與應用 (Doctoral dissertation, 博士學位論文]. 長沙: 湖南大學)。 2. 沈俊誠、唐麗英，2003，整合金融機構風險評估與信用評等模式之研究 (Doctoral dissertation)。 3. 林宗勳，2006，Support Vector Machines 簡介。 4. 劉應興，1996，類別資料分析導論. 台北市: 華泰文化事業公司。 5. 鍾經樊、黃嘉龍、黃博怡、謝有隆，2006，台灣地區企業信用評分系統的建置, 驗證和比較. 經濟論文, 34(4), 541-590。二、英文文獻 1. Caire, D., Barton, S., Zubiria, A., Alexiev, Z., Dyer, J., Bundred, F. & Brislin, N. 2006. A Handbook for developing credit scoring systems in a microfinance context. United States Agency for International Development. 2. Fawcett, T. 2006. An introduction to ROC analysis. Pattern recognition letters, 27(8), p. 861-874. 3. Hagan, M. T., Demuth, H. B. & Beale, M. H. 1996. Neural network design (Vol. 20). Boston: Pws Pub.. 4. Hosmer Jr, D. W., Lemeshow, S. & Sturdivant, R. X. 2013. Applied logistic regression (Vol. 398). John Wiley & Sons. 5. Keenan, S. & Sobehart, J. R. 1999. Performance measures for credit risk models. Moody’s Risk Management Services. 6. Massey , Jr F. J. 1951. The Kolmogorov-Smirnov test for goodness of fit. Journal of the American statistical Association, 46(253), p. 68-78. 7. Mays, E. (Ed.). 2001. Handbook of credit scoring. Global Professional Publish. 8. Moore, D. S., McCabe, G. P., & Craig, B. A. 2009. Introduction to the Practice of Statistics (p. 522-526). New York: WH Freeman. 9. Neter, J., Kutner, M. H., Nachtsheim, C. J. & Wasserman, W. (1996). Applied linear statistical models (Vol. 4, p. 318). Chicago: Irwin. 10. Safavian, S. R. & Landgrebe, D. 1991. A survey of decision tree classifier methodology. IEEE transactions on systems, man, and cybernetics, 21(3), p. 660-674. 11. Shannon, C. E. 1951. Prediction and entropy of printed English. Bell Labs Technical Journal, 30(1), p. 50-64. 12. Siddiqi, N. 2012. Credit risk scorecards: developing and implementing intelligent credit scoring (Vol. 3). John Wiley & Sons. 13. Wu, W., Mallet, Y., Walczak, B., Penninckx, W., Massart, D. L., Heuerding, S. & Erni, F. 1996. Comparison of regularized discriminant analysis linear discriminant analysis and quadratic discriminant analysis applied to NIR data. Analytica Chimica Acta, 329(3), p. 257-265. 14. Zou, K. H., O’Malley, A. J. & Mauri, L. 2007. Receiver-operating characteristic analysis for evaluating diagnostic tests and predictive models. Circulation, 115(5), p. 654-657.	zh_TW

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM