學術產出-學位論文
題名 | 機器學習在P2P借貸信用風險模型之應用:以Lending Club為例 Application of Machine Learning in P2P Lending Credit Risk Model - A Case of Lending Club |
作者 | 陳勃文 Chen, Po-Wen |
貢獻者 | 林士貴<br>蔡瑞煌 Lin, Shih-Kuei<br>Tsai, Rua-Huan 陳勃文 Chen, Po-Wen |
關鍵詞 | P2P借貸 類神經網路 羅吉斯迴歸 信用風險 違約預測 P2P lending Neural network Logistic regression Credit risk Default predicton |
日期 | 2018 |
上傳時間 | 31-七月-2018 13:46:04 (UTC+8) |
摘要 | 本研究使用傳統方法與機器學習方法,建立P2P借貸平台上的貸款違預測模型,並比較各種方法之績效。本研究使用美國最大的P2P借貸平台,即Lending Club所公開之數據庫。本文先就近年針對P2P貸款違約因子的研究做討論,並審視不同因子間的相關性以決定羅吉斯迴歸之自變量,並建立四個依輸入特徵區分之羅吉斯迴歸模型。在機器學習方法中,類神經網路有四個控制變因,為批次訓練量、訓練次數、隱藏層數、隱藏層神經元數,以每次控制一至兩個變因的方法,尋找類神經網路最佳的超參數組。最佳的超參數組合為激勵函數為雙取正切函數(tanh),批次訓練量為70,隱藏層神經元數為8,隱藏層1層,訓練次數至少要200次以上。最後,將羅吉斯迴歸模型、類神經網路模型及支援向量機模型做比較並將三種模型之預測結果進行統計檢定後,發現類神經網路模型之預測準確率顯著高於另外兩者。 This study uses traditional methods and machine learning methods to establish a default prediction model of loans on the P2P lending platform, and then compares the performance of various methods. This study uses the database published by Lending Club, which is the largest P2P lending platform in the United States. We first overview the research on P2P loan default factors in recent years, and inspect the correlation between different factors to determine the independent variables of logistic regression. We establish four logistic regression models based on input characteristics. In machine learning method, the neural network has four control variables, which are batch training, training times, hidden layers, neurons of hidden layer. We find the best hyper-parameter group for the network by controlling one or two variables each time. The optimal hyper-parameter combination is to set the activation function as tanh, the batch training amount as 70, the number of neurons of hidden layer as 8, and the hidden layer as 1 layer, and the times of training as 200 times at least. Finally, we compared the logistic regression model, the neural network model and the support vector machine model by doing statistical test and found that the prediction accuracy of the neural network model is significantly higher than the other two. |
參考文獻 | 中文文獻 1. 林威廷(1995)。以總體經濟因素預測股票報酬率-類神經網路與多元迴歸之比較研究。碩士論文。國立交通大學資訊管理研究所。 2. 蔡瑞煌(1995)。類神經網路概論。三民書局。 3. 賴俊霖(1996)。應用類神經網路預測國外股價指數期約。碩士論文。國立政治大學資訊管理研究所。 英文文獻 1. Bajpai, P. (2015). The 7 Best Peer-To-Peer Lending Websites (LC). Investopedia. 2. Batista, G. E., Prati, R. C., & Monard, M. C. (2004). A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD explorations newsletter, 6(1), 20-29. 3. Berger, S. C., & Gleisner, F. (2010). Emergence of financial intermediaries in electronic markets: The case of online P2P lending. Working Paper. 4. Boritz, J. E., Kennedy, D. B., & Sun, J. Y. (2007). Predicting business failures in Canada. Accounting Perspectives, 6(2), 141-165. 5. Carmichael, D. (2014). Modeling default for peer-to-peer loans. Working Paper. 6. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321-357. 7. Chen, X., Huang, B., & Ye, D. (2018). The role of punctuation in P2P lending: Evidence from China. Economic Modelling, 68, 634-643. 8. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273-297. 9. Dapp, T., Slomka, L., AG, D. B., & Hoffmann, R. (2014). Fintech–The digital (r) evolution in the financial sector. Deutsche Bank Research, Frankfurt am Main. 10. Emekter, R., Tu, Y., Jirasakuldech, B., & Lu, M. (2015). Evaluating credit risk and loan performance in online Peer-to-Peer (P2P) lending. Applied Economics, 47(1), 54-70. 11. Everett, C. R. (2015). Group membership, relationship banking and loan default risk: the case of online social lending. Working Paper. 12. Finger, R. (2013). Banks are not lending like they should, and with good reason. Forbes Business. 13. Guo, Y., Zhou, W., Luo, C., Liu, C., & Xiong, H. (2016). Instance-based credit risk assessment for investment decisions in P2P lending. European Journal of Operational Research, 249(2), 417-426. 14. Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157-1182. 15. Haykin, S. S., Haykin, S. S., Haykin, S. S., & Haykin, S. S. (2009). Neural networks and learning machines (3). Pearson, Upper Saddle River, NJ, USA. 16. Hebbs, D. G. (1949). The organization of behavior. Wiely and Sons, New York, NY, USA. 17. Hsu, C. W., Chang, C. C., & Lin, C. J. (2003). A practical guide to support vector classification. Working Paper. 18. Iyer, R., Khwaja, A. I., Luttmer, E. F., & Shue, K. (2009). Screening in new credit markets: Can individual lenders infer borrower creditworthiness in peer-to-peer lending?. Working Paper. 19. Japkowicz, N. (2000). Learning from imbalanced data sets: a comparison of various strategies. AAAI workshop on learning from imbalanced data sets, 68, 10-15. 20. Jo, T., & Japkowicz, N. (2004). Class imbalances versus small disjuncts. ACM SIGKDD Explorations Newsletter, 6(1), 40-49. 21. Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. European conference on machine learning, 137-142. 22. Klafft, M. (2008). Peer to peer lending: auctioning microcredits over the internet. Working Paper. 23. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 1097-1105. 24. Kubat, M., & Matwin, S. (1997). Addressing the curse of imbalanced training sets: one-sided selection. International conference on machine learning, 97, 179-186. 25. Laurikkala, J. (2001). Improving identification of difficult small classes by balancing class distribution. Conference on Artificial Intelligence in Medicine in Europe, 63-66. 26. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436. 27. Li, Z., Yao, X., Wen, Q., & Yang, W. (2016). Prepayment and Default of Consumer Loans in Online Lending. Working Paper. 28. Lin, M., Prabhala, N. R., & Viswanathan, S. (2013). Judging borrowers by the company they keep: Friendship networks and information asymmetry in online peer-to-peer lending. Management Science, 59(1), 17-35. 29. Ling, C. X., & Li, C. (1998). Data mining for direct marketing: Problems and solutions. Kdd, 98, 73-79. 30. Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier nonlinearities improve neural network acoustic models. Proc. International conference on machine learning, 30(1), 3. 31. Mateescu, A. (2015). Peer-to-Peer Lending. Data & Society, 1-23. 32. Milne, A., & Parboteeah, P. (2016). The business models and economics of peer-to-peer lending. ECRI Research Reports, 17. 33. Nickolas, S. (2015).What is the difference between moral hazard and adverse selection? Retrieved April 24 2015 from Investopedia: http://www.investo pedia.com/ask/answers/042415/what-difference-between-moral-hazard-and-adve rse-selection.asp 34. Odom, M. D., & Sharda, R. (1990). A neural network model for bankruptcy prediction. Neural Networks, 1990., 1990 IJCNN International Joint Conference, 163-168. 35. Phua, C., Alahakoon, D., & Lee, V. (2004). Minority report in fraud detection: classification of skewed data. ACM SIGKDD explorations newsletter, 6(1), 50-59. 36. Pontil, M., & Verri, A. (1998). Support vector machines for 3D object recognition. IEEE transactions on pattern analysis and machine intelligence, 20(6), 637-646. 37. Rind, V. (2016). Pros and Cons of Peer-To-Peer Lending. Retrieved April 26 2016 from GoBankingRates: https://www.gobankingrates.com/personal-finance/5-perks-peer-to-peer-lending/ 38. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533. 39. Serrano-Cinca, C., Gutierrez-Nieto, B., & López-Palacios, L. (2015). Determinants of default in P2P lending. PloS one, 10(10), e0139427. 40. Venkatasubramanian, V., & Chan, K. (1989). A neural network methodology for process fault diagnosis. AIChE Journal, 35(12), 1993-2002. 41. Woodruff, M. (2014). Here`s what you need to know before taking out a peer-to-peer loan. Retrieved August 29 2014 from Yahoo Finance: http:// finance.yahoo.com/news/what-is-peer-to-peer-lending-173019140.html 42. Xu, B., Wang, N., Chen, T., & Li, M. (2015). Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853. |
描述 | 碩士 國立政治大學 金融學系 105352033 |
資料來源 | http://thesis.lib.nccu.edu.tw/record/#G0105352033 |
資料類型 | thesis |
dc.contributor.advisor | 林士貴<br>蔡瑞煌 | zh_TW |
dc.contributor.advisor | Lin, Shih-Kuei<br>Tsai, Rua-Huan | en_US |
dc.contributor.author (作者) | 陳勃文 | zh_TW |
dc.contributor.author (作者) | Chen, Po-Wen | en_US |
dc.creator (作者) | 陳勃文 | zh_TW |
dc.creator (作者) | Chen, Po-Wen | en_US |
dc.date (日期) | 2018 | en_US |
dc.date.accessioned | 31-七月-2018 13:46:04 (UTC+8) | - |
dc.date.available | 31-七月-2018 13:46:04 (UTC+8) | - |
dc.date.issued (上傳時間) | 31-七月-2018 13:46:04 (UTC+8) | - |
dc.identifier (其他 識別碼) | G0105352033 | en_US |
dc.identifier.uri (URI) | http://nccur.lib.nccu.edu.tw/handle/140.119/119092 | - |
dc.description (描述) | 碩士 | zh_TW |
dc.description (描述) | 國立政治大學 | zh_TW |
dc.description (描述) | 金融學系 | zh_TW |
dc.description (描述) | 105352033 | zh_TW |
dc.description.abstract (摘要) | 本研究使用傳統方法與機器學習方法,建立P2P借貸平台上的貸款違預測模型,並比較各種方法之績效。本研究使用美國最大的P2P借貸平台,即Lending Club所公開之數據庫。本文先就近年針對P2P貸款違約因子的研究做討論,並審視不同因子間的相關性以決定羅吉斯迴歸之自變量,並建立四個依輸入特徵區分之羅吉斯迴歸模型。在機器學習方法中,類神經網路有四個控制變因,為批次訓練量、訓練次數、隱藏層數、隱藏層神經元數,以每次控制一至兩個變因的方法,尋找類神經網路最佳的超參數組。最佳的超參數組合為激勵函數為雙取正切函數(tanh),批次訓練量為70,隱藏層神經元數為8,隱藏層1層,訓練次數至少要200次以上。最後,將羅吉斯迴歸模型、類神經網路模型及支援向量機模型做比較並將三種模型之預測結果進行統計檢定後,發現類神經網路模型之預測準確率顯著高於另外兩者。 | zh_TW |
dc.description.abstract (摘要) | This study uses traditional methods and machine learning methods to establish a default prediction model of loans on the P2P lending platform, and then compares the performance of various methods. This study uses the database published by Lending Club, which is the largest P2P lending platform in the United States. We first overview the research on P2P loan default factors in recent years, and inspect the correlation between different factors to determine the independent variables of logistic regression. We establish four logistic regression models based on input characteristics. In machine learning method, the neural network has four control variables, which are batch training, training times, hidden layers, neurons of hidden layer. We find the best hyper-parameter group for the network by controlling one or two variables each time. The optimal hyper-parameter combination is to set the activation function as tanh, the batch training amount as 70, the number of neurons of hidden layer as 8, and the hidden layer as 1 layer, and the times of training as 200 times at least. Finally, we compared the logistic regression model, the neural network model and the support vector machine model by doing statistical test and found that the prediction accuracy of the neural network model is significantly higher than the other two. | en_US |
dc.description.tableofcontents | 第一章 緒論 1 第一節 研究背景及動機 1 第二節 研究目的 2 第二章 文獻回顧 3 第一節 P2P借貸介紹 3 第二節 P2P借貸的優劣 3 1. 借方使用P2P借貸平台的優點 3 2. 貸方使用P2P借貸平台的優點 4 3. 借貸雙方使用P2P借貸平台的缺點及風險 5 第三節 違約因子 6 第四節 類神經網路 6 第三章 研究方法 8 第一節 羅吉斯迴歸 8 第二節 支援向量機 9 1. 建構支援向量機 9 2. 核心函數(Kernel) 11 第三節 類神經網路 13 1. 結構(Architecture) 13 2. 激勵函數(Activation Function) 14 3. 學習規則(Learning Rule) 17 第四章 實證結果 18 第一節 變數挑選 18 第二節 資料前處理與資料集切割 22 1. 空值、異常值處理 22 2. 處理不平衡分類問題 22 3. 針對數值型資料標準化 23 4. 針對類別型資料轉為虛擬變數 23 5. 切割為兩個資料集 23 第三節 羅吉斯迴歸用於貸款違約預測及模型選擇 24 第四節 類神經網路用於貸款違約預測及超參數選擇 25 1. 訓練次數 25 2. 激勵函數 26 3. 批次樣本數 27 4. 隱藏層神經元數及隱藏層層數 28 第五節 羅吉斯迴歸、支援向量機及類神經網路之比較 29 第五章 研究結論與未來展望 31 第一節 研究結論 31 第二節 未來展望 31 | zh_TW |
dc.format.extent | 3886226 bytes | - |
dc.format.mimetype | application/pdf | - |
dc.source.uri (資料來源) | http://thesis.lib.nccu.edu.tw/record/#G0105352033 | en_US |
dc.subject (關鍵詞) | P2P借貸 | zh_TW |
dc.subject (關鍵詞) | 類神經網路 | zh_TW |
dc.subject (關鍵詞) | 羅吉斯迴歸 | zh_TW |
dc.subject (關鍵詞) | 信用風險 | zh_TW |
dc.subject (關鍵詞) | 違約預測 | zh_TW |
dc.subject (關鍵詞) | P2P lending | en_US |
dc.subject (關鍵詞) | Neural network | en_US |
dc.subject (關鍵詞) | Logistic regression | en_US |
dc.subject (關鍵詞) | Credit risk | en_US |
dc.subject (關鍵詞) | Default predicton | en_US |
dc.title (題名) | 機器學習在P2P借貸信用風險模型之應用:以Lending Club為例 | zh_TW |
dc.title (題名) | Application of Machine Learning in P2P Lending Credit Risk Model - A Case of Lending Club | en_US |
dc.type (資料類型) | thesis | en_US |
dc.relation.reference (參考文獻) | 中文文獻 1. 林威廷(1995)。以總體經濟因素預測股票報酬率-類神經網路與多元迴歸之比較研究。碩士論文。國立交通大學資訊管理研究所。 2. 蔡瑞煌(1995)。類神經網路概論。三民書局。 3. 賴俊霖(1996)。應用類神經網路預測國外股價指數期約。碩士論文。國立政治大學資訊管理研究所。 英文文獻 1. Bajpai, P. (2015). The 7 Best Peer-To-Peer Lending Websites (LC). Investopedia. 2. Batista, G. E., Prati, R. C., & Monard, M. C. (2004). A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD explorations newsletter, 6(1), 20-29. 3. Berger, S. C., & Gleisner, F. (2010). Emergence of financial intermediaries in electronic markets: The case of online P2P lending. Working Paper. 4. Boritz, J. E., Kennedy, D. B., & Sun, J. Y. (2007). Predicting business failures in Canada. Accounting Perspectives, 6(2), 141-165. 5. Carmichael, D. (2014). Modeling default for peer-to-peer loans. Working Paper. 6. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321-357. 7. Chen, X., Huang, B., & Ye, D. (2018). The role of punctuation in P2P lending: Evidence from China. Economic Modelling, 68, 634-643. 8. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273-297. 9. Dapp, T., Slomka, L., AG, D. B., & Hoffmann, R. (2014). Fintech–The digital (r) evolution in the financial sector. Deutsche Bank Research, Frankfurt am Main. 10. Emekter, R., Tu, Y., Jirasakuldech, B., & Lu, M. (2015). Evaluating credit risk and loan performance in online Peer-to-Peer (P2P) lending. Applied Economics, 47(1), 54-70. 11. Everett, C. R. (2015). Group membership, relationship banking and loan default risk: the case of online social lending. Working Paper. 12. Finger, R. (2013). Banks are not lending like they should, and with good reason. Forbes Business. 13. Guo, Y., Zhou, W., Luo, C., Liu, C., & Xiong, H. (2016). Instance-based credit risk assessment for investment decisions in P2P lending. European Journal of Operational Research, 249(2), 417-426. 14. Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157-1182. 15. Haykin, S. S., Haykin, S. S., Haykin, S. S., & Haykin, S. S. (2009). Neural networks and learning machines (3). Pearson, Upper Saddle River, NJ, USA. 16. Hebbs, D. G. (1949). The organization of behavior. Wiely and Sons, New York, NY, USA. 17. Hsu, C. W., Chang, C. C., & Lin, C. J. (2003). A practical guide to support vector classification. Working Paper. 18. Iyer, R., Khwaja, A. I., Luttmer, E. F., & Shue, K. (2009). Screening in new credit markets: Can individual lenders infer borrower creditworthiness in peer-to-peer lending?. Working Paper. 19. Japkowicz, N. (2000). Learning from imbalanced data sets: a comparison of various strategies. AAAI workshop on learning from imbalanced data sets, 68, 10-15. 20. Jo, T., & Japkowicz, N. (2004). Class imbalances versus small disjuncts. ACM SIGKDD Explorations Newsletter, 6(1), 40-49. 21. Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. European conference on machine learning, 137-142. 22. Klafft, M. (2008). Peer to peer lending: auctioning microcredits over the internet. Working Paper. 23. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 1097-1105. 24. Kubat, M., & Matwin, S. (1997). Addressing the curse of imbalanced training sets: one-sided selection. International conference on machine learning, 97, 179-186. 25. Laurikkala, J. (2001). Improving identification of difficult small classes by balancing class distribution. Conference on Artificial Intelligence in Medicine in Europe, 63-66. 26. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436. 27. Li, Z., Yao, X., Wen, Q., & Yang, W. (2016). Prepayment and Default of Consumer Loans in Online Lending. Working Paper. 28. Lin, M., Prabhala, N. R., & Viswanathan, S. (2013). Judging borrowers by the company they keep: Friendship networks and information asymmetry in online peer-to-peer lending. Management Science, 59(1), 17-35. 29. Ling, C. X., & Li, C. (1998). Data mining for direct marketing: Problems and solutions. Kdd, 98, 73-79. 30. Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier nonlinearities improve neural network acoustic models. Proc. International conference on machine learning, 30(1), 3. 31. Mateescu, A. (2015). Peer-to-Peer Lending. Data & Society, 1-23. 32. Milne, A., & Parboteeah, P. (2016). The business models and economics of peer-to-peer lending. ECRI Research Reports, 17. 33. Nickolas, S. (2015).What is the difference between moral hazard and adverse selection? Retrieved April 24 2015 from Investopedia: http://www.investo pedia.com/ask/answers/042415/what-difference-between-moral-hazard-and-adve rse-selection.asp 34. Odom, M. D., & Sharda, R. (1990). A neural network model for bankruptcy prediction. Neural Networks, 1990., 1990 IJCNN International Joint Conference, 163-168. 35. Phua, C., Alahakoon, D., & Lee, V. (2004). Minority report in fraud detection: classification of skewed data. ACM SIGKDD explorations newsletter, 6(1), 50-59. 36. Pontil, M., & Verri, A. (1998). Support vector machines for 3D object recognition. IEEE transactions on pattern analysis and machine intelligence, 20(6), 637-646. 37. Rind, V. (2016). Pros and Cons of Peer-To-Peer Lending. Retrieved April 26 2016 from GoBankingRates: https://www.gobankingrates.com/personal-finance/5-perks-peer-to-peer-lending/ 38. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533. 39. Serrano-Cinca, C., Gutierrez-Nieto, B., & López-Palacios, L. (2015). Determinants of default in P2P lending. PloS one, 10(10), e0139427. 40. Venkatasubramanian, V., & Chan, K. (1989). A neural network methodology for process fault diagnosis. AIChE Journal, 35(12), 1993-2002. 41. Woodruff, M. (2014). Here`s what you need to know before taking out a peer-to-peer loan. Retrieved August 29 2014 from Yahoo Finance: http:// finance.yahoo.com/news/what-is-peer-to-peer-lending-173019140.html 42. Xu, B., Wang, N., Chen, T., & Li, M. (2015). Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853. | zh_TW |
dc.identifier.doi (DOI) | 10.6814/THE.NCCU.MB.024.2018.F06 | - |