機器學習下信用卡詐欺之預測分析: 以美國市場為例

學術產出-Theses

Article View/Open

pdf(0)

Publication Export

Google Scholar^TM

政大圖書館

學術資源探索系統

Citation Infomation

No doi shows Citation Infomation

Simple Record
Full Record

題名	機器學習下信用卡詐欺之預測分析: 以美國市場為例 Predictive Analysis of Credit Card Fraud via Machine Learning : Evidence from the United State
作者	陳彥霖 Chen, Yen-Lin
貢獻者	洪芷漪<br>林士貴 Hong, Jyy-I<br>Lin, Shih-Kuei 陳彥霖 Chen, Yen-Lin
關鍵詞	信用卡詐欺模型機器學習非線性問題召回率 Credit Card Fraud Model Machine Learning Nonlinear Problem Recall
日期	2024
上傳時間	1-Feb-2024 11:25:27 (UTC+8)
摘要	本研究採用包含 180 萬筆美國信用卡詐欺資料集，旨在深入探討消費詐欺行為。透過對客戶消費行為與個人資料這兩大類變數進行建模，我們試圖探究各項變數對詐欺消費之影響。本研究比較機器學習中樹模型與邏輯斯迴歸模型的表現，結果顯示在這類非線性問題中，隨機森林與 XGBoost 展現出優異預測能力。同時，我們發現消費金額、店家種類以及消費日期為星期幾這三個變數對於預測詐欺行為具有重要影響，並成功建立出召回率較高的模型。 This study employs a dataset containing 1.8 million instances of credit card fraud in the United States to delve into fraudulent transaction behaviors. By modeling two major categories of variables—customer transaction behaviors and personal information—we aim to explore the influence of various factors on fraudulent transactions. Comparative analysis between tree-based models and logistic regression in machine learning reveals that in such non-linear scenarios, Random Forest and XGBoost demonstrate superior predictive performance. Additionally, we identified four significant variables—transaction amount, merchant type, and the day of the week of the transaction —as influential factors in predicting fraudulent behavior, resulting in the development of a model with higher recall rates.
參考文獻	Alexandrov, A., Bedre-Defolie, Ö., and Grodzicki, D. (2017). Consumer demand for credit card services. Apley, D. W., . Z. J. (2020). Visualizing the effects of predictor variables in black box supervised learning models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(4):1059–1086. Attivilli, R. and Jothi, A. A. (2023). Serverless stream-based processing for real time credit card fraud detection using machine learning. In 2023 IEEE World AI IoT Congress (AIIoT), pages 0434–0439. IEEE. Barbaglia, L., Manzan, S., and Tosetti, E. (2023). Forecasting loan default in europe with machine learning. Journal of Financial Econometrics, 21(2):569–596. Bradley, A. P. (1997). The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7):1145–1159. Breiman, L. (2001). Random forests. Machine Learning, 45:5–32. Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16:321–357. Chen, T. and Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 785–794. Consulting, B. (2022). Digital Payment Market- Global Industry Size, Share, Trend Analysis and Forecast Report, 2018-2028, Segmented By Offering (Solution and Service) By Mode of Payment (Digital Currencies, Bank Cards, Digital Wallets, Net Banking, Point of Sale, and Others), By Deployment (On-Premise and Cloud), By Organization Size (Small Enterprises, Medium Enterprises, and Large Enterprises), By Sector (Banking, Financial Services and Insurance (BFSI), Retail/E-Commerce, Healthcare, Hospitality/Travel, Logistics and Transportation, Others), By Region (North America, Europe, Asia-Pacific (APAC), Latin America (LATAM), Middle East Africa (MEA). https://www.blueweaveconsulting.com/report/ digital-payment-market/report-sample. 27 Consulting, M. C. (2023). Credit Card Fraud Statistics (2024). https://merchantcostconsulting.com/lower-credit-card-processing-fees/ credit-card-fraud-statistics/. Davis, J. and Goadrich, M. (2006). The relationship between precision-recall and roc curves. In Proceedings of the 23rd International Conference on Machine Learning, pages 233–240. Ganong, P. and Noel, P. (2019). Consumer spending during unemployment: Positive and normative implications. American Economic Review, 109(7):2383–2424. Hajek, P. and Henriques, R. (2017). Mining corporate annual reports for intelligent detection of financial statement fraud: A comparative study of machine learning methods. KnowledgeBased Systems, 128:139–152. Horvath, A., Kay, B., and Wix, C. (2023). The covid-19 shock and consumer credit: Evidence from credit card data. Journal of Banking & Finance, 152:106854. Huang, J. and Ling, C. X. (2005). Using auc and accuracy in evaluating learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 17(3):299–310. Huddleston, D., Liu, F., and Stentoft, L. (2023). Intraday market predictability: A machine learning approach. Journal of Financial Econometrics, 21(2):485–527. Hundtofte, S., Olafsson, A., and Pagel, M. (2019). Credit smoothing. Technical report, National Bureau of Economic Research. Karpoff, J. M. (2021). The future of financial fraud. Journal of Corporate Finance, 66:101694. KAZANINS, J. (2022). Notes on VISA FY Q4 2022 results: U.S. credit card holders drive payments volume up. https://www.popularfintech.com/p/notes-on-visa-fy-q4-2022-results. Kourou, K., Exarchos, T. P., Exarchos, K. P., Karamouzis, M. V., and Fotiadis, D. I. (2015). Machine learning applications in cancer prognosis and prediction. Computational and Structural biotechnology journal, 13:8–17. Murdoch, W. J., Singh, C., Kumbier, K., Abbasi-Asl, R., and Yu, B. (2019). Interpretable machine learning: Definitions, methods, and applications. arXiv preprint arXiv:1901.04592. Nobre, J. and Neves, R. F. (2019). Combining principal component analysis, discrete wavelet transform and xgboost to trade in the financial markets. Expert Systems with Applications, 125:181–194. Perols, J. (2011). Financial statement fraud detection: An analysis of statistical and machine learning algorithms. Auditing: A Journal of Practice & Theory, 30(2):19–50. 28 Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1135–1144. Sadgali, I., Sael, N., and Benabbou, F. (2019). Performance of machine learning techniques in the detection of financial frauds. Procedia Computer Science, 148:45–54. Schiltz, F., Masci, C., Agasisti, T., and Horn, D. (2018). Using regression tree ensembles to model interaction effects: A graphical approach. Applied Economics, 50(58):6341–6354. Scholnick, B., Massoud, N., Saunders, A., Carbo-Valverde, S., and Rodríguez-Fernández, F. (2008). The economics of credit cards, debit cards and atms: A survey and some new evidence. Journal of Banking & Finance, 32(8):1468–1483. Shou, M., Bao, X., and Yu, J. (2023). An optimal weighted machine learning model for detecting financial fraud. Applied Economics Letters, 30(4):410–415. Spathis, C., Doumpos, M., and Zopounidis, C. (2002). Detecting falsified financial statements: A comparative study using multicriteria analysis and multivariate statistical techniques. European Accounting Review, 11(3):509–535. Yee, O. S., Sagadevan, S., and Malim, N. H. A. H. (2018). Credit card fraud detection using machine learning as data mining technique. Journal of Telecommunication, Electronic and Computer Engineering (JTEC), 10(1-4):23–27. Yin, M., Wortman Vaughan, J., and Wallach, H. (2019). Understanding the effect of accuracy on trust in machine learning models. In Proceedings of the 2019 Chi Conference on Human Factors in Computing Systems, pages 1–12. Zhao, Q. and Hastie, T. (2021). Causal interpretations of black-box models. Journal of Business & Economic Statistics, 39(1):272–281.
描述	碩士國立政治大學應用數學系 110751015
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0110751015
資料類型	thesis

dc.contributor.advisor	洪芷漪<br>林士貴	zh_TW
dc.contributor.advisor	Hong, Jyy-I<br>Lin, Shih-Kuei	en_US
dc.contributor.author (Authors)	陳彥霖	zh_TW
dc.contributor.author (Authors)	Chen, Yen-Lin	en_US
dc.creator (作者)	陳彥霖	zh_TW
dc.creator (作者)	Chen, Yen-Lin	en_US
dc.date (日期)	2024	en_US
dc.date.accessioned	1-Feb-2024 11:25:27 (UTC+8)	-
dc.date.available	1-Feb-2024 11:25:27 (UTC+8)	-
dc.date.issued (上傳時間)	1-Feb-2024 11:25:27 (UTC+8)	-
dc.identifier (Other Identifiers)	G0110751015	en_US
dc.identifier.uri (URI)	https://nccur.lib.nccu.edu.tw/handle/140.119/149594	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	應用數學系	zh_TW
dc.description (描述)	110751015	zh_TW
dc.description.abstract (摘要)	本研究採用包含 180 萬筆美國信用卡詐欺資料集，旨在深入探討消費詐欺行為。透過對客戶消費行為與個人資料這兩大類變數進行建模，我們試圖探究各項變數對詐欺消費之影響。本研究比較機器學習中樹模型與邏輯斯迴歸模型的表現，結果顯示在這類非線性問題中，隨機森林與 XGBoost 展現出優異預測能力。同時，我們發現消費金額、店家種類以及消費日期為星期幾這三個變數對於預測詐欺行為具有重要影響，並成功建立出召回率較高的模型。	zh_TW
dc.description.abstract (摘要)	This study employs a dataset containing 1.8 million instances of credit card fraud in the United States to delve into fraudulent transaction behaviors. By modeling two major categories of variables—customer transaction behaviors and personal information—we aim to explore the influence of various factors on fraudulent transactions. Comparative analysis between tree-based models and logistic regression in machine learning reveals that in such non-linear scenarios, Random Forest and XGBoost demonstrate superior predictive performance. Additionally, we identified four significant variables—transaction amount, merchant type, and the day of the week of the transaction —as influential factors in predicting fraudulent behavior, resulting in the development of a model with higher recall rates.	en_US
dc.description.tableofcontents	1 緒論 1 2 文獻回顧 5 2.1 財務詐欺檢測 5 2.2 信用卡消費行為 6 2.3 機器學習在金融領域之應用 7 3 研究方法 8 3.1 模型 8 3.1.1 懲罰型邏輯斯迴歸 (Penalized Logistic Regression,LR) 8 3.1.2 隨機森林 (Random Forest,RF) 9 3.1.3 eXtreme Gradient Boosting 10 3.2 模型表現 10 3.3 可解釋機器學習 11 4 實證研究 13 4.1 資料描述與預處理 13 4.2 模型訓練流程 15 4.3 模型績效表現 17 4.4 穩定性探討 22 5 結論與未來展望 25 5.1 結論 25 5.2 未來展望 25 References 27 Appendix 30	zh_TW
dc.format.extent	4446252 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0110751015	en_US
dc.subject (關鍵詞)	信用卡詐欺模型	zh_TW
dc.subject (關鍵詞)	機器學習	zh_TW
dc.subject (關鍵詞)	非線性問題	zh_TW
dc.subject (關鍵詞)	召回率	zh_TW
dc.subject (關鍵詞)	Credit Card Fraud Model	en_US
dc.subject (關鍵詞)	Machine Learning	en_US
dc.subject (關鍵詞)	Nonlinear Problem	en_US
dc.subject (關鍵詞)	Recall	en_US
dc.title (題名)	機器學習下信用卡詐欺之預測分析: 以美國市場為例	zh_TW
dc.title (題名)	Predictive Analysis of Credit Card Fraud via Machine Learning : Evidence from the United State	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	Alexandrov, A., Bedre-Defolie, Ö., and Grodzicki, D. (2017). Consumer demand for credit card services. Apley, D. W., . Z. J. (2020). Visualizing the effects of predictor variables in black box supervised learning models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(4):1059–1086. Attivilli, R. and Jothi, A. A. (2023). Serverless stream-based processing for real time credit card fraud detection using machine learning. In 2023 IEEE World AI IoT Congress (AIIoT), pages 0434–0439. IEEE. Barbaglia, L., Manzan, S., and Tosetti, E. (2023). Forecasting loan default in europe with machine learning. Journal of Financial Econometrics, 21(2):569–596. Bradley, A. P. (1997). The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7):1145–1159. Breiman, L. (2001). Random forests. Machine Learning, 45:5–32. Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16:321–357. Chen, T. and Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 785–794. Consulting, B. (2022). Digital Payment Market- Global Industry Size, Share, Trend Analysis and Forecast Report, 2018-2028, Segmented By Offering (Solution and Service) By Mode of Payment (Digital Currencies, Bank Cards, Digital Wallets, Net Banking, Point of Sale, and Others), By Deployment (On-Premise and Cloud), By Organization Size (Small Enterprises, Medium Enterprises, and Large Enterprises), By Sector (Banking, Financial Services and Insurance (BFSI), Retail/E-Commerce, Healthcare, Hospitality/Travel, Logistics and Transportation, Others), By Region (North America, Europe, Asia-Pacific (APAC), Latin America (LATAM), Middle East Africa (MEA). https://www.blueweaveconsulting.com/report/ digital-payment-market/report-sample. 27 Consulting, M. C. (2023). Credit Card Fraud Statistics (2024). https://merchantcostconsulting.com/lower-credit-card-processing-fees/ credit-card-fraud-statistics/. Davis, J. and Goadrich, M. (2006). The relationship between precision-recall and roc curves. In Proceedings of the 23rd International Conference on Machine Learning, pages 233–240. Ganong, P. and Noel, P. (2019). Consumer spending during unemployment: Positive and normative implications. American Economic Review, 109(7):2383–2424. Hajek, P. and Henriques, R. (2017). Mining corporate annual reports for intelligent detection of financial statement fraud: A comparative study of machine learning methods. KnowledgeBased Systems, 128:139–152. Horvath, A., Kay, B., and Wix, C. (2023). The covid-19 shock and consumer credit: Evidence from credit card data. Journal of Banking & Finance, 152:106854. Huang, J. and Ling, C. X. (2005). Using auc and accuracy in evaluating learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 17(3):299–310. Huddleston, D., Liu, F., and Stentoft, L. (2023). Intraday market predictability: A machine learning approach. Journal of Financial Econometrics, 21(2):485–527. Hundtofte, S., Olafsson, A., and Pagel, M. (2019). Credit smoothing. Technical report, National Bureau of Economic Research. Karpoff, J. M. (2021). The future of financial fraud. Journal of Corporate Finance, 66:101694. KAZANINS, J. (2022). Notes on VISA FY Q4 2022 results: U.S. credit card holders drive payments volume up. https://www.popularfintech.com/p/notes-on-visa-fy-q4-2022-results. Kourou, K., Exarchos, T. P., Exarchos, K. P., Karamouzis, M. V., and Fotiadis, D. I. (2015). Machine learning applications in cancer prognosis and prediction. Computational and Structural biotechnology journal, 13:8–17. Murdoch, W. J., Singh, C., Kumbier, K., Abbasi-Asl, R., and Yu, B. (2019). Interpretable machine learning: Definitions, methods, and applications. arXiv preprint arXiv:1901.04592. Nobre, J. and Neves, R. F. (2019). Combining principal component analysis, discrete wavelet transform and xgboost to trade in the financial markets. Expert Systems with Applications, 125:181–194. Perols, J. (2011). Financial statement fraud detection: An analysis of statistical and machine learning algorithms. Auditing: A Journal of Practice & Theory, 30(2):19–50. 28 Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1135–1144. Sadgali, I., Sael, N., and Benabbou, F. (2019). Performance of machine learning techniques in the detection of financial frauds. Procedia Computer Science, 148:45–54. Schiltz, F., Masci, C., Agasisti, T., and Horn, D. (2018). Using regression tree ensembles to model interaction effects: A graphical approach. Applied Economics, 50(58):6341–6354. Scholnick, B., Massoud, N., Saunders, A., Carbo-Valverde, S., and Rodríguez-Fernández, F. (2008). The economics of credit cards, debit cards and atms: A survey and some new evidence. Journal of Banking & Finance, 32(8):1468–1483. Shou, M., Bao, X., and Yu, J. (2023). An optimal weighted machine learning model for detecting financial fraud. Applied Economics Letters, 30(4):410–415. Spathis, C., Doumpos, M., and Zopounidis, C. (2002). Detecting falsified financial statements: A comparative study using multicriteria analysis and multivariate statistical techniques. European Accounting Review, 11(3):509–535. Yee, O. S., Sagadevan, S., and Malim, N. H. A. H. (2018). Credit card fraud detection using machine learning as data mining technique. Journal of Telecommunication, Electronic and Computer Engineering (JTEC), 10(1-4):23–27. Yin, M., Wortman Vaughan, J., and Wallach, H. (2019). Understanding the effect of accuracy on trust in machine learning models. In Proceedings of the 2019 Chi Conference on Human Factors in Computing Systems, pages 1–12. Zhao, Q. and Hastie, T. (2021). Causal interpretations of black-box models. Journal of Business & Economic Statistics, 39(1):272–281.	zh_TW

學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

Google Scholar^TM