學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 零售藥妝顧客購買頻率與利潤之分析
Analysis of Customer Purchase Frequency and Profitability in Retail Pharmacy Stores
作者 黃兆椿
貢獻者 莊皓鈞
黃兆椿
關鍵詞 零售業
RFM
集中度
廣度
資料分析
Retailing
RFM
Clumpiness
Breadth
Data Analytics
日期 2017
上傳時間 28-Aug-2017 13:38:41 (UTC+8)
摘要 本研究主要探討藥妝零售產業提升預測顧客行為的模型與方法,並以RFM模型為基礎進行延伸。RFM模型在行銷領域中是廣泛被使用的模型,具有良好預測和分群顧客的能力,本研究在此模型中加入了兩項新指標:集中度 (C) 和 廣度 (B),並針對顧客的「交易頻率」和「交易利潤」進行分析,藉此找出優於RFM的指標組合。首先將RFM、C、B共五項指標進行排列組合,並以迴歸分析驗證新增的兩項指標能顯著提升模型解釋能力,接著將RFM指標組合及RFMCB指標組合分別作為機器學習方法的解釋變數以預測顧客行為。對顧客交易頻率而言,C和B兩項指標的加入能顯著提升其預測能力,對顧客交易利潤而言,新指標的加入,平均而言對於預測精準度有所提升,但在部分資料中會使誤差值增加以致整體誤差的最大值有所提升。
This research proposes modeling techniques to better predict customer behaviors in the retail industry. Extending the widely-adopted RFM model in marketing, we introduce two new metrics – clumpiness (C) and breadth (B). Using more than two million transaction records from over 100 retail pharmacy stores in Taiwan, we fit a set of regression models, in which we assess the explanatory power of different combinations of RFMCB for customer purchase frequency and profitability. Our analysis shows that the RFM model is significantly inferior to models with C and/or B, suggesting that C and B are indeed promising metrics. In the next stage, we will apply machine learning methods to incorporate C and B into predictive models and assess their out-of-sample prediction performance. On Average, RFMCB outperforms RFM in predicting Frequency & Profit. However, there are some cases where RFMCB leads to larger prediction error.
參考文獻 Akaike, H. (1974). A new look at the statistical model identification. IEEE transactions on automatic control, 19(6), 716-723.
Auria, L., & Moro, R. A. (2008). Support vector machines (SVM) as a technique for solvency analysis.
Bell, D. R., & Lattin, J. M. (1998). Shopping behavior and consumer preference for store price format: Why “large basket” shoppers prefer EDLP. Marketing Science, 17(1), 66-88.
Berger, P., & Magliozzi, T. (1992). The effect of sample size and proportion of buyers in the sample on the performance of list segmentation equations generated by regression analysis. Journal of Direct Marketing, 6(1), 13-22.
Bhattacharyya, S. (1999). Direct marketing performance modeling using genetic algorithms. INFORMS Journal on Computing, 11(3), 248-257.
Breiman, L. (1996). Bagging predictors. Machine learning, 24(2), 123-140.
Chai, T., & Draxler, R. R. (2014). Root mean square error (RMSE) or mean absolute error (MAE)?–Arguments against avoiding RMSE in the literature. Geoscientific Model Development, 7(3), 1247-1250.
Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785-794). ACM.
Colombo, R., & Jiang, W. (1999). A stochastic RFM model. Journal of Interactive Marketing, 13(3), 2-12.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297
Coussement, K., Van den Bossche, F. A., & De Bock, K. W. (2014). Data accuracy`s impact on segmentation performance: Benchmarking RFM analysis, logistic regression, and decision trees. Journal of Business Research, 67(1), 2751-2758.
Cui, G., Wong, M. L., & Lui, H. K. (2006). Machine learning for direct marketing response models: Bayesian networks with evolutionary programming. Management Science, 52(4), 597-612.
Drucker, H., Burges, C. J., Kaufman, L., Smola, A., & Vapnik, V. (1997). Support vector regression machines. Advances in neural information processing systems, 9, 155-161.
Elith, J., Leathwick, J. R., & Hastie, T. (2008). A working guide to boosted regression trees. Journal of Animal Ecology, 77(4), 802-813.
Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4), 367-378.
Hastie, T. J., & Tibshirani, R. J. (1990). Generalized additive models (Vol. 43). CRC press.
Haughton, D., & Oulabi, S. (1997). Direct marketing modeling with CART and CHAID. Journal of Interactive Marketing, 11(4), 42-52.
Hosseini, Seyed Mohammad Seyed, Anahita Maleki, and Mohammad Reza Gholamian. "Cluster analysis using data mining approach to develop CRM methodology to assess the customer loyalty." Expert Systems with Applications 37.7 (2010): 5259-5264.
Jean Halliday. (2002). Database Marketing: GM plays cards right. Retrieved January 14, 2017, from http://adage.com/article/interactive-media-marketing/database-marketing-gm-plays-cards/52084/
Jiang, W. (2002). On weak base hypotheses and their implications for boosting regression and classification. Annals of statistics, 51-73.
Johnson, N. L. (1949). Systems of frequency curves generated by methods of translation. Biometrika, 36(1/2), 149-176.
Kahan, R. (1998). Using database marketing techniques to enhance your one-to-one marketing initiatives. Journal of Consumer Marketing, 15(5), 491-493.
Khajvand, M., Zolfaghar, K., Ashoori, S., & Alizadeh, S. (2011). Estimating customer lifetime value based on RFM analysis of customer purchase behavior: Case study. Procedia Computer Science, 3, 57-63.
Kohavi, R. (1995, August). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Ijcai (Vol. 14, No. 2, pp. 1137-1145).
Kumar, V., Srinivasan, K., Rao, V. R., Zhang, Y., Bradlow, E. T., & Small, D. S. (2015). Commentaries and Reply on “Predicting Customer Value Using Clumpiness: From RFM to RFMC” by Yao Zhang, Eric T. Bradlow, and Dylan S. Small. Marketing Science, 34(2), 209-217.
Ling, C. X., & Li, C. (1998, August). Data Mining for Direct Marketing: Problems and Solutions. In KDD (Vol. 98, pp. 73-79).
Marcus, C. (1998). A practical yet meaningful approach to customer segmentation. Journal of consumer marketing, 15(5), 494-504.
McCarty, J. A., & Hastak, M. (2007). Segmentation approaches in data-mining: A comparison of RFM, CHAID, and logistic regression. Journal of business research, 60(6), 656-662.
Natekin, A., & Knoll, A. (2013). Gradient boosting machines, a tutorial. Frontiers in neurorobotics, 7, 21.
Netzer, O., Lattin, J. M., & Srinivasan, V. (2008). A hidden Markov model of customer relationship dynamics. Marketing Science, 27(2), 185-204.
Petrison, L. A., Blattberg, R. C., & Wang, P. (1997). Database marketing: Past, present, and future. Journal of Interactive Marketing, 11(4), 109-125.
Prasad, A. M., Iverson, L. R., & Liaw, A. (2006). Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems, 9(2), 181-199.
Ridgeway, G. (2002). Looking for lumps: Boosting and bagging for density estimation. Computational Statistics & Data Analysis, 38(4), 379-392.
Ridgeway, G. (2007). Generalized Boosted Models: A guide to the gbm package. Update, 1(1), 2007.
Schweidel, D. A., Bradlow, E. T., & Fader, P. S. (2011). Portfolio dynamics for customers of a multiservice provider. Management Science, 57(3), 471-486.
Sohrabi, B., & Khanlari, A. (2007). Customer lifetime value (CLV) measurement based on RFM model. Iranian Accounting & Auditing Review, 14(47), 7-20.
Verhoef, P. C., Spring, P. N., Hoekstra, J. C., & Leeflang, P. S. (2003). The commercial use of segmentation and predictive modeling techniques for database marketing in the Netherlands. Decision Support Systems, 34(4), 471-481.
Wagenmakers, E. J., & Farrell, S. (2004). AIC model selection using Akaike weights. Psychonomic bulletin & review, 11(1), 192-196.
Yeh, I. C., Yang, K. J., & Ting, T. M. (2009). Knowledge discovery on RFM model using Bernoulli sequence. Expert Systems with Applications, 36(3), 5866-5871.
Zhang, Y., Bradlow, E. T., & Small, D. S. (2013). New measures of clumpiness for incidence data. Journal of Applied Statistics, 40(11), 2533-2548.
Zhang, Y., Bradlow, E. T., & Small, D. S. (2014). Predicting customer value using clumpiness: From RFM to RFMC. Marketing Science, 34(2), 195-208.
Zwilling M. L. (2013), “Negative Binomial Regression,” The Mathematica Journal, dx.doi.org/10.3888/tmj.15-6
林軒田 (民104年12月8日)。Machine Learning Foundation (機器學習基石)
【部落格影音資料】取自https://www.youtube.com/playlist?list=PLXVfgk9fNX2I7tB6oIINGBmW50rrmFTqf
描述 碩士
國立政治大學
資訊管理學系
104356032
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0104356032
資料類型 thesis
dc.contributor.advisor 莊皓鈞zh_TW
dc.contributor.author (Authors) 黃兆椿zh_TW
dc.creator (作者) 黃兆椿zh_TW
dc.date (日期) 2017en_US
dc.date.accessioned 28-Aug-2017 13:38:41 (UTC+8)-
dc.date.available 28-Aug-2017 13:38:41 (UTC+8)-
dc.date.issued (上傳時間) 28-Aug-2017 13:38:41 (UTC+8)-
dc.identifier (Other Identifiers) G0104356032en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/112274-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊管理學系zh_TW
dc.description (描述) 104356032zh_TW
dc.description.abstract (摘要) 本研究主要探討藥妝零售產業提升預測顧客行為的模型與方法,並以RFM模型為基礎進行延伸。RFM模型在行銷領域中是廣泛被使用的模型,具有良好預測和分群顧客的能力,本研究在此模型中加入了兩項新指標:集中度 (C) 和 廣度 (B),並針對顧客的「交易頻率」和「交易利潤」進行分析,藉此找出優於RFM的指標組合。首先將RFM、C、B共五項指標進行排列組合,並以迴歸分析驗證新增的兩項指標能顯著提升模型解釋能力,接著將RFM指標組合及RFMCB指標組合分別作為機器學習方法的解釋變數以預測顧客行為。對顧客交易頻率而言,C和B兩項指標的加入能顯著提升其預測能力,對顧客交易利潤而言,新指標的加入,平均而言對於預測精準度有所提升,但在部分資料中會使誤差值增加以致整體誤差的最大值有所提升。zh_TW
dc.description.abstract (摘要) This research proposes modeling techniques to better predict customer behaviors in the retail industry. Extending the widely-adopted RFM model in marketing, we introduce two new metrics – clumpiness (C) and breadth (B). Using more than two million transaction records from over 100 retail pharmacy stores in Taiwan, we fit a set of regression models, in which we assess the explanatory power of different combinations of RFMCB for customer purchase frequency and profitability. Our analysis shows that the RFM model is significantly inferior to models with C and/or B, suggesting that C and B are indeed promising metrics. In the next stage, we will apply machine learning methods to incorporate C and B into predictive models and assess their out-of-sample prediction performance. On Average, RFMCB outperforms RFM in predicting Frequency & Profit. However, there are some cases where RFMCB leads to larger prediction error.en_US
dc.description.tableofcontents 第一章 緒論 4
第二章 文獻探討 8
第三章 Clumpiness指標 10
第四章 資料與迴歸模型 13
第五章 解釋力驗證架構與結果評估 19
第一節 解釋力驗證架構 19
第二節 結果評估 22
第六章 預測力驗證架構與結果評估 24
第一節 預測力驗證架構及機器學習方法 24
第二節 結果評估 29
第七章 結論 36
附錄一 39
參考文獻 46
zh_TW
dc.format.extent 933012 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0104356032en_US
dc.subject (關鍵詞) 零售業zh_TW
dc.subject (關鍵詞) RFMzh_TW
dc.subject (關鍵詞) 集中度zh_TW
dc.subject (關鍵詞) 廣度zh_TW
dc.subject (關鍵詞) 資料分析zh_TW
dc.subject (關鍵詞) Retailingen_US
dc.subject (關鍵詞) RFMen_US
dc.subject (關鍵詞) Clumpinessen_US
dc.subject (關鍵詞) Breadthen_US
dc.subject (關鍵詞) Data Analyticsen_US
dc.title (題名) 零售藥妝顧客購買頻率與利潤之分析zh_TW
dc.title (題名) Analysis of Customer Purchase Frequency and Profitability in Retail Pharmacy Storesen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) Akaike, H. (1974). A new look at the statistical model identification. IEEE transactions on automatic control, 19(6), 716-723.
Auria, L., & Moro, R. A. (2008). Support vector machines (SVM) as a technique for solvency analysis.
Bell, D. R., & Lattin, J. M. (1998). Shopping behavior and consumer preference for store price format: Why “large basket” shoppers prefer EDLP. Marketing Science, 17(1), 66-88.
Berger, P., & Magliozzi, T. (1992). The effect of sample size and proportion of buyers in the sample on the performance of list segmentation equations generated by regression analysis. Journal of Direct Marketing, 6(1), 13-22.
Bhattacharyya, S. (1999). Direct marketing performance modeling using genetic algorithms. INFORMS Journal on Computing, 11(3), 248-257.
Breiman, L. (1996). Bagging predictors. Machine learning, 24(2), 123-140.
Chai, T., & Draxler, R. R. (2014). Root mean square error (RMSE) or mean absolute error (MAE)?–Arguments against avoiding RMSE in the literature. Geoscientific Model Development, 7(3), 1247-1250.
Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785-794). ACM.
Colombo, R., & Jiang, W. (1999). A stochastic RFM model. Journal of Interactive Marketing, 13(3), 2-12.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297
Coussement, K., Van den Bossche, F. A., & De Bock, K. W. (2014). Data accuracy`s impact on segmentation performance: Benchmarking RFM analysis, logistic regression, and decision trees. Journal of Business Research, 67(1), 2751-2758.
Cui, G., Wong, M. L., & Lui, H. K. (2006). Machine learning for direct marketing response models: Bayesian networks with evolutionary programming. Management Science, 52(4), 597-612.
Drucker, H., Burges, C. J., Kaufman, L., Smola, A., & Vapnik, V. (1997). Support vector regression machines. Advances in neural information processing systems, 9, 155-161.
Elith, J., Leathwick, J. R., & Hastie, T. (2008). A working guide to boosted regression trees. Journal of Animal Ecology, 77(4), 802-813.
Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4), 367-378.
Hastie, T. J., & Tibshirani, R. J. (1990). Generalized additive models (Vol. 43). CRC press.
Haughton, D., & Oulabi, S. (1997). Direct marketing modeling with CART and CHAID. Journal of Interactive Marketing, 11(4), 42-52.
Hosseini, Seyed Mohammad Seyed, Anahita Maleki, and Mohammad Reza Gholamian. "Cluster analysis using data mining approach to develop CRM methodology to assess the customer loyalty." Expert Systems with Applications 37.7 (2010): 5259-5264.
Jean Halliday. (2002). Database Marketing: GM plays cards right. Retrieved January 14, 2017, from http://adage.com/article/interactive-media-marketing/database-marketing-gm-plays-cards/52084/
Jiang, W. (2002). On weak base hypotheses and their implications for boosting regression and classification. Annals of statistics, 51-73.
Johnson, N. L. (1949). Systems of frequency curves generated by methods of translation. Biometrika, 36(1/2), 149-176.
Kahan, R. (1998). Using database marketing techniques to enhance your one-to-one marketing initiatives. Journal of Consumer Marketing, 15(5), 491-493.
Khajvand, M., Zolfaghar, K., Ashoori, S., & Alizadeh, S. (2011). Estimating customer lifetime value based on RFM analysis of customer purchase behavior: Case study. Procedia Computer Science, 3, 57-63.
Kohavi, R. (1995, August). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Ijcai (Vol. 14, No. 2, pp. 1137-1145).
Kumar, V., Srinivasan, K., Rao, V. R., Zhang, Y., Bradlow, E. T., & Small, D. S. (2015). Commentaries and Reply on “Predicting Customer Value Using Clumpiness: From RFM to RFMC” by Yao Zhang, Eric T. Bradlow, and Dylan S. Small. Marketing Science, 34(2), 209-217.
Ling, C. X., & Li, C. (1998, August). Data Mining for Direct Marketing: Problems and Solutions. In KDD (Vol. 98, pp. 73-79).
Marcus, C. (1998). A practical yet meaningful approach to customer segmentation. Journal of consumer marketing, 15(5), 494-504.
McCarty, J. A., & Hastak, M. (2007). Segmentation approaches in data-mining: A comparison of RFM, CHAID, and logistic regression. Journal of business research, 60(6), 656-662.
Natekin, A., & Knoll, A. (2013). Gradient boosting machines, a tutorial. Frontiers in neurorobotics, 7, 21.
Netzer, O., Lattin, J. M., & Srinivasan, V. (2008). A hidden Markov model of customer relationship dynamics. Marketing Science, 27(2), 185-204.
Petrison, L. A., Blattberg, R. C., & Wang, P. (1997). Database marketing: Past, present, and future. Journal of Interactive Marketing, 11(4), 109-125.
Prasad, A. M., Iverson, L. R., & Liaw, A. (2006). Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems, 9(2), 181-199.
Ridgeway, G. (2002). Looking for lumps: Boosting and bagging for density estimation. Computational Statistics & Data Analysis, 38(4), 379-392.
Ridgeway, G. (2007). Generalized Boosted Models: A guide to the gbm package. Update, 1(1), 2007.
Schweidel, D. A., Bradlow, E. T., & Fader, P. S. (2011). Portfolio dynamics for customers of a multiservice provider. Management Science, 57(3), 471-486.
Sohrabi, B., & Khanlari, A. (2007). Customer lifetime value (CLV) measurement based on RFM model. Iranian Accounting & Auditing Review, 14(47), 7-20.
Verhoef, P. C., Spring, P. N., Hoekstra, J. C., & Leeflang, P. S. (2003). The commercial use of segmentation and predictive modeling techniques for database marketing in the Netherlands. Decision Support Systems, 34(4), 471-481.
Wagenmakers, E. J., & Farrell, S. (2004). AIC model selection using Akaike weights. Psychonomic bulletin & review, 11(1), 192-196.
Yeh, I. C., Yang, K. J., & Ting, T. M. (2009). Knowledge discovery on RFM model using Bernoulli sequence. Expert Systems with Applications, 36(3), 5866-5871.
Zhang, Y., Bradlow, E. T., & Small, D. S. (2013). New measures of clumpiness for incidence data. Journal of Applied Statistics, 40(11), 2533-2548.
Zhang, Y., Bradlow, E. T., & Small, D. S. (2014). Predicting customer value using clumpiness: From RFM to RFMC. Marketing Science, 34(2), 195-208.
Zwilling M. L. (2013), “Negative Binomial Regression,” The Mathematica Journal, dx.doi.org/10.3888/tmj.15-6
林軒田 (民104年12月8日)。Machine Learning Foundation (機器學習基石)
【部落格影音資料】取自https://www.youtube.com/playlist?list=PLXVfgk9fNX2I7tB6oIINGBmW50rrmFTqf
zh_TW