零售藥妝顧客購買頻率與利潤之分析

學術產出-Theses

Article View/Open

pdf(35)

Publication Export

Google Scholar^TM

政大圖書館

學術資源探索系統

Citation Infomation

No doi shows Citation Infomation

Simple Record
Full Record

題名	零售藥妝顧客購買頻率與利潤之分析 Analysis of Customer Purchase Frequency and Profitability in Retail Pharmacy Stores
作者	黃兆椿
貢獻者	莊皓鈞黃兆椿
關鍵詞	零售業 RFM 集中度廣度資料分析 Retailing RFM Clumpiness Breadth Data Analytics
日期	2017
上傳時間	28-Aug-2017 13:38:41 (UTC+8)
摘要	本研究主要探討藥妝零售產業提升預測顧客行為的模型與方法，並以RFM模型為基礎進行延伸。RFM模型在行銷領域中是廣泛被使用的模型，具有良好預測和分群顧客的能力，本研究在此模型中加入了兩項新指標：集中度 (C) 和廣度 (B)，並針對顧客的「交易頻率」和「交易利潤」進行分析，藉此找出優於RFM的指標組合。首先將RFM、C、B共五項指標進行排列組合，並以迴歸分析驗證新增的兩項指標能顯著提升模型解釋能力，接著將RFM指標組合及RFMCB指標組合分別作為機器學習方法的解釋變數以預測顧客行為。對顧客交易頻率而言，C和B兩項指標的加入能顯著提升其預測能力，對顧客交易利潤而言，新指標的加入，平均而言對於預測精準度有所提升，但在部分資料中會使誤差值增加以致整體誤差的最大值有所提升。 This research proposes modeling techniques to better predict customer behaviors in the retail industry. Extending the widely-adopted RFM model in marketing, we introduce two new metrics – clumpiness (C) and breadth (B). Using more than two million transaction records from over 100 retail pharmacy stores in Taiwan, we fit a set of regression models, in which we assess the explanatory power of different combinations of RFMCB for customer purchase frequency and profitability. Our analysis shows that the RFM model is significantly inferior to models with C and/or B, suggesting that C and B are indeed promising metrics. In the next stage, we will apply machine learning methods to incorporate C and B into predictive models and assess their out-of-sample prediction performance. On Average, RFMCB outperforms RFM in predicting Frequency & Profit. However, there are some cases where RFMCB leads to larger prediction error.
參考文獻	Akaike, H. (1974). A new look at the statistical model identification. IEEE transactions on automatic control, 19(6), 716-723. Auria, L., & Moro, R. A. (2008). Support vector machines (SVM) as a technique for solvency analysis. Bell, D. R., & Lattin, J. M. (1998). Shopping behavior and consumer preference for store price format: Why “large basket” shoppers prefer EDLP. Marketing Science, 17(1), 66-88. Berger, P., & Magliozzi, T. (1992). The effect of sample size and proportion of buyers in the sample on the performance of list segmentation equations generated by regression analysis. Journal of Direct Marketing, 6(1), 13-22. Bhattacharyya, S. (1999). Direct marketing performance modeling using genetic algorithms. INFORMS Journal on Computing, 11(3), 248-257. Breiman, L. (1996). Bagging predictors. Machine learning, 24(2), 123-140. Chai, T., & Draxler, R. R. (2014). Root mean square error (RMSE) or mean absolute error (MAE)?–Arguments against avoiding RMSE in the literature. Geoscientific Model Development, 7(3), 1247-1250. Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785-794). ACM. Colombo, R., & Jiang, W. (1999). A stochastic RFM model. Journal of Interactive Marketing, 13(3), 2-12. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297 Coussement, K., Van den Bossche, F. A., & De Bock, K. W. (2014). Data accuracy`s impact on segmentation performance: Benchmarking RFM analysis, logistic regression, and decision trees. Journal of Business Research, 67(1), 2751-2758. Cui, G., Wong, M. L., & Lui, H. K. (2006). Machine learning for direct marketing response models: Bayesian networks with evolutionary programming. Management Science, 52(4), 597-612. Drucker, H., Burges, C. J., Kaufman, L., Smola, A., & Vapnik, V. (1997). Support vector regression machines. Advances in neural information processing systems, 9, 155-161. Elith, J., Leathwick, J. R., & Hastie, T. (2008). A working guide to boosted regression trees. Journal of Animal Ecology, 77(4), 802-813. Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4), 367-378. Hastie, T. J., & Tibshirani, R. J. (1990). Generalized additive models (Vol. 43). CRC press. Haughton, D., & Oulabi, S. (1997). Direct marketing modeling with CART and CHAID. Journal of Interactive Marketing, 11(4), 42-52. Hosseini, Seyed Mohammad Seyed, Anahita Maleki, and Mohammad Reza Gholamian. "Cluster analysis using data mining approach to develop CRM methodology to assess the customer loyalty." Expert Systems with Applications 37.7 (2010): 5259-5264. Jean Halliday. (2002). Database Marketing: GM plays cards right. Retrieved January 14, 2017, from http://adage.com/article/interactive-media-marketing/database-marketing-gm-plays-cards/52084/ Jiang, W. (2002). On weak base hypotheses and their implications for boosting regression and classification. Annals of statistics, 51-73. Johnson, N. L. (1949). Systems of frequency curves generated by methods of translation. Biometrika, 36(1/2), 149-176. Kahan, R. (1998). Using database marketing techniques to enhance your one-to-one marketing initiatives. Journal of Consumer Marketing, 15(5), 491-493. Khajvand, M., Zolfaghar, K., Ashoori, S., & Alizadeh, S. (2011). Estimating customer lifetime value based on RFM analysis of customer purchase behavior: Case study. Procedia Computer Science, 3, 57-63. Kohavi, R. (1995, August). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Ijcai (Vol. 14, No. 2, pp. 1137-1145). Kumar, V., Srinivasan, K., Rao, V. R., Zhang, Y., Bradlow, E. T., & Small, D. S. (2015). Commentaries and Reply on “Predicting Customer Value Using Clumpiness: From RFM to RFMC” by Yao Zhang, Eric T. Bradlow, and Dylan S. Small. Marketing Science, 34(2), 209-217. Ling, C. X., & Li, C. (1998, August). Data Mining for Direct Marketing: Problems and Solutions. In KDD (Vol. 98, pp. 73-79). Marcus, C. (1998). A practical yet meaningful approach to customer segmentation. Journal of consumer marketing, 15(5), 494-504. McCarty, J. A., & Hastak, M. (2007). Segmentation approaches in data-mining: A comparison of RFM, CHAID, and logistic regression. Journal of business research, 60(6), 656-662. Natekin, A., & Knoll, A. (2013). Gradient boosting machines, a tutorial. Frontiers in neurorobotics, 7, 21. Netzer, O., Lattin, J. M., & Srinivasan, V. (2008). A hidden Markov model of customer relationship dynamics. Marketing Science, 27(2), 185-204. Petrison, L. A., Blattberg, R. C., & Wang, P. (1997). Database marketing: Past, present, and future. Journal of Interactive Marketing, 11(4), 109-125. Prasad, A. M., Iverson, L. R., & Liaw, A. (2006). Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems, 9(2), 181-199. Ridgeway, G. (2002). Looking for lumps: Boosting and bagging for density estimation. Computational Statistics & Data Analysis, 38(4), 379-392. Ridgeway, G. (2007). Generalized Boosted Models: A guide to the gbm package. Update, 1(1), 2007. Schweidel, D. A., Bradlow, E. T., & Fader, P. S. (2011). Portfolio dynamics for customers of a multiservice provider. Management Science, 57(3), 471-486. Sohrabi, B., & Khanlari, A. (2007). Customer lifetime value (CLV) measurement based on RFM model. Iranian Accounting & Auditing Review, 14(47), 7-20. Verhoef, P. C., Spring, P. N., Hoekstra, J. C., & Leeflang, P. S. (2003). The commercial use of segmentation and predictive modeling techniques for database marketing in the Netherlands. Decision Support Systems, 34(4), 471-481. Wagenmakers, E. J., & Farrell, S. (2004). AIC model selection using Akaike weights. Psychonomic bulletin & review, 11(1), 192-196. Yeh, I. C., Yang, K. J., & Ting, T. M. (2009). Knowledge discovery on RFM model using Bernoulli sequence. Expert Systems with Applications, 36(3), 5866-5871. Zhang, Y., Bradlow, E. T., & Small, D. S. (2013). New measures of clumpiness for incidence data. Journal of Applied Statistics, 40(11), 2533-2548. Zhang, Y., Bradlow, E. T., & Small, D. S. (2014). Predicting customer value using clumpiness: From RFM to RFMC. Marketing Science, 34(2), 195-208. Zwilling M. L. (2013), “Negative Binomial Regression,” The Mathematica Journal, dx.doi.org/10.3888/tmj.15-6 林軒田 (民104年12月8日)。Machine Learning Foundation (機器學習基石) 【部落格影音資料】取自https://www.youtube.com/playlist?list=PLXVfgk9fNX2I7tB6oIINGBmW50rrmFTqf
描述	碩士國立政治大學資訊管理學系 104356032
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0104356032
資料類型	thesis

dc.contributor.advisor	莊皓鈞	zh_TW
dc.contributor.author (Authors)	黃兆椿	zh_TW
dc.creator (作者)	黃兆椿	zh_TW
dc.date (日期)	2017	en_US
dc.date.accessioned	28-Aug-2017 13:38:41 (UTC+8)	-
dc.date.available	28-Aug-2017 13:38:41 (UTC+8)	-
dc.date.issued (上傳時間)	28-Aug-2017 13:38:41 (UTC+8)	-
dc.identifier (Other Identifiers)	G0104356032	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/112274	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	資訊管理學系	zh_TW
dc.description (描述)	104356032	zh_TW
dc.description.abstract (摘要)	本研究主要探討藥妝零售產業提升預測顧客行為的模型與方法，並以RFM模型為基礎進行延伸。RFM模型在行銷領域中是廣泛被使用的模型，具有良好預測和分群顧客的能力，本研究在此模型中加入了兩項新指標：集中度 (C) 和廣度 (B)，並針對顧客的「交易頻率」和「交易利潤」進行分析，藉此找出優於RFM的指標組合。首先將RFM、C、B共五項指標進行排列組合，並以迴歸分析驗證新增的兩項指標能顯著提升模型解釋能力，接著將RFM指標組合及RFMCB指標組合分別作為機器學習方法的解釋變數以預測顧客行為。對顧客交易頻率而言，C和B兩項指標的加入能顯著提升其預測能力，對顧客交易利潤而言，新指標的加入，平均而言對於預測精準度有所提升，但在部分資料中會使誤差值增加以致整體誤差的最大值有所提升。	zh_TW
dc.description.abstract (摘要)	This research proposes modeling techniques to better predict customer behaviors in the retail industry. Extending the widely-adopted RFM model in marketing, we introduce two new metrics – clumpiness (C) and breadth (B). Using more than two million transaction records from over 100 retail pharmacy stores in Taiwan, we fit a set of regression models, in which we assess the explanatory power of different combinations of RFMCB for customer purchase frequency and profitability. Our analysis shows that the RFM model is significantly inferior to models with C and/or B, suggesting that C and B are indeed promising metrics. In the next stage, we will apply machine learning methods to incorporate C and B into predictive models and assess their out-of-sample prediction performance. On Average, RFMCB outperforms RFM in predicting Frequency & Profit. However, there are some cases where RFMCB leads to larger prediction error.	en_US
dc.description.tableofcontents	第一章緒論 4 第二章文獻探討 8 第三章 Clumpiness指標 10 第四章資料與迴歸模型 13 第五章解釋力驗證架構與結果評估 19 第一節解釋力驗證架構 19 第二節結果評估 22 第六章預測力驗證架構與結果評估 24 第一節預測力驗證架構及機器學習方法 24 第二節結果評估 29 第七章結論 36 附錄一 39 參考文獻 46	zh_TW
dc.format.extent	933012 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0104356032	en_US
dc.subject (關鍵詞)	零售業	zh_TW
dc.subject (關鍵詞)	RFM	zh_TW
dc.subject (關鍵詞)	集中度	zh_TW
dc.subject (關鍵詞)	廣度	zh_TW
dc.subject (關鍵詞)	資料分析	zh_TW
dc.subject (關鍵詞)	Retailing	en_US
dc.subject (關鍵詞)	RFM	en_US
dc.subject (關鍵詞)	Clumpiness	en_US
dc.subject (關鍵詞)	Breadth	en_US
dc.subject (關鍵詞)	Data Analytics	en_US
dc.title (題名)	零售藥妝顧客購買頻率與利潤之分析	zh_TW
dc.title (題名)	Analysis of Customer Purchase Frequency and Profitability in Retail Pharmacy Stores	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	Akaike, H. (1974). A new look at the statistical model identification. IEEE transactions on automatic control, 19(6), 716-723. Auria, L., & Moro, R. A. (2008). Support vector machines (SVM) as a technique for solvency analysis. Bell, D. R., & Lattin, J. M. (1998). Shopping behavior and consumer preference for store price format: Why “large basket” shoppers prefer EDLP. Marketing Science, 17(1), 66-88. Berger, P., & Magliozzi, T. (1992). The effect of sample size and proportion of buyers in the sample on the performance of list segmentation equations generated by regression analysis. Journal of Direct Marketing, 6(1), 13-22. Bhattacharyya, S. (1999). Direct marketing performance modeling using genetic algorithms. INFORMS Journal on Computing, 11(3), 248-257. Breiman, L. (1996). Bagging predictors. Machine learning, 24(2), 123-140. Chai, T., & Draxler, R. R. (2014). Root mean square error (RMSE) or mean absolute error (MAE)?–Arguments against avoiding RMSE in the literature. Geoscientific Model Development, 7(3), 1247-1250. Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785-794). ACM. Colombo, R., & Jiang, W. (1999). A stochastic RFM model. Journal of Interactive Marketing, 13(3), 2-12. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297 Coussement, K., Van den Bossche, F. A., & De Bock, K. W. (2014). Data accuracy`s impact on segmentation performance: Benchmarking RFM analysis, logistic regression, and decision trees. Journal of Business Research, 67(1), 2751-2758. Cui, G., Wong, M. L., & Lui, H. K. (2006). Machine learning for direct marketing response models: Bayesian networks with evolutionary programming. Management Science, 52(4), 597-612. Drucker, H., Burges, C. J., Kaufman, L., Smola, A., & Vapnik, V. (1997). Support vector regression machines. Advances in neural information processing systems, 9, 155-161. Elith, J., Leathwick, J. R., & Hastie, T. (2008). A working guide to boosted regression trees. Journal of Animal Ecology, 77(4), 802-813. Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4), 367-378. Hastie, T. J., & Tibshirani, R. J. (1990). Generalized additive models (Vol. 43). CRC press. Haughton, D., & Oulabi, S. (1997). Direct marketing modeling with CART and CHAID. Journal of Interactive Marketing, 11(4), 42-52. Hosseini, Seyed Mohammad Seyed, Anahita Maleki, and Mohammad Reza Gholamian. "Cluster analysis using data mining approach to develop CRM methodology to assess the customer loyalty." Expert Systems with Applications 37.7 (2010): 5259-5264. Jean Halliday. (2002). Database Marketing: GM plays cards right. Retrieved January 14, 2017, from http://adage.com/article/interactive-media-marketing/database-marketing-gm-plays-cards/52084/ Jiang, W. (2002). On weak base hypotheses and their implications for boosting regression and classification. Annals of statistics, 51-73. Johnson, N. L. (1949). Systems of frequency curves generated by methods of translation. Biometrika, 36(1/2), 149-176. Kahan, R. (1998). Using database marketing techniques to enhance your one-to-one marketing initiatives. Journal of Consumer Marketing, 15(5), 491-493. Khajvand, M., Zolfaghar, K., Ashoori, S., & Alizadeh, S. (2011). Estimating customer lifetime value based on RFM analysis of customer purchase behavior: Case study. Procedia Computer Science, 3, 57-63. Kohavi, R. (1995, August). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Ijcai (Vol. 14, No. 2, pp. 1137-1145). Kumar, V., Srinivasan, K., Rao, V. R., Zhang, Y., Bradlow, E. T., & Small, D. S. (2015). Commentaries and Reply on “Predicting Customer Value Using Clumpiness: From RFM to RFMC” by Yao Zhang, Eric T. Bradlow, and Dylan S. Small. Marketing Science, 34(2), 209-217. Ling, C. X., & Li, C. (1998, August). Data Mining for Direct Marketing: Problems and Solutions. In KDD (Vol. 98, pp. 73-79). Marcus, C. (1998). A practical yet meaningful approach to customer segmentation. Journal of consumer marketing, 15(5), 494-504. McCarty, J. A., & Hastak, M. (2007). Segmentation approaches in data-mining: A comparison of RFM, CHAID, and logistic regression. Journal of business research, 60(6), 656-662. Natekin, A., & Knoll, A. (2013). Gradient boosting machines, a tutorial. Frontiers in neurorobotics, 7, 21. Netzer, O., Lattin, J. M., & Srinivasan, V. (2008). A hidden Markov model of customer relationship dynamics. Marketing Science, 27(2), 185-204. Petrison, L. A., Blattberg, R. C., & Wang, P. (1997). Database marketing: Past, present, and future. Journal of Interactive Marketing, 11(4), 109-125. Prasad, A. M., Iverson, L. R., & Liaw, A. (2006). Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems, 9(2), 181-199. Ridgeway, G. (2002). Looking for lumps: Boosting and bagging for density estimation. Computational Statistics & Data Analysis, 38(4), 379-392. Ridgeway, G. (2007). Generalized Boosted Models: A guide to the gbm package. Update, 1(1), 2007. Schweidel, D. A., Bradlow, E. T., & Fader, P. S. (2011). Portfolio dynamics for customers of a multiservice provider. Management Science, 57(3), 471-486. Sohrabi, B., & Khanlari, A. (2007). Customer lifetime value (CLV) measurement based on RFM model. Iranian Accounting & Auditing Review, 14(47), 7-20. Verhoef, P. C., Spring, P. N., Hoekstra, J. C., & Leeflang, P. S. (2003). The commercial use of segmentation and predictive modeling techniques for database marketing in the Netherlands. Decision Support Systems, 34(4), 471-481. Wagenmakers, E. J., & Farrell, S. (2004). AIC model selection using Akaike weights. Psychonomic bulletin & review, 11(1), 192-196. Yeh, I. C., Yang, K. J., & Ting, T. M. (2009). Knowledge discovery on RFM model using Bernoulli sequence. Expert Systems with Applications, 36(3), 5866-5871. Zhang, Y., Bradlow, E. T., & Small, D. S. (2013). New measures of clumpiness for incidence data. Journal of Applied Statistics, 40(11), 2533-2548. Zhang, Y., Bradlow, E. T., & Small, D. S. (2014). Predicting customer value using clumpiness: From RFM to RFMC. Marketing Science, 34(2), 195-208. Zwilling M. L. (2013), “Negative Binomial Regression,” The Mathematica Journal, dx.doi.org/10.3888/tmj.15-6 林軒田 (民104年12月8日)。Machine Learning Foundation (機器學習基石) 【部落格影音資料】取自https://www.youtube.com/playlist?list=PLXVfgk9fNX2I7tB6oIINGBmW50rrmFTqf	zh_TW

學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

Google Scholar^TM