學術產出-學位論文

文章檢視/開啟

書目匯出

Google ScholarTM

政大圖書館

引文資訊

TAIR相關學術產出

題名 整體學習應用於線上零售的回購預測
Ensemble learning for customer retention prediction in online retailing
作者 佘欣玲
SHE, XIN-LING
貢獻者 莊皓鈞、周彥君
Chuang, Hao-Chun、Chou, Yen-Chun
佘欣玲
SHE, XIN-LING
關鍵詞 整體學習
零售業
回購預測
Ensemble learning
Online retailers
Cutomer retention
日期 2019
上傳時間 7-八月-2019 16:09:20 (UTC+8)
摘要 回購於顧客關係管理中扮演重要角色,其中為了改善過度行銷與溝通成本過高的狀況,消費者回購的議題成為線上零售業者提升經營績效的關鍵。本研究針對回購議題,首先了解如何從交易、退貨或取消等紀錄建構會員的消費行為和特徵?其次,研究如何採用XGBoost與LightGBM兩種整體學習的演算法,應用於預測消費者回購的議題,並比較何者的預測效果較優?第三,透過整體學習結合貝氏網路,探討哪些消費行為會影響回購?最後,如何從業者角度評估模型之結果,以提供完整的分析顧客回購之方法?
本研究相較於過去學者採用少量的特徵變數進行預測,本研究進行深入的特徵工程,總共建構167個變數,提供較完整的消費行為與特徵。另外,提供 XGBoost與LightGBM 兩種演算法的預測結果,且模型準確率最高可達90%,並將各模型進行深入探討與比較分析。更進一步地將整體學習結合貝氏網路,探討重要特徵與回購之關係,不僅協助業者了解哪些消費特徵會影響顧客的回購行為,透過模型的預測結果提供業者潛在的回購名單。對於模型預測的結果,提供業者成本效益之評估,協助業者以利潤為導向的決策依據,除了可以避免消費者對過度行銷反感,亦可降低業者與會員的溝通成本,讓業者可以了解顧客需求,並提升經營的績效。
Customer retention plays an important role in customer relationship management. In order to reduce the cost of communicating with customers and avoid over-marketing, capturing customer retention has become the key to online retail operations. This research attempts to address the following issues pertaining to customer retention. First, how can online retailers construct customer behaviors and characteristics from records of transactions, returns, and cancellations? Secondly, how to use the cutting-edge ensemble learning algorithms - XGBoost and LightGBM - to predict customer retention? Which algorithm performs better? Third, how can we combine knowledge extracted from ensemble learning the Bayesian network, to establish causal diagrams of how customer characteristics drive customer retention? Finally, how to evaluate the results of predictive models from a business perspective and perform a cost-benefit analysis of customer retention analytics?
Compared with the past research using much fewer feature to predict customer retention, this research presents a fairly comprehensive feature engineering that results in a total of 167 variables of customer characteristics. In addition, we show that both XGBoost and LightGBM algorithms achieve prediction accuracy up to 90%. Furthermore, this study integrates ensemble learning with the Bayesian network to explore the relationship between important features and customer retention. Doing so helps retailers understand which characteristics will affect customer retention, in addition to providing a potential repurchase list based on model predictions. Finally, this study conducts a cost-effectiveness analysis according to model predictions, with the aim of helping online retailers make profit-oriented decisions for digital marketing.
參考文獻 Abirami, M., & Pattabiraman, V. (2016). Data mining approach for intelligent customer behavior analysis for a retail store. Paper presented at the Proceedings of the 3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC–16’) (pp. 283-291). Springer, Cham.
Al-Tit, A. A. (2015). The effect of service and food quality on customer satisfaction and hence customer retention. Asian Social Science, 11(23), 129.
Alpaydin, E. (2016). Machine learning: the new AI: MIT press.
Amin, M., Rezaei, S., & Tavana, F. S. (2015). Gender differences and consumer’s repurchase intention: the impact of trust propensity, usefulness and ease of use for implication of innovative online retail. International Journal of Innovation and Learning, 17(2), 217-233.
Aren, S., Güzel, M., Kabadayı, E., & Alpkan, L. (2013). Factors affecting repurchase intention to shop at the same website. Procedia-Social and Behavioral Sciences, 99, 536-544.
Bijalwan, V., Kumar, V., Kumari, P., & Pascual, J. (2014). KNN based machine learning approach for text and document mining. International Journal of Database Theory and Application, 7(1), 61-70.
Breiman, L. (1996). Bagging predictors. Machine learning, 24(2), 123-140.
Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
Bzdok, D., Altman, N., & Krzywinski, M. (2018). Statistics versus machine learning. Nature methods, 15(4), 233.
Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. Paper presented at the Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. (pp. 785-794). ACM.
Colubri, A., Silver, T., Fradet, T., Retzepi, K., Fry, B., & Sabeti, P. (2016). Transforming clinical data into actionable prognosis models: machine-learning framework and field-deployable app to predict outcome of Ebola patients. PLoS neglected tropical diseases, 10(3), e0004549.
Dai, C., Zhang, H., Arens, E., & Lian, Z. (2017). Machine learning approaches to predict thermal demands using skin temperatures: Steady-state conditions. Building and Environment, 114, 1-10.
Díaz, G. R. (2017). The influence of satisfaction on customer retention in mobile phone market. Journal of Retailing and Consumer Services, 36, 75-85.
Fader, P. S., Hardie, B. G., & Lee, K. L. (2005). RFM and CLV: Using iso-value curves for customer base analysis. Journal of marketing research, 42(4), 415-430.
Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. Paper presented at the icml. (Vol. 96, pp. 148-156).
Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189-1232.
Gupta, S., & Kim, H. W. (2008). Linking structural equation modeling to Bayesian networks: Decision support for customer retention in virtual communities. European Journal of Operational Research, 190(3), 818-833.
Hennig-Thurau, T., & Hansen, U. (2013). Relationship marketing: gaining competitive advantage through customer satisfaction and customer retention. Copenhagen, Denmark: Spieger.
Ho, T. K. (1998). Nearest neighbors in random subspaces. In Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR) (pp. 640-648). Springer, Berlin, Heidelberg.
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye,Q.,Liu, T.-Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. Paper presented at the Advances in Neural Information Processing Systems.
Kumar, V. (2010). Customer relationship management.Wiley international encyclopedia of marketing.
Lo, A. S., Stalcup, L. D., & Lee, A. (2010). Customer relationship management for hotels in Hong Kong. International Journal of Contemporary Hospitality Management, 22(2), 139-159.
Martínez, A., Schmuck, C., Pereverzyev Jr, S., Pirker, C., & Haltmeier, M. (2018). A machine learning framework for customer purchase prediction in the non-contractual setting. European Journal of Operational Research.
Perveen, S., Shahbaz, M., Guergachi, A., & Keshavjee, K. (2016). Performance analysis of data mining classification techniques to predict diabetes. Procedia Computer Science, 82, 115-121.
Renjith, S. (2015). An Integrated Framework to Recommend Personalized Retention Actions to Control B2C E-Commerce Customer Churn. arXiv preprint arXiv:1511.06975.
Saleh, K., & Shukairy, A. (2010). Conversion optimization: The art and science of converting prospects to customers: " O`Reilly Media, Inc.".
Soltani, Z., & Navimipour, N. J. (2016). Customer relationship management mechanisms: A systematic review of the state of the art literature and recommendations for future research. Computers in Human Behavior, 61, 667-688.
Wen, C., Prybutok, V. R., & Xu, C. (2011). An integrated model for customer online repurchase intention. Journal of Computer Information Systems, 52(1), 14-23.
Wolpert, D. H. (1992). Stacked generalization. Neural networks, 5(2), 241-259.
Xiao, Q., Chang, H. H., Geng, G., & Liu, Y. (2018). An ensemble machine-learning model to predict historical PM2. 5 concentrations in China from satellite data. Environmental science & technology, 52(22), 13260-13269.
Zhang, Y., Pang, L., Shi, L., & Wang, B. (2014). Large scale purchase prediction with historical user actions on B2C online retail platform. arXiv preprint arXiv:1408.6515.
Zhu, Y., Xie, C., Wang, G.-J., & Yan, X.-G. (2017). Comparison of individual, ensemble and integrated ensemble machine learning methods to predict China’s SME credit risk in supply chain finance. Neural Computing and Applications, 28(1), 41-50.
描述 碩士
國立政治大學
資訊管理學系
107356005
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0107356005
資料類型 thesis
dc.contributor.advisor 莊皓鈞、周彥君zh_TW
dc.contributor.advisor Chuang, Hao-Chun、Chou, Yen-Chunen_US
dc.contributor.author (作者) 佘欣玲zh_TW
dc.contributor.author (作者) SHE, XIN-LINGen_US
dc.creator (作者) 佘欣玲zh_TW
dc.creator (作者) SHE, XIN-LINGen_US
dc.date (日期) 2019en_US
dc.date.accessioned 7-八月-2019 16:09:20 (UTC+8)-
dc.date.available 7-八月-2019 16:09:20 (UTC+8)-
dc.date.issued (上傳時間) 7-八月-2019 16:09:20 (UTC+8)-
dc.identifier (其他 識別碼) G0107356005en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/124723-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊管理學系zh_TW
dc.description (描述) 107356005zh_TW
dc.description.abstract (摘要) 回購於顧客關係管理中扮演重要角色,其中為了改善過度行銷與溝通成本過高的狀況,消費者回購的議題成為線上零售業者提升經營績效的關鍵。本研究針對回購議題,首先了解如何從交易、退貨或取消等紀錄建構會員的消費行為和特徵?其次,研究如何採用XGBoost與LightGBM兩種整體學習的演算法,應用於預測消費者回購的議題,並比較何者的預測效果較優?第三,透過整體學習結合貝氏網路,探討哪些消費行為會影響回購?最後,如何從業者角度評估模型之結果,以提供完整的分析顧客回購之方法?
本研究相較於過去學者採用少量的特徵變數進行預測,本研究進行深入的特徵工程,總共建構167個變數,提供較完整的消費行為與特徵。另外,提供 XGBoost與LightGBM 兩種演算法的預測結果,且模型準確率最高可達90%,並將各模型進行深入探討與比較分析。更進一步地將整體學習結合貝氏網路,探討重要特徵與回購之關係,不僅協助業者了解哪些消費特徵會影響顧客的回購行為,透過模型的預測結果提供業者潛在的回購名單。對於模型預測的結果,提供業者成本效益之評估,協助業者以利潤為導向的決策依據,除了可以避免消費者對過度行銷反感,亦可降低業者與會員的溝通成本,讓業者可以了解顧客需求,並提升經營的績效。
zh_TW
dc.description.abstract (摘要) Customer retention plays an important role in customer relationship management. In order to reduce the cost of communicating with customers and avoid over-marketing, capturing customer retention has become the key to online retail operations. This research attempts to address the following issues pertaining to customer retention. First, how can online retailers construct customer behaviors and characteristics from records of transactions, returns, and cancellations? Secondly, how to use the cutting-edge ensemble learning algorithms - XGBoost and LightGBM - to predict customer retention? Which algorithm performs better? Third, how can we combine knowledge extracted from ensemble learning the Bayesian network, to establish causal diagrams of how customer characteristics drive customer retention? Finally, how to evaluate the results of predictive models from a business perspective and perform a cost-benefit analysis of customer retention analytics?
Compared with the past research using much fewer feature to predict customer retention, this research presents a fairly comprehensive feature engineering that results in a total of 167 variables of customer characteristics. In addition, we show that both XGBoost and LightGBM algorithms achieve prediction accuracy up to 90%. Furthermore, this study integrates ensemble learning with the Bayesian network to explore the relationship between important features and customer retention. Doing so helps retailers understand which characteristics will affect customer retention, in addition to providing a potential repurchase list based on model predictions. Finally, this study conducts a cost-effectiveness analysis according to model predictions, with the aim of helping online retailers make profit-oriented decisions for digital marketing.
en_US
dc.description.tableofcontents 第一章 緒論 1
第一節 研究背景 1
第二節 研究目的 2
第二章 文獻探討 4
第一節 零售業回購 4
第二節 整體學習 5
第三節 XGBoost 與LightGBM的介紹與比較 6
第三章 資料處理與特徵工程 11
第一節資料前處理 11
第二節 特徵變數說明 13
第三節 基本資料統計 15
第四章 研究結果 19
第一節回購預測 19
第二節貝氏網路 26
第三節利潤表現分析 31
第五章 結論 34
參考文獻 36
zh_TW
dc.format.extent 1300313 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0107356005en_US
dc.subject (關鍵詞) 整體學習zh_TW
dc.subject (關鍵詞) 零售業zh_TW
dc.subject (關鍵詞) 回購預測zh_TW
dc.subject (關鍵詞) Ensemble learningen_US
dc.subject (關鍵詞) Online retailersen_US
dc.subject (關鍵詞) Cutomer retentionen_US
dc.title (題名) 整體學習應用於線上零售的回購預測zh_TW
dc.title (題名) Ensemble learning for customer retention prediction in online retailingen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) Abirami, M., & Pattabiraman, V. (2016). Data mining approach for intelligent customer behavior analysis for a retail store. Paper presented at the Proceedings of the 3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC–16’) (pp. 283-291). Springer, Cham.
Al-Tit, A. A. (2015). The effect of service and food quality on customer satisfaction and hence customer retention. Asian Social Science, 11(23), 129.
Alpaydin, E. (2016). Machine learning: the new AI: MIT press.
Amin, M., Rezaei, S., & Tavana, F. S. (2015). Gender differences and consumer’s repurchase intention: the impact of trust propensity, usefulness and ease of use for implication of innovative online retail. International Journal of Innovation and Learning, 17(2), 217-233.
Aren, S., Güzel, M., Kabadayı, E., & Alpkan, L. (2013). Factors affecting repurchase intention to shop at the same website. Procedia-Social and Behavioral Sciences, 99, 536-544.
Bijalwan, V., Kumar, V., Kumari, P., & Pascual, J. (2014). KNN based machine learning approach for text and document mining. International Journal of Database Theory and Application, 7(1), 61-70.
Breiman, L. (1996). Bagging predictors. Machine learning, 24(2), 123-140.
Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
Bzdok, D., Altman, N., & Krzywinski, M. (2018). Statistics versus machine learning. Nature methods, 15(4), 233.
Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. Paper presented at the Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. (pp. 785-794). ACM.
Colubri, A., Silver, T., Fradet, T., Retzepi, K., Fry, B., & Sabeti, P. (2016). Transforming clinical data into actionable prognosis models: machine-learning framework and field-deployable app to predict outcome of Ebola patients. PLoS neglected tropical diseases, 10(3), e0004549.
Dai, C., Zhang, H., Arens, E., & Lian, Z. (2017). Machine learning approaches to predict thermal demands using skin temperatures: Steady-state conditions. Building and Environment, 114, 1-10.
Díaz, G. R. (2017). The influence of satisfaction on customer retention in mobile phone market. Journal of Retailing and Consumer Services, 36, 75-85.
Fader, P. S., Hardie, B. G., & Lee, K. L. (2005). RFM and CLV: Using iso-value curves for customer base analysis. Journal of marketing research, 42(4), 415-430.
Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. Paper presented at the icml. (Vol. 96, pp. 148-156).
Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189-1232.
Gupta, S., & Kim, H. W. (2008). Linking structural equation modeling to Bayesian networks: Decision support for customer retention in virtual communities. European Journal of Operational Research, 190(3), 818-833.
Hennig-Thurau, T., & Hansen, U. (2013). Relationship marketing: gaining competitive advantage through customer satisfaction and customer retention. Copenhagen, Denmark: Spieger.
Ho, T. K. (1998). Nearest neighbors in random subspaces. In Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR) (pp. 640-648). Springer, Berlin, Heidelberg.
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye,Q.,Liu, T.-Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. Paper presented at the Advances in Neural Information Processing Systems.
Kumar, V. (2010). Customer relationship management.Wiley international encyclopedia of marketing.
Lo, A. S., Stalcup, L. D., & Lee, A. (2010). Customer relationship management for hotels in Hong Kong. International Journal of Contemporary Hospitality Management, 22(2), 139-159.
Martínez, A., Schmuck, C., Pereverzyev Jr, S., Pirker, C., & Haltmeier, M. (2018). A machine learning framework for customer purchase prediction in the non-contractual setting. European Journal of Operational Research.
Perveen, S., Shahbaz, M., Guergachi, A., & Keshavjee, K. (2016). Performance analysis of data mining classification techniques to predict diabetes. Procedia Computer Science, 82, 115-121.
Renjith, S. (2015). An Integrated Framework to Recommend Personalized Retention Actions to Control B2C E-Commerce Customer Churn. arXiv preprint arXiv:1511.06975.
Saleh, K., & Shukairy, A. (2010). Conversion optimization: The art and science of converting prospects to customers: " O`Reilly Media, Inc.".
Soltani, Z., & Navimipour, N. J. (2016). Customer relationship management mechanisms: A systematic review of the state of the art literature and recommendations for future research. Computers in Human Behavior, 61, 667-688.
Wen, C., Prybutok, V. R., & Xu, C. (2011). An integrated model for customer online repurchase intention. Journal of Computer Information Systems, 52(1), 14-23.
Wolpert, D. H. (1992). Stacked generalization. Neural networks, 5(2), 241-259.
Xiao, Q., Chang, H. H., Geng, G., & Liu, Y. (2018). An ensemble machine-learning model to predict historical PM2. 5 concentrations in China from satellite data. Environmental science & technology, 52(22), 13260-13269.
Zhang, Y., Pang, L., Shi, L., & Wang, B. (2014). Large scale purchase prediction with historical user actions on B2C online retail platform. arXiv preprint arXiv:1408.6515.
Zhu, Y., Xie, C., Wang, G.-J., & Yan, X.-G. (2017). Comparison of individual, ensemble and integrated ensemble machine learning methods to predict China’s SME credit risk in supply chain finance. Neural Computing and Applications, 28(1), 41-50.
zh_TW
dc.identifier.doi (DOI) 10.6814/NCCU201900463en_US