Publications-Theses
Article View/Open
Publication Export
-
Google ScholarTM
NCCU Library
Citation Infomation
Related Publications in TAIR
題名 直播電商推薦框架:利用強化學習建模三方演化偏好
A Recommendation Framework for Live-Streaming Commerce: Leveraging Reinforcement Learning to Model Tripartite Evolving Preference作者 林語恩
Lin, Yu-En貢獻者 林怡伶<br>蕭舜文
Lin, Yi-Ling<br>Hsiao, Shun-Wen
林語恩
Lin, Yu-En關鍵詞 推薦系統
直播電商
強化學習
演員評論家算法
長短期偏好
三方推薦
Live-streaming commerce
Actor-critic
Recommendation system
Deep reinforcement learning
Long-term and short-term preference
Tripartite recommendation日期 2025 上傳時間 1-Sep-2025 15:03:29 (UTC+8) 摘要 近年來,直播電商的商品交易總額持續攀升,推動線上購物從傳統的基於用戶與商品互動的靜態模式轉向更具動態性和社交沉浸感的體驗,進而重塑電子商務的發展模式。在直播電商平台上,顧客、直播主和商品進行即時互動,直播主透過其影響力在塑造顧客購買行為方面發揮著至關重要的作用。本篇論文提出了 TriRec-RL 推薦框架,這是一個結合顧客、商品和主播之間互動的推薦框架,能夠更好地捕捉直播電商中的動態用戶偏好。實驗結果表明 TriRec-RL 能夠有效地建模長期和短期偏好,且其性能優於其他現代的推薦模型。
In recent years, the gross merchandise volume (GMV) of live-streaming e-commerce has steadily increased, shifting online shopping away from traditional static models based on user-item interactions toward a more dynamic and socially immersive experience, thereby reshaping the e-commerce landscape. On live-streaming e-commerce platforms, customers, streamers, and products engage in real-time interactions, where streamers play a crucial role in shaping customer purchasing behavior through their influence. This paper proposes the TriRec-RL recommendation framework, a recommendation framework that incorporates interactions among customers, products, and streamers to better capture dynamic user preferences in live-streaming e-commerce. Experimental results show that TriRec-RL outperforms state-of-the-art models by effectively modeling both long-term and short-term preferences.參考文獻 Bojanowski, P., Grave, E., Joulin, A., and Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the association for computational linguistics, 5:135–146. Chen, H., Dou, Y., and Xiao, Y. (2023). Understanding the role of live streamers in live-streaming e-commerce. Electronic commerce research and applications, 59:101266. Chi, H., Xu, H., Fu, H., Liu, M., Zhang, M., Yang, Y., Hao, Q., and Wu, W. (2022). Long short-term preference modeling for continuous-time sequential recommendation. arXiv preprint arXiv:2208.00593. Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186. Dridi, R., Tamine, L., and Slimani, Y. (2022). Exploiting context-awareness and multi-criteria decision making to improve items recommendation using a tripartite graph-based model. Information Processing & Management, 59(2):102861. Fujimoto, S., Meger, D., and Precup, D. (2019). Off-policy deep reinforcement learning without exploration. In International conference on machine learning, pages 2052– 2062. PMLR. Hidasi, B., Karatzoglou, A., Baltrunas, L., and Tikk, D. (2015). Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939. Jadon, A. and Patil, A. (2024). A comprehensive survey of evaluation techniques for recommendation systems. In International Conference on Computation of Artificial Intelligence & Machine Learning, pages 281–304. Springer. Jambo Live Streaming Platform (2023). Jambo live streaming platform. https:// jambolive.tv/. Kang, W.-C. and McAuley, J. (2018). Self-attentive sequential recommendation. In 2018 IEEE international conference on data mining (ICDM), pages 197–206. IEEE. Kim, K.-M., Kwak, D., Kwak, H., Park, Y.-J., Sim, S., Cho, J.-H., Kim, M., Kwon, J., Sung, N., and Ha, J.-W. (2019). Tripartite heterogeneous graph propagation for largescale social recommendation. arXiv preprint arXiv:1908.02569. Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE, 78(9):1464–1480. Konda, V. and Tsitsiklis, J. (1999). Actor-critic algorithms. Advances in neural information processing systems, 12. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., and Soricut, R. (2020). Albert: A lite bert for self-supervised learning of language representations. In International Conference on Learning Representations. Li, Y., Zhao, F., Chen, Z., Fu, Y., and Ma, L. (2023). Multi-behavior enhanced heterogeneous graph convolutional networks recommendation algorithm based on feature-interaction. Applied Artificial Intelligence, 37(1):2201144. Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971. Liu, M., Wang, J., Abdelfatah, K., and Korayem, M. (2019). Tripartite vector representations for better job recommendation. arXiv preprint arXiv:1907.12379. Liu, X., Wu, S., Zhang, Z., and Shen, C. (2022). Unify local and global information for top-n recommendation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1262–1272. Lu, B. and Chen, Z. (2021). Live streaming commerce and consumers’purchase intention: An uncertainty reduction perspective. Information & Management, 58(7):103509. Mnih, V. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602. Reimers, N. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084. Rendle, S. (2010). Factorization machines. In 2010 IEEE International conference on data mining, pages 995–1000. IEEE. Sarwar, B., Karypis, G., Konstan, J., and Riedl, J. (2001). Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th international conference on World Wide Web, pages 285–295. Shani, G., Heckerman, D., Brafman, R. I., and Boutilier, C. (2005). An mdp-based recommender system. Journal of machine Learning research, 6(9). Shi, C., Han, X., Song, L., Wang, X., Wang, S., Du, J., and Philip, S. Y. (2019). Deep collaborative filtering with multi-aspect information in heterogeneous networks. IEEE transactions on knowledge and data engineering, 33(4):1413–1425. Tang, J. and Wang, K. (2018). Personalized top-n sequential recommendation via convolutional sequence embedding. In Proceedings of the eleventh ACM international conference on web search and data mining, pages 565–573. Van Hasselt, H., Guez, A., and Silver, D. (2016). Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, volume 30. Van Meteren, R. and Van Someren, M. (2000). Using content-based filtering for recommendation. In Proceedings of the machine learning in the new information age: MLnet/ECML2000 workshop, volume 30, pages 47–56. Barcelona. Vaswani, A. (2017). Attention is all you need. Advances in Neural Information Processing Systems. Wang, L., Yang, N., Huang, X., Yang, L., Majumder, R., and Wei, F. (2024). Multilingual e5 text embeddings: A technical report. arXiv preprint arXiv:2402.05672. Wang, P., Fan, Y., Xia, L., Zhao, W. X., Niu, S., and Huang, J. (2020). Kerl: A knowledge-guided reinforcement learning model for sequential recommendation. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pages 209–218. Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M., and Freitas, N. (2016). Dueling network architectures for deep reinforcement learning. In International conference on machine learning, pages 1995–2003. PMLR. Wu, Y., Li, K., Zhao, G., and Qian, X. (2020). Personalized long-and short-term preference learning for next poi recommendation. IEEE Transactions on Knowledge and Data Engineering, 34(4):1944–1957. Xiang, L., Yuan, Q., Zhao, S., Chen, L., Zhang, X., Yang, Q., and Sun, J. (2010). Temporal recommendation on graphs via long-and short-term preference fusion. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 723–732. Xin, X., Karatzoglou, A., Arapakis, I., and Jose, J. M. (2020). Self-supervised reinforcement learning for recommender systems. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pages 931– 940. Yu, L., Gong, W., and Zhang, D. (2024). Live streaming channel recommendation based on viewers’ interaction behavior: A hypergraph approach. Decision Support Systems, 184:114272. Yu, S., Jiang, Z., Chen, D.-D., Feng, S., Li, D., Liu, Q., and Yi, J. (2021). Leveraging tripartite interaction information from live stream e-commerce for improving product recommendation. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 3886–3894. Yuan, F., Karatzoglou, A., Arapakis, I., Jose, J. M., and He, X. (2019). A simple convolutional generative network for next item recommendation. In Proceedings of the twelfth ACM international conference on web search and data mining, pages 582–590. Zhang, M., Liu, Y., Wang, Y., and Zhao, L. (2022). How to retain customers: Understanding the role of trust in live streaming commerce with a socio-technical perspective. Computers in Human Behavior, 127:107052. Zhang, R., Liu, Q.-d., Wei, J.-X., et al. (2014). Collaborative filtering for recommender systems. In 2014 second international conference on advanced cloud and big data, pages 301–308. IEEE. Zhao, X., Zhang, L., Ding, Z., Xia, L., Tang, J., and Yin, D. (2018). Recommendations with negative feedback via pairwise deep reinforcement learning. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pages 1040–1048. Zheng, G., Zhang, F., Zheng, Z., Xiang, Y., Yuan, N. J., Xie, X., and Li, Z. (2018). Drn: A deep reinforcement learning framework for news recommendation. In Proceedings of the 2018 world wide web conference, pages 167–176. Zhou, S., Dai, X., Chen, H., Zhang, W., Ren, K., Tang, R., He, X., and Yu, Y. (2020). Interactive recommender system via knowledge graph-enhanced reinforcement learning. In Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval, pages 179–188. Zou, L., Xia, L., Du, P., Zhang, Z., Bai, T., Liu, W., Nie, J.-Y., and Yin, D. (2020). Pseudo dyna-q: A reinforcement learning framework for interactive recommendation. In Proceedings of the 13th International Conference on Web Search and Data Mining, pages 816–824. 描述 碩士
國立政治大學
資訊管理學系
112356003資料來源 http://thesis.lib.nccu.edu.tw/record/#G0112356003 資料類型 thesis dc.contributor.advisor 林怡伶<br>蕭舜文 zh_TW dc.contributor.advisor Lin, Yi-Ling<br>Hsiao, Shun-Wen en_US dc.contributor.author (Authors) 林語恩 zh_TW dc.contributor.author (Authors) Lin, Yu-En en_US dc.creator (作者) 林語恩 zh_TW dc.creator (作者) Lin, Yu-En en_US dc.date (日期) 2025 en_US dc.date.accessioned 1-Sep-2025 15:03:29 (UTC+8) - dc.date.available 1-Sep-2025 15:03:29 (UTC+8) - dc.date.issued (上傳時間) 1-Sep-2025 15:03:29 (UTC+8) - dc.identifier (Other Identifiers) G0112356003 en_US dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/159088 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊管理學系 zh_TW dc.description (描述) 112356003 zh_TW dc.description.abstract (摘要) 近年來,直播電商的商品交易總額持續攀升,推動線上購物從傳統的基於用戶與商品互動的靜態模式轉向更具動態性和社交沉浸感的體驗,進而重塑電子商務的發展模式。在直播電商平台上,顧客、直播主和商品進行即時互動,直播主透過其影響力在塑造顧客購買行為方面發揮著至關重要的作用。本篇論文提出了 TriRec-RL 推薦框架,這是一個結合顧客、商品和主播之間互動的推薦框架,能夠更好地捕捉直播電商中的動態用戶偏好。實驗結果表明 TriRec-RL 能夠有效地建模長期和短期偏好,且其性能優於其他現代的推薦模型。 zh_TW dc.description.abstract (摘要) In recent years, the gross merchandise volume (GMV) of live-streaming e-commerce has steadily increased, shifting online shopping away from traditional static models based on user-item interactions toward a more dynamic and socially immersive experience, thereby reshaping the e-commerce landscape. On live-streaming e-commerce platforms, customers, streamers, and products engage in real-time interactions, where streamers play a crucial role in shaping customer purchasing behavior through their influence. This paper proposes the TriRec-RL recommendation framework, a recommendation framework that incorporates interactions among customers, products, and streamers to better capture dynamic user preferences in live-streaming e-commerce. Experimental results show that TriRec-RL outperforms state-of-the-art models by effectively modeling both long-term and short-term preferences. en_US dc.description.tableofcontents 摘要 i Abstract ii Contents iii List of Figures vi List of Tables vii 1 Introduction 1 2 Literature Review 4 2.1 Online Live-streaming E-commerce 4 2.2 Tripartite Recommendation 5 2.3 Modeling Long-Term and Short-Term Preference in Recommendation System 6 2.4 Reinforcement Learning-Based Recommendation System 7 3 Methodology 9 3.1 Problem Formulation 9 3.2 Overall Framework 10 3.3 Data Preprocessing Phase 12 3.4 Long-Term and Short-Term Preference Modeling Phase 13 3.5 Product Recommendation Phase 15 4 Experiments 19 4.1 Dataset and Settings 19 4.1.1 Dataset 19 4.1.2 Settings 20 4.1.3 Baselines 21 4.1.4 Metrics 21 4.2 Which language model can best transform product names into high-quality vectors? 22 4.2.1 Supervised Learning Task 22 4.2.2 Unsupervised Learning Task 24 4.3 How do we choose the best model and preference length to model the long-term and short-term preferences of both customers and streamers? 25 4.3.1 Customer Short-Term Preference Length 26 4.3.2 Customer Long-Term Preference Length 27 4.3.3 Streamer Short-Term Preference Length 28 4.3.4 Streamer Long-Term Preference Length 29 4.4 How do we choose the most optimal reinforcement learning model for the framework ? 30 4.5 Do our designs really contribute to the performance improvement of our framework ? 31 4.5.1 Ablation Study of Long-Length Transaction Data on Model Effectiveness 31 4.5.2 Ablation Study of Middle-Length Transaction Data on Model Effectiveness 33 4.5.3 Ablation Study of Short-Length Transaction Data on Model Effectiveness 34 4.6 Does our proposed recommendation framework outperform other state-of-art recommendation models in terms of recommendation performance? 36 4.7 Case Study 37 4.7.1 Case Study for Long-Term and Short-Term Preference Modeling Phase 37 4.7.2 Case Study for Product Recommendation Phase 42 5 Conclusion 48 Reference 50 zh_TW dc.format.extent 1812460 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0112356003 en_US dc.subject (關鍵詞) 推薦系統 zh_TW dc.subject (關鍵詞) 直播電商 zh_TW dc.subject (關鍵詞) 強化學習 zh_TW dc.subject (關鍵詞) 演員評論家算法 zh_TW dc.subject (關鍵詞) 長短期偏好 zh_TW dc.subject (關鍵詞) 三方推薦 zh_TW dc.subject (關鍵詞) Live-streaming commerce en_US dc.subject (關鍵詞) Actor-critic en_US dc.subject (關鍵詞) Recommendation system en_US dc.subject (關鍵詞) Deep reinforcement learning en_US dc.subject (關鍵詞) Long-term and short-term preference en_US dc.subject (關鍵詞) Tripartite recommendation en_US dc.title (題名) 直播電商推薦框架:利用強化學習建模三方演化偏好 zh_TW dc.title (題名) A Recommendation Framework for Live-Streaming Commerce: Leveraging Reinforcement Learning to Model Tripartite Evolving Preference en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) Bojanowski, P., Grave, E., Joulin, A., and Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the association for computational linguistics, 5:135–146. Chen, H., Dou, Y., and Xiao, Y. (2023). Understanding the role of live streamers in live-streaming e-commerce. Electronic commerce research and applications, 59:101266. Chi, H., Xu, H., Fu, H., Liu, M., Zhang, M., Yang, Y., Hao, Q., and Wu, W. (2022). Long short-term preference modeling for continuous-time sequential recommendation. arXiv preprint arXiv:2208.00593. Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186. Dridi, R., Tamine, L., and Slimani, Y. (2022). Exploiting context-awareness and multi-criteria decision making to improve items recommendation using a tripartite graph-based model. Information Processing & Management, 59(2):102861. Fujimoto, S., Meger, D., and Precup, D. (2019). Off-policy deep reinforcement learning without exploration. In International conference on machine learning, pages 2052– 2062. PMLR. Hidasi, B., Karatzoglou, A., Baltrunas, L., and Tikk, D. (2015). Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939. Jadon, A. and Patil, A. (2024). A comprehensive survey of evaluation techniques for recommendation systems. In International Conference on Computation of Artificial Intelligence & Machine Learning, pages 281–304. Springer. Jambo Live Streaming Platform (2023). Jambo live streaming platform. https:// jambolive.tv/. Kang, W.-C. and McAuley, J. (2018). Self-attentive sequential recommendation. In 2018 IEEE international conference on data mining (ICDM), pages 197–206. IEEE. Kim, K.-M., Kwak, D., Kwak, H., Park, Y.-J., Sim, S., Cho, J.-H., Kim, M., Kwon, J., Sung, N., and Ha, J.-W. (2019). Tripartite heterogeneous graph propagation for largescale social recommendation. arXiv preprint arXiv:1908.02569. Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE, 78(9):1464–1480. Konda, V. and Tsitsiklis, J. (1999). Actor-critic algorithms. Advances in neural information processing systems, 12. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., and Soricut, R. (2020). Albert: A lite bert for self-supervised learning of language representations. In International Conference on Learning Representations. Li, Y., Zhao, F., Chen, Z., Fu, Y., and Ma, L. (2023). Multi-behavior enhanced heterogeneous graph convolutional networks recommendation algorithm based on feature-interaction. Applied Artificial Intelligence, 37(1):2201144. Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971. Liu, M., Wang, J., Abdelfatah, K., and Korayem, M. (2019). Tripartite vector representations for better job recommendation. arXiv preprint arXiv:1907.12379. Liu, X., Wu, S., Zhang, Z., and Shen, C. (2022). Unify local and global information for top-n recommendation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1262–1272. Lu, B. and Chen, Z. (2021). Live streaming commerce and consumers’purchase intention: An uncertainty reduction perspective. Information & Management, 58(7):103509. Mnih, V. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602. Reimers, N. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084. Rendle, S. (2010). Factorization machines. In 2010 IEEE International conference on data mining, pages 995–1000. IEEE. Sarwar, B., Karypis, G., Konstan, J., and Riedl, J. (2001). Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th international conference on World Wide Web, pages 285–295. Shani, G., Heckerman, D., Brafman, R. I., and Boutilier, C. (2005). An mdp-based recommender system. Journal of machine Learning research, 6(9). Shi, C., Han, X., Song, L., Wang, X., Wang, S., Du, J., and Philip, S. Y. (2019). Deep collaborative filtering with multi-aspect information in heterogeneous networks. IEEE transactions on knowledge and data engineering, 33(4):1413–1425. Tang, J. and Wang, K. (2018). Personalized top-n sequential recommendation via convolutional sequence embedding. In Proceedings of the eleventh ACM international conference on web search and data mining, pages 565–573. Van Hasselt, H., Guez, A., and Silver, D. (2016). Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, volume 30. Van Meteren, R. and Van Someren, M. (2000). Using content-based filtering for recommendation. In Proceedings of the machine learning in the new information age: MLnet/ECML2000 workshop, volume 30, pages 47–56. Barcelona. Vaswani, A. (2017). Attention is all you need. Advances in Neural Information Processing Systems. Wang, L., Yang, N., Huang, X., Yang, L., Majumder, R., and Wei, F. (2024). Multilingual e5 text embeddings: A technical report. arXiv preprint arXiv:2402.05672. Wang, P., Fan, Y., Xia, L., Zhao, W. X., Niu, S., and Huang, J. (2020). Kerl: A knowledge-guided reinforcement learning model for sequential recommendation. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pages 209–218. Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M., and Freitas, N. (2016). Dueling network architectures for deep reinforcement learning. In International conference on machine learning, pages 1995–2003. PMLR. Wu, Y., Li, K., Zhao, G., and Qian, X. (2020). Personalized long-and short-term preference learning for next poi recommendation. IEEE Transactions on Knowledge and Data Engineering, 34(4):1944–1957. Xiang, L., Yuan, Q., Zhao, S., Chen, L., Zhang, X., Yang, Q., and Sun, J. (2010). Temporal recommendation on graphs via long-and short-term preference fusion. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 723–732. Xin, X., Karatzoglou, A., Arapakis, I., and Jose, J. M. (2020). Self-supervised reinforcement learning for recommender systems. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pages 931– 940. Yu, L., Gong, W., and Zhang, D. (2024). Live streaming channel recommendation based on viewers’ interaction behavior: A hypergraph approach. Decision Support Systems, 184:114272. Yu, S., Jiang, Z., Chen, D.-D., Feng, S., Li, D., Liu, Q., and Yi, J. (2021). Leveraging tripartite interaction information from live stream e-commerce for improving product recommendation. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 3886–3894. Yuan, F., Karatzoglou, A., Arapakis, I., Jose, J. M., and He, X. (2019). A simple convolutional generative network for next item recommendation. In Proceedings of the twelfth ACM international conference on web search and data mining, pages 582–590. Zhang, M., Liu, Y., Wang, Y., and Zhao, L. (2022). How to retain customers: Understanding the role of trust in live streaming commerce with a socio-technical perspective. Computers in Human Behavior, 127:107052. Zhang, R., Liu, Q.-d., Wei, J.-X., et al. (2014). Collaborative filtering for recommender systems. In 2014 second international conference on advanced cloud and big data, pages 301–308. IEEE. Zhao, X., Zhang, L., Ding, Z., Xia, L., Tang, J., and Yin, D. (2018). Recommendations with negative feedback via pairwise deep reinforcement learning. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pages 1040–1048. Zheng, G., Zhang, F., Zheng, Z., Xiang, Y., Yuan, N. J., Xie, X., and Li, Z. (2018). Drn: A deep reinforcement learning framework for news recommendation. In Proceedings of the 2018 world wide web conference, pages 167–176. Zhou, S., Dai, X., Chen, H., Zhang, W., Ren, K., Tang, R., He, X., and Yu, Y. (2020). Interactive recommender system via knowledge graph-enhanced reinforcement learning. In Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval, pages 179–188. Zou, L., Xia, L., Du, P., Zhang, Z., Bai, T., Liu, W., Nie, J.-Y., and Yin, D. (2020). Pseudo dyna-q: A reinforcement learning framework for interactive recommendation. In Proceedings of the 13th International Conference on Web Search and Data Mining, pages 816–824. zh_TW
