應用於直播電商的探索性演員評論家推薦系統

Publications-Theses

Article View/Open

pdf(0)

Publication Export

Google Scholar^TM

題名	應用於直播電商的探索性演員評論家推薦系統 Exploratory Actor-Critic Recommender for Online Streaming Retailing
作者	簡巧恩
貢獻者	林怡伶<br>蕭舜文簡巧恩
關鍵詞	推薦系統探索利用平衡強化學習深度學習 streaming retailing actor-critic recommendation system deep reinforcement learning exploration
日期	2023
上傳時間	1-Sep-2023 14:55:39 (UTC+8)
摘要	互動式推薦系統的發展受到了關注。此外，所有串流中提供的產品也都不同，這導致這些產品能夠在連續的行動空間中進行建模。因此，在線串流環境的大型物品空間中，我們使用了演員-評論家架構來推薦產品，以在用戶觀看直播時學習其偏好。基於演員生成的物品嵌入，我們選擇了最接近的幾個物品作為推薦的基礎。同時，為了確保用戶接收的信息足夠多樣，我們在演員生成結果嵌入之前提出了兩種探索策略。我們計劃進行相應的實驗，以檢驗所提出的探索策略是否能夠優於基線模型或一般的推薦系統。 The development of interactive recommender systems has received atten tion. Besides, the products provided are different among all the streams plus, causing the products being able to be modeled in continuous action space. Therefore, the actor-critic architecture is used to recommend products in the large item space of online streaming environments to learn users’ preferences while watching live streams. Based on the item embedding generated by the actor, the closest few items are selected as the basis for the recommenda tion. At the same time, to ensure that the information received by users is sufficiently diverse, we proposed two exploration strategies before the actor generates the result embeddings. We planned to conduct corresponding ex periments to examine whether the proposed exploration strategies are able to outperform the baseline model or general recommenders.
參考文獻	Cai, J., Wohn, D. Y., Mittal, A., and Sureshbabu, D. (2018). Utilitarian and hedonic motivations for live streaming shopping. In Proceedings of the 2018 ACM international conference on interactive experiences for TV and online video, pages 81–88. Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Fu, W. (2021). Consumer choices in live streaming retailing, evidence from taobao ecom- merce. In The 2021 12th International Conference on E-business, Management and Economics, pages 12–20. Han, J., Yu, Y., Liu, F., Tang, R., and Zhang, Y. (2019). Optimizing ranking algorithm in recommender system via deep reinforcement learning. In 2019 International Con- ference on Artificial Intelligence and Advanced Manufacturing (AIAM), pages 22–26. IEEE. Hofmann, K., Whiteson, S., and Rijke, M. D. (2013). Fidelity, soundness, and efficiency of interleaved comparison methods. ACM Transactions on Information Systems (TOIS), 31(4):1–43. Howard, R. A. (1960). Dynamic programming and markov processes. Jambo Live Streaming Platform (2020). Jambo live streaming platform. https:// jambolive.tv/. Katehakis, M. N. and Veinott Jr, A. F. (1987). The multi-armed bandit problem: decom- position and computation. Mathematics of Operations Research, 12(2):262–268. Kulesza, A., Taskar, B., et al. (2012). Determinantal point processes for machine learning. Foundations and Trends® in Machine Learning, 5(2–3):123–286. Ladosz, P., Weng, L., Kim, M., and Oh, H. (2022). Exploration in deep reinforcement learning: A survey. Information Fusion. Li, L., Chu, W., Langford, J., and Schapire, R. E. (2010). A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web, pages 661–670. Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wier- stra, D. (2015). Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971. Liu, F., Tang, R., Li, X., Zhang, W., Ye, Y., Chen, H., Guo, H., and Zhang, Y. (2018). Deep reinforcement learning based recommendation with explicit user-item interactions modeling. arXiv preprint arXiv:1810.12027. Liu, Y., Shen, Z., Zhang, Y., and Cui, L. (2021). Diversity-promoting deep reinforcement learning for interactive recommendation. In 5th International Conference on Crowd Science and Engineering, pages 132–139. Meta Platforms (2023). Ax • adaptive experimentation platform. https://ax.dev/. Michael Gimelfarb (2020). Adaptive epsilon-greedy exploration policy using bayesian ensembles. https://github.com/mike-gimelfarb/bayesian-epsilon-greedy. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., et al. (2015). Human-level control through deep reinforcement learning. nature, 518(7540):529–533. OpenAI (2023). Deep deterministic policy gradient - spinning up documentation. https: //spinningup.openai.com/en/latest/algorithms/ddpg.html/. Plappert, M., Houthooft, R., Dhariwal, P., Sidor, S., Chen, R. Y., Chen, X., Asfour, T., Abbeel, P., and Andrychowicz, M. (2017). Parameter space noise for exploration. arXiv preprint arXiv:1706.01905. Rafailidis, D. and Nanopoulos, A. (2015). Modeling users preference dynamics and side information in recommender systems. IEEE Transactions on Systems, Man, and Cy- bernetics: Systems, 46(6):782–792. Sutton, R. S. and Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press. Tokic, M. (2010). Adaptive ε-greedy exploration in reinforcement learning based on value differences. In KI 2010: Advances in Artificial Intelligence: 33rd Annual German Conference on AI, Karlsruhe, Germany, September 21-24, 2010. Proceedings 33, pages 203–210. Springer. Wikipedia (2022). Ornstein–uhlenbeck process. https://en.wikipedia.org/wiki/ Ornstein%E2%80%93Uhlenbeck_process. Wongkitrungrueng, A., Dehouche, N., and Assarut, N. (2020). Live streaming commerce from the sellers’perspective: implications for online relationship marketing. Journal of Marketing Management, 36(5-6):488–518. Wu, Q., Liu, Y., Miao, C., Zhao, Y., Guan, L., and Tang, H. (2019). Recent advances in diversified recommendation. arXiv preprint arXiv:1905.06589. Yuyan, Z., Xiayao, S., and Yong, L. (2019). A novel movie recommendation system based on deep reinforcement learning with prioritized experience replay. In 2019 IEEE 19th International Conference on Communication Technology (ICCT), pages 1496–1500. IEEE. Zhao, X., Zhang, L., Ding, Z., Xia, L., Tang, J., and Yin, D. (2018). Recommendations with negative feedback via pairwise deep reinforcement learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1040–1048. Zheng, G., Zhang, F., Zheng, Z., Xiang, Y., Yuan, N. J., Xie, X., and Li, Z. (2018). Drn: A deep reinforcement learning framework for news recommendation. In Proceedings of the 2018 world wide web conference, pages 167–176.
描述	碩士國立政治大學資訊管理學系 110356045
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0110356045
資料類型	thesis

dc.contributor.advisor	林怡伶<br>蕭舜文	zh_TW
dc.contributor.author (Authors)	簡巧恩	zh_TW
dc.creator (作者)	簡巧恩	zh_TW
dc.date (日期)	2023	en_US
dc.date.accessioned	1-Sep-2023 14:55:39 (UTC+8)	-
dc.date.available	1-Sep-2023 14:55:39 (UTC+8)	-
dc.date.issued (上傳時間)	1-Sep-2023 14:55:39 (UTC+8)	-
dc.identifier (Other Identifiers)	G0110356045	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/146897	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	資訊管理學系	zh_TW
dc.description (描述)	110356045	zh_TW
dc.description.abstract (摘要)	互動式推薦系統的發展受到了關注。此外，所有串流中提供的產品也都不同，這導致這些產品能夠在連續的行動空間中進行建模。因此，在線串流環境的大型物品空間中，我們使用了演員-評論家架構來推薦產品，以在用戶觀看直播時學習其偏好。基於演員生成的物品嵌入，我們選擇了最接近的幾個物品作為推薦的基礎。同時，為了確保用戶接收的信息足夠多樣，我們在演員生成結果嵌入之前提出了兩種探索策略。我們計劃進行相應的實驗，以檢驗所提出的探索策略是否能夠優於基線模型或一般的推薦系統。	zh_TW
dc.description.abstract (摘要)	The development of interactive recommender systems has received atten tion. Besides, the products provided are different among all the streams plus, causing the products being able to be modeled in continuous action space. Therefore, the actor-critic architecture is used to recommend products in the large item space of online streaming environments to learn users’ preferences while watching live streams. Based on the item embedding generated by the actor, the closest few items are selected as the basis for the recommenda tion. At the same time, to ensure that the information received by users is sufficiently diverse, we proposed two exploration strategies before the actor generates the result embeddings. We planned to conduct corresponding ex periments to examine whether the proposed exploration strategies are able to outperform the baseline model or general recommenders.	en_US
dc.description.tableofcontents	Abstract i Contents ii ListofFigures iv 1 Introduction 1 2 LiteratureReview 4 2.1 OnlineStreamingRetailing 4 2.2 Reinforcement Learning-based Recommender System 5 2.3 ExplorationStrategies 6 3 ProposedFramework 9 3.1 DatasetInformationandFeatureConstruction 9 3.2 ProposedModelFramework 9 3.2.1 BaselineStructure 10 3.2.2 ActionNoiseStructure 13 3.2.3 ParameterNoiseStructure 13 3.2.4 Derived Exploration Strategies: Multipliers of Action Noise 16 3.2.5 Derived Exploration Strategies: Interleaving with Parameter Noise 19 4 Experiments 23 4.1 Comparisons of Difference Vector Vdiff 23 4.1.1 Comparison of General Performance for Difference Vectors 23 4.1.2 TheEffectofDifferenceVector 25 4.2 Feasibility of Parameter Noise-based Model 25 4.3 Comparison of Multipliers for Action Noise 27 4.3.1 Comparison of Constant Multipliers 27 4.3.2 Comparison of All Multipliers 28 4.4 Comparison for Interleaving with Parameter Noise Framework 29 4.4.1 Comparisons among Vanilla Interleaving Frameworks 31 4.4.2 Comparisons among Interleaving Frameworks with Difference Vector 32 4.5 ComparisonforAllProposedFrameworks 33 5 DiscussionandConclusion 36 5.1 FindingsandDiscussion 36 5.1.1 EffectofDifferenceVectorVdiff 36 5.1.2 Feasibility of Parameter Noise-based Framework 36 5.1.3 TheEffectofHybridMethods:Multipliers 37 5.1.4 TheEffectofHybridMethods: Interleaving 38 5.2 ConclusionandFutureWorks 39 Reference 42	zh_TW
dc.format.extent	1689591 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0110356045	en_US
dc.subject (關鍵詞)	推薦系統	zh_TW
dc.subject (關鍵詞)	探索利用平衡	zh_TW
dc.subject (關鍵詞)	強化學習	zh_TW
dc.subject (關鍵詞)	深度學習	zh_TW
dc.subject (關鍵詞)	streaming retailing	en_US
dc.subject (關鍵詞)	actor-critic	en_US
dc.subject (關鍵詞)	recommendation system	en_US
dc.subject (關鍵詞)	deep reinforcement learning	en_US
dc.subject (關鍵詞)	exploration	en_US
dc.title (題名)	應用於直播電商的探索性演員評論家推薦系統	zh_TW
dc.title (題名)	Exploratory Actor-Critic Recommender for Online Streaming Retailing	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	Cai, J., Wohn, D. Y., Mittal, A., and Sureshbabu, D. (2018). Utilitarian and hedonic motivations for live streaming shopping. In Proceedings of the 2018 ACM international conference on interactive experiences for TV and online video, pages 81–88. Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Fu, W. (2021). Consumer choices in live streaming retailing, evidence from taobao ecom- merce. In The 2021 12th International Conference on E-business, Management and Economics, pages 12–20. Han, J., Yu, Y., Liu, F., Tang, R., and Zhang, Y. (2019). Optimizing ranking algorithm in recommender system via deep reinforcement learning. In 2019 International Con- ference on Artificial Intelligence and Advanced Manufacturing (AIAM), pages 22–26. IEEE. Hofmann, K., Whiteson, S., and Rijke, M. D. (2013). Fidelity, soundness, and efficiency of interleaved comparison methods. ACM Transactions on Information Systems (TOIS), 31(4):1–43. Howard, R. A. (1960). Dynamic programming and markov processes. Jambo Live Streaming Platform (2020). Jambo live streaming platform. https:// jambolive.tv/. Katehakis, M. N. and Veinott Jr, A. F. (1987). The multi-armed bandit problem: decom- position and computation. Mathematics of Operations Research, 12(2):262–268. Kulesza, A., Taskar, B., et al. (2012). Determinantal point processes for machine learning. Foundations and Trends® in Machine Learning, 5(2–3):123–286. Ladosz, P., Weng, L., Kim, M., and Oh, H. (2022). Exploration in deep reinforcement learning: A survey. Information Fusion. Li, L., Chu, W., Langford, J., and Schapire, R. E. (2010). A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web, pages 661–670. Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wier- stra, D. (2015). Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971. Liu, F., Tang, R., Li, X., Zhang, W., Ye, Y., Chen, H., Guo, H., and Zhang, Y. (2018). Deep reinforcement learning based recommendation with explicit user-item interactions modeling. arXiv preprint arXiv:1810.12027. Liu, Y., Shen, Z., Zhang, Y., and Cui, L. (2021). Diversity-promoting deep reinforcement learning for interactive recommendation. In 5th International Conference on Crowd Science and Engineering, pages 132–139. Meta Platforms (2023). Ax • adaptive experimentation platform. https://ax.dev/. Michael Gimelfarb (2020). Adaptive epsilon-greedy exploration policy using bayesian ensembles. https://github.com/mike-gimelfarb/bayesian-epsilon-greedy. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., et al. (2015). Human-level control through deep reinforcement learning. nature, 518(7540):529–533. OpenAI (2023). Deep deterministic policy gradient - spinning up documentation. https: //spinningup.openai.com/en/latest/algorithms/ddpg.html/. Plappert, M., Houthooft, R., Dhariwal, P., Sidor, S., Chen, R. Y., Chen, X., Asfour, T., Abbeel, P., and Andrychowicz, M. (2017). Parameter space noise for exploration. arXiv preprint arXiv:1706.01905. Rafailidis, D. and Nanopoulos, A. (2015). Modeling users preference dynamics and side information in recommender systems. IEEE Transactions on Systems, Man, and Cy- bernetics: Systems, 46(6):782–792. Sutton, R. S. and Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press. Tokic, M. (2010). Adaptive ε-greedy exploration in reinforcement learning based on value differences. In KI 2010: Advances in Artificial Intelligence: 33rd Annual German Conference on AI, Karlsruhe, Germany, September 21-24, 2010. Proceedings 33, pages 203–210. Springer. Wikipedia (2022). Ornstein–uhlenbeck process. https://en.wikipedia.org/wiki/ Ornstein%E2%80%93Uhlenbeck_process. Wongkitrungrueng, A., Dehouche, N., and Assarut, N. (2020). Live streaming commerce from the sellers’perspective: implications for online relationship marketing. Journal of Marketing Management, 36(5-6):488–518. Wu, Q., Liu, Y., Miao, C., Zhao, Y., Guan, L., and Tang, H. (2019). Recent advances in diversified recommendation. arXiv preprint arXiv:1905.06589. Yuyan, Z., Xiayao, S., and Yong, L. (2019). A novel movie recommendation system based on deep reinforcement learning with prioritized experience replay. In 2019 IEEE 19th International Conference on Communication Technology (ICCT), pages 1496–1500. IEEE. Zhao, X., Zhang, L., Ding, Z., Xia, L., Tang, J., and Yin, D. (2018). Recommendations with negative feedback via pairwise deep reinforcement learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1040–1048. Zheng, G., Zhang, F., Zheng, Z., Xiang, Y., Yuan, N. J., Xie, X., and Li, Z. (2018). Drn: A deep reinforcement learning framework for news recommendation. In Proceedings of the 2018 world wide web conference, pages 167–176.	zh_TW

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM