Publications-Theses
Article View/Open
Publication Export
-
Google ScholarTM
NCCU Library
Citation Infomation
Related Publications in TAIR
題名 應用於直播電商的探索性演員評論家推薦系統
Exploratory Actor-Critic Recommender for Online Streaming Retailing作者 簡巧恩 貢獻者 林怡伶<br>蕭舜文
簡巧恩關鍵詞 推薦系統
探索利用平衡
強化學習
深度學習
streaming retailing
actor-critic
recommendation system
deep reinforcement learning
exploration日期 2023 上傳時間 1-Sep-2023 14:55:39 (UTC+8) 摘要 互動式推薦系統的發展受到了關注。此外,所有串流中提供的產品也都不同,這導致這些產品能夠在連續的行動空間中進行建模。因此,在線串流環境的大型物品空間中,我們使用了演員-評論家架構來推薦產品,以在用戶觀看直播時學習其偏好。基於演員生成的物品嵌入,我們選擇了最接近的幾個物品作為推薦的基礎。同時,為了確保用戶接收的信息足夠多樣,我們在演員生成結果嵌入之前提出了兩種探索策略。我們計劃進行相應的實驗,以檢驗所提出的探索策略是否能夠優於基線模型或一般的推薦系統。
The development of interactive recommender systems has received atten tion. Besides, the products provided are different among all the streams plus, causing the products being able to be modeled in continuous action space. Therefore, the actor-critic architecture is used to recommend products in the large item space of online streaming environments to learn users’ preferences while watching live streams. Based on the item embedding generated by the actor, the closest few items are selected as the basis for the recommenda tion. At the same time, to ensure that the information received by users is sufficiently diverse, we proposed two exploration strategies before the actor generates the result embeddings. We planned to conduct corresponding ex periments to examine whether the proposed exploration strategies are able to outperform the baseline model or general recommenders.參考文獻 Cai, J., Wohn, D. Y., Mittal, A., and Sureshbabu, D. (2018). Utilitarian and hedonic motivations for live streaming shopping. In Proceedings of the 2018 ACM international conference on interactive experiences for TV and online video, pages 81–88.Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.Fu, W. (2021). Consumer choices in live streaming retailing, evidence from taobao ecom- merce. In The 2021 12th International Conference on E-business, Management and Economics, pages 12–20.Han, J., Yu, Y., Liu, F., Tang, R., and Zhang, Y. (2019). Optimizing ranking algorithm in recommender system via deep reinforcement learning. In 2019 International Con- ference on Artificial Intelligence and Advanced Manufacturing (AIAM), pages 22–26. IEEE.Hofmann, K., Whiteson, S., and Rijke, M. D. (2013). Fidelity, soundness, and efficiency of interleaved comparison methods. ACM Transactions on Information Systems (TOIS), 31(4):1–43.Howard, R. A. (1960). Dynamic programming and markov processes.Jambo Live Streaming Platform (2020). Jambo live streaming platform. https:// jambolive.tv/.Katehakis, M. N. and Veinott Jr, A. F. (1987). The multi-armed bandit problem: decom- position and computation. Mathematics of Operations Research, 12(2):262–268.Kulesza, A., Taskar, B., et al. (2012). Determinantal point processes for machine learning. Foundations and Trends® in Machine Learning, 5(2–3):123–286.Ladosz, P., Weng, L., Kim, M., and Oh, H. (2022). Exploration in deep reinforcement learning: A survey. Information Fusion.Li, L., Chu, W., Langford, J., and Schapire, R. E. (2010). A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web, pages 661–670.Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wier- stra, D. (2015). Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.Liu, F., Tang, R., Li, X., Zhang, W., Ye, Y., Chen, H., Guo, H., and Zhang, Y. (2018). Deep reinforcement learning based recommendation with explicit user-item interactions modeling. arXiv preprint arXiv:1810.12027.Liu, Y., Shen, Z., Zhang, Y., and Cui, L. (2021). Diversity-promoting deep reinforcement learning for interactive recommendation. In 5th International Conference on Crowd Science and Engineering, pages 132–139.Meta Platforms (2023). Ax • adaptive experimentation platform. https://ax.dev/. Michael Gimelfarb (2020). Adaptive epsilon-greedy exploration policy using bayesianensembles. https://github.com/mike-gimelfarb/bayesian-epsilon-greedy.Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., et al. (2015). Human-level control through deep reinforcement learning. nature, 518(7540):529–533.OpenAI (2023). Deep deterministic policy gradient - spinning up documentation. https: //spinningup.openai.com/en/latest/algorithms/ddpg.html/.Plappert, M., Houthooft, R., Dhariwal, P., Sidor, S., Chen, R. Y., Chen, X., Asfour, T., Abbeel, P., and Andrychowicz, M. (2017). Parameter space noise for exploration. arXiv preprint arXiv:1706.01905.Rafailidis, D. and Nanopoulos, A. (2015). Modeling users preference dynamics and side information in recommender systems. IEEE Transactions on Systems, Man, and Cy- bernetics: Systems, 46(6):782–792.Sutton, R. S. and Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.Tokic, M. (2010). Adaptive ε-greedy exploration in reinforcement learning based on value differences. In KI 2010: Advances in Artificial Intelligence: 33rd Annual German Conference on AI, Karlsruhe, Germany, September 21-24, 2010. Proceedings 33, pages 203–210. Springer.Wikipedia (2022). Ornstein–uhlenbeck process. https://en.wikipedia.org/wiki/ Ornstein%E2%80%93Uhlenbeck_process.Wongkitrungrueng, A., Dehouche, N., and Assarut, N. (2020). Live streaming commerce from the sellers’perspective: implications for online relationship marketing. Journal of Marketing Management, 36(5-6):488–518.Wu, Q., Liu, Y., Miao, C., Zhao, Y., Guan, L., and Tang, H. (2019). Recent advances in diversified recommendation. arXiv preprint arXiv:1905.06589.Yuyan, Z., Xiayao, S., and Yong, L. (2019). A novel movie recommendation system based on deep reinforcement learning with prioritized experience replay. In 2019 IEEE 19th International Conference on Communication Technology (ICCT), pages 1496–1500. IEEE.Zhao, X., Zhang, L., Ding, Z., Xia, L., Tang, J., and Yin, D. (2018). Recommendations with negative feedback via pairwise deep reinforcement learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1040–1048.Zheng, G., Zhang, F., Zheng, Z., Xiang, Y., Yuan, N. J., Xie, X., and Li, Z. (2018). Drn: A deep reinforcement learning framework for news recommendation. In Proceedings of the 2018 world wide web conference, pages 167–176. 描述 碩士
國立政治大學
資訊管理學系
110356045資料來源 http://thesis.lib.nccu.edu.tw/record/#G0110356045 資料類型 thesis dc.contributor.advisor 林怡伶<br>蕭舜文 zh_TW dc.contributor.author (Authors) 簡巧恩 zh_TW dc.creator (作者) 簡巧恩 zh_TW dc.date (日期) 2023 en_US dc.date.accessioned 1-Sep-2023 14:55:39 (UTC+8) - dc.date.available 1-Sep-2023 14:55:39 (UTC+8) - dc.date.issued (上傳時間) 1-Sep-2023 14:55:39 (UTC+8) - dc.identifier (Other Identifiers) G0110356045 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/146897 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊管理學系 zh_TW dc.description (描述) 110356045 zh_TW dc.description.abstract (摘要) 互動式推薦系統的發展受到了關注。此外,所有串流中提供的產品也都不同,這導致這些產品能夠在連續的行動空間中進行建模。因此,在線串流環境的大型物品空間中,我們使用了演員-評論家架構來推薦產品,以在用戶觀看直播時學習其偏好。基於演員生成的物品嵌入,我們選擇了最接近的幾個物品作為推薦的基礎。同時,為了確保用戶接收的信息足夠多樣,我們在演員生成結果嵌入之前提出了兩種探索策略。我們計劃進行相應的實驗,以檢驗所提出的探索策略是否能夠優於基線模型或一般的推薦系統。 zh_TW dc.description.abstract (摘要) The development of interactive recommender systems has received atten tion. Besides, the products provided are different among all the streams plus, causing the products being able to be modeled in continuous action space. Therefore, the actor-critic architecture is used to recommend products in the large item space of online streaming environments to learn users’ preferences while watching live streams. Based on the item embedding generated by the actor, the closest few items are selected as the basis for the recommenda tion. At the same time, to ensure that the information received by users is sufficiently diverse, we proposed two exploration strategies before the actor generates the result embeddings. We planned to conduct corresponding ex periments to examine whether the proposed exploration strategies are able to outperform the baseline model or general recommenders. en_US dc.description.tableofcontents Abstract iContents iiListofFigures iv1 Introduction 12 LiteratureReview 42.1 OnlineStreamingRetailing 42.2 Reinforcement Learning-based Recommender System 52.3 ExplorationStrategies 63 ProposedFramework 93.1 DatasetInformationandFeatureConstruction 93.2 ProposedModelFramework 93.2.1 BaselineStructure 103.2.2 ActionNoiseStructure 133.2.3 ParameterNoiseStructure 133.2.4 Derived Exploration Strategies: Multipliers of Action Noise 163.2.5 Derived Exploration Strategies: Interleaving with Parameter Noise 194 Experiments 234.1 Comparisons of Difference Vector Vdiff 234.1.1 Comparison of General Performance for Difference Vectors234.1.2 TheEffectofDifferenceVector 254.2 Feasibility of Parameter Noise-based Model 254.3 Comparison of Multipliers for Action Noise 274.3.1 Comparison of Constant Multipliers 274.3.2 Comparison of All Multipliers 284.4 Comparison for Interleaving with Parameter Noise Framework 294.4.1 Comparisons among Vanilla Interleaving Frameworks 314.4.2 Comparisons among Interleaving Frameworks with Difference Vector 324.5 ComparisonforAllProposedFrameworks 335 DiscussionandConclusion 365.1 FindingsandDiscussion 365.1.1 EffectofDifferenceVectorVdiff 365.1.2 Feasibility of Parameter Noise-based Framework 365.1.3 TheEffectofHybridMethods:Multipliers 375.1.4 TheEffectofHybridMethods: Interleaving 385.2 ConclusionandFutureWorks 39Reference 42 zh_TW dc.format.extent 1689591 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0110356045 en_US dc.subject (關鍵詞) 推薦系統 zh_TW dc.subject (關鍵詞) 探索利用平衡 zh_TW dc.subject (關鍵詞) 強化學習 zh_TW dc.subject (關鍵詞) 深度學習 zh_TW dc.subject (關鍵詞) streaming retailing en_US dc.subject (關鍵詞) actor-critic en_US dc.subject (關鍵詞) recommendation system en_US dc.subject (關鍵詞) deep reinforcement learning en_US dc.subject (關鍵詞) exploration en_US dc.title (題名) 應用於直播電商的探索性演員評論家推薦系統 zh_TW dc.title (題名) Exploratory Actor-Critic Recommender for Online Streaming Retailing en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) Cai, J., Wohn, D. Y., Mittal, A., and Sureshbabu, D. (2018). Utilitarian and hedonic motivations for live streaming shopping. In Proceedings of the 2018 ACM international conference on interactive experiences for TV and online video, pages 81–88.Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.Fu, W. (2021). Consumer choices in live streaming retailing, evidence from taobao ecom- merce. In The 2021 12th International Conference on E-business, Management and Economics, pages 12–20.Han, J., Yu, Y., Liu, F., Tang, R., and Zhang, Y. (2019). Optimizing ranking algorithm in recommender system via deep reinforcement learning. In 2019 International Con- ference on Artificial Intelligence and Advanced Manufacturing (AIAM), pages 22–26. IEEE.Hofmann, K., Whiteson, S., and Rijke, M. D. (2013). Fidelity, soundness, and efficiency of interleaved comparison methods. ACM Transactions on Information Systems (TOIS), 31(4):1–43.Howard, R. A. (1960). Dynamic programming and markov processes.Jambo Live Streaming Platform (2020). Jambo live streaming platform. https:// jambolive.tv/.Katehakis, M. N. and Veinott Jr, A. F. (1987). The multi-armed bandit problem: decom- position and computation. Mathematics of Operations Research, 12(2):262–268.Kulesza, A., Taskar, B., et al. (2012). Determinantal point processes for machine learning. Foundations and Trends® in Machine Learning, 5(2–3):123–286.Ladosz, P., Weng, L., Kim, M., and Oh, H. (2022). Exploration in deep reinforcement learning: A survey. Information Fusion.Li, L., Chu, W., Langford, J., and Schapire, R. E. (2010). A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web, pages 661–670.Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wier- stra, D. (2015). Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.Liu, F., Tang, R., Li, X., Zhang, W., Ye, Y., Chen, H., Guo, H., and Zhang, Y. (2018). Deep reinforcement learning based recommendation with explicit user-item interactions modeling. arXiv preprint arXiv:1810.12027.Liu, Y., Shen, Z., Zhang, Y., and Cui, L. (2021). Diversity-promoting deep reinforcement learning for interactive recommendation. In 5th International Conference on Crowd Science and Engineering, pages 132–139.Meta Platforms (2023). Ax • adaptive experimentation platform. https://ax.dev/. Michael Gimelfarb (2020). Adaptive epsilon-greedy exploration policy using bayesianensembles. https://github.com/mike-gimelfarb/bayesian-epsilon-greedy.Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., et al. (2015). Human-level control through deep reinforcement learning. nature, 518(7540):529–533.OpenAI (2023). Deep deterministic policy gradient - spinning up documentation. https: //spinningup.openai.com/en/latest/algorithms/ddpg.html/.Plappert, M., Houthooft, R., Dhariwal, P., Sidor, S., Chen, R. Y., Chen, X., Asfour, T., Abbeel, P., and Andrychowicz, M. (2017). Parameter space noise for exploration. arXiv preprint arXiv:1706.01905.Rafailidis, D. and Nanopoulos, A. (2015). Modeling users preference dynamics and side information in recommender systems. IEEE Transactions on Systems, Man, and Cy- bernetics: Systems, 46(6):782–792.Sutton, R. S. and Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.Tokic, M. (2010). Adaptive ε-greedy exploration in reinforcement learning based on value differences. In KI 2010: Advances in Artificial Intelligence: 33rd Annual German Conference on AI, Karlsruhe, Germany, September 21-24, 2010. Proceedings 33, pages 203–210. Springer.Wikipedia (2022). Ornstein–uhlenbeck process. https://en.wikipedia.org/wiki/ Ornstein%E2%80%93Uhlenbeck_process.Wongkitrungrueng, A., Dehouche, N., and Assarut, N. (2020). Live streaming commerce from the sellers’perspective: implications for online relationship marketing. Journal of Marketing Management, 36(5-6):488–518.Wu, Q., Liu, Y., Miao, C., Zhao, Y., Guan, L., and Tang, H. (2019). Recent advances in diversified recommendation. arXiv preprint arXiv:1905.06589.Yuyan, Z., Xiayao, S., and Yong, L. (2019). A novel movie recommendation system based on deep reinforcement learning with prioritized experience replay. In 2019 IEEE 19th International Conference on Communication Technology (ICCT), pages 1496–1500. IEEE.Zhao, X., Zhang, L., Ding, Z., Xia, L., Tang, J., and Yin, D. (2018). Recommendations with negative feedback via pairwise deep reinforcement learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1040–1048.Zheng, G., Zhang, F., Zheng, Z., Xiang, Y., Yuan, N. J., Xie, X., and Li, Z. (2018). Drn: A deep reinforcement learning framework for news recommendation. In Proceedings of the 2018 world wide web conference, pages 167–176. zh_TW