學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 應用於直播電商的探索性演員評論家推薦系統
Exploratory Actor-Critic Recommender for Online Streaming Retailing
作者 簡巧恩
貢獻者 林怡伶<br>蕭舜文
簡巧恩
關鍵詞 推薦系統
探索利用平衡
強化學習
深度學習
streaming retailing
actor-critic
recommendation system
deep reinforcement learning
exploration
日期 2023
上傳時間 1-Sep-2023 14:55:39 (UTC+8)
摘要 互動式推薦系統的發展受到了關注。此外,所有串流中提供的產品也都不同,這導致這些產品能夠在連續的行動空間中進行建模。因此,在線串流環境的大型物品空間中,我們使用了演員-評論家架構來推薦產品,以在用戶觀看直播時學習其偏好。基於演員生成的物品嵌入,我們選擇了最接近的幾個物品作為推薦的基礎。同時,為了確保用戶接收的信息足夠多樣,我們在演員生成結果嵌入之前提出了兩種探索策略。我們計劃進行相應的實驗,以檢驗所提出的探索策略是否能夠優於基線模型或一般的推薦系統。
The development of interactive recommender systems has received atten tion. Besides, the products provided are different among all the streams plus, causing the products being able to be modeled in continuous action space. Therefore, the actor-critic architecture is used to recommend products in the large item space of online streaming environments to learn users’ preferences while watching live streams. Based on the item embedding generated by the actor, the closest few items are selected as the basis for the recommenda tion. At the same time, to ensure that the information received by users is sufficiently diverse, we proposed two exploration strategies before the actor generates the result embeddings. We planned to conduct corresponding ex periments to examine whether the proposed exploration strategies are able to outperform the baseline model or general recommenders.
參考文獻 Cai, J., Wohn, D. Y., Mittal, A., and Sureshbabu, D. (2018). Utilitarian and hedonic motivations for live streaming shopping. In Proceedings of the 2018 ACM international conference on interactive experiences for TV and online video, pages 81–88.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Fu, W. (2021). Consumer choices in live streaming retailing, evidence from taobao ecom- merce. In The 2021 12th International Conference on E-business, Management and Economics, pages 12–20.
Han, J., Yu, Y., Liu, F., Tang, R., and Zhang, Y. (2019). Optimizing ranking algorithm in recommender system via deep reinforcement learning. In 2019 International Con- ference on Artificial Intelligence and Advanced Manufacturing (AIAM), pages 22–26. IEEE.
Hofmann, K., Whiteson, S., and Rijke, M. D. (2013). Fidelity, soundness, and efficiency of interleaved comparison methods. ACM Transactions on Information Systems (TOIS), 31(4):1–43.
Howard, R. A. (1960). Dynamic programming and markov processes.
Jambo Live Streaming Platform (2020). Jambo live streaming platform. https:// jambolive.tv/.
Katehakis, M. N. and Veinott Jr, A. F. (1987). The multi-armed bandit problem: decom- position and computation. Mathematics of Operations Research, 12(2):262–268.
Kulesza, A., Taskar, B., et al. (2012). Determinantal point processes for machine learning. Foundations and Trends® in Machine Learning, 5(2–3):123–286.
Ladosz, P., Weng, L., Kim, M., and Oh, H. (2022). Exploration in deep reinforcement learning: A survey. Information Fusion.
Li, L., Chu, W., Langford, J., and Schapire, R. E. (2010). A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web, pages 661–670.
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wier- stra, D. (2015). Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.
Liu, F., Tang, R., Li, X., Zhang, W., Ye, Y., Chen, H., Guo, H., and Zhang, Y. (2018). Deep reinforcement learning based recommendation with explicit user-item interactions modeling. arXiv preprint arXiv:1810.12027.
Liu, Y., Shen, Z., Zhang, Y., and Cui, L. (2021). Diversity-promoting deep reinforcement learning for interactive recommendation. In 5th International Conference on Crowd Science and Engineering, pages 132–139.
Meta Platforms (2023). Ax • adaptive experimentation platform. https://ax.dev/. Michael Gimelfarb (2020). Adaptive epsilon-greedy exploration policy using bayesian
ensembles. https://github.com/mike-gimelfarb/bayesian-epsilon-greedy.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., et al. (2015). Human-level control through deep reinforcement learning. nature, 518(7540):529–533.
OpenAI (2023). Deep deterministic policy gradient - spinning up documentation. https: //spinningup.openai.com/en/latest/algorithms/ddpg.html/.
Plappert, M., Houthooft, R., Dhariwal, P., Sidor, S., Chen, R. Y., Chen, X., Asfour, T., Abbeel, P., and Andrychowicz, M. (2017). Parameter space noise for exploration. arXiv preprint arXiv:1706.01905.
Rafailidis, D. and Nanopoulos, A. (2015). Modeling users preference dynamics and side information in recommender systems. IEEE Transactions on Systems, Man, and Cy- bernetics: Systems, 46(6):782–792.
Sutton, R. S. and Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.
Tokic, M. (2010). Adaptive ε-greedy exploration in reinforcement learning based on value differences. In KI 2010: Advances in Artificial Intelligence: 33rd Annual German Conference on AI, Karlsruhe, Germany, September 21-24, 2010. Proceedings 33, pages 203–210. Springer.
Wikipedia (2022). Ornstein–uhlenbeck process. https://en.wikipedia.org/wiki/ Ornstein%E2%80%93Uhlenbeck_process.
Wongkitrungrueng, A., Dehouche, N., and Assarut, N. (2020). Live streaming commerce from the sellers’perspective: implications for online relationship marketing. Journal of Marketing Management, 36(5-6):488–518.
Wu, Q., Liu, Y., Miao, C., Zhao, Y., Guan, L., and Tang, H. (2019). Recent advances in diversified recommendation. arXiv preprint arXiv:1905.06589.
Yuyan, Z., Xiayao, S., and Yong, L. (2019). A novel movie recommendation system based on deep reinforcement learning with prioritized experience replay. In 2019 IEEE 19th International Conference on Communication Technology (ICCT), pages 1496–1500. IEEE.
Zhao, X., Zhang, L., Ding, Z., Xia, L., Tang, J., and Yin, D. (2018). Recommendations with negative feedback via pairwise deep reinforcement learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1040–1048.
Zheng, G., Zhang, F., Zheng, Z., Xiang, Y., Yuan, N. J., Xie, X., and Li, Z. (2018). Drn: A deep reinforcement learning framework for news recommendation. In Proceedings of the 2018 world wide web conference, pages 167–176.
描述 碩士
國立政治大學
資訊管理學系
110356045
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0110356045
資料類型 thesis
dc.contributor.advisor 林怡伶<br>蕭舜文zh_TW
dc.contributor.author (Authors) 簡巧恩zh_TW
dc.creator (作者) 簡巧恩zh_TW
dc.date (日期) 2023en_US
dc.date.accessioned 1-Sep-2023 14:55:39 (UTC+8)-
dc.date.available 1-Sep-2023 14:55:39 (UTC+8)-
dc.date.issued (上傳時間) 1-Sep-2023 14:55:39 (UTC+8)-
dc.identifier (Other Identifiers) G0110356045en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/146897-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊管理學系zh_TW
dc.description (描述) 110356045zh_TW
dc.description.abstract (摘要) 互動式推薦系統的發展受到了關注。此外,所有串流中提供的產品也都不同,這導致這些產品能夠在連續的行動空間中進行建模。因此,在線串流環境的大型物品空間中,我們使用了演員-評論家架構來推薦產品,以在用戶觀看直播時學習其偏好。基於演員生成的物品嵌入,我們選擇了最接近的幾個物品作為推薦的基礎。同時,為了確保用戶接收的信息足夠多樣,我們在演員生成結果嵌入之前提出了兩種探索策略。我們計劃進行相應的實驗,以檢驗所提出的探索策略是否能夠優於基線模型或一般的推薦系統。zh_TW
dc.description.abstract (摘要) The development of interactive recommender systems has received atten tion. Besides, the products provided are different among all the streams plus, causing the products being able to be modeled in continuous action space. Therefore, the actor-critic architecture is used to recommend products in the large item space of online streaming environments to learn users’ preferences while watching live streams. Based on the item embedding generated by the actor, the closest few items are selected as the basis for the recommenda tion. At the same time, to ensure that the information received by users is sufficiently diverse, we proposed two exploration strategies before the actor generates the result embeddings. We planned to conduct corresponding ex periments to examine whether the proposed exploration strategies are able to outperform the baseline model or general recommenders.en_US
dc.description.tableofcontents Abstract i
Contents ii
ListofFigures iv
1 Introduction 1
2 LiteratureReview 4
2.1 OnlineStreamingRetailing 4
2.2 Reinforcement Learning-based Recommender System 5
2.3 ExplorationStrategies 6
3 ProposedFramework 9
3.1 DatasetInformationandFeatureConstruction 9
3.2 ProposedModelFramework 9
3.2.1 BaselineStructure 10
3.2.2 ActionNoiseStructure 13
3.2.3 ParameterNoiseStructure 13
3.2.4 Derived Exploration Strategies: Multipliers of Action Noise 16
3.2.5 Derived Exploration Strategies: Interleaving with Parameter Noise 19
4 Experiments 23
4.1 Comparisons of Difference Vector Vdiff 23
4.1.1 Comparison of General Performance for Difference Vectors
23
4.1.2 TheEffectofDifferenceVector 25
4.2 Feasibility of Parameter Noise-based Model 25
4.3 Comparison of Multipliers for Action Noise 27
4.3.1 Comparison of Constant Multipliers 27
4.3.2 Comparison of All Multipliers 28
4.4 Comparison for Interleaving with Parameter Noise Framework 29
4.4.1 Comparisons among Vanilla Interleaving Frameworks 31
4.4.2 Comparisons among Interleaving Frameworks with Difference Vector 32
4.5 ComparisonforAllProposedFrameworks 33
5 DiscussionandConclusion 36
5.1 FindingsandDiscussion 36
5.1.1 EffectofDifferenceVectorVdiff 36
5.1.2 Feasibility of Parameter Noise-based Framework 36
5.1.3 TheEffectofHybridMethods:Multipliers 37
5.1.4 TheEffectofHybridMethods: Interleaving 38
5.2 ConclusionandFutureWorks 39
Reference 42
zh_TW
dc.format.extent 1689591 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0110356045en_US
dc.subject (關鍵詞) 推薦系統zh_TW
dc.subject (關鍵詞) 探索利用平衡zh_TW
dc.subject (關鍵詞) 強化學習zh_TW
dc.subject (關鍵詞) 深度學習zh_TW
dc.subject (關鍵詞) streaming retailingen_US
dc.subject (關鍵詞) actor-criticen_US
dc.subject (關鍵詞) recommendation systemen_US
dc.subject (關鍵詞) deep reinforcement learningen_US
dc.subject (關鍵詞) explorationen_US
dc.title (題名) 應用於直播電商的探索性演員評論家推薦系統zh_TW
dc.title (題名) Exploratory Actor-Critic Recommender for Online Streaming Retailingen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) Cai, J., Wohn, D. Y., Mittal, A., and Sureshbabu, D. (2018). Utilitarian and hedonic motivations for live streaming shopping. In Proceedings of the 2018 ACM international conference on interactive experiences for TV and online video, pages 81–88.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Fu, W. (2021). Consumer choices in live streaming retailing, evidence from taobao ecom- merce. In The 2021 12th International Conference on E-business, Management and Economics, pages 12–20.
Han, J., Yu, Y., Liu, F., Tang, R., and Zhang, Y. (2019). Optimizing ranking algorithm in recommender system via deep reinforcement learning. In 2019 International Con- ference on Artificial Intelligence and Advanced Manufacturing (AIAM), pages 22–26. IEEE.
Hofmann, K., Whiteson, S., and Rijke, M. D. (2013). Fidelity, soundness, and efficiency of interleaved comparison methods. ACM Transactions on Information Systems (TOIS), 31(4):1–43.
Howard, R. A. (1960). Dynamic programming and markov processes.
Jambo Live Streaming Platform (2020). Jambo live streaming platform. https:// jambolive.tv/.
Katehakis, M. N. and Veinott Jr, A. F. (1987). The multi-armed bandit problem: decom- position and computation. Mathematics of Operations Research, 12(2):262–268.
Kulesza, A., Taskar, B., et al. (2012). Determinantal point processes for machine learning. Foundations and Trends® in Machine Learning, 5(2–3):123–286.
Ladosz, P., Weng, L., Kim, M., and Oh, H. (2022). Exploration in deep reinforcement learning: A survey. Information Fusion.
Li, L., Chu, W., Langford, J., and Schapire, R. E. (2010). A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web, pages 661–670.
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wier- stra, D. (2015). Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.
Liu, F., Tang, R., Li, X., Zhang, W., Ye, Y., Chen, H., Guo, H., and Zhang, Y. (2018). Deep reinforcement learning based recommendation with explicit user-item interactions modeling. arXiv preprint arXiv:1810.12027.
Liu, Y., Shen, Z., Zhang, Y., and Cui, L. (2021). Diversity-promoting deep reinforcement learning for interactive recommendation. In 5th International Conference on Crowd Science and Engineering, pages 132–139.
Meta Platforms (2023). Ax • adaptive experimentation platform. https://ax.dev/. Michael Gimelfarb (2020). Adaptive epsilon-greedy exploration policy using bayesian
ensembles. https://github.com/mike-gimelfarb/bayesian-epsilon-greedy.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., et al. (2015). Human-level control through deep reinforcement learning. nature, 518(7540):529–533.
OpenAI (2023). Deep deterministic policy gradient - spinning up documentation. https: //spinningup.openai.com/en/latest/algorithms/ddpg.html/.
Plappert, M., Houthooft, R., Dhariwal, P., Sidor, S., Chen, R. Y., Chen, X., Asfour, T., Abbeel, P., and Andrychowicz, M. (2017). Parameter space noise for exploration. arXiv preprint arXiv:1706.01905.
Rafailidis, D. and Nanopoulos, A. (2015). Modeling users preference dynamics and side information in recommender systems. IEEE Transactions on Systems, Man, and Cy- bernetics: Systems, 46(6):782–792.
Sutton, R. S. and Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.
Tokic, M. (2010). Adaptive ε-greedy exploration in reinforcement learning based on value differences. In KI 2010: Advances in Artificial Intelligence: 33rd Annual German Conference on AI, Karlsruhe, Germany, September 21-24, 2010. Proceedings 33, pages 203–210. Springer.
Wikipedia (2022). Ornstein–uhlenbeck process. https://en.wikipedia.org/wiki/ Ornstein%E2%80%93Uhlenbeck_process.
Wongkitrungrueng, A., Dehouche, N., and Assarut, N. (2020). Live streaming commerce from the sellers’perspective: implications for online relationship marketing. Journal of Marketing Management, 36(5-6):488–518.
Wu, Q., Liu, Y., Miao, C., Zhao, Y., Guan, L., and Tang, H. (2019). Recent advances in diversified recommendation. arXiv preprint arXiv:1905.06589.
Yuyan, Z., Xiayao, S., and Yong, L. (2019). A novel movie recommendation system based on deep reinforcement learning with prioritized experience replay. In 2019 IEEE 19th International Conference on Communication Technology (ICCT), pages 1496–1500. IEEE.
Zhao, X., Zhang, L., Ding, Z., Xia, L., Tang, J., and Yin, D. (2018). Recommendations with negative feedback via pairwise deep reinforcement learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1040–1048.
Zheng, G., Zhang, F., Zheng, Z., Xiang, Y., Yuan, N. J., Xie, X., and Li, Z. (2018). Drn: A deep reinforcement learning framework for news recommendation. In Proceedings of the 2018 world wide web conference, pages 167–176.
zh_TW