學術產出-學位論文
文章檢視/開啟
書目匯出
-
題名 深度強化學習的視覺化分析—以橫向卷軸遊戲為例
Visualization Analysis for Deep Reinforcement Learning – A Case Study of Side-scrolling Video Game作者 鄭緒辰
Cheng, Hsu-Chen貢獻者 紀明德
Chi, Ming-Te
鄭緒辰
Cheng, Hsu-Chen關鍵詞 視覺化分析
深度強化學習
橫向卷軸遊戲
Visual Analytics
Deep Reinforcement Learning
Side-scrolling Game日期 2020 上傳時間 2-九月-2020 12:15:02 (UTC+8) 摘要 深度強化學習是人工智慧(AI)領域近年來常用於訓練不同電腦遊戲中的代理人(Agent)對環境的應對。常見用於深度強化學習研究的遊戲有Atari 2600系列等簡單且規則明確的遊戲環境,方便研究者去觀察及分析AI行為。本研究主要針對橫向卷軸類型的遊戲環境,而橫向卷軸遊戲的特色是玩家只能隨著角色移動看見有限的場景,這就考驗著AI的即時反應和經驗。我們對較為簡單的flappy bird和較為複雜的 Super Mario Bros做深度強化學習及視覺化分析,預期解決以下問題,第一,找出AI的傾向和限制。第二,分析AI的動作選擇以及遊玩策略。第三,藉由比較不同訓練時間的模型了解AI的學習歷程。第四,驗證AI是否有學習到方向及距離的重要性。為了解決上述問題,首先,我們利用A3C的深度強化學習架構,對環境和獎勵機制做調整,以增強AI進行遊戲的靈活度和適應性。接著,蒐集遊戲歷程和訓練資料。最後,制定視覺化分析,根據分析,可以提高研究人員對模型表現的解讀,降低改良深度學習模型的門檻。
In recent years, deep reinforcement learning becomes an essential topic in artificial intelligence (AI), which trains agents to deal with different computer game environments. Most deep reinforcement learning research focuses on Atari 2600 series and other simple and well-defined game environments convenient for researchers to observe and analyze AI behavior. This research mainly focuses on the side-scrolling game environment, in which players can only see limited scenes as the characters move, to test the AI`s immediate response and experience. We use the simple flappy bird and the more complicated Super Mario Bros as a testbed. We expect to solve the following problems. First, find out the tendencies and limitations of AI. Second, analyze AI`s action selection and play strategy. Third, understand the learning process of AI by comparing models with different training times. Fourth, verify whether AI has learned the importance of direction and distance. To solve the above problems, we first apply A3C deep reinforcement learning architecture to set the environment and reward mechanism to enhance the flexibility and adaptability of AI games. Next, collect game history and training data. Finally, design the visual analysis workflow and tools to improve the researchers` interpretation of the model`s performance and reduce the threshold for improving the deep learning model.參考文獻 [1] Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. 2017 IEEE International Conference on Computer Vision (ICCV). doi:10.1109/iccv.2017.74.[2] Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning Deep Features for Discriminative Localization. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). doi:10.1109/cvpr.2016.319[3] Sarkar, Howlader, Balasubramanian, & N, V. (2018, November 09). Grad-CAM++ : Improved Visual Explanations for Deep Convolutional Networks. Retrieved from https://arxiv.org/abs/1710.11063v3.[4] Olah, C., Mordvintsev, A., & Schubert, L. (2019, April 01). Feature Visualization. Retrieved from https://distill.pub/2017/feature-visualization/.[5] Volodymyr, Koray, David, Alex, Ioannis, Daan, . . . Martin. (2013, December 19). Playing Atari with Deep Reinforcement Learning. Retrieved from https://arxiv.org/abs/1312.5602.[6] Zeiler, M. D., & Fergus, R. (2014). Visualizing and Understanding Convolutional Networks. Computer Vision – ECCV 2014 Lecture Notes in Computer Science,818-833. doi:10.1007/978-3-319-10590-1_53[7] Tobias, J., Dosovitskiy, Alexey, Brox, Thomas, Riedmiller, & Martin. (2015, April 13). Striving for Simplicity: The All Convolutional Net. Retrieved from https://arxiv.org/abs/1412.6806[8] Schaul, Tom, Quan, John, Ioannis, & David. (2016, February 25). Prioritized Experience Replay. Retrieved from https://arxiv.org/abs/1511.05952[9] Hausknecht, Matthew, & Peter. (2017, January 11). Deep Recurrent Q-Learning for Partially Observable MDPs. Retrieved from https://arxiv.org/abs/1507.06527[10] Hardlyrichie, pytorch-flappy-bird, (2019), GitHub repository,https://github.com/hardlyrichie/pytorch-flappy-bird[11] Uvipen, Super-mario-bros-A3C-pytorch, (2019), GitHub repository,https://github.com/uvipen/Super-mario-bros-A3C-pytorch[12] Zahavy, Tom, Zrihem, Ben, N., & Shie. (2017, April 24). Graying the black box: Understanding DQNs. Retrieved from https://arxiv.org/abs/1602.02658[13] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529, 2015[14] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu. “Asynchronous methods for deep reinforcement learning”. In: arXiv preprint arXiv:1602.01783 (2016).[15] Kautenja, gym-super-mario-bros, (2019), GitHub repository, https://github.com/Kautenja/gym-super-mario-bros[16] https://www.romhacking.net/utilities/178/[17] Lillicrap, T. P. (2015, September 9). Continuous control with deep reinforcement learning. ArXiv.Org. https://arxiv.org/abs/1509.02971[18] Heess, N. (2017, July 7). Emergence of Locomotion Behaviours in Rich Environments. ArXiv.Org. https://arxiv.org/abs/1707.02286 描述 碩士
國立政治大學
資訊科學系
106753016資料來源 http://thesis.lib.nccu.edu.tw/record/#G0106753016 資料類型 thesis dc.contributor.advisor 紀明德 zh_TW dc.contributor.advisor Chi, Ming-Te en_US dc.contributor.author (作者) 鄭緒辰 zh_TW dc.contributor.author (作者) Cheng, Hsu-Chen en_US dc.creator (作者) 鄭緒辰 zh_TW dc.creator (作者) Cheng, Hsu-Chen en_US dc.date (日期) 2020 en_US dc.date.accessioned 2-九月-2020 12:15:02 (UTC+8) - dc.date.available 2-九月-2020 12:15:02 (UTC+8) - dc.date.issued (上傳時間) 2-九月-2020 12:15:02 (UTC+8) - dc.identifier (其他 識別碼) G0106753016 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/131629 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊科學系 zh_TW dc.description (描述) 106753016 zh_TW dc.description.abstract (摘要) 深度強化學習是人工智慧(AI)領域近年來常用於訓練不同電腦遊戲中的代理人(Agent)對環境的應對。常見用於深度強化學習研究的遊戲有Atari 2600系列等簡單且規則明確的遊戲環境,方便研究者去觀察及分析AI行為。本研究主要針對橫向卷軸類型的遊戲環境,而橫向卷軸遊戲的特色是玩家只能隨著角色移動看見有限的場景,這就考驗著AI的即時反應和經驗。我們對較為簡單的flappy bird和較為複雜的 Super Mario Bros做深度強化學習及視覺化分析,預期解決以下問題,第一,找出AI的傾向和限制。第二,分析AI的動作選擇以及遊玩策略。第三,藉由比較不同訓練時間的模型了解AI的學習歷程。第四,驗證AI是否有學習到方向及距離的重要性。為了解決上述問題,首先,我們利用A3C的深度強化學習架構,對環境和獎勵機制做調整,以增強AI進行遊戲的靈活度和適應性。接著,蒐集遊戲歷程和訓練資料。最後,制定視覺化分析,根據分析,可以提高研究人員對模型表現的解讀,降低改良深度學習模型的門檻。 zh_TW dc.description.abstract (摘要) In recent years, deep reinforcement learning becomes an essential topic in artificial intelligence (AI), which trains agents to deal with different computer game environments. Most deep reinforcement learning research focuses on Atari 2600 series and other simple and well-defined game environments convenient for researchers to observe and analyze AI behavior. This research mainly focuses on the side-scrolling game environment, in which players can only see limited scenes as the characters move, to test the AI`s immediate response and experience. We use the simple flappy bird and the more complicated Super Mario Bros as a testbed. We expect to solve the following problems. First, find out the tendencies and limitations of AI. Second, analyze AI`s action selection and play strategy. Third, understand the learning process of AI by comparing models with different training times. Fourth, verify whether AI has learned the importance of direction and distance. To solve the above problems, we first apply A3C deep reinforcement learning architecture to set the environment and reward mechanism to enhance the flexibility and adaptability of AI games. Next, collect game history and training data. Finally, design the visual analysis workflow and tools to improve the researchers` interpretation of the model`s performance and reduce the threshold for improving the deep learning model. en_US dc.description.tableofcontents 摘要 iiAbstract iii目錄 v圖目錄 vii第一章 緒論 11.1 研究動機與目的 11.2 問題描述 21.3 論文貢獻 21.4 論文章節架構 3第二章 相關研究 42.1 深度強化學習 42.2 深度學習視覺化 62.3 資料降維和深度模型的分析 8第三章 研究方法與步驟 103.1 環境設置 103.2 模型訓練和資料蒐集 123.2.1 A3C模型架構 123.2.2 Flappy Bird 133.2.3 Super Mario Bros 133.3 單一模型結果視覺化 153.3.1 模型的遊戲流程視覺化 153.3.2 模型基本數據 163.3.3 模型的經驗和策略 183.3.4 RL Grad-CAM++ 203.4 不同訓練時間模型比較視覺化 233.5 Region of Interest(ROI)視覺化 24第四章 實驗結果與討論 264.1 實作與實驗環境 264.2 Flappy Bird視覺化分析 264.2.1遊戲過程和基本數據分析 274.2.2從特徵散佈圖觀察模型經驗和策略 284.2.3比較不同訓練時間模型 304.3 Super Mario Bros 視覺化分析 314.3.1 遊戲過程和基本數據分析 324.3.2 從特徵散佈圖觀察模型經驗和策略 334.3.3 比較不同訓練時間模型 364.3.4 Region of Interest(ROI)視覺化分析 384.4 比較不同獎勵機制的模型 404.5 限制 43第五章 結論與未來工作 445.1 結論 445.2 未來工作 45參考文獻 46 zh_TW dc.format.extent 4136096 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0106753016 en_US dc.subject (關鍵詞) 視覺化分析 zh_TW dc.subject (關鍵詞) 深度強化學習 zh_TW dc.subject (關鍵詞) 橫向卷軸遊戲 zh_TW dc.subject (關鍵詞) Visual Analytics en_US dc.subject (關鍵詞) Deep Reinforcement Learning en_US dc.subject (關鍵詞) Side-scrolling Game en_US dc.title (題名) 深度強化學習的視覺化分析—以橫向卷軸遊戲為例 zh_TW dc.title (題名) Visualization Analysis for Deep Reinforcement Learning – A Case Study of Side-scrolling Video Game en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) [1] Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. 2017 IEEE International Conference on Computer Vision (ICCV). doi:10.1109/iccv.2017.74.[2] Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning Deep Features for Discriminative Localization. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). doi:10.1109/cvpr.2016.319[3] Sarkar, Howlader, Balasubramanian, & N, V. (2018, November 09). Grad-CAM++ : Improved Visual Explanations for Deep Convolutional Networks. Retrieved from https://arxiv.org/abs/1710.11063v3.[4] Olah, C., Mordvintsev, A., & Schubert, L. (2019, April 01). Feature Visualization. Retrieved from https://distill.pub/2017/feature-visualization/.[5] Volodymyr, Koray, David, Alex, Ioannis, Daan, . . . Martin. (2013, December 19). Playing Atari with Deep Reinforcement Learning. Retrieved from https://arxiv.org/abs/1312.5602.[6] Zeiler, M. D., & Fergus, R. (2014). Visualizing and Understanding Convolutional Networks. Computer Vision – ECCV 2014 Lecture Notes in Computer Science,818-833. doi:10.1007/978-3-319-10590-1_53[7] Tobias, J., Dosovitskiy, Alexey, Brox, Thomas, Riedmiller, & Martin. (2015, April 13). Striving for Simplicity: The All Convolutional Net. Retrieved from https://arxiv.org/abs/1412.6806[8] Schaul, Tom, Quan, John, Ioannis, & David. (2016, February 25). Prioritized Experience Replay. Retrieved from https://arxiv.org/abs/1511.05952[9] Hausknecht, Matthew, & Peter. (2017, January 11). Deep Recurrent Q-Learning for Partially Observable MDPs. Retrieved from https://arxiv.org/abs/1507.06527[10] Hardlyrichie, pytorch-flappy-bird, (2019), GitHub repository,https://github.com/hardlyrichie/pytorch-flappy-bird[11] Uvipen, Super-mario-bros-A3C-pytorch, (2019), GitHub repository,https://github.com/uvipen/Super-mario-bros-A3C-pytorch[12] Zahavy, Tom, Zrihem, Ben, N., & Shie. (2017, April 24). Graying the black box: Understanding DQNs. Retrieved from https://arxiv.org/abs/1602.02658[13] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529, 2015[14] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu. “Asynchronous methods for deep reinforcement learning”. In: arXiv preprint arXiv:1602.01783 (2016).[15] Kautenja, gym-super-mario-bros, (2019), GitHub repository, https://github.com/Kautenja/gym-super-mario-bros[16] https://www.romhacking.net/utilities/178/[17] Lillicrap, T. P. (2015, September 9). Continuous control with deep reinforcement learning. ArXiv.Org. https://arxiv.org/abs/1509.02971[18] Heess, N. (2017, July 7). Emergence of Locomotion Behaviours in Rich Environments. ArXiv.Org. https://arxiv.org/abs/1707.02286 zh_TW dc.identifier.doi (DOI) 10.6814/NCCU202001675 en_US