學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

題名 深度強化學習的視覺化分析—以橫向卷軸遊戲為例
Visualization Analysis for Deep Reinforcement Learning – A Case Study of Side-scrolling Video Game
作者 鄭緒辰
Cheng, Hsu-Chen
貢獻者 紀明德
Chi, Ming-Te
鄭緒辰
Cheng, Hsu-Chen
關鍵詞 視覺化分析
深度強化學習
橫向卷軸遊戲
Visual Analytics
Deep Reinforcement Learning
Side-scrolling Game
日期 2020
上傳時間 2-Sep-2020 12:15:02 (UTC+8)
摘要 深度強化學習是人工智慧(AI)領域近年來常用於訓練不同電腦遊戲中的代理人(Agent)對環境的應對。常見用於深度強化學習研究的遊戲有Atari 2600系列等簡單且規則明確的遊戲環境,方便研究者去觀察及分析AI行為。本研究主要針對橫向卷軸類型的遊戲環境,而橫向卷軸遊戲的特色是玩家只能隨著角色移動看見有限的場景,這就考驗著AI的即時反應和經驗。我們對較為簡單的flappy bird和較為複雜的 Super Mario Bros做深度強化學習及視覺化分析,預期解決以下問題,第一,找出AI的傾向和限制。第二,分析AI的動作選擇以及遊玩策略。第三,藉由比較不同訓練時間的模型了解AI的學習歷程。第四,驗證AI是否有學習到方向及距離的重要性。為了解決上述問題,首先,我們利用A3C的深度強化學習架構,對環境和獎勵機制做調整,以增強AI進行遊戲的靈活度和適應性。接著,蒐集遊戲歷程和訓練資料。最後,制定視覺化分析,根據分析,可以提高研究人員對模型表現的解讀,降低改良深度學習模型的門檻。
In recent years, deep reinforcement learning becomes an essential topic in artificial intelligence (AI), which trains agents to deal with different computer game environments. Most deep reinforcement learning research focuses on Atari 2600 series and other simple and well-defined game environments convenient for researchers to observe and analyze AI behavior. This research mainly focuses on the side-scrolling game environment, in which players can only see limited scenes as the characters move, to test the AI`s immediate response and experience. We use the simple flappy bird and the more complicated Super Mario Bros as a testbed. We expect to solve the following problems. First, find out the tendencies and limitations of AI. Second, analyze AI`s action selection and play strategy. Third, understand the learning process of AI by comparing models with different training times. Fourth, verify whether AI has learned the importance of direction and distance. To solve the above problems, we first apply A3C deep reinforcement learning architecture to set the environment and reward mechanism to enhance the flexibility and adaptability of AI games. Next, collect game history and training data. Finally, design the visual analysis workflow and tools to improve the researchers` interpretation of the model`s performance and reduce the threshold for improving the deep learning model.
參考文獻 [1] Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. 2017 IEEE International Conference on Computer Vision (ICCV). doi:10.1109/iccv.2017.74.
[2] Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning Deep Features for Discriminative Localization. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). doi:10.1109/cvpr.2016.319
[3] Sarkar, Howlader, Balasubramanian, & N, V. (2018, November 09). Grad-CAM++ : Improved Visual Explanations for Deep Convolutional Networks. Retrieved from https://arxiv.org/abs/1710.11063v3.
[4] Olah, C., Mordvintsev, A., & Schubert, L. (2019, April 01). Feature Visualization. Retrieved from https://distill.pub/2017/feature-visualization/.
[5] Volodymyr, Koray, David, Alex, Ioannis, Daan, . . . Martin. (2013, December 19). Playing Atari with Deep Reinforcement Learning. Retrieved from https://arxiv.org/abs/1312.5602.
[6] Zeiler, M. D., & Fergus, R. (2014). Visualizing and Understanding Convolutional Networks. Computer Vision – ECCV 2014 Lecture Notes in Computer Science,818-833. doi:10.1007/978-3-319-10590-1_53
[7] Tobias, J., Dosovitskiy, Alexey, Brox, Thomas, Riedmiller, & Martin. (2015, April 13). Striving for Simplicity: The All Convolutional Net. Retrieved from https://arxiv.org/abs/1412.6806
[8] Schaul, Tom, Quan, John, Ioannis, & David. (2016, February 25). Prioritized Experience Replay. Retrieved from https://arxiv.org/abs/1511.05952
[9] Hausknecht, Matthew, & Peter. (2017, January 11). Deep Recurrent Q-Learning for Partially Observable MDPs. Retrieved from https://arxiv.org/abs/1507.06527
[10] Hardlyrichie, pytorch-flappy-bird, (2019), GitHub repository,
https://github.com/hardlyrichie/pytorch-flappy-bird
[11] Uvipen, Super-mario-bros-A3C-pytorch, (2019), GitHub repository,
https://github.com/uvipen/Super-mario-bros-A3C-pytorch
[12] Zahavy, Tom, Zrihem, Ben, N., & Shie. (2017, April 24). Graying the black box: Understanding DQNs. Retrieved from https://arxiv.org/abs/1602.02658
[13] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529, 2015
[14] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu. “Asynchronous methods for deep reinforcement learning”. In: arXiv preprint arXiv:1602.01783 (2016).
[15] Kautenja, gym-super-mario-bros, (2019), GitHub repository, https://github.com/Kautenja/gym-super-mario-bros
[16] https://www.romhacking.net/utilities/178/
[17] Lillicrap, T. P. (2015, September 9). Continuous control with deep reinforcement learning. ArXiv.Org. https://arxiv.org/abs/1509.02971
[18] Heess, N. (2017, July 7). Emergence of Locomotion Behaviours in Rich Environments. ArXiv.Org. https://arxiv.org/abs/1707.02286
描述 碩士
國立政治大學
資訊科學系
106753016
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0106753016
資料類型 thesis
dc.contributor.advisor 紀明德zh_TW
dc.contributor.advisor Chi, Ming-Teen_US
dc.contributor.author (Authors) 鄭緒辰zh_TW
dc.contributor.author (Authors) Cheng, Hsu-Chenen_US
dc.creator (作者) 鄭緒辰zh_TW
dc.creator (作者) Cheng, Hsu-Chenen_US
dc.date (日期) 2020en_US
dc.date.accessioned 2-Sep-2020 12:15:02 (UTC+8)-
dc.date.available 2-Sep-2020 12:15:02 (UTC+8)-
dc.date.issued (上傳時間) 2-Sep-2020 12:15:02 (UTC+8)-
dc.identifier (Other Identifiers) G0106753016en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/131629-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學系zh_TW
dc.description (描述) 106753016zh_TW
dc.description.abstract (摘要) 深度強化學習是人工智慧(AI)領域近年來常用於訓練不同電腦遊戲中的代理人(Agent)對環境的應對。常見用於深度強化學習研究的遊戲有Atari 2600系列等簡單且規則明確的遊戲環境,方便研究者去觀察及分析AI行為。本研究主要針對橫向卷軸類型的遊戲環境,而橫向卷軸遊戲的特色是玩家只能隨著角色移動看見有限的場景,這就考驗著AI的即時反應和經驗。我們對較為簡單的flappy bird和較為複雜的 Super Mario Bros做深度強化學習及視覺化分析,預期解決以下問題,第一,找出AI的傾向和限制。第二,分析AI的動作選擇以及遊玩策略。第三,藉由比較不同訓練時間的模型了解AI的學習歷程。第四,驗證AI是否有學習到方向及距離的重要性。為了解決上述問題,首先,我們利用A3C的深度強化學習架構,對環境和獎勵機制做調整,以增強AI進行遊戲的靈活度和適應性。接著,蒐集遊戲歷程和訓練資料。最後,制定視覺化分析,根據分析,可以提高研究人員對模型表現的解讀,降低改良深度學習模型的門檻。zh_TW
dc.description.abstract (摘要) In recent years, deep reinforcement learning becomes an essential topic in artificial intelligence (AI), which trains agents to deal with different computer game environments. Most deep reinforcement learning research focuses on Atari 2600 series and other simple and well-defined game environments convenient for researchers to observe and analyze AI behavior. This research mainly focuses on the side-scrolling game environment, in which players can only see limited scenes as the characters move, to test the AI`s immediate response and experience. We use the simple flappy bird and the more complicated Super Mario Bros as a testbed. We expect to solve the following problems. First, find out the tendencies and limitations of AI. Second, analyze AI`s action selection and play strategy. Third, understand the learning process of AI by comparing models with different training times. Fourth, verify whether AI has learned the importance of direction and distance. To solve the above problems, we first apply A3C deep reinforcement learning architecture to set the environment and reward mechanism to enhance the flexibility and adaptability of AI games. Next, collect game history and training data. Finally, design the visual analysis workflow and tools to improve the researchers` interpretation of the model`s performance and reduce the threshold for improving the deep learning model.en_US
dc.description.tableofcontents 摘要 ii
Abstract iii
目錄 v
圖目錄 vii
第一章 緒論 1
1.1 研究動機與目的 1
1.2 問題描述 2
1.3 論文貢獻 2
1.4 論文章節架構 3
第二章 相關研究 4
2.1 深度強化學習 4
2.2 深度學習視覺化 6
2.3 資料降維和深度模型的分析 8
第三章 研究方法與步驟 10
3.1 環境設置 10
3.2 模型訓練和資料蒐集 12
3.2.1 A3C模型架構 12
3.2.2 Flappy Bird 13
3.2.3 Super Mario Bros 13
3.3 單一模型結果視覺化 15
3.3.1 模型的遊戲流程視覺化 15
3.3.2 模型基本數據 16
3.3.3 模型的經驗和策略 18
3.3.4 RL Grad-CAM++ 20
3.4 不同訓練時間模型比較視覺化 23
3.5 Region of Interest(ROI)視覺化 24
第四章 實驗結果與討論 26
4.1 實作與實驗環境 26
4.2 Flappy Bird視覺化分析 26
4.2.1遊戲過程和基本數據分析 27
4.2.2從特徵散佈圖觀察模型經驗和策略 28
4.2.3比較不同訓練時間模型 30
4.3 Super Mario Bros 視覺化分析 31
4.3.1 遊戲過程和基本數據分析 32
4.3.2 從特徵散佈圖觀察模型經驗和策略 33
4.3.3 比較不同訓練時間模型 36
4.3.4 Region of Interest(ROI)視覺化分析 38
4.4 比較不同獎勵機制的模型 40
4.5 限制 43
第五章 結論與未來工作 44
5.1 結論 44
5.2 未來工作 45
參考文獻 46
zh_TW
dc.format.extent 4136096 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0106753016en_US
dc.subject (關鍵詞) 視覺化分析zh_TW
dc.subject (關鍵詞) 深度強化學習zh_TW
dc.subject (關鍵詞) 橫向卷軸遊戲zh_TW
dc.subject (關鍵詞) Visual Analyticsen_US
dc.subject (關鍵詞) Deep Reinforcement Learningen_US
dc.subject (關鍵詞) Side-scrolling Gameen_US
dc.title (題名) 深度強化學習的視覺化分析—以橫向卷軸遊戲為例zh_TW
dc.title (題名) Visualization Analysis for Deep Reinforcement Learning – A Case Study of Side-scrolling Video Gameen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1] Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. 2017 IEEE International Conference on Computer Vision (ICCV). doi:10.1109/iccv.2017.74.
[2] Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning Deep Features for Discriminative Localization. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). doi:10.1109/cvpr.2016.319
[3] Sarkar, Howlader, Balasubramanian, & N, V. (2018, November 09). Grad-CAM++ : Improved Visual Explanations for Deep Convolutional Networks. Retrieved from https://arxiv.org/abs/1710.11063v3.
[4] Olah, C., Mordvintsev, A., & Schubert, L. (2019, April 01). Feature Visualization. Retrieved from https://distill.pub/2017/feature-visualization/.
[5] Volodymyr, Koray, David, Alex, Ioannis, Daan, . . . Martin. (2013, December 19). Playing Atari with Deep Reinforcement Learning. Retrieved from https://arxiv.org/abs/1312.5602.
[6] Zeiler, M. D., & Fergus, R. (2014). Visualizing and Understanding Convolutional Networks. Computer Vision – ECCV 2014 Lecture Notes in Computer Science,818-833. doi:10.1007/978-3-319-10590-1_53
[7] Tobias, J., Dosovitskiy, Alexey, Brox, Thomas, Riedmiller, & Martin. (2015, April 13). Striving for Simplicity: The All Convolutional Net. Retrieved from https://arxiv.org/abs/1412.6806
[8] Schaul, Tom, Quan, John, Ioannis, & David. (2016, February 25). Prioritized Experience Replay. Retrieved from https://arxiv.org/abs/1511.05952
[9] Hausknecht, Matthew, & Peter. (2017, January 11). Deep Recurrent Q-Learning for Partially Observable MDPs. Retrieved from https://arxiv.org/abs/1507.06527
[10] Hardlyrichie, pytorch-flappy-bird, (2019), GitHub repository,
https://github.com/hardlyrichie/pytorch-flappy-bird
[11] Uvipen, Super-mario-bros-A3C-pytorch, (2019), GitHub repository,
https://github.com/uvipen/Super-mario-bros-A3C-pytorch
[12] Zahavy, Tom, Zrihem, Ben, N., & Shie. (2017, April 24). Graying the black box: Understanding DQNs. Retrieved from https://arxiv.org/abs/1602.02658
[13] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529, 2015
[14] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu. “Asynchronous methods for deep reinforcement learning”. In: arXiv preprint arXiv:1602.01783 (2016).
[15] Kautenja, gym-super-mario-bros, (2019), GitHub repository, https://github.com/Kautenja/gym-super-mario-bros
[16] https://www.romhacking.net/utilities/178/
[17] Lillicrap, T. P. (2015, September 9). Continuous control with deep reinforcement learning. ArXiv.Org. https://arxiv.org/abs/1509.02971
[18] Heess, N. (2017, July 7). Emergence of Locomotion Behaviours in Rich Environments. ArXiv.Org. https://arxiv.org/abs/1707.02286
zh_TW
dc.identifier.doi (DOI) 10.6814/NCCU202001675en_US