深度強化學習的視覺化分析—以橫向卷軸遊戲為例

Publications-Theses

Article View/Open

pdf(118)

Publication Export

Google Scholar^TM

NCCU Library

Discovery System

Citation Infomation

Related Publications in TAIR

Simple Record
Full Record

題名	深度強化學習的視覺化分析—以橫向卷軸遊戲為例 Visualization Analysis for Deep Reinforcement Learning – A Case Study of Side-scrolling Video Game
作者	鄭緒辰 Cheng, Hsu-Chen
貢獻者	紀明德 Chi, Ming-Te 鄭緒辰 Cheng, Hsu-Chen
關鍵詞	視覺化分析深度強化學習橫向卷軸遊戲 Visual Analytics Deep Reinforcement Learning Side-scrolling Game
日期	2020
上傳時間	2-Sep-2020 12:15:02 (UTC+8)
摘要	深度強化學習是人工智慧（AI）領域近年來常用於訓練不同電腦遊戲中的代理人（Agent）對環境的應對。常見用於深度強化學習研究的遊戲有Atari 2600系列等簡單且規則明確的遊戲環境，方便研究者去觀察及分析AI行為。本研究主要針對橫向卷軸類型的遊戲環境，而橫向卷軸遊戲的特色是玩家只能隨著角色移動看見有限的場景，這就考驗著AI的即時反應和經驗。我們對較為簡單的flappy bird和較為複雜的 Super Mario Bros做深度強化學習及視覺化分析，預期解決以下問題，第一，找出AI的傾向和限制。第二，分析AI的動作選擇以及遊玩策略。第三，藉由比較不同訓練時間的模型了解AI的學習歷程。第四，驗證AI是否有學習到方向及距離的重要性。為了解決上述問題，首先，我們利用A3C的深度強化學習架構，對環境和獎勵機制做調整，以增強AI進行遊戲的靈活度和適應性。接著，蒐集遊戲歷程和訓練資料。最後，制定視覺化分析，根據分析，可以提高研究人員對模型表現的解讀，降低改良深度學習模型的門檻。 In recent years, deep reinforcement learning becomes an essential topic in artificial intelligence (AI), which trains agents to deal with different computer game environments. Most deep reinforcement learning research focuses on Atari 2600 series and other simple and well-defined game environments convenient for researchers to observe and analyze AI behavior. This research mainly focuses on the side-scrolling game environment, in which players can only see limited scenes as the characters move, to test the AI`s immediate response and experience. We use the simple flappy bird and the more complicated Super Mario Bros as a testbed. We expect to solve the following problems. First, find out the tendencies and limitations of AI. Second, analyze AI`s action selection and play strategy. Third, understand the learning process of AI by comparing models with different training times. Fourth, verify whether AI has learned the importance of direction and distance. To solve the above problems, we first apply A3C deep reinforcement learning architecture to set the environment and reward mechanism to enhance the flexibility and adaptability of AI games. Next, collect game history and training data. Finally, design the visual analysis workflow and tools to improve the researchers` interpretation of the model`s performance and reduce the threshold for improving the deep learning model.
參考文獻	[1] Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. 2017 IEEE International Conference on Computer Vision (ICCV). doi:10.1109/iccv.2017.74. [2] Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning Deep Features for Discriminative Localization. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). doi:10.1109/cvpr.2016.319 [3] Sarkar, Howlader, Balasubramanian, & N, V. (2018, November 09). Grad-CAM++ : Improved Visual Explanations for Deep Convolutional Networks. Retrieved from https://arxiv.org/abs/1710.11063v3. [4] Olah, C., Mordvintsev, A., & Schubert, L. (2019, April 01). Feature Visualization. Retrieved from https://distill.pub/2017/feature-visualization/. [5] Volodymyr, Koray, David, Alex, Ioannis, Daan, . . . Martin. (2013, December 19). Playing Atari with Deep Reinforcement Learning. Retrieved from https://arxiv.org/abs/1312.5602. [6] Zeiler, M. D., & Fergus, R. (2014). Visualizing and Understanding Convolutional Networks. Computer Vision – ECCV 2014 Lecture Notes in Computer Science,818-833. doi:10.1007/978-3-319-10590-1_53 [7] Tobias, J., Dosovitskiy, Alexey, Brox, Thomas, Riedmiller, & Martin. (2015, April 13). Striving for Simplicity: The All Convolutional Net. Retrieved from https://arxiv.org/abs/1412.6806 [8] Schaul, Tom, Quan, John, Ioannis, & David. (2016, February 25). Prioritized Experience Replay. Retrieved from https://arxiv.org/abs/1511.05952 [9] Hausknecht, Matthew, & Peter. (2017, January 11). Deep Recurrent Q-Learning for Partially Observable MDPs. Retrieved from https://arxiv.org/abs/1507.06527 [10] Hardlyrichie, pytorch-flappy-bird, (2019), GitHub repository, https://github.com/hardlyrichie/pytorch-flappy-bird [11] Uvipen, Super-mario-bros-A3C-pytorch, (2019), GitHub repository, https://github.com/uvipen/Super-mario-bros-A3C-pytorch [12] Zahavy, Tom, Zrihem, Ben, N., & Shie. (2017, April 24). Graying the black box: Understanding DQNs. Retrieved from https://arxiv.org/abs/1602.02658 [13] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529, 2015 [14] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu. “Asynchronous methods for deep reinforcement learning”. In: arXiv preprint arXiv:1602.01783 (2016). [15] Kautenja, gym-super-mario-bros, (2019), GitHub repository, https://github.com/Kautenja/gym-super-mario-bros [16] https://www.romhacking.net/utilities/178/ [17] Lillicrap, T. P. (2015, September 9). Continuous control with deep reinforcement learning. ArXiv.Org. https://arxiv.org/abs/1509.02971 [18] Heess, N. (2017, July 7). Emergence of Locomotion Behaviours in Rich Environments. ArXiv.Org. https://arxiv.org/abs/1707.02286
描述	碩士國立政治大學資訊科學系 106753016
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0106753016
資料類型	thesis

dc.contributor.advisor	紀明德	zh_TW
dc.contributor.advisor	Chi, Ming-Te	en_US
dc.contributor.author (Authors)	鄭緒辰	zh_TW
dc.contributor.author (Authors)	Cheng, Hsu-Chen	en_US
dc.creator (作者)	鄭緒辰	zh_TW
dc.creator (作者)	Cheng, Hsu-Chen	en_US
dc.date (日期)	2020	en_US
dc.date.accessioned	2-Sep-2020 12:15:02 (UTC+8)	-
dc.date.available	2-Sep-2020 12:15:02 (UTC+8)	-
dc.date.issued (上傳時間)	2-Sep-2020 12:15:02 (UTC+8)	-
dc.identifier (Other Identifiers)	G0106753016	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/131629	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	資訊科學系	zh_TW
dc.description (描述)	106753016	zh_TW
dc.description.abstract (摘要)	深度強化學習是人工智慧（AI）領域近年來常用於訓練不同電腦遊戲中的代理人（Agent）對環境的應對。常見用於深度強化學習研究的遊戲有Atari 2600系列等簡單且規則明確的遊戲環境，方便研究者去觀察及分析AI行為。本研究主要針對橫向卷軸類型的遊戲環境，而橫向卷軸遊戲的特色是玩家只能隨著角色移動看見有限的場景，這就考驗著AI的即時反應和經驗。我們對較為簡單的flappy bird和較為複雜的 Super Mario Bros做深度強化學習及視覺化分析，預期解決以下問題，第一，找出AI的傾向和限制。第二，分析AI的動作選擇以及遊玩策略。第三，藉由比較不同訓練時間的模型了解AI的學習歷程。第四，驗證AI是否有學習到方向及距離的重要性。為了解決上述問題，首先，我們利用A3C的深度強化學習架構，對環境和獎勵機制做調整，以增強AI進行遊戲的靈活度和適應性。接著，蒐集遊戲歷程和訓練資料。最後，制定視覺化分析，根據分析，可以提高研究人員對模型表現的解讀，降低改良深度學習模型的門檻。	zh_TW
dc.description.abstract (摘要)	In recent years, deep reinforcement learning becomes an essential topic in artificial intelligence (AI), which trains agents to deal with different computer game environments. Most deep reinforcement learning research focuses on Atari 2600 series and other simple and well-defined game environments convenient for researchers to observe and analyze AI behavior. This research mainly focuses on the side-scrolling game environment, in which players can only see limited scenes as the characters move, to test the AI`s immediate response and experience. We use the simple flappy bird and the more complicated Super Mario Bros as a testbed. We expect to solve the following problems. First, find out the tendencies and limitations of AI. Second, analyze AI`s action selection and play strategy. Third, understand the learning process of AI by comparing models with different training times. Fourth, verify whether AI has learned the importance of direction and distance. To solve the above problems, we first apply A3C deep reinforcement learning architecture to set the environment and reward mechanism to enhance the flexibility and adaptability of AI games. Next, collect game history and training data. Finally, design the visual analysis workflow and tools to improve the researchers` interpretation of the model`s performance and reduce the threshold for improving the deep learning model.	en_US
dc.description.tableofcontents	摘要 ii Abstract iii 目錄 v 圖目錄 vii 第一章緒論 1 1.1 研究動機與目的 1 1.2 問題描述 2 1.3 論文貢獻 2 1.4 論文章節架構 3 第二章相關研究 4 2.1 深度強化學習 4 2.2 深度學習視覺化 6 2.3 資料降維和深度模型的分析 8 第三章研究方法與步驟 10 3.1 環境設置 10 3.2 模型訓練和資料蒐集 12 3.2.1 A3C模型架構 12 3.2.2 Flappy Bird 13 3.2.3 Super Mario Bros 13 3.3 單一模型結果視覺化 15 3.3.1 模型的遊戲流程視覺化 15 3.3.2 模型基本數據 16 3.3.3 模型的經驗和策略 18 3.3.4 RL Grad-CAM++ 20 3.4 不同訓練時間模型比較視覺化 23 3.5 Region of Interest（ROI）視覺化 24 第四章實驗結果與討論 26 4.1 實作與實驗環境 26 4.2 Flappy Bird視覺化分析 26 4.2.1遊戲過程和基本數據分析 27 4.2.2從特徵散佈圖觀察模型經驗和策略 28 4.2.3比較不同訓練時間模型 30 4.3 Super Mario Bros 視覺化分析 31 4.3.1 遊戲過程和基本數據分析 32 4.3.2 從特徵散佈圖觀察模型經驗和策略 33 4.3.3 比較不同訓練時間模型 36 4.3.4 Region of Interest（ROI）視覺化分析 38 4.4 比較不同獎勵機制的模型 40 4.5 限制 43 第五章結論與未來工作 44 5.1 結論 44 5.2 未來工作 45 參考文獻 46	zh_TW
dc.format.extent	4136096 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0106753016	en_US
dc.subject (關鍵詞)	視覺化分析	zh_TW
dc.subject (關鍵詞)	深度強化學習	zh_TW
dc.subject (關鍵詞)	橫向卷軸遊戲	zh_TW
dc.subject (關鍵詞)	Visual Analytics	en_US
dc.subject (關鍵詞)	Deep Reinforcement Learning	en_US
dc.subject (關鍵詞)	Side-scrolling Game	en_US
dc.title (題名)	深度強化學習的視覺化分析—以橫向卷軸遊戲為例	zh_TW
dc.title (題名)	Visualization Analysis for Deep Reinforcement Learning – A Case Study of Side-scrolling Video Game	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	[1] Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. 2017 IEEE International Conference on Computer Vision (ICCV). doi:10.1109/iccv.2017.74. [2] Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning Deep Features for Discriminative Localization. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). doi:10.1109/cvpr.2016.319 [3] Sarkar, Howlader, Balasubramanian, & N, V. (2018, November 09). Grad-CAM++ : Improved Visual Explanations for Deep Convolutional Networks. Retrieved from https://arxiv.org/abs/1710.11063v3. [4] Olah, C., Mordvintsev, A., & Schubert, L. (2019, April 01). Feature Visualization. Retrieved from https://distill.pub/2017/feature-visualization/. [5] Volodymyr, Koray, David, Alex, Ioannis, Daan, . . . Martin. (2013, December 19). Playing Atari with Deep Reinforcement Learning. Retrieved from https://arxiv.org/abs/1312.5602. [6] Zeiler, M. D., & Fergus, R. (2014). Visualizing and Understanding Convolutional Networks. Computer Vision – ECCV 2014 Lecture Notes in Computer Science,818-833. doi:10.1007/978-3-319-10590-1_53 [7] Tobias, J., Dosovitskiy, Alexey, Brox, Thomas, Riedmiller, & Martin. (2015, April 13). Striving for Simplicity: The All Convolutional Net. Retrieved from https://arxiv.org/abs/1412.6806 [8] Schaul, Tom, Quan, John, Ioannis, & David. (2016, February 25). Prioritized Experience Replay. Retrieved from https://arxiv.org/abs/1511.05952 [9] Hausknecht, Matthew, & Peter. (2017, January 11). Deep Recurrent Q-Learning for Partially Observable MDPs. Retrieved from https://arxiv.org/abs/1507.06527 [10] Hardlyrichie, pytorch-flappy-bird, (2019), GitHub repository, https://github.com/hardlyrichie/pytorch-flappy-bird [11] Uvipen, Super-mario-bros-A3C-pytorch, (2019), GitHub repository, https://github.com/uvipen/Super-mario-bros-A3C-pytorch [12] Zahavy, Tom, Zrihem, Ben, N., & Shie. (2017, April 24). Graying the black box: Understanding DQNs. Retrieved from https://arxiv.org/abs/1602.02658 [13] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529, 2015 [14] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu. “Asynchronous methods for deep reinforcement learning”. In: arXiv preprint arXiv:1602.01783 (2016). [15] Kautenja, gym-super-mario-bros, (2019), GitHub repository, https://github.com/Kautenja/gym-super-mario-bros [16] https://www.romhacking.net/utilities/178/ [17] Lillicrap, T. P. (2015, September 9). Continuous control with deep reinforcement learning. ArXiv.Org. https://arxiv.org/abs/1509.02971 [18] Heess, N. (2017, July 7). Emergence of Locomotion Behaviours in Rich Environments. ArXiv.Org. https://arxiv.org/abs/1707.02286	zh_TW
dc.identifier.doi (DOI)	10.6814/NCCU202001675	en_US

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM