學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

題名 無人機基於深度強化學習於虛擬環境之視覺化分析
Visual Analysis for drone with Reinforcement Learning in Virtual Environment
作者 李宣毅
Lee, Hsuan-I
貢獻者 紀明德
Chi, Ming-Te
李宣毅
Lee, Hsuan-I
關鍵詞 深度強化學習
無人機競賽
虛擬環境
視覺化分析
Deep reinforcement learning
Drone racing
Virtual environment
Visual analytics
日期 2022
上傳時間 1-Apr-2022 15:04:57 (UTC+8)
摘要 近年來非常流行全自動無人機競賽,2019 年微軟團隊 Airsim 於
NeurlIPS 的會議上舉辦一個基於虛擬環境的無人機過框比賽,其主要
目標希望能夠超越人類玩家的表現,而在得名的參賽者中並沒有針對
這項競賽設計一套利用深度強化學習的方法,因此本研究針對此虛擬
競賽使用深度強化學習的方法訓練成功過框完賽的模型,並結合現實
中無人機時常運用的 ROS 系統作為指令傳遞的通訊架構縮小虛擬與
現實的差異。
眾所周知深度強化學習這項方法就如同黑盒子,使用者不知道模
型究竟學習到什麼,因此本研究設計一套視覺化介面,提供使用者分
析模型表現,並設計一套圖表分析各項動作選擇的機率,看出模型在
當下狀態所做的思考是否與普遍認知上相同,最後利用神經網路視覺
化的技巧看出模型表現不佳的問題並將其改良,其中發現某些情況下
模型表現與人類的行為相似,使得對深度強化學習的信任以及現實應
用的可能性大幅增加。
Autonomous drone racing has become very popular in recent years. At the 2019 Microsoft team, Airsim at the NeurlIPS conference held a virtual environment-based drone passing-gate competition. Its main goal is to surpass the performance of human players. None of the contestants designed a method for utilizing DRL (Deep Reinforcement Learning) specifically for this competition. This research uses the DRL method to train a model for this virtual racing and combines the ROS system that is often used by drones in reality as the communication architecture for command transmission to reduce the difference between virtual and reality. It is well known that the method of DRL is like a black box, and the user does not know what the model has learned. Therefore, this research designed a visual interface to provide users with an analysis of the model`s performance and designed a chart to analyze the probability of each action selection so users could know whether the thinking of the model in the current state is the same as the general cognition. Finally, the neural network visualization technique is used to identify the problem of poor performance of the model and improve it, as well as to find to behave similarly to human behavior. In some cases, it greatly increases the trust in DRL and the possibility of real-world applications.
參考文獻 [1] Gebhardt, C., Stevšić, S., & Hilliges, O. (2018). Optimizing for aesthetically
pleasing quadrotor camera motion. ACM Transactions on Graphics (TOG), 37(4),
1-11.

[2] Hepp, B., Dey, D., Sinha, S. N., Kapoor, A., Joshi, N., & Hilliges, O. (2018).
Learn-to-score: Efficient 3d scene exploration by predicting view utility. In
Proceedings of the European conference on computer vision (ECCV) (pp. 437-
452).

[3] Kaufmann, E., Loquercio, A., Ranftl, R., Dosovitskiy, A., Koltun, V., &
Scaramuzza, D. (2018, October). Deep drone racing: Learning agile flight in
dynamic environments. In Conference on Robot Learning (pp. 133-145). PMLR.

[4] Xu, J., Du, T., Foshey, M., Li, B., Zhu, B., Schulz, A., & Matusik, W. (2019).
Learning to fly: computational controller design for hybrid uavs with
reinforcement learning. ACM Transactions on Graphics (TOG), 38(4), 1-12.

[5] Shin, S. Y., Kang, Y. W., & Kim, Y. G. (2020). Reward-driven U-net training for
obstacle avoidance drone. Expert Systems with Applications, 143, 113064.

[6] Shin, S. Y., Kang, Y. W., & Kim, Y. G. (2019). Obstacle avoidance drone by deep
reinforcement learning and its racing with human pilot. Applied sciences, 9(24),
5571.

[7] Madaan, R., Gyde, N., Vemprala, S., Brown, M., Nagami, K., Taubner, T., ... &
Kapoor, A. (2020, August). Airsim drone racing lab. In NeurIPS 2019 Competition
and Demonstration Track (pp. 177-191). PMLR.

[8] Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., &
Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv
preprint arXiv:13

[9] Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., ... & Wierstra,
D. (2015). Continuous control with deep reinforcement learning. arXiv preprint
arXiv:1509.02971.

[10] Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., ... &
Kavukcuoglu, K. (2016, June). Asynchronous methods for deep reinforcement
learning. In International conference on machine learning (pp. 1928-1937).
PMLR.

[11] Wu, Y., Mansimov, E., Grosse, R. B., Liao, S., & Ba, J. (2017). Scalable trustregion method for deep reinforcement learning using kronecker-factored
approximation. Advances in neural information processing systems, 30, 5279-
5288.

[12] Won, J., Park, J., Kim, K., & Lee, J. (2017). How to train your dragon: exampleguided control of flapping flight. ACM Transactions on Graphics (TOG), 36(6), 1-
13.

[13] Simonyan, K., Vedaldi, A., & Zisserman, A. (2013). Deep inside convolutional
networks: Visualising image classification models and saliency maps. arXiv
preprint arXiv:1312.6034.

[14] Iyer, R., Li, Y., Li, H., Lewis, M., Sundar, R., & Sycara, K. (2018, December).
Transparency and explanation in deep reinforcement learning neural networks. In
Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society (pp.
144-150).

[15] Wang, X., Li, H., Zhang, H., Lewis, M., & Sycara, K. (2020). Explanation of
Reinforcement Learning Model in Dynamic Multi-Agent System. arXiv preprint
arXiv:2008.01508.

[16] Deshpande, S., Eysenbach, B., & Schneider, J. (2020). Interactive Visualization
for Debugging RL. arXiv preprint arXiv:2008.07331.

[17] Greydanus, S., Koul, A., Dodge, J., & Fern, A. (2018, July). Visualizing and
understanding atari agents. In International Conference on Machine Learning (pp.
1792-1801). PMLR.

[18] Dabkowski, P., & Gal, Y. (2017). Real time image saliency for black box
classifiers. arXiv preprint arXiv:1705.07857.

[19] Fong, R. C., & Vedaldi, A. (2017). Interpretable explanations of black boxes by
meaningful perturbation. In Proceedings of the IEEE international conference on
computer vision (pp. 3429-3437).

[20] Rosynski, M., Kirchner, F., & Valdenegro-Toro, M. (2020). Are Gradient-based
Saliency Maps Useful in Deep Reinforcement Learning?. arXiv preprint
arXiv:2012.01281.

[21] Atrey, A., Clary, K., & Jensen, D. (2019). Exploratory not explanatory:
Counterfactual analysis of saliency maps for deep reinforcement learning. arXiv
preprint arXiv:1912.05743.

[22] Wang, J., Gou, L., Shen, H. W., & Yang, H. (2018). Dqnviz: A visual analytics
approach to understand deep q-networks. IEEE transactions on visualization and
computer graphics, 25(1), 288-298.

[23] Jaunet, T., Vuillemot, R., & Wolf, C. (2020, June). DRLViz: Understanding
decisions and memory in deep reinforcement learning. In Computer Graphics
Forum (Vol. 39, No. 3, pp. 49-61).

[24] Jaderberg, M., Czarnecki, W. M., Dunning, I., Marris, L., Lever, G., Castaneda, A.
G., ... & Graepel, T. (2019). Human-level performance in 3D multiplayer games
with population-based reinforcement learning. Science, 364(6443), 859-865.

[25] Deng, Z., Weng, D., Chen, J., Liu, R., Wang, Z., Bao, J., ... & Wu, Y. (2019). Airvis:
Visual analytics of air pollution propagation. IEEE transactions on visualization
and computer graphics, 26(1), 800-810.

[26] Ates, U. (2020, October). Long-Term Planning with Deep Reinforcement
Learning on Autonomous Drones. In 2020 Innovations in Intelligent Systems and
Applications Conference (ASYU) (pp. 1-6). IEEE.

[27] Chattopadhay, A., Sarkar, A., Howlader, P., & Balasubramanian, V. N. (2018,
March). Grad-cam++: Generalized gradient-based visual explanations for deep
convolutional networks. In 2018 IEEE winter conference on applications of
computer vision (WACV) (pp. 839-847). IEEE.

[28] Mott, A., Zoran, D., Chrzanowski, M., Wierstra, D., & Rezende, D. J. (2019).
Towards interpretable reinforcement learning using attention augmented agents.
arXiv preprint arXiv:1906.02500.

[29] Puri, N., Verma, S., Gupta, P., Kayastha, D., Deshmukh, S., Krishnamurthy, B., &
Singh, S. (2019). Explain your move: Understanding agent actions using specific
and relevant feature attribution. arXiv preprint arXiv:1912.12191.

[30] Kostrikov, I.. (2018). PyTorch Implementations of Reinforcement Learning
Algorithms. https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail.

[31] Tzutalin. LabelImg. Git code (2015). https://github.com/tzutalin/labelImg

[32] Quigley, M., Conley, K., Gerkey, B., Faust, J., Foote, T., Leibs, J., ... & Ng, A. Y.
(2009, May). ROS: an open-source Robot Operating System. In ICRA workshop
on open source software (Vol. 3, No. 3.2, p. 5).

[33] Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., &
Zaremba, W. (2016). Openai gym. arXiv preprint arXiv:1606.01540.

[34] Amdegroot. (2017). SSD.PyTorch. https://github.com/amdegroot/ssd.pytorch

[35] Reinforcement learning basic architecture diagram https://www.newton.com.tw/wiki

[36] Actor critic architecture http://incompleteideas.net/book/ebook
描述 碩士
國立政治大學
資訊科學系
108753130
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0108753130
資料類型 thesis
dc.contributor.advisor 紀明德zh_TW
dc.contributor.advisor Chi, Ming-Teen_US
dc.contributor.author (Authors) 李宣毅zh_TW
dc.contributor.author (Authors) Lee, Hsuan-Ien_US
dc.creator (作者) 李宣毅zh_TW
dc.creator (作者) Lee, Hsuan-Ien_US
dc.date (日期) 2022en_US
dc.date.accessioned 1-Apr-2022 15:04:57 (UTC+8)-
dc.date.available 1-Apr-2022 15:04:57 (UTC+8)-
dc.date.issued (上傳時間) 1-Apr-2022 15:04:57 (UTC+8)-
dc.identifier (Other Identifiers) G0108753130en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/139558-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學系zh_TW
dc.description (描述) 108753130zh_TW
dc.description.abstract (摘要) 近年來非常流行全自動無人機競賽,2019 年微軟團隊 Airsim 於
NeurlIPS 的會議上舉辦一個基於虛擬環境的無人機過框比賽,其主要
目標希望能夠超越人類玩家的表現,而在得名的參賽者中並沒有針對
這項競賽設計一套利用深度強化學習的方法,因此本研究針對此虛擬
競賽使用深度強化學習的方法訓練成功過框完賽的模型,並結合現實
中無人機時常運用的 ROS 系統作為指令傳遞的通訊架構縮小虛擬與
現實的差異。
眾所周知深度強化學習這項方法就如同黑盒子,使用者不知道模
型究竟學習到什麼,因此本研究設計一套視覺化介面,提供使用者分
析模型表現,並設計一套圖表分析各項動作選擇的機率,看出模型在
當下狀態所做的思考是否與普遍認知上相同,最後利用神經網路視覺
化的技巧看出模型表現不佳的問題並將其改良,其中發現某些情況下
模型表現與人類的行為相似,使得對深度強化學習的信任以及現實應
用的可能性大幅增加。
zh_TW
dc.description.abstract (摘要) Autonomous drone racing has become very popular in recent years. At the 2019 Microsoft team, Airsim at the NeurlIPS conference held a virtual environment-based drone passing-gate competition. Its main goal is to surpass the performance of human players. None of the contestants designed a method for utilizing DRL (Deep Reinforcement Learning) specifically for this competition. This research uses the DRL method to train a model for this virtual racing and combines the ROS system that is often used by drones in reality as the communication architecture for command transmission to reduce the difference between virtual and reality. It is well known that the method of DRL is like a black box, and the user does not know what the model has learned. Therefore, this research designed a visual interface to provide users with an analysis of the model`s performance and designed a chart to analyze the probability of each action selection so users could know whether the thinking of the model in the current state is the same as the general cognition. Finally, the neural network visualization technique is used to identify the problem of poor performance of the model and improve it, as well as to find to behave similarly to human behavior. In some cases, it greatly increases the trust in DRL and the possibility of real-world applications.en_US
dc.description.tableofcontents 摘要 i
Abstract ii
目錄 iii
圖目錄 vi
表目錄 x
第一章 緒論 1
1.1 研究動機與目的 1
1.2 問題描述 2
1.3 論文貢獻 3
1.4 論文章節架構 3
第二章 相關研究 4
2.1 深度強化學習 4
2.1.1 深度學習 4
2.1.2 強化學習 5
2.1.3 深度強化學習的發展 6
2.2 深度強化學習與無人機應用 8
2.3 視覺化分析及技巧 10
第三章 研究方法 16
3.1 系統架構 16
3.2 環境設置 17
3.3 利用深度強化學習控制無人機 18
3.3.1 ACKTR 及模型架構 18
3.3.2 物件偵測 19
3.3.3 無人機控制 21
3.4 獎勵函數設計 21
3.5 數據收集 23
第四章 視覺化設計 25
4.1 設計動機以及目標 25
4.2 儀表板概覽 27
4.3 神經網路視覺化 29
4.3.1 反向傳播法 30
4.3.2 基於擾動式顯著圖 31
4.3.3 利用顯著圖觀察問題以及改良 32
4.4 Grad-Cam++分析視覺化 34
第五章 實驗結果與討論 36
5.1 實作與實驗環境 36
5.2 模型的測試結果 36
5.3 模型視覺化分析 39
5.3.1 數據分析 40
5.3.2 模型的行為思考分析 43
5.3.3 藉由擾動式顯著圖分析模型的知識 47
5.3.4 Grad-Cam++結果分析 49
5-4 限制 52
第六章 結論與未來工作 53
6.1 結論 53
6.2 未來工作 54
參考文獻 55
zh_TW
dc.format.extent 3947156 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0108753130en_US
dc.subject (關鍵詞) 深度強化學習zh_TW
dc.subject (關鍵詞) 無人機競賽zh_TW
dc.subject (關鍵詞) 虛擬環境zh_TW
dc.subject (關鍵詞) 視覺化分析zh_TW
dc.subject (關鍵詞) Deep reinforcement learningen_US
dc.subject (關鍵詞) Drone racingen_US
dc.subject (關鍵詞) Virtual environmenten_US
dc.subject (關鍵詞) Visual analyticsen_US
dc.title (題名) 無人機基於深度強化學習於虛擬環境之視覺化分析zh_TW
dc.title (題名) Visual Analysis for drone with Reinforcement Learning in Virtual Environmenten_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1] Gebhardt, C., Stevšić, S., & Hilliges, O. (2018). Optimizing for aesthetically
pleasing quadrotor camera motion. ACM Transactions on Graphics (TOG), 37(4),
1-11.

[2] Hepp, B., Dey, D., Sinha, S. N., Kapoor, A., Joshi, N., & Hilliges, O. (2018).
Learn-to-score: Efficient 3d scene exploration by predicting view utility. In
Proceedings of the European conference on computer vision (ECCV) (pp. 437-
452).

[3] Kaufmann, E., Loquercio, A., Ranftl, R., Dosovitskiy, A., Koltun, V., &
Scaramuzza, D. (2018, October). Deep drone racing: Learning agile flight in
dynamic environments. In Conference on Robot Learning (pp. 133-145). PMLR.

[4] Xu, J., Du, T., Foshey, M., Li, B., Zhu, B., Schulz, A., & Matusik, W. (2019).
Learning to fly: computational controller design for hybrid uavs with
reinforcement learning. ACM Transactions on Graphics (TOG), 38(4), 1-12.

[5] Shin, S. Y., Kang, Y. W., & Kim, Y. G. (2020). Reward-driven U-net training for
obstacle avoidance drone. Expert Systems with Applications, 143, 113064.

[6] Shin, S. Y., Kang, Y. W., & Kim, Y. G. (2019). Obstacle avoidance drone by deep
reinforcement learning and its racing with human pilot. Applied sciences, 9(24),
5571.

[7] Madaan, R., Gyde, N., Vemprala, S., Brown, M., Nagami, K., Taubner, T., ... &
Kapoor, A. (2020, August). Airsim drone racing lab. In NeurIPS 2019 Competition
and Demonstration Track (pp. 177-191). PMLR.

[8] Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., &
Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv
preprint arXiv:13

[9] Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., ... & Wierstra,
D. (2015). Continuous control with deep reinforcement learning. arXiv preprint
arXiv:1509.02971.

[10] Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., ... &
Kavukcuoglu, K. (2016, June). Asynchronous methods for deep reinforcement
learning. In International conference on machine learning (pp. 1928-1937).
PMLR.

[11] Wu, Y., Mansimov, E., Grosse, R. B., Liao, S., & Ba, J. (2017). Scalable trustregion method for deep reinforcement learning using kronecker-factored
approximation. Advances in neural information processing systems, 30, 5279-
5288.

[12] Won, J., Park, J., Kim, K., & Lee, J. (2017). How to train your dragon: exampleguided control of flapping flight. ACM Transactions on Graphics (TOG), 36(6), 1-
13.

[13] Simonyan, K., Vedaldi, A., & Zisserman, A. (2013). Deep inside convolutional
networks: Visualising image classification models and saliency maps. arXiv
preprint arXiv:1312.6034.

[14] Iyer, R., Li, Y., Li, H., Lewis, M., Sundar, R., & Sycara, K. (2018, December).
Transparency and explanation in deep reinforcement learning neural networks. In
Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society (pp.
144-150).

[15] Wang, X., Li, H., Zhang, H., Lewis, M., & Sycara, K. (2020). Explanation of
Reinforcement Learning Model in Dynamic Multi-Agent System. arXiv preprint
arXiv:2008.01508.

[16] Deshpande, S., Eysenbach, B., & Schneider, J. (2020). Interactive Visualization
for Debugging RL. arXiv preprint arXiv:2008.07331.

[17] Greydanus, S., Koul, A., Dodge, J., & Fern, A. (2018, July). Visualizing and
understanding atari agents. In International Conference on Machine Learning (pp.
1792-1801). PMLR.

[18] Dabkowski, P., & Gal, Y. (2017). Real time image saliency for black box
classifiers. arXiv preprint arXiv:1705.07857.

[19] Fong, R. C., & Vedaldi, A. (2017). Interpretable explanations of black boxes by
meaningful perturbation. In Proceedings of the IEEE international conference on
computer vision (pp. 3429-3437).

[20] Rosynski, M., Kirchner, F., & Valdenegro-Toro, M. (2020). Are Gradient-based
Saliency Maps Useful in Deep Reinforcement Learning?. arXiv preprint
arXiv:2012.01281.

[21] Atrey, A., Clary, K., & Jensen, D. (2019). Exploratory not explanatory:
Counterfactual analysis of saliency maps for deep reinforcement learning. arXiv
preprint arXiv:1912.05743.

[22] Wang, J., Gou, L., Shen, H. W., & Yang, H. (2018). Dqnviz: A visual analytics
approach to understand deep q-networks. IEEE transactions on visualization and
computer graphics, 25(1), 288-298.

[23] Jaunet, T., Vuillemot, R., & Wolf, C. (2020, June). DRLViz: Understanding
decisions and memory in deep reinforcement learning. In Computer Graphics
Forum (Vol. 39, No. 3, pp. 49-61).

[24] Jaderberg, M., Czarnecki, W. M., Dunning, I., Marris, L., Lever, G., Castaneda, A.
G., ... & Graepel, T. (2019). Human-level performance in 3D multiplayer games
with population-based reinforcement learning. Science, 364(6443), 859-865.

[25] Deng, Z., Weng, D., Chen, J., Liu, R., Wang, Z., Bao, J., ... & Wu, Y. (2019). Airvis:
Visual analytics of air pollution propagation. IEEE transactions on visualization
and computer graphics, 26(1), 800-810.

[26] Ates, U. (2020, October). Long-Term Planning with Deep Reinforcement
Learning on Autonomous Drones. In 2020 Innovations in Intelligent Systems and
Applications Conference (ASYU) (pp. 1-6). IEEE.

[27] Chattopadhay, A., Sarkar, A., Howlader, P., & Balasubramanian, V. N. (2018,
March). Grad-cam++: Generalized gradient-based visual explanations for deep
convolutional networks. In 2018 IEEE winter conference on applications of
computer vision (WACV) (pp. 839-847). IEEE.

[28] Mott, A., Zoran, D., Chrzanowski, M., Wierstra, D., & Rezende, D. J. (2019).
Towards interpretable reinforcement learning using attention augmented agents.
arXiv preprint arXiv:1906.02500.

[29] Puri, N., Verma, S., Gupta, P., Kayastha, D., Deshmukh, S., Krishnamurthy, B., &
Singh, S. (2019). Explain your move: Understanding agent actions using specific
and relevant feature attribution. arXiv preprint arXiv:1912.12191.

[30] Kostrikov, I.. (2018). PyTorch Implementations of Reinforcement Learning
Algorithms. https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail.

[31] Tzutalin. LabelImg. Git code (2015). https://github.com/tzutalin/labelImg

[32] Quigley, M., Conley, K., Gerkey, B., Faust, J., Foote, T., Leibs, J., ... & Ng, A. Y.
(2009, May). ROS: an open-source Robot Operating System. In ICRA workshop
on open source software (Vol. 3, No. 3.2, p. 5).

[33] Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., &
Zaremba, W. (2016). Openai gym. arXiv preprint arXiv:1606.01540.

[34] Amdegroot. (2017). SSD.PyTorch. https://github.com/amdegroot/ssd.pytorch

[35] Reinforcement learning basic architecture diagram https://www.newton.com.tw/wiki

[36] Actor critic architecture http://incompleteideas.net/book/ebook
zh_TW
dc.identifier.doi (DOI) 10.6814/NCCU202200384en_US