學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

題名 使用深度學習於RGB-D影像之無人飛行載具避障模型
Collision Avoidance Based on RGB-D Images in Unmanned Aerial Vehicles Using Deep Learning Techniques
作者 林宗賢
Lin, Tsung-Hsien
貢獻者 廖文宏
Liao, Wen-Hung
林宗賢
Lin, Tsung-Hsien
關鍵詞 無人機
避障
深度學習
RGB-D影像
UAV
Obstacle avoidance
Deep learning
RGB-D image
日期 2020
上傳時間 2-Jun-2020 11:12:29 (UTC+8)
摘要 無人機的相關應用越來越廣泛,從原本的國防領域,逐漸被推廣到商業、農業和救災等領域上,使人們的生活日趨便利,在這些應用當中,避障是一個不可或缺的功能,然而使用人為操控的方式無法大規模普及,因此本研究以RGB-D影像與深度學習為基礎,分別為沒有搭載深度攝影機的無人機和有搭載深度攝影機的無人機,提出自動避障的方法。

對於沒有搭載深度攝影機的無人機,本研究從開放的碰撞資料集,使用深度估計模型預測出對應的深度資訊,透過深度資訊在彩色影像中分割出危險、安全等區域,並使用即時語義分割模型進行訓練,將從彩色影像中預測出來的區域分布,透過我們提出的避障機制,使無人機找到一個合適的避障方向。

對於搭載深度攝影機的無人機,本研究使用即時語義分割模型和分群演算法,得到物體的類別和位置資訊,接著使用路徑規劃演算法幫助無人機找出最佳的避障路徑。

本研究所訓練的深度學習模型可以在嵌入式裝置上進行推論,因此我們提出的避障方法將可應用於運算資源有限的無人機。
UAV applications have been extended from the defense sector to commercial, agricultural and disaster relief in recent years. Obstacle avoidance is an essential component for UAV navigation. However, manual manipulation of UAVs is costly in terms of training and human resources. In the thesis, we propose automatic obstacle avoidance mechanisms for UAVs without depth sensors and UAVs with a depth camera based on deep learning techniques.
For UAVs not equipped with depth sensors, we employ depth estimation models to compute depth maps from 2D images. The depth information is then used to partition an image into dangerous and safe zones by a real-time semantic segmentation model. Given the zone distribution, the UAV can determine a suitable obstacle avoidance direction to guarantee a collision-free flight.
For UAVs with a depth camera, we combine semantic segmentation model and clustering algorithm to obtain the class and location of the obstacles. We then apply path planning algorithm to construct the optimal obstacle avoidance path.
All the deep learning models employed in this work meet the requirement of being able to perform inference on embedded systems efficiently. This will ensure the proposed obstacle avoidance algorithms to work on UAVs with limited computing resources.
參考文獻 [1] ImageNet. http://www.image-net.org/, last visited on Dec 2018.
[2] ImageNet Large Scale Visual Recognition Competition (ILSVRC). http://www.image-net.org/challenges/LSVRC/, last visited on Dec 2018.
[3] Warren S. McCulloch, Walter H. Pitts. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics 5, 115-133, 1943.
[4] Rosenblatt F. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review 65(6), 386-408, 1958.
[5] Rumelhart, D. E., Hinton, G. E., Williams, R. J. Learning representations by back-propagating errors. Nature 323, 533–536, 1986.
[6] Michael Nielsen. Neural Networks and Deep Learning. http://neuralnetworksanddeeplearning.com/index.html. Last visited on Dec 2018.
[7] Yann LeCun, Corinna Cortes, Christopher J.C. Burges. THE MNIST DATABASE of handwritten digits. http://yann.lecun.com/exdb/mnist/, last visited on Dec 2018
[8] Yuanqing Lin, Fengjun Lv, Shenghuo Zhu, Ming Yang, Timothee Cour, Kai Yu, Liangliang Cao, Thomas Huang. Large-scale image classification: Fast feature extraction and SVM training. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1689-1696, 2011.
[9]Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in neural information processing systems, pages 1097-1105, 2012.
[10] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich. Going Deeper with Convolutions. arXiv:1409.4842v1, 2014.
[11] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. Deep Residual Learning for Image Recognition. IEEE, pages 770-778, 2016.
[12] D. H. Hubel and T. N. Wiesel, “Receptive fields of single neurones in the cat’s striate cortex,”J. Physiol. London 148, 574–591 (1959).
[13] F. Chollet. Xception: Deep learning with depth wise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[14] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le. Learning transferable architectures for scalable image recognition. arXiv:1707.07012, 2017.
[15] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pages 4510–4520, 2018.
[16] Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. Searching for MobileNetV3. arXiv:1905.02244, 2019.
[17] Keras Documentation. https://keras.io/applications/, last visited on Feb 2020.
[18] Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR, abs/1704.04861, 2017.
[19] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the Inception architecture for computer vision. arXiv:1512.00567, 2015.
[20] B. Zoph and Q. V. Le. Neural architecture search with reinforcement learning. In International Conference on Learning Representations(ICLR), 2017.
[21] CIFAR-10. https://www.cs.toronto.edu/~kriz/cifar.html, last visited on Dec 2019.
[22] J. Hu, L. Shen, and G. Sun. Squeeze-and-excitation networks. arXiv:1709.01507, 2017.
[23] Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. Bisenet: Bilateral segmentation network for real-time semantic segmentation. arXiv:1808.00897, 2018.
[24] Ping Chao, Chao-Yang Kao, Yu-Shan Ruan, Chien-Hsiang Huang, and Youn-Long Lin. HarDNet: A low memory traffic network. Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2019.
[25] Real-Time Semantic Segmentation on Cityscapes test. https://paperswithcode.com/sota/real-time-semantic-segmentation-on-cityscapes/, last visited on Feb 2020.
[26] Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3431-3440, 2015.
[27] A. Loquercio, A. I. Maqueda, C. R. del-Blanco, and D. Scaramuzza. Dronet: Learning to fly by driving. IEEE Robotics and Automation Letters 3, 1088-1095, 2018.
[28] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Machine Learning Res. 15, 1929–1958, 2014.
[29] Glorot, X., Bordes, A., Bengio. Y. Deep sparse rectifier neural networks. Proc. 14th International Conference on Artificial Intelligence and Statistics 315–323, 2011.
[30] Udacity. An Open Source Self-Driving Car. https://www.udacity.com/self-driving-car, 2016. Last visited on Dec 2018.
[31] A. Giusti, J. Guzzi, D. C. Cirean, F. L. He, J. P. Rodrguez, F. Fontana, M. Faessler, C. Forster, J. Schmidhuber, G. D. Caro, D. Scaramuzza, and L. M. Gambardella. A machine learning approach to visual perception of forest trails for mobile robots. IEEE Robotics and Automation Letters, 2016.
[32] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014.
[33] Zhengqi Li, Noah Snavely. MegaDepth: Learning Single-View Depth Prediction from Internet Photos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[34] W. Chen, Z. Fu, D. Yang, J. Deng. Single-image depth perception in the wild. Neural Information Processing Systems, pages 730–738, 2016.
[35] J. L. Schonberger, J.-M. Frahm. Structure-from-motion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4104–4113, 2016.
[36] J. L. Schonberger, E. Zheng, J.-M. Frahm, M. Pollefeys. Pixelwise view selection for unstructured multi-view stereo. In Proc. European Conf. on Computer Vision (ECCV), pages 501–518, 2016.
[37] H. Zhao, J. Shi, X. Qi, X. Wang, J. Jia. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[38] B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso, A. Torralba. Scene parsing through ade20k dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[39] D. Eigen, R. Fergus. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proc. Int. Conf. on Computer Vision (ICCV), pages 2650–2658, 2015.
[40] I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, N. Navab. Deeper depth prediction with fully convolutional residual networks. In Int. Conf. on 3D Vision (3DV), pages 239–248, 2016.
[41] D. Eigen, C. Puhrsch, R. Fergus. Depth map prediction from a single image using a multi-scale deep network. In Neural Information Processing Systems, pages 2366–2374, 2014.
[42] A. Saxena, S. H. Chung, A. Y. Ng. Learning depth from single monocular images. In Neural Information Processing Systems, volume 18, pages 1–8, 2005.
[43] C. Godard, O. Mac Aodha, G. J. Brostow. Unsupervised monocular depth estimation with left-right consistency. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[44] Geiger, Andreas, Lenz, Philip, Stiller, Christoph, and Urtasun, Raquel. Vision meets robotics: The KITTI dataset. International Journal of Robotics Research, 32(11), 2013.
[45] R. P. Mihail, S. Workman, Z. Bessinger, and N. Jacobs. Sky segmentation in the wild: An empirical study. In Proceedings of IEEE Winter Conference on Applications of Computer Vision(WACV), pages 1–6, 2016.
[46] Cordts, Marius, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, Bernt Schiele. The Cityscapes Dataset for Semantic Urban Scene Understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[47] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV, 2018.
[48] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. The Pascal Visual Object Classes (VOC) Challenge. IJCV, pages 303–338, 2010.
[49] D. Pelleg and A. Moore. X-means: Extending k-means with efficient estimation of the number of clusters. In International Conference on Machine Learning, pages 727–734, 2000.
[50] Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. arXiv:1610.02391, 2016.
[51] B. Zhou, A. Khosla, L. A., A. Oliva, and A. Torralba. Learning Deep Features for Discriminative Localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
描述 碩士
國立政治大學
資訊科學系
106753008
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0106753008
資料類型 thesis
dc.contributor.advisor 廖文宏zh_TW
dc.contributor.advisor Liao, Wen-Hungen_US
dc.contributor.author (Authors) 林宗賢zh_TW
dc.contributor.author (Authors) Lin, Tsung-Hsienen_US
dc.creator (作者) 林宗賢zh_TW
dc.creator (作者) Lin, Tsung-Hsienen_US
dc.date (日期) 2020en_US
dc.date.accessioned 2-Jun-2020 11:12:29 (UTC+8)-
dc.date.available 2-Jun-2020 11:12:29 (UTC+8)-
dc.date.issued (上傳時間) 2-Jun-2020 11:12:29 (UTC+8)-
dc.identifier (Other Identifiers) G0106753008en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/130078-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學系zh_TW
dc.description (描述) 106753008zh_TW
dc.description.abstract (摘要) 無人機的相關應用越來越廣泛,從原本的國防領域,逐漸被推廣到商業、農業和救災等領域上,使人們的生活日趨便利,在這些應用當中,避障是一個不可或缺的功能,然而使用人為操控的方式無法大規模普及,因此本研究以RGB-D影像與深度學習為基礎,分別為沒有搭載深度攝影機的無人機和有搭載深度攝影機的無人機,提出自動避障的方法。

對於沒有搭載深度攝影機的無人機,本研究從開放的碰撞資料集,使用深度估計模型預測出對應的深度資訊,透過深度資訊在彩色影像中分割出危險、安全等區域,並使用即時語義分割模型進行訓練,將從彩色影像中預測出來的區域分布,透過我們提出的避障機制,使無人機找到一個合適的避障方向。

對於搭載深度攝影機的無人機,本研究使用即時語義分割模型和分群演算法,得到物體的類別和位置資訊,接著使用路徑規劃演算法幫助無人機找出最佳的避障路徑。

本研究所訓練的深度學習模型可以在嵌入式裝置上進行推論,因此我們提出的避障方法將可應用於運算資源有限的無人機。
zh_TW
dc.description.abstract (摘要) UAV applications have been extended from the defense sector to commercial, agricultural and disaster relief in recent years. Obstacle avoidance is an essential component for UAV navigation. However, manual manipulation of UAVs is costly in terms of training and human resources. In the thesis, we propose automatic obstacle avoidance mechanisms for UAVs without depth sensors and UAVs with a depth camera based on deep learning techniques.
For UAVs not equipped with depth sensors, we employ depth estimation models to compute depth maps from 2D images. The depth information is then used to partition an image into dangerous and safe zones by a real-time semantic segmentation model. Given the zone distribution, the UAV can determine a suitable obstacle avoidance direction to guarantee a collision-free flight.
For UAVs with a depth camera, we combine semantic segmentation model and clustering algorithm to obtain the class and location of the obstacles. We then apply path planning algorithm to construct the optimal obstacle avoidance path.
All the deep learning models employed in this work meet the requirement of being able to perform inference on embedded systems efficiently. This will ensure the proposed obstacle avoidance algorithms to work on UAVs with limited computing resources.
en_US
dc.description.tableofcontents 摘要......i
Abstract......ii
目錄......iii
表目錄......vi
圖目錄......vii
第一章 緒論......1
1.1 研究背景與動機......1
1.2 研究目的......2
1.3 論文架構......3
第二章 技術背景與相關研究......4
2.1 深度學習的背景與突破......4
2.2 卷積神經網路概述......7
2.2.1 全連接與局部連接......7
2.2.2 多卷積核與權值共享......8
2.2.3 池化層(Pooling Layer)......9
2.2.4 多層卷積......9
2.3圖像分類模型......10
2.3.1 Xception......11
2.3.2 NASNetMobile......12
2.3.3 MobileNet V2......14
2.3.4 MobileNet V3 Small......15
2.4即時語義分割模型......15
2.4.1 BiSeNet......16
2.4.2 U-HarDNet-70......17
2.5相關研究......18
2.5.1 DroNet......18
2.5.2 MegaDepth......20
2.5.3 小結......24
第三章 深度資訊與資料集介紹......25
3.1 深度資訊......25
3.1.1 CNN深度資訊估計......25
3.1.2 深度感測器......27
3.2 資料集......29
3.2.1 碰撞資料集......30
3.2.2 區域分割資料集......31
3.2.3 SkyFinder資料集......32
3.2.4 物體距離資料集......32
3.3 深度學習套件與開發環境......33
第四章 研究方法......35
4.1 設備介紹......36
4.1.1 Parrot Bebop 2......36
4.1.2 ZED Stereo Camera......37
4.2 實驗流程......38
4.2.1 碰撞事件判斷......38
4.2.2 不同安全等級區域分割......39
4.2.3 避障機制設計......46
4.2.4 物體位置判斷......48
第五章 實驗結果與分析......53
5.1 碰撞事件判斷......53
5.1.1 模型測試與比較......53
5.1.2 CNN視覺化分析......57
5.2 不同安全等級區域分割......59
5.2.1 模型測試與比較......59
5.2.2 無人機避障實測......61
5.3 物體位置判斷......64
5.3.1 重疊物體分割......64
5.3.2 避障路徑規劃......65
第六章 結論與未來方向......69
參考文獻......70
zh_TW
dc.format.extent 6730940 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0106753008en_US
dc.subject (關鍵詞) 無人機zh_TW
dc.subject (關鍵詞) 避障zh_TW
dc.subject (關鍵詞) 深度學習zh_TW
dc.subject (關鍵詞) RGB-D影像zh_TW
dc.subject (關鍵詞) UAVen_US
dc.subject (關鍵詞) Obstacle avoidanceen_US
dc.subject (關鍵詞) Deep learningen_US
dc.subject (關鍵詞) RGB-D imageen_US
dc.title (題名) 使用深度學習於RGB-D影像之無人飛行載具避障模型zh_TW
dc.title (題名) Collision Avoidance Based on RGB-D Images in Unmanned Aerial Vehicles Using Deep Learning Techniquesen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1] ImageNet. http://www.image-net.org/, last visited on Dec 2018.
[2] ImageNet Large Scale Visual Recognition Competition (ILSVRC). http://www.image-net.org/challenges/LSVRC/, last visited on Dec 2018.
[3] Warren S. McCulloch, Walter H. Pitts. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics 5, 115-133, 1943.
[4] Rosenblatt F. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review 65(6), 386-408, 1958.
[5] Rumelhart, D. E., Hinton, G. E., Williams, R. J. Learning representations by back-propagating errors. Nature 323, 533–536, 1986.
[6] Michael Nielsen. Neural Networks and Deep Learning. http://neuralnetworksanddeeplearning.com/index.html. Last visited on Dec 2018.
[7] Yann LeCun, Corinna Cortes, Christopher J.C. Burges. THE MNIST DATABASE of handwritten digits. http://yann.lecun.com/exdb/mnist/, last visited on Dec 2018
[8] Yuanqing Lin, Fengjun Lv, Shenghuo Zhu, Ming Yang, Timothee Cour, Kai Yu, Liangliang Cao, Thomas Huang. Large-scale image classification: Fast feature extraction and SVM training. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1689-1696, 2011.
[9]Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in neural information processing systems, pages 1097-1105, 2012.
[10] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich. Going Deeper with Convolutions. arXiv:1409.4842v1, 2014.
[11] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. Deep Residual Learning for Image Recognition. IEEE, pages 770-778, 2016.
[12] D. H. Hubel and T. N. Wiesel, “Receptive fields of single neurones in the cat’s striate cortex,”J. Physiol. London 148, 574–591 (1959).
[13] F. Chollet. Xception: Deep learning with depth wise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[14] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le. Learning transferable architectures for scalable image recognition. arXiv:1707.07012, 2017.
[15] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pages 4510–4520, 2018.
[16] Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. Searching for MobileNetV3. arXiv:1905.02244, 2019.
[17] Keras Documentation. https://keras.io/applications/, last visited on Feb 2020.
[18] Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR, abs/1704.04861, 2017.
[19] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the Inception architecture for computer vision. arXiv:1512.00567, 2015.
[20] B. Zoph and Q. V. Le. Neural architecture search with reinforcement learning. In International Conference on Learning Representations(ICLR), 2017.
[21] CIFAR-10. https://www.cs.toronto.edu/~kriz/cifar.html, last visited on Dec 2019.
[22] J. Hu, L. Shen, and G. Sun. Squeeze-and-excitation networks. arXiv:1709.01507, 2017.
[23] Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. Bisenet: Bilateral segmentation network for real-time semantic segmentation. arXiv:1808.00897, 2018.
[24] Ping Chao, Chao-Yang Kao, Yu-Shan Ruan, Chien-Hsiang Huang, and Youn-Long Lin. HarDNet: A low memory traffic network. Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2019.
[25] Real-Time Semantic Segmentation on Cityscapes test. https://paperswithcode.com/sota/real-time-semantic-segmentation-on-cityscapes/, last visited on Feb 2020.
[26] Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3431-3440, 2015.
[27] A. Loquercio, A. I. Maqueda, C. R. del-Blanco, and D. Scaramuzza. Dronet: Learning to fly by driving. IEEE Robotics and Automation Letters 3, 1088-1095, 2018.
[28] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Machine Learning Res. 15, 1929–1958, 2014.
[29] Glorot, X., Bordes, A., Bengio. Y. Deep sparse rectifier neural networks. Proc. 14th International Conference on Artificial Intelligence and Statistics 315–323, 2011.
[30] Udacity. An Open Source Self-Driving Car. https://www.udacity.com/self-driving-car, 2016. Last visited on Dec 2018.
[31] A. Giusti, J. Guzzi, D. C. Cirean, F. L. He, J. P. Rodrguez, F. Fontana, M. Faessler, C. Forster, J. Schmidhuber, G. D. Caro, D. Scaramuzza, and L. M. Gambardella. A machine learning approach to visual perception of forest trails for mobile robots. IEEE Robotics and Automation Letters, 2016.
[32] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014.
[33] Zhengqi Li, Noah Snavely. MegaDepth: Learning Single-View Depth Prediction from Internet Photos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[34] W. Chen, Z. Fu, D. Yang, J. Deng. Single-image depth perception in the wild. Neural Information Processing Systems, pages 730–738, 2016.
[35] J. L. Schonberger, J.-M. Frahm. Structure-from-motion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4104–4113, 2016.
[36] J. L. Schonberger, E. Zheng, J.-M. Frahm, M. Pollefeys. Pixelwise view selection for unstructured multi-view stereo. In Proc. European Conf. on Computer Vision (ECCV), pages 501–518, 2016.
[37] H. Zhao, J. Shi, X. Qi, X. Wang, J. Jia. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[38] B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso, A. Torralba. Scene parsing through ade20k dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[39] D. Eigen, R. Fergus. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proc. Int. Conf. on Computer Vision (ICCV), pages 2650–2658, 2015.
[40] I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, N. Navab. Deeper depth prediction with fully convolutional residual networks. In Int. Conf. on 3D Vision (3DV), pages 239–248, 2016.
[41] D. Eigen, C. Puhrsch, R. Fergus. Depth map prediction from a single image using a multi-scale deep network. In Neural Information Processing Systems, pages 2366–2374, 2014.
[42] A. Saxena, S. H. Chung, A. Y. Ng. Learning depth from single monocular images. In Neural Information Processing Systems, volume 18, pages 1–8, 2005.
[43] C. Godard, O. Mac Aodha, G. J. Brostow. Unsupervised monocular depth estimation with left-right consistency. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[44] Geiger, Andreas, Lenz, Philip, Stiller, Christoph, and Urtasun, Raquel. Vision meets robotics: The KITTI dataset. International Journal of Robotics Research, 32(11), 2013.
[45] R. P. Mihail, S. Workman, Z. Bessinger, and N. Jacobs. Sky segmentation in the wild: An empirical study. In Proceedings of IEEE Winter Conference on Applications of Computer Vision(WACV), pages 1–6, 2016.
[46] Cordts, Marius, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, Bernt Schiele. The Cityscapes Dataset for Semantic Urban Scene Understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[47] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV, 2018.
[48] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. The Pascal Visual Object Classes (VOC) Challenge. IJCV, pages 303–338, 2010.
[49] D. Pelleg and A. Moore. X-means: Extending k-means with efficient estimation of the number of clusters. In International Conference on Machine Learning, pages 727–734, 2000.
[50] Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. arXiv:1610.02391, 2016.
[51] B. Zhou, A. Khosla, L. A., A. Oliva, and A. Torralba. Learning Deep Features for Discriminative Localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
zh_TW
dc.identifier.doi (DOI) 10.6814/NCCU202000432en_US