學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

題名 基於深度學習之衛星圖像建物偵測
Detection of Buildings in Satellite Images Using Deep Learning Techniques
作者 陳芝宇
Chen, Chih-Yu
貢獻者 李蔡彥<br>廖文宏
Li, Tsai-Yen<br>Liao, Wen-Hung
陳芝宇
Chen, Chih-Yu
關鍵詞 衛星圖像
邊緣偵測
YOLOv5
物件辨識
圖像分割
超解析度
Satellite Images
YOLOv5
Object Detection
Image Segmentation
Super-resolution
日期 2021
上傳時間 1-Nov-2021 12:01:23 (UTC+8)
摘要 衛星照片的應用日趨廣泛,從衛星照片中辨識出不同物體的位置,是一項具挑戰性的任務。近年來伴隨人工智慧與深度學習的快速發展,自動物件辨識與偵測已取得不錯的成果,然針對衛星照片的物件辨識,仍有進一步研究改進的空間,特別是低解析度衛星圖資。
本研究以Google Maps及Xview兩種不同解析度的衛星圖像資料集為基礎,希能透過深度學習的方法,快速地判別出建築物的位置,同時探討不同資料集所適用的方法是否有差異。由於Google Maps衛星圖缺乏物體的標記,為加速資料準備流程,本論文提出了一套圖像分割演算法,將Map街景圖透過顏色區分前後景、中值濾波器過濾雜訊、找物體再計算面積,最後將建築物與背景成功分離。
有關物件偵測方法,嘗試過多種深度學習框架後,我們選擇以YOLOv5x6模型為基底,設計高解析度、強化和未強化、擴增通道等不同之影像強化前處理模型,調校模型中Anchor偵測框數量以及門檻值,最後與原圖模型進行比較,以了解不同模型對準確度、召回率與mAP等辨識品質指標的影響。實驗結果顯示, Google Maps資料集的mAP最佳值0.687,而Xview資料集mAP最佳值0.783。我們以實驗方式證明影像強化的前處理方法對提高衛星影像的辨識率有幫助,且不同類型資料集的最佳方法亦有所不同,可作為衛星影像辨識後續應用的參考。
Satellite images have been utilized in a wide range of applications. However, identifying the location of various types of objects from satellite images remains a challenging task. Thanks to the recent rapid development of artificial intelligence and deep learning, the research on automatic object detection has made great strides. This thesis attempts to apply the latest technology in improving object recognition from satellite images, especially for low-resolution data.
Two satellite image datasets with different resolutions, namely, Google Maps, and Xview, are employed to investigate whether there were discrepancies in current techniques. Since the images in Google Maps lack ground truth labels, this thesis proposed an image segmentation algorithm to distinguish foreground (buildings) and background in the map street view by combining color features, noise filtering, object localization and area computation.
Regarding object detection methods, after testing various deep learning frameworks, we chose the YOLOv5x6 as the baseline model. We designed different pre-processing methods including super-resolution, edge enhancement, and augmented channels to improve the accuracy. Additionally, the calibration of the number of Anchor detection frames and threshold values in the models were investigated. Comparative analysis was conducted to understand the effects of various factors on performance metrics such as accuracy, recall rate and mAP. Experimental results showed that the highest mAP is 0.687 for the Google Maps dataset and 0.783 for the Xview dataset, demonstrating that image pre-processing is beneficial for improving the recognition rate. Moreover, the best method differed for various types of datasets. We expect these results to serve as an informative reference for subsequent analysis of satellite imagery.
參考文獻 參考資料

[1] “Xview Dataset,” DIU, [線上]. Available: http://xviewdataset.org/.
[2] H.Zhao, X.Kong, J.He, Y.Qiao, C.Dong, “Efficient Image Super-Resolution Using Pixel Attention,” arXiv preprint arXiv:2010.01073, Oct 2020.
[3] X.Wang, K.Yu, S.Wu, J.Gu, Y.Liu, C.Dong, C.Loy, Y.Qiao, X.Tang, “ESRGAN: Enhanced Super-Resolution,” arXiv preprint arXiv:1809.00219, Sep 2018.
[4] H. Zhao, X. Kong, J. He, Y. Qiao, C. Dong, “Efficient Image Super-Resolution Using Pixel Attention,” in Prof. of European Conference on Computer Vision (ECCV) Workshops, 2020.
[5] Y. James, “[資料分析&機器學習] 第5.1講: 卷積神經網絡介紹(Convolutional Neural Network),” 2017. [線上]. Available: https://medium.com/jameslearningnote/%E8%B3%87%E6%96%99%E5%88%86%E6%9E%90-%E6%A9%9F%E5%99%A8%E5%AD%B8%E7%BF%92-%E7%AC%AC5-1%E8%AC%9B-%E5%8D%B7%E7%A9%8D%E7%A5%9E%E7%B6%93%E7%B6%B2%E7%B5%A1%E4%BB%8B%E7%B4%B9-convolutional-neural-network-4f8249d65d4f.
[6] R. Girshick, “Fast R-CNN,” in Proc. of 2015 IEEE International Conference on Computer Vision (ICCV), Dec 2015.
[7] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, A. C. Berg, “SSD: Single Shot MultiBox Detector,” arXiv preprint arXiv:1512.02325, Dec 2016.
[8] A. Bochkovskiy, C.-Y. Wang, H.-Y. M. Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection,” arXiv preprint arXiv:2004.10934., April 2020.
[9] J. Canny, “A Computational Approach To Edge Detection,” IEEE Trans. Pattern Analysis and Machine Intelligence., 1986.
[10] N. Kanopoulos, N. Vasanthavada, R.L. Baker, “Design of an image edge detection filter using the Sobel operator,” IEEE Journal of Solid-State Circuits, April 1988.
[11] T. Xishan, “A Novel Image Edge Detection Algorithm based on Prewitt Operator and Wavelet Transform,” International Journal of Advancements in Computing Technology, 2012.
[12] E. Roopa, H. Ramachandra, P. Shubha, “Buildings Detection from Very High Resolution Satellite images Using Segmentation and Morphological Operations,” in Proc. of International Conference on Design Innovations for 3Cs Compute Communicate Control, 2018.
[13] K. Zhang, J. Liang, L. Gool, R. Timofte, “Designing a Practical Degradation Model for Deep Blind Image Super-Resolution,” Computer Vision and Pattern Recognition, May 2021.
[14] “Image Quality Metrics - MATLAB & Simulink - MathWorks,” [線上]. Available: https://ww2.mathworks.cn/help/images/image-quality-metrics.html.
[15] G. Jocher, “YOLOv5 github,” 2020. [線上]. Available: https://github.com/ultralytics/YOLOv5.
[16] 白勇, “CSDN,” [線上]. Available: https://edu.csdn.net/course/detail/31428.
[17] Y. Zhang, Y. Yin, R. Zimmermann, G. Wang, J. Varadarajan, S. Ng, An Enhanced GAN Model for Automatic Satellite-to-Map Image Conversion,” IEEE Access, Sep 2020.
[18] D. Lam, R. Kuzma, K. McGee, S. Dooley, M. Laielli, “xView: Objects in Context in Overhead Imagery,” arXiv preprint arXiv:1802.07856, Feb 2018.
[19] “Wiki,” [線上]. Available: https://en.wikipedia.org/wiki/Precision_and_recall.
[20] O. Ronneberger, P. Fischer, T. Brox, “U-Net: Convolutional Networks for Biomedical,” arXiv preprint arXiv:1505.04597, 18 May 2015.
[21] J. Zhu, T. Park, P. Isola, A. A. Efros, “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks,” in Proc. of ICCV, 2017.
[22] I. J. Goodfellow, J. Abadie, “Generative Adversarial Nets,” Advances in Neural Information Processing Systems for GAN, 3(11), p. 9, 10 6 2014.
[23] A. Ramya, V. Pola, A.Vaishnavi, S. Karra, “Comparison of YOLOv3, YOLOv4 and YOLOv5 Performance for Detection of Blood Cells,” International Research Journal of Engineering and Technology (IRJET), April 2021.
[24] S. Wang, O. Wang, R. Zhang, A. Owens, A. Efros, “CNN-generated images are surprisingly easy to spot...for now,” in Proc. of CVPR 2020.
描述 碩士
國立政治大學
資訊科學系
108971021
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0108971021
資料類型 thesis
dc.contributor.advisor 李蔡彥<br>廖文宏zh_TW
dc.contributor.advisor Li, Tsai-Yen<br>Liao, Wen-Hungen_US
dc.contributor.author (Authors) 陳芝宇zh_TW
dc.contributor.author (Authors) Chen, Chih-Yuen_US
dc.creator (作者) 陳芝宇zh_TW
dc.creator (作者) Chen, Chih-Yuen_US
dc.date (日期) 2021en_US
dc.date.accessioned 1-Nov-2021 12:01:23 (UTC+8)-
dc.date.available 1-Nov-2021 12:01:23 (UTC+8)-
dc.date.issued (上傳時間) 1-Nov-2021 12:01:23 (UTC+8)-
dc.identifier (Other Identifiers) G0108971021en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/137679-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學系zh_TW
dc.description (描述) 108971021zh_TW
dc.description.abstract (摘要) 衛星照片的應用日趨廣泛,從衛星照片中辨識出不同物體的位置,是一項具挑戰性的任務。近年來伴隨人工智慧與深度學習的快速發展,自動物件辨識與偵測已取得不錯的成果,然針對衛星照片的物件辨識,仍有進一步研究改進的空間,特別是低解析度衛星圖資。
本研究以Google Maps及Xview兩種不同解析度的衛星圖像資料集為基礎,希能透過深度學習的方法,快速地判別出建築物的位置,同時探討不同資料集所適用的方法是否有差異。由於Google Maps衛星圖缺乏物體的標記,為加速資料準備流程,本論文提出了一套圖像分割演算法,將Map街景圖透過顏色區分前後景、中值濾波器過濾雜訊、找物體再計算面積,最後將建築物與背景成功分離。
有關物件偵測方法,嘗試過多種深度學習框架後,我們選擇以YOLOv5x6模型為基底,設計高解析度、強化和未強化、擴增通道等不同之影像強化前處理模型,調校模型中Anchor偵測框數量以及門檻值,最後與原圖模型進行比較,以了解不同模型對準確度、召回率與mAP等辨識品質指標的影響。實驗結果顯示, Google Maps資料集的mAP最佳值0.687,而Xview資料集mAP最佳值0.783。我們以實驗方式證明影像強化的前處理方法對提高衛星影像的辨識率有幫助,且不同類型資料集的最佳方法亦有所不同,可作為衛星影像辨識後續應用的參考。
zh_TW
dc.description.abstract (摘要) Satellite images have been utilized in a wide range of applications. However, identifying the location of various types of objects from satellite images remains a challenging task. Thanks to the recent rapid development of artificial intelligence and deep learning, the research on automatic object detection has made great strides. This thesis attempts to apply the latest technology in improving object recognition from satellite images, especially for low-resolution data.
Two satellite image datasets with different resolutions, namely, Google Maps, and Xview, are employed to investigate whether there were discrepancies in current techniques. Since the images in Google Maps lack ground truth labels, this thesis proposed an image segmentation algorithm to distinguish foreground (buildings) and background in the map street view by combining color features, noise filtering, object localization and area computation.
Regarding object detection methods, after testing various deep learning frameworks, we chose the YOLOv5x6 as the baseline model. We designed different pre-processing methods including super-resolution, edge enhancement, and augmented channels to improve the accuracy. Additionally, the calibration of the number of Anchor detection frames and threshold values in the models were investigated. Comparative analysis was conducted to understand the effects of various factors on performance metrics such as accuracy, recall rate and mAP. Experimental results showed that the highest mAP is 0.687 for the Google Maps dataset and 0.783 for the Xview dataset, demonstrating that image pre-processing is beneficial for improving the recognition rate. Moreover, the best method differed for various types of datasets. We expect these results to serve as an informative reference for subsequent analysis of satellite imagery.
en_US
dc.description.tableofcontents 目錄
第 1 章 緒論 1
1.1 研究動機 1
1.2 研究目的與貢獻 2
1.2.1 Google Maps資料集挑戰 3
1.2.2 Xview資料集挑戰 3
1.2.3 整體貢獻 3
1.3 論文架構 4
第 2 章 相關研究 5
2.1 卷積神經網路(CNN) 5
2.2 物件偵測 6
2.2.1 圖像邊緣偵測 7
2.2.2 提高解析度研究 10
2.3 物件偵測 18
2.3.1 SSD [6] 19
2.3.2 YOLO [7] 20
2.4 衛星照片轉換研究 24
第 3 章 資料集介紹與先期研究 27
3.1 資料集介紹以及比較 27
3.1.1 Google Map 衛星圖資料集 28
3.1.2 Xview 28
3.2 標記軟體 31
3.3 評估指標 33
3.3.1 Precision and Recall 33
3.3.2 IoU (Intersection over Union) 34
3.3.3 Average Precision (AP) 34
3.3.4 Mean Average Precision(mAP) 34
3.3.5 F1 Score 35
3.3.6 資料增補(Data Augmentation) 35
3.4先期研究-圖像分割方法 36
3.3.7 U-Net 37
3.3.8 Cycle-GAN [21] 39
3.3.9 YOLOv3 & YOLOv5x6 44
3.3.10 SSD 512 44
3.3.11 小結 46
第 4 章 研究方法 48
4.1 Ground Truth Label產生 48
4.1.1 建築物分割演算法 48
4.1.2 標籤產生演算法 49
4.2 資料前處理強化方法 50
4.2.1 圖像合成加強法 51
4.2.2 圖片合成比例實驗 52
4.2.3 超解析度圖片轉換 52
4.2.4 圖像邊緣增強法 54
4.2.5 圖像通道擴展法(四通道) 55
4.3 模型調校 57
4.4 小結 58
第 5 章 實驗結果與討論 60
5.1 測試環境 61
5.2 資料擴充參數介紹 61
5.3 實驗項目一:不同資料前處理強化之結果 62
5.4 實驗參數(Xview、Google Maps) 63
5.5 實驗項目二:比較模型anchor數不同的結果 65
5.5.1 Google Maps資料集 anchor比較 66
5.5.2 Xview 資料集anchor 比較 68
5.6 實驗項目三:比較模型權重在不同前處理強化的結果 71
5.7 Google Maps 資料集模型優化探討 72
5.7.1 實驗基準 72
5.7.2 50%合成模型 73
5.7.3 超解析度訓練模型 73
5.7.4 圖像邊緣訓練模型 73
5.7.5 圖像通道擴展訓練模型 74
5.7.6 測試結果 74
5.8 Xview資料集模型優化探討 79
5.8.1 實驗基準 79
5.8.2 50%合成模型 80
5.8.3 圖像邊緣訓練模型 80
5.8.4 圖像通道擴展訓練模型 80
5.8.5 測試結果 81
5.9 小結 85
第 6 章 結論與未來展望 87
參考資料 89
zh_TW
dc.format.extent 6850053 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0108971021en_US
dc.subject (關鍵詞) 衛星圖像zh_TW
dc.subject (關鍵詞) 邊緣偵測zh_TW
dc.subject (關鍵詞) YOLOv5zh_TW
dc.subject (關鍵詞) 物件辨識zh_TW
dc.subject (關鍵詞) 圖像分割zh_TW
dc.subject (關鍵詞) 超解析度zh_TW
dc.subject (關鍵詞) Satellite Imagesen_US
dc.subject (關鍵詞) YOLOv5en_US
dc.subject (關鍵詞) Object Detectionen_US
dc.subject (關鍵詞) Image Segmentationen_US
dc.subject (關鍵詞) Super-resolutionen_US
dc.title (題名) 基於深度學習之衛星圖像建物偵測zh_TW
dc.title (題名) Detection of Buildings in Satellite Images Using Deep Learning Techniquesen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) 參考資料

[1] “Xview Dataset,” DIU, [線上]. Available: http://xviewdataset.org/.
[2] H.Zhao, X.Kong, J.He, Y.Qiao, C.Dong, “Efficient Image Super-Resolution Using Pixel Attention,” arXiv preprint arXiv:2010.01073, Oct 2020.
[3] X.Wang, K.Yu, S.Wu, J.Gu, Y.Liu, C.Dong, C.Loy, Y.Qiao, X.Tang, “ESRGAN: Enhanced Super-Resolution,” arXiv preprint arXiv:1809.00219, Sep 2018.
[4] H. Zhao, X. Kong, J. He, Y. Qiao, C. Dong, “Efficient Image Super-Resolution Using Pixel Attention,” in Prof. of European Conference on Computer Vision (ECCV) Workshops, 2020.
[5] Y. James, “[資料分析&機器學習] 第5.1講: 卷積神經網絡介紹(Convolutional Neural Network),” 2017. [線上]. Available: https://medium.com/jameslearningnote/%E8%B3%87%E6%96%99%E5%88%86%E6%9E%90-%E6%A9%9F%E5%99%A8%E5%AD%B8%E7%BF%92-%E7%AC%AC5-1%E8%AC%9B-%E5%8D%B7%E7%A9%8D%E7%A5%9E%E7%B6%93%E7%B6%B2%E7%B5%A1%E4%BB%8B%E7%B4%B9-convolutional-neural-network-4f8249d65d4f.
[6] R. Girshick, “Fast R-CNN,” in Proc. of 2015 IEEE International Conference on Computer Vision (ICCV), Dec 2015.
[7] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, A. C. Berg, “SSD: Single Shot MultiBox Detector,” arXiv preprint arXiv:1512.02325, Dec 2016.
[8] A. Bochkovskiy, C.-Y. Wang, H.-Y. M. Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection,” arXiv preprint arXiv:2004.10934., April 2020.
[9] J. Canny, “A Computational Approach To Edge Detection,” IEEE Trans. Pattern Analysis and Machine Intelligence., 1986.
[10] N. Kanopoulos, N. Vasanthavada, R.L. Baker, “Design of an image edge detection filter using the Sobel operator,” IEEE Journal of Solid-State Circuits, April 1988.
[11] T. Xishan, “A Novel Image Edge Detection Algorithm based on Prewitt Operator and Wavelet Transform,” International Journal of Advancements in Computing Technology, 2012.
[12] E. Roopa, H. Ramachandra, P. Shubha, “Buildings Detection from Very High Resolution Satellite images Using Segmentation and Morphological Operations,” in Proc. of International Conference on Design Innovations for 3Cs Compute Communicate Control, 2018.
[13] K. Zhang, J. Liang, L. Gool, R. Timofte, “Designing a Practical Degradation Model for Deep Blind Image Super-Resolution,” Computer Vision and Pattern Recognition, May 2021.
[14] “Image Quality Metrics - MATLAB & Simulink - MathWorks,” [線上]. Available: https://ww2.mathworks.cn/help/images/image-quality-metrics.html.
[15] G. Jocher, “YOLOv5 github,” 2020. [線上]. Available: https://github.com/ultralytics/YOLOv5.
[16] 白勇, “CSDN,” [線上]. Available: https://edu.csdn.net/course/detail/31428.
[17] Y. Zhang, Y. Yin, R. Zimmermann, G. Wang, J. Varadarajan, S. Ng, An Enhanced GAN Model for Automatic Satellite-to-Map Image Conversion,” IEEE Access, Sep 2020.
[18] D. Lam, R. Kuzma, K. McGee, S. Dooley, M. Laielli, “xView: Objects in Context in Overhead Imagery,” arXiv preprint arXiv:1802.07856, Feb 2018.
[19] “Wiki,” [線上]. Available: https://en.wikipedia.org/wiki/Precision_and_recall.
[20] O. Ronneberger, P. Fischer, T. Brox, “U-Net: Convolutional Networks for Biomedical,” arXiv preprint arXiv:1505.04597, 18 May 2015.
[21] J. Zhu, T. Park, P. Isola, A. A. Efros, “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks,” in Proc. of ICCV, 2017.
[22] I. J. Goodfellow, J. Abadie, “Generative Adversarial Nets,” Advances in Neural Information Processing Systems for GAN, 3(11), p. 9, 10 6 2014.
[23] A. Ramya, V. Pola, A.Vaishnavi, S. Karra, “Comparison of YOLOv3, YOLOv4 and YOLOv5 Performance for Detection of Blood Cells,” International Research Journal of Engineering and Technology (IRJET), April 2021.
[24] S. Wang, O. Wang, R. Zhang, A. Owens, A. Efros, “CNN-generated images are surprisingly easy to spot...for now,” in Proc. of CVPR 2020.
zh_TW
dc.identifier.doi (DOI) 10.6814/NCCU202101630en_US