Title: 基於深度學習之衛星圖像建物偵測
Detection of Buildings in Satellite Images Using Deep Learning Techniques
Authors: 陳芝宇
Chen, Chih-Yu
Contributors: 李蔡彥

Li, Tsai-Yen
Liao, Wen-Hung

Keywords: 衛星圖像
Satellite Images
Object Detection
Image Segmentation
Date: 2021
Issue Date: 2021-11-01 12:01:23 (UTC+8)
Abstract: 衛星照片的應用日趨廣泛,從衛星照片中辨識出不同物體的位置,是一項具挑戰性的任務。近年來伴隨人工智慧與深度學習的快速發展,自動物件辨識與偵測已取得不錯的成果,然針對衛星照片的物件辨識,仍有進一步研究改進的空間,特別是低解析度衛星圖資。
本研究以Google Maps及Xview兩種不同解析度的衛星圖像資料集為基礎,希能透過深度學習的方法,快速地判別出建築物的位置,同時探討不同資料集所適用的方法是否有差異。由於Google Maps衛星圖缺乏物體的標記,為加速資料準備流程,本論文提出了一套圖像分割演算法,將Map街景圖透過顏色區分前後景、中值濾波器過濾雜訊、找物體再計算面積,最後將建築物與背景成功分離。
有關物件偵測方法,嘗試過多種深度學習框架後,我們選擇以YOLOv5x6模型為基底,設計高解析度、強化和未強化、擴增通道等不同之影像強化前處理模型,調校模型中Anchor偵測框數量以及門檻值,最後與原圖模型進行比較,以了解不同模型對準確度、召回率與mAP等辨識品質指標的影響。實驗結果顯示, Google Maps資料集的mAP最佳值0.687,而Xview資料集mAP最佳值0.783。我們以實驗方式證明影像強化的前處理方法對提高衛星影像的辨識率有幫助,且不同類型資料集的最佳方法亦有所不同,可作為衛星影像辨識後續應用的參考。
Satellite images have been utilized in a wide range of applications. However, identifying the location of various types of objects from satellite images remains a challenging task. Thanks to the recent rapid development of artificial intelligence and deep learning, the research on automatic object detection has made great strides. This thesis attempts to apply the latest technology in improving object recognition from satellite images, especially for low-resolution data.
Two satellite image datasets with different resolutions, namely, Google Maps, and Xview, are employed to investigate whether there were discrepancies in current techniques. Since the images in Google Maps lack ground truth labels, this thesis proposed an image segmentation algorithm to distinguish foreground (buildings) and background in the map street view by combining color features, noise filtering, object localization and area computation.
Regarding object detection methods, after testing various deep learning frameworks, we chose the YOLOv5x6 as the baseline model. We designed different pre-processing methods including super-resolution, edge enhancement, and augmented channels to improve the accuracy. Additionally, the calibration of the number of Anchor detection frames and threshold values in the models were investigated. Comparative analysis was conducted to understand the effects of various factors on performance metrics such as accuracy, recall rate and mAP. Experimental results showed that the highest mAP is 0.687 for the Google Maps dataset and 0.783 for the Xview dataset, demonstrating that image pre-processing is beneficial for improving the recognition rate. Moreover, the best method differed for various types of datasets. We expect these results to serve as an informative reference for subsequent analysis of satellite imagery.
