學術產出-學位論文

文章檢視/開啟

書目匯出

Google ScholarTM

政大圖書館

引文資訊

TAIR相關學術產出

題名 以深度學習自動偵測航空真正射影像中的建物邊界像素
Applying deep learning to automatically detect building boundary pixels from aerial true orthoimages
作者 謝皓展
Hsieh, Hao-Chan
貢獻者 邱式鴻
Chio, Shih-Hong
謝皓展
Hsieh, Hao-Chan
關鍵詞 航空真正射影像
數值高度模型
深度學習
遷移學習
建物邊界線偵測
True orthoimage
Digital height model
Deep learning
Transfer learning
Building boundary detection
日期 2025
上傳時間 4-八月-2025 15:08:20 (UTC+8)
摘要 本研究聚焦於運用深度學習技術,提出可直接從航空真正射影像中自動偵測建物邊界像素的方法並探討後處理的幾何精度。相較於傳統方法多以偵測建物區塊(footprint)為中介,需再萃取區塊邊界經多邊形態處理,本研究嘗試繞過中間階段,直接針對邊界線像素進行模型學習與預測。 資料來源選擇台北市信義區作為訓練區域,並以建物類型與密度差異顯著的台東縣綠島鄉作為測試區域,以驗證模型在異質場景下之分割能力。透過Metashape 軟體將航拍影像產製的點雲與真正射影像,進一步將點雲後處理產生數值高度模型(Digital Height Model, DHM)影像。為提升高度品質,本研究過程採用 CloudCompare 進行點雲過濾處理去除大錯,並結合 ArcGIS 進行建物向量資料遺漏處之補測與製作學度訓練用之標籤資料。將裁切之後的512x512像素之航空真正射影像、相對應之DHM影像、建物標籤等訓練資料透過資料擴增(如裁切、翻轉、旋轉)提升模型訓練樣本。 模型架構部分,首先選用U-Net為主要影像分割框架,並以台北市信義區資料作為模型訓練資料,設計航空真正射影像RGB波段與 DHM資料之多通道輸入組合,以結合光譜與高度資訊強化建物邊界辨識能力。訓練過程中比較不同資料組合以及不同標籤資料其建物邊界寬度設定(1、3、5、7 像素)對模型精度之影響,實驗結果顯示,融合經過濾之 DHM 資料能有效提升模型邊界像素偵測能力;標籤寬度設定在 3–5像素寬 相較1像素與7像素寬,可偵測較佳之邊界像素;接著比較ResUNet-34與U-Net深度學習模型之成果,結果在所有分割精度指標上ResUNet-34 皆優於傳統 U-Net。而在台東縣綠島鄉的測試資料執行遷移學習時,得到資料量達原訓練集約 10% 時即可獲得穩定效果。最後將模型輸出邊界像素經骨架化處理後萃取細化之邊界線,並以ASSD、評估預測資料位於原標籤資料像素某範圍內之佔比做為精度評估指標進行評估。整題而言,本文所提出之方法經後續細化處理,具備初步應用於建物邊界線自動化萃取之可行性,但其建物邊界線萃取之完整性仍待加強。
This study focuses on leveraging deep learning technique to propose a method for directly detecting building boundary pixels from aerial true orthoimages, and further evaluates the geometric accuracy of the post-processing results. In contrast to conventional approache that detect building footprints as an intermediate step before extracting their boundaries through polygonal processing, this study attempts to bypass such intermediate procedures by training models to directly predict boundary pixels. The training dataset is derived from the Xinyi District of Taipei City, while the test area is set in Ludao Township, Taitung County, a region with distinct differences in building types and density. This allows for the evaluation of model generalization across heterogeneous scenes. Aerial images were processed using Metashape to generate point clouds and true orthoimages, which were then used to produce Digital Height Models (DHMs). To enhance the quality of the height data, CloudCompare was utilized to filter outliers from the point clouds, and ArcGIS was used to supplementary compilation of building vector data and prepare training label datasets. Training data—comprising 512×512 pixel patches of true orthoimages, corresponding DHMs, and building labels—were augmented through cropping, flipping, and rotation. For the model architecture, U-Net was initially adopted as the primary segmentation framework, trained using data from Taipei City, incorporating both RGB bands from true orthimages and DHMs as multi-channel inputs to integrate spectral and height information for improved boundary detection. The training process also investigated the impact of various data combination and boundary label widths (1, 3, 5, and 7 pixels) on model accuracy. Results indicated that integrating filtered DHM data significantly improved boundary pixel detection. Labels with 3–5 pixel widths yielded better results compared to 1 or 7 pixels. Additionally, a comparison between U-Net and ResUNet-34 architectures revealed that ResUNet-34 outperformed traditional U-Net across all segmentation accuracy metrics. Transfer learning experiments conducted on Ludao Township demonstrated that stable performance could be achieved with as little as 10% of the original training dataset. The predicted boundary pixels were refined through skeletonization, and the geometric accuracy was assessed using indicators both Average Symmetric Surface Distance (ASSD) and the proportion of predicted pixels falling within a certain range of ground truth labels. Overall, the proposed method, when followed by refinement steps, demonstrates the preliminary feasibility of automated building boundary extraction. However, further improvements are needed to enhance the continuity and completeness of detected boundaries.
參考文獻 一、中文參考文獻 許家群,2006,使用邊緣資訊之區域成長演算法與互動式編輯工具在醫學影像分割上的應用。臺灣大學電子工程學研究所碩士論文:臺北市。 林宏軒、李肇棠、江滄明,2018,「深度學習的訓練資料準備與平台之演進發展」,『電腦與通訊』,(174):5-21。 汪知馨(2022)。以深度學習萃取高解析度無人機正射影像之農地重劃區現況資訊。國立政治大學地政學系碩士論文:臺北市。 許家彰、邱式鴻,2023,應用深度學習於不同時期真正射影像自動偵測建物變 遷,《航測及遙測學刊》,第28卷,第4期,頁 209-226。 張智安、傅于洳(2021)。應用深度學習於航照正射影像之房屋偵測。《航測及遙測學刊》,26(4),209–220 邱式鴻、邱俊榮(2025)。都市區有人機傾斜攝影製作真正射影像精度評估與探討。《國土測繪與空間資訊》,13(1),109-130。 二、英文參考文獻 Abraham, E. (2018). Moiré pattern detection using wavelet decomposition and convolutional neural network. 2018 IEEE Symposium Series on Computational Intelligence (SSCI), 1275–1279. Badrinarayanan, V., Kendall, A., & Cipolla, R. (2017). SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12), 2481–2495. Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828. Canny, J. (1986). A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, (6), 679–698. Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2017). DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848. Cira, Calimanut-Ionut, Ramón Alcarria, and Francisco Serradilla, 2020, A Deep Learning-Based Solution for Large-Scale Extraction of the Secondary Road Network from High-Resolution Aerial Orthoimagery, 《Appl. Sci. 》, 10, 7272; doi:10.3390/app10207272. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297 Cover, T. M., & Hart, P. E. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21–27. Dalal, N., & Triggs, B. (2005). Histograms of Oriented Gradients for Human Detection. Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12, 2121–2159. Fetai, B., Oštir, K., Fras, M. K. and Lisec, A., 2019, “Extraction of Visible Boundaries for Cadastral Mapping Based on UAV Imagery” Remote Sensing, 11(13): 1510. Girard, N., Smirnov, D., Solomon, J., & Tarabalka, Y. (2021). Polygonal building extraction by frame field learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Guo, M., Liu, H., Xu, Y., & Huang, Y. (2020). Building extraction based on U-Net with an attention block and multiple losses. Remote Sensing, 12(9), 1400. Haykin, S. (1998). Neural networks: A comprehensive foundation. Prentice Hall PTR. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 770–778). Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651–666. Hinton, G. (2012). Lecture 6d: A separate, adaptive learning rate for each connection. Slides of Lecture Neural Networks for Machine Learning. Keskar, N. S., Mudigere, D., Nocedal, J., Smelyanskiy, M., & Tang, P. T. P. (2016). On large-batch training for deep learning: Generalization gap and sharp minima. arXiv preprint arXiv:1609.04836. Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. International Conference on Learning Representations (ICLR). Kirillov, A., He, K., Girshick, R., Rother, C., & Dollár, P. (2019). Panoptic segmentation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 9404–9413. Kornblith, S., Shlens, J., & Le, Q. V. (2019). Do better ImageNet models transfer better? Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2661–2671. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks, Communications of the ACM,60(6), 84 - 90, 2017. Li, W., He, C., Fang, J., Zheng, J., Fu, H., & Yu, L. (2019). Semantic segmentation-based building footprint extraction using very high-resolution satellite images and multi-source GIS data. Remote Sensing, 11(4), 403.Lipton, Z. C. (2016). The mythos of model interpretability. arXiv preprint arXiv:1606.03490. Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3431–3440. Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91–110. Milletari, F., Navab, N., & Ahmadi, S. A. (2016). V-net: Fully convolutional neural networks for volumetric medical image segmentation. In Fourth International Conference on 3D Vision (3DV). Nahhas, Faten Hamed, Helmi Z. M. Shafri , Maher Ibrahim Sameen, Biswajeet Pradhan, and Shattri Mansor, 2018, Deep Learning Approach for Building Detection Using LiDAR–Orthophoto Fusion, 《Journal of Sensors》, Article ID 7212307, 12 pages https://doi.org/10.1155/2018/721230 Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359. Polyak, B. T. (1964). Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics, 4(5), 1–17. Raghu, M., Zhang, C., Kleinberg, J., & Bengio, S. (2019). Transfusion: Understanding transfer learning for medical imaging. Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (pp. 234–241). Siddharth, M. (2021, September 14). Building ResNet-34 model using PyTorch – A Guide for Beginners. Analytics Vidhya. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Sobel, I. (1968). An isotropic 3×3 image gradient operator. Stanford Artificial Intelligence Project. Sun, X.(2021)。Deep learning-based building extraction using aerial images and digital surface models。University of Twente, Faculty of Geo-Information Science and Earth Observation, Enschede, The Netherlands。 Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., & Liu, C. (2018). A survey on deep transfer learning. In International Conference on Artificial Neural Networks. Taravat, A., Wanger, M. P., Bonifacio, R. and Petit D., 2021, “Advanced Fully Convolutional Networks for Agricultural Field Boundary Detection” Remote sensing, 13(4):722. Zhang, Z., Liu, Q., & Wang, Y., 2018, Road Extraction by Deep Residual U‑Net. IEEE Geoscience and Remote Sensing Letters, 15(5), 749–753. https://doi.org/10.1109/LGRS.2018.2802944 Zoph, B., Ghiasi, G., Lin, T.-Y., Cui, Y., Liu, H., Cubuk, E. D., & Le, Q. V. (2020). Rethinking pre-training and self-training. Advances in Neural Information Processing Systems, 33, 12215–12225.
描述 碩士
國立政治大學
地政學系
112257029
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0112257029
資料類型 thesis
dc.contributor.advisor 邱式鴻zh_TW
dc.contributor.advisor Chio, Shih-Hongen_US
dc.contributor.author (作者) 謝皓展zh_TW
dc.contributor.author (作者) Hsieh, Hao-Chanen_US
dc.creator (作者) 謝皓展zh_TW
dc.creator (作者) Hsieh, Hao-Chanen_US
dc.date (日期) 2025en_US
dc.date.accessioned 4-八月-2025 15:08:20 (UTC+8)-
dc.date.available 4-八月-2025 15:08:20 (UTC+8)-
dc.date.issued (上傳時間) 4-八月-2025 15:08:20 (UTC+8)-
dc.identifier (其他 識別碼) G0112257029en_US
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/158703-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 地政學系zh_TW
dc.description (描述) 112257029zh_TW
dc.description.abstract (摘要) 本研究聚焦於運用深度學習技術,提出可直接從航空真正射影像中自動偵測建物邊界像素的方法並探討後處理的幾何精度。相較於傳統方法多以偵測建物區塊(footprint)為中介,需再萃取區塊邊界經多邊形態處理,本研究嘗試繞過中間階段,直接針對邊界線像素進行模型學習與預測。 資料來源選擇台北市信義區作為訓練區域,並以建物類型與密度差異顯著的台東縣綠島鄉作為測試區域,以驗證模型在異質場景下之分割能力。透過Metashape 軟體將航拍影像產製的點雲與真正射影像,進一步將點雲後處理產生數值高度模型(Digital Height Model, DHM)影像。為提升高度品質,本研究過程採用 CloudCompare 進行點雲過濾處理去除大錯,並結合 ArcGIS 進行建物向量資料遺漏處之補測與製作學度訓練用之標籤資料。將裁切之後的512x512像素之航空真正射影像、相對應之DHM影像、建物標籤等訓練資料透過資料擴增(如裁切、翻轉、旋轉)提升模型訓練樣本。 模型架構部分,首先選用U-Net為主要影像分割框架,並以台北市信義區資料作為模型訓練資料,設計航空真正射影像RGB波段與 DHM資料之多通道輸入組合,以結合光譜與高度資訊強化建物邊界辨識能力。訓練過程中比較不同資料組合以及不同標籤資料其建物邊界寬度設定(1、3、5、7 像素)對模型精度之影響,實驗結果顯示,融合經過濾之 DHM 資料能有效提升模型邊界像素偵測能力;標籤寬度設定在 3–5像素寬 相較1像素與7像素寬,可偵測較佳之邊界像素;接著比較ResUNet-34與U-Net深度學習模型之成果,結果在所有分割精度指標上ResUNet-34 皆優於傳統 U-Net。而在台東縣綠島鄉的測試資料執行遷移學習時,得到資料量達原訓練集約 10% 時即可獲得穩定效果。最後將模型輸出邊界像素經骨架化處理後萃取細化之邊界線,並以ASSD、評估預測資料位於原標籤資料像素某範圍內之佔比做為精度評估指標進行評估。整題而言,本文所提出之方法經後續細化處理,具備初步應用於建物邊界線自動化萃取之可行性,但其建物邊界線萃取之完整性仍待加強。zh_TW
dc.description.abstract (摘要) This study focuses on leveraging deep learning technique to propose a method for directly detecting building boundary pixels from aerial true orthoimages, and further evaluates the geometric accuracy of the post-processing results. In contrast to conventional approache that detect building footprints as an intermediate step before extracting their boundaries through polygonal processing, this study attempts to bypass such intermediate procedures by training models to directly predict boundary pixels. The training dataset is derived from the Xinyi District of Taipei City, while the test area is set in Ludao Township, Taitung County, a region with distinct differences in building types and density. This allows for the evaluation of model generalization across heterogeneous scenes. Aerial images were processed using Metashape to generate point clouds and true orthoimages, which were then used to produce Digital Height Models (DHMs). To enhance the quality of the height data, CloudCompare was utilized to filter outliers from the point clouds, and ArcGIS was used to supplementary compilation of building vector data and prepare training label datasets. Training data—comprising 512×512 pixel patches of true orthoimages, corresponding DHMs, and building labels—were augmented through cropping, flipping, and rotation. For the model architecture, U-Net was initially adopted as the primary segmentation framework, trained using data from Taipei City, incorporating both RGB bands from true orthimages and DHMs as multi-channel inputs to integrate spectral and height information for improved boundary detection. The training process also investigated the impact of various data combination and boundary label widths (1, 3, 5, and 7 pixels) on model accuracy. Results indicated that integrating filtered DHM data significantly improved boundary pixel detection. Labels with 3–5 pixel widths yielded better results compared to 1 or 7 pixels. Additionally, a comparison between U-Net and ResUNet-34 architectures revealed that ResUNet-34 outperformed traditional U-Net across all segmentation accuracy metrics. Transfer learning experiments conducted on Ludao Township demonstrated that stable performance could be achieved with as little as 10% of the original training dataset. The predicted boundary pixels were refined through skeletonization, and the geometric accuracy was assessed using indicators both Average Symmetric Surface Distance (ASSD) and the proportion of predicted pixels falling within a certain range of ground truth labels. Overall, the proposed method, when followed by refinement steps, demonstrates the preliminary feasibility of automated building boundary extraction. However, further improvements are needed to enhance the continuity and completeness of detected boundaries.en_US
dc.description.tableofcontents 謝誌 i 摘要 ii 圖目錄 viii 表目錄 xi 第一章 緒論 - 1 - 第一節 研究背景與動機 - 1 - 第二節 研究目的 - 3 - 第三節 研究架構 - 4 - 第二章 文獻回顧 - 5 - 第一節 傳統影像邊緣線像素偵測 - 5 - 第二節 深度學習 - 8 - 第三節 遷移學習 - 10 - 第四節 影像分割 - 12 - 第五節 影像辨識 - 14 - 第六節 影像分割與影像辨識之差異 - 15 - 第七節 深度學習於建物邊界萃取之研究 - 17 - 第三章 研究方法 - 21 - 第一節 研究區域 - 21 - 第二節 研究資料與處理工具 - 23 - 第三節 研究流程 - 27 - 第四節 研究方法及理論基礎 - 29 - 第四章 實驗成果分析與討論 - 44 - 第一節 實驗資料 - 44 - 第二節 深度學習網路訓練 - 49 - 第三節 精度評估 - 68 - 第五章 結論與建議 - 83 - 第一節 結論 - 84 - 第二節 建議 - 86 - 參考文獻 - 88 -zh_TW
dc.format.extent 5263014 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0112257029en_US
dc.subject (關鍵詞) 航空真正射影像zh_TW
dc.subject (關鍵詞) 數值高度模型zh_TW
dc.subject (關鍵詞) 深度學習zh_TW
dc.subject (關鍵詞) 遷移學習zh_TW
dc.subject (關鍵詞) 建物邊界線偵測zh_TW
dc.subject (關鍵詞) True orthoimageen_US
dc.subject (關鍵詞) Digital height modelen_US
dc.subject (關鍵詞) Deep learningen_US
dc.subject (關鍵詞) Transfer learningen_US
dc.subject (關鍵詞) Building boundary detectionen_US
dc.title (題名) 以深度學習自動偵測航空真正射影像中的建物邊界像素zh_TW
dc.title (題名) Applying deep learning to automatically detect building boundary pixels from aerial true orthoimagesen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) 一、中文參考文獻 許家群,2006,使用邊緣資訊之區域成長演算法與互動式編輯工具在醫學影像分割上的應用。臺灣大學電子工程學研究所碩士論文:臺北市。 林宏軒、李肇棠、江滄明,2018,「深度學習的訓練資料準備與平台之演進發展」,『電腦與通訊』,(174):5-21。 汪知馨(2022)。以深度學習萃取高解析度無人機正射影像之農地重劃區現況資訊。國立政治大學地政學系碩士論文:臺北市。 許家彰、邱式鴻,2023,應用深度學習於不同時期真正射影像自動偵測建物變 遷,《航測及遙測學刊》,第28卷,第4期,頁 209-226。 張智安、傅于洳(2021)。應用深度學習於航照正射影像之房屋偵測。《航測及遙測學刊》,26(4),209–220 邱式鴻、邱俊榮(2025)。都市區有人機傾斜攝影製作真正射影像精度評估與探討。《國土測繪與空間資訊》,13(1),109-130。 二、英文參考文獻 Abraham, E. (2018). Moiré pattern detection using wavelet decomposition and convolutional neural network. 2018 IEEE Symposium Series on Computational Intelligence (SSCI), 1275–1279. Badrinarayanan, V., Kendall, A., & Cipolla, R. (2017). SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12), 2481–2495. Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828. Canny, J. (1986). A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, (6), 679–698. Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2017). DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848. Cira, Calimanut-Ionut, Ramón Alcarria, and Francisco Serradilla, 2020, A Deep Learning-Based Solution for Large-Scale Extraction of the Secondary Road Network from High-Resolution Aerial Orthoimagery, 《Appl. Sci. 》, 10, 7272; doi:10.3390/app10207272. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297 Cover, T. M., & Hart, P. E. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21–27. Dalal, N., & Triggs, B. (2005). Histograms of Oriented Gradients for Human Detection. Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12, 2121–2159. Fetai, B., Oštir, K., Fras, M. K. and Lisec, A., 2019, “Extraction of Visible Boundaries for Cadastral Mapping Based on UAV Imagery” Remote Sensing, 11(13): 1510. Girard, N., Smirnov, D., Solomon, J., & Tarabalka, Y. (2021). Polygonal building extraction by frame field learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Guo, M., Liu, H., Xu, Y., & Huang, Y. (2020). Building extraction based on U-Net with an attention block and multiple losses. Remote Sensing, 12(9), 1400. Haykin, S. (1998). Neural networks: A comprehensive foundation. Prentice Hall PTR. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 770–778). Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651–666. Hinton, G. (2012). Lecture 6d: A separate, adaptive learning rate for each connection. Slides of Lecture Neural Networks for Machine Learning. Keskar, N. S., Mudigere, D., Nocedal, J., Smelyanskiy, M., & Tang, P. T. P. (2016). On large-batch training for deep learning: Generalization gap and sharp minima. arXiv preprint arXiv:1609.04836. Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. International Conference on Learning Representations (ICLR). Kirillov, A., He, K., Girshick, R., Rother, C., & Dollár, P. (2019). Panoptic segmentation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 9404–9413. Kornblith, S., Shlens, J., & Le, Q. V. (2019). Do better ImageNet models transfer better? Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2661–2671. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks, Communications of the ACM,60(6), 84 - 90, 2017. Li, W., He, C., Fang, J., Zheng, J., Fu, H., & Yu, L. (2019). Semantic segmentation-based building footprint extraction using very high-resolution satellite images and multi-source GIS data. Remote Sensing, 11(4), 403.Lipton, Z. C. (2016). The mythos of model interpretability. arXiv preprint arXiv:1606.03490. Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3431–3440. Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91–110. Milletari, F., Navab, N., & Ahmadi, S. A. (2016). V-net: Fully convolutional neural networks for volumetric medical image segmentation. In Fourth International Conference on 3D Vision (3DV). Nahhas, Faten Hamed, Helmi Z. M. Shafri , Maher Ibrahim Sameen, Biswajeet Pradhan, and Shattri Mansor, 2018, Deep Learning Approach for Building Detection Using LiDAR–Orthophoto Fusion, 《Journal of Sensors》, Article ID 7212307, 12 pages https://doi.org/10.1155/2018/721230 Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359. Polyak, B. T. (1964). Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics, 4(5), 1–17. Raghu, M., Zhang, C., Kleinberg, J., & Bengio, S. (2019). Transfusion: Understanding transfer learning for medical imaging. Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (pp. 234–241). Siddharth, M. (2021, September 14). Building ResNet-34 model using PyTorch – A Guide for Beginners. Analytics Vidhya. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Sobel, I. (1968). An isotropic 3×3 image gradient operator. Stanford Artificial Intelligence Project. Sun, X.(2021)。Deep learning-based building extraction using aerial images and digital surface models。University of Twente, Faculty of Geo-Information Science and Earth Observation, Enschede, The Netherlands。 Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., & Liu, C. (2018). A survey on deep transfer learning. In International Conference on Artificial Neural Networks. Taravat, A., Wanger, M. P., Bonifacio, R. and Petit D., 2021, “Advanced Fully Convolutional Networks for Agricultural Field Boundary Detection” Remote sensing, 13(4):722. Zhang, Z., Liu, Q., & Wang, Y., 2018, Road Extraction by Deep Residual U‑Net. IEEE Geoscience and Remote Sensing Letters, 15(5), 749–753. https://doi.org/10.1109/LGRS.2018.2802944 Zoph, B., Ghiasi, G., Lin, T.-Y., Cui, Y., Liu, H., Cubuk, E. D., & Le, Q. V. (2020). Rethinking pre-training and self-training. Advances in Neural Information Processing Systems, 33, 12215–12225.zh_TW