Publications-Theses
Article View/Open
Publication Export
-
Google ScholarTM
NCCU Library
Citation Infomation
Related Publications in TAIR
題名 應用生成式資料擴增提升魚眼鏡頭物件偵測模型效能
Enhancing Fisheye Lens Object Detection Using Generative Data Augmentation作者 程品潔
Cheng, Pin-Chieh貢獻者 廖文宏
Liao, Wen-Hung
程品潔
Cheng, Pin-Chieh關鍵詞 魚眼鏡頭
魚眼校正
物件偵測
擴散模型
生成式資料擴增
Fisheye Camera
Fisheye Correction
Object Detection
Diffusion Model
Generative Data Augmentation日期 2024 上傳時間 5-Aug-2024 13:56:22 (UTC+8) 摘要 智慧城市旨在利用創新科技提升城市運行效率、安全性及生活品質,先進的監控系統和物件偵測技術是智慧城市的重要組成部分,有助於管理和優化公共空間。頂照式魚眼鏡頭因其超廣角視野,非常適合用於大範圍監控,但也帶來了嚴重的影像失真問題。此外,鑒於隱私保護需求和場景的多樣性,獲取足夠且多樣化的公開影像極其困難,從而阻礙了相關研究的發展。 針對上述問題,本研究選擇圖書館這一常見且重要的公共場所作為研究對象,旨在解決使用頂照式魚眼鏡頭進行物件偵測時的兩大挑戰:資料稀缺性和魚眼影像失真問題。透過使用文本到圖像的生成模型來擴增訓練資料,並結合基於預設相機內部參數的失真校正方法,我們成功提高了物件偵測的準確率。 實驗結果顯示,使用生成式AI模型生成的圖像進行訓練,並逐步策略性地增加合成實例,能夠顯著提升模型的偵測性能,將資料集校正後對小尺寸物件的偵測效果尤其顯著。我們提出的魚眼校正微調模型跟YOLOv8基準模型相比,整體的mAP(0.5) 從0.246提升到0.688,mAP(0.5-0.95) 從0.122提升到0.518;在特定的小尺寸物件類別(飲料)上,mAP(0.5) 從0.507提升到0.795,mAP(0.5-0.95) 從0.268提升到0.586。此外,混合適當比例的合成資料與真實資料進行訓練,不僅可以提升訓練過程的穩健性,還有助於進一步優化模型性能。這些發現證實了本研究採用的方法在頂照式魚眼鏡頭物件偵測應用中的潛力。
Smart cities aim to leverage innovative technologies to enhance urban operational efficiency, safety, and quality of life. Advanced surveillance systems and object detection technologies are crucial components of smart cities, aiding in the management and optimization of public spaces. Overhead fisheye lenses, with their ultra-wide field of view, are well-suited for large-scale surveillance but present significant image distortion challenges. Furthermore, due to privacy protection requirements and the diversity of scenes, acquiring sufficient and diverse public images is extremely difficult, hindering the development of related research. Addressing these issues, this study focuses on libraries, a common and important public venue, aiming to address two major challenges in object detection using overhead fisheye images: data scarcity and fisheye lens distortion. By using text-to-image generative models to augment the training data and combining them with distortion correction methods based on preset camera intrinsic parameters, we successfully improved object detection accuracy. Experimental results show that training with images generated by generative AI models and gradually and strategically increasing synthetic instances can significantly enhance the model's detection performance, with particularly notable improvements in detecting small objects after correcting the dataset. Our fisheye correction fine-tuning model, compared to the YOLOv8 baseline model, improved the overall mAP(0.5) from 0.246 to 0.688 and mAP(0.5-0.95) from 0.122 to 0.518; for specific small object categories (beverages), mAP(0.5) increased from 0.507 to 0.795, and mAP(0.5-0.95) from 0.268 to 0.586. Additionally, mixing an appropriate proportion of synthetic data with real data for training not only enhances the robustness of the training process but also helps further optimize model performance. These findings confirm the potential of our approach in the application of object detection using overhead fisheye lenses.參考文獻 [1] Xu, J., Han, D.-W., Li, K., Li, J.-J., & Ma, Z.-Y. (2024). A Comprehensive Overview of Fish-Eye Camera Distortion Correction Methods. arXiv. [2] Electrical & Computer Engineering, Visual Information Processing, Human-Aligned Bounding Boxes from Overhead Fisheye cameras dataset (HABBOF). Retrieved from https://vip.bu.edu/projects/vsns/cossy/datasets/habbof/. [3] Electrical & Computer Engineering, Visual Information Processing, Rotated Bounding-Box Annotations for Mirror Worlds Dataset (MW-R). Retrieved from https://vip.bu.edu/projects/vsns/cossy/datasets/mw-r/. [4] Electrical & Computer Engineering, Visual Information Processing, Challenging Events for Person Detection from Overhead Fisheye images (CEPDOF). Retrieved from https://vip.bu.edu/projects/vsns/cossy/datasets/cepdof/. [5] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative Adversarial Nets. In Advances in Neural Information Processing Systems (pp. 2672-2680). [6] Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros. (2017). Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. arXiv preprint arXiv:1703.10593. [Submitted on 30 Mar 2017 (v1), last revised 24 Aug 2020 (this version, v7)]. [7] Besnier, V., Jain, H., Bursuc, A., Cord, M., & Perez, P. (2019). THIS DATASET DOES NOT EXIST: TRAINING MODELS FROM GENERATED IMAGES. arXiv. [8] Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. (2016). Improved Techniques for Training GANs. arXiv preprint arXiv:1606.03498. [Submitted on 10 Jun 2016]. [9] Prafulla Dhariwal, Alex Nichol. (2021). Diffusion Models Beat GANs on Image Synthesis. arXiv preprint arXiv:2105.05233. [Submitted on 11 May 2021 (v1), last revised 1 Jun 2021 (this version, v4)]. [10] Ho, J., Jain, A., & Abbeel, P. (2020). Denoising Diffusion Probabilistic Models. arXiv preprint arXiv:2006.11239. [Submitted on 19 Jun 2020 (v1), last revised 16 Dec 2020 (this version, v2)]. [11] Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. (2016). Generative Adversarial Text to Image Synthesis. arXiv preprint arXiv:1605.05396. [Submitted on 17 May 2016 (v1), last revised 5 Jun 2016 (this version, v2)]. [12] Gerstgrasser, M., Schaeffer, R., Dey, A., Rafailov, R., Sleight, H., Hughes, J., Korbak, T., Agrawal, R., Pai, D., Gromov, A., Roberts, D. A., Yang, D., & Donoho, D. L., & Koyejo, S. (2024). Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data. arXiv preprint arXiv:2404.12345. [Submitted on 1 Apr 2024 (v1), last revised 29 Apr 2024 (this version, v2)]. [13] Bram Vanherle, Steven Moonen, Frank Van Reeth, Nick Michiels. (2022). Analysis of Training Object Detection Models with Synthetic Data. arXiv preprint arXiv:2211.15432. [Submitted on 29 Nov 2022]. [14] Seib, V., Roosen, M., Germann, I., Wirtz, S., & Paulus, D. (2024). Generation of Synthetic Images for Pedestrian Detection Using a Sequence of GANs. arXiv. [15] Huibing Wanga, Tianxiang Cuia, Mingze Yaoa, Huijuan Panga, Yushan Dua. (2023). Domain Adaptive Person Search via GAN-based Scene Synthesis for Cross-scene Videos. arXiv preprint arXiv:2308.04322v1 [cs.CV]. [Submitted on 8 Aug 2023]. [16] Fu, Y., Chen, C., Qiao, Y., & Yu, Y. (2024). DreamDA: Generative Data Augmentation with Diffusion Models. arXiv preprint arXiv:2403.09876. [Submitted on 19 Mar 2024]. [17] Zhu-Cun Xue, Nan Xue, Gui-Song Xia. (2020). Fisheye Distortion Rectification from Deep Straight Lines. arXiv preprint arXiv:2003.11767. [Submitted on 25 Mar 2020]. [18] Yang, S., Lin, C., Liao, K., Zhang, C., & Zhao, Y. (2021). Progressively Complementary Network for Fisheye Image Rectification Using Appearance Flow. arXiv preprint arXiv:2103.12345. [Submitted on 30 Mar 2021 (v1), last revised 31 Mar 2021 (this version, v2)]. [19] Yang, S., Lin, C., Liao, K., & Zhao, Y. (2023). Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization. arXiv preprint arXiv:2301.09876. [Submitted on 26 Jan 2023]. [20] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Once: Unified, Real-Time Object Detection. arXiv preprint arXiv:1506.02640. [Submitted on 8 Jun 2015 (v1), last revised 9 May 2016 (this version, v5)]. [21] Gochoo, M., Otgonbold, M., Ganbold, E., Hsieh, J.-W., Chang, M.-C., Chen, P.-Y., Dorj, B., Al Jassmi, H., Batnasan, G., Alnajjar, F., Abduljabbar, M., & Lin, F.-P. (2023). FishEye8K: A Benchmark and Dataset for Fisheye Camera Object Detection. arXiv preprint arXiv:2305.09876. [Submitted on 27 May 2023 (v1), last revised 6 Jun 2023 (this version, v2)]. [22] Li, S., Tezcan, M. O., Ishwar, P., & Konrad, J. (2019). Supervised People Counting Using An Overhead Fisheye Camera. IEEE Transactions on Image Processing, 28(12), 6142-6157. [23] Rashed, H., Mohamed, E., Sistu, G., Kumar, V. R., Eising, C., El-Sallab, A., & Yogamani, S. (2020). Generalized Object Detection on Fisheye Cameras for Autonomous Driving: Dataset, Representations and Baseline. arXiv preprint arXiv:2012.12345. [Submitted on 3 Dec 2020 (v1), last revised 21 Dec 2022 (this version, v2)]. [24] Z. Duan, M.O. Tezcan, H. Nakamura, P. Ishwar and J. Konrad, “RAPiD: Rotation-Aware People Detection in Overhead Fisheye Images”, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Omnidirectional Computer Vision in Research and Industry (OmniCV) Workshop, June 2020. [25] Rashed, H., Mohamed, E., Sistu, G., Kumar, V. R., Eising, C., El-Sallab, A., & Yogamani, S. (2020). FisheyeYOLO: Object Detection on Fisheye Cameras for Autonomous Driving. arXiv preprint arXiv:2004.13621. [26] Yogamani, S., Hughes, C., Horgan, J., Sistu, G., Varley, P., O'Dea, D., Uricar, M., Milz, S., Simon, M., Amende, K., Witt, C., Rashed, H., Chennupati, S., Nayak, S., Mansoor, S., Perroton, X., & Perez, P. (2021). WoodScape: A multi-task, multi-camera fisheye dataset for autonomous driving. arXiv. [27] Peterson, M. (2024). Dos and don’ts when fine-tuning generative AI models. RWS. Retrieved from https://www.rws.com/artificial-intelligence/train-ai-data-services/blog/dos-and-donts-when-fine-tuning-generative-ai-models/?utm_campaign=TrainAI%20Data%20Services%20-%20GenAI%20Campaign&utm_content=281536374&utm_medium=social&utm_source=linkedin&hss_channel=lcp-12582389. 描述 碩士
國立政治大學
資訊科學系碩士在職專班
111971008資料來源 http://thesis.lib.nccu.edu.tw/record/#G0111971008 資料類型 thesis dc.contributor.advisor 廖文宏 zh_TW dc.contributor.advisor Liao, Wen-Hung en_US dc.contributor.author (Authors) 程品潔 zh_TW dc.contributor.author (Authors) Cheng, Pin-Chieh en_US dc.creator (作者) 程品潔 zh_TW dc.creator (作者) Cheng, Pin-Chieh en_US dc.date (日期) 2024 en_US dc.date.accessioned 5-Aug-2024 13:56:22 (UTC+8) - dc.date.available 5-Aug-2024 13:56:22 (UTC+8) - dc.date.issued (上傳時間) 5-Aug-2024 13:56:22 (UTC+8) - dc.identifier (Other Identifiers) G0111971008 en_US dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/152770 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊科學系碩士在職專班 zh_TW dc.description (描述) 111971008 zh_TW dc.description.abstract (摘要) 智慧城市旨在利用創新科技提升城市運行效率、安全性及生活品質,先進的監控系統和物件偵測技術是智慧城市的重要組成部分,有助於管理和優化公共空間。頂照式魚眼鏡頭因其超廣角視野,非常適合用於大範圍監控,但也帶來了嚴重的影像失真問題。此外,鑒於隱私保護需求和場景的多樣性,獲取足夠且多樣化的公開影像極其困難,從而阻礙了相關研究的發展。 針對上述問題,本研究選擇圖書館這一常見且重要的公共場所作為研究對象,旨在解決使用頂照式魚眼鏡頭進行物件偵測時的兩大挑戰:資料稀缺性和魚眼影像失真問題。透過使用文本到圖像的生成模型來擴增訓練資料,並結合基於預設相機內部參數的失真校正方法,我們成功提高了物件偵測的準確率。 實驗結果顯示,使用生成式AI模型生成的圖像進行訓練,並逐步策略性地增加合成實例,能夠顯著提升模型的偵測性能,將資料集校正後對小尺寸物件的偵測效果尤其顯著。我們提出的魚眼校正微調模型跟YOLOv8基準模型相比,整體的mAP(0.5) 從0.246提升到0.688,mAP(0.5-0.95) 從0.122提升到0.518;在特定的小尺寸物件類別(飲料)上,mAP(0.5) 從0.507提升到0.795,mAP(0.5-0.95) 從0.268提升到0.586。此外,混合適當比例的合成資料與真實資料進行訓練,不僅可以提升訓練過程的穩健性,還有助於進一步優化模型性能。這些發現證實了本研究採用的方法在頂照式魚眼鏡頭物件偵測應用中的潛力。 zh_TW dc.description.abstract (摘要) Smart cities aim to leverage innovative technologies to enhance urban operational efficiency, safety, and quality of life. Advanced surveillance systems and object detection technologies are crucial components of smart cities, aiding in the management and optimization of public spaces. Overhead fisheye lenses, with their ultra-wide field of view, are well-suited for large-scale surveillance but present significant image distortion challenges. Furthermore, due to privacy protection requirements and the diversity of scenes, acquiring sufficient and diverse public images is extremely difficult, hindering the development of related research. Addressing these issues, this study focuses on libraries, a common and important public venue, aiming to address two major challenges in object detection using overhead fisheye images: data scarcity and fisheye lens distortion. By using text-to-image generative models to augment the training data and combining them with distortion correction methods based on preset camera intrinsic parameters, we successfully improved object detection accuracy. Experimental results show that training with images generated by generative AI models and gradually and strategically increasing synthetic instances can significantly enhance the model's detection performance, with particularly notable improvements in detecting small objects after correcting the dataset. Our fisheye correction fine-tuning model, compared to the YOLOv8 baseline model, improved the overall mAP(0.5) from 0.246 to 0.688 and mAP(0.5-0.95) from 0.122 to 0.518; for specific small object categories (beverages), mAP(0.5) increased from 0.507 to 0.795, and mAP(0.5-0.95) from 0.268 to 0.586. Additionally, mixing an appropriate proportion of synthetic data with real data for training not only enhances the robustness of the training process but also helps further optimize model performance. These findings confirm the potential of our approach in the application of object detection using overhead fisheye lenses. en_US dc.description.tableofcontents 第一章 緒論 1 1.1 研究背景與動機 1 1.2 研究目的與貢獻 2 1.3 論文架構 3 第二章 相關研究與技術背景 5 2.1 頂照式魚眼鏡頭之物件偵測資料集 5 2.1.1 頂照式魚眼公開資料集 5 2.1.2 資料擴增 6 2.2 圖像生成模型 7 2.2.1 基於生成對抗網路的生成模型 7 2.2.2 基於擴散模型的生成模型 8 2.2.3 域適應 10 2.3 魚眼影像失真校正 10 2.3.1 基於相機參數的魚眼失真校正 11 2.3.2 基於卷積神經網路的魚眼失真校正 12 2.4 魚眼鏡頭下的物件偵測技術 13 2.4.1 應用於原始魚眼影像的物件偵測技術 13 2.4.2 針對魚眼失真的適應性物件偵測技術 14 第三章 研究方法 16 3.1 訓練資料 17 3.1.1 真實資料CEPDOF與MW-18Mar 17 3.1.2 合成資料Midjourney與DALL·E生成圖像 19 3.2 魚眼資料集的校正方法 23 3.2.1 頂照式魚眼影像的失真校正方法 23 3.2.2 校正後物件框對應的調整方法 30 3.3 模型架構 31 3.3.1 YOLOv8模型架構 31 3.3.2 YOLOv8x預訓練模型 33 3.4 模型的性能評估指標 34 3.4.1 物件偵測模型的性能評估指標 34 3.4.2 YOLO模型的偵測性能評估指標 36 3.5 實驗方法 37 3.5.1 實驗流程 37 3.5.2 使用生成式AI圖像微調模型的關鍵策略 40 3.5.3 實驗環境與訓練參數 41 第四章 實驗過程與結果分析 43 4.1 實驗1:使用小批量訓練集比較生成式AI工具 43 4.2 實驗2:使用未校正合成資料訓練 45 4.2.1 實驗2-1:使用合成資料訓練並以合成圖像驗證 46 4.2.2 實驗2-2:使用合成資料訓練並以真實影像驗證 47 4.3 實驗3:使用校正後合成資料訓練並以真實影像驗證 48 4.4 實驗4:混合合成與真實資料訓練並以真實影像驗證 53 4.5 未能成功偵測的案例分析 58 4.6 實驗結果總結 61 第五章 結論與未來研究方向 63 5.1 結論 63 5.2 未來研究方向 63 參考文獻 65 附錄 69 使用Midjourney以圖生圖的測試過程 69 使用DALL·E 3以圖生圖的測試過程 69 zh_TW dc.format.extent 8810925 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0111971008 en_US dc.subject (關鍵詞) 魚眼鏡頭 zh_TW dc.subject (關鍵詞) 魚眼校正 zh_TW dc.subject (關鍵詞) 物件偵測 zh_TW dc.subject (關鍵詞) 擴散模型 zh_TW dc.subject (關鍵詞) 生成式資料擴增 zh_TW dc.subject (關鍵詞) Fisheye Camera en_US dc.subject (關鍵詞) Fisheye Correction en_US dc.subject (關鍵詞) Object Detection en_US dc.subject (關鍵詞) Diffusion Model en_US dc.subject (關鍵詞) Generative Data Augmentation en_US dc.title (題名) 應用生成式資料擴增提升魚眼鏡頭物件偵測模型效能 zh_TW dc.title (題名) Enhancing Fisheye Lens Object Detection Using Generative Data Augmentation en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) [1] Xu, J., Han, D.-W., Li, K., Li, J.-J., & Ma, Z.-Y. (2024). A Comprehensive Overview of Fish-Eye Camera Distortion Correction Methods. arXiv. [2] Electrical & Computer Engineering, Visual Information Processing, Human-Aligned Bounding Boxes from Overhead Fisheye cameras dataset (HABBOF). Retrieved from https://vip.bu.edu/projects/vsns/cossy/datasets/habbof/. [3] Electrical & Computer Engineering, Visual Information Processing, Rotated Bounding-Box Annotations for Mirror Worlds Dataset (MW-R). Retrieved from https://vip.bu.edu/projects/vsns/cossy/datasets/mw-r/. [4] Electrical & Computer Engineering, Visual Information Processing, Challenging Events for Person Detection from Overhead Fisheye images (CEPDOF). Retrieved from https://vip.bu.edu/projects/vsns/cossy/datasets/cepdof/. [5] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative Adversarial Nets. In Advances in Neural Information Processing Systems (pp. 2672-2680). [6] Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros. (2017). Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. arXiv preprint arXiv:1703.10593. [Submitted on 30 Mar 2017 (v1), last revised 24 Aug 2020 (this version, v7)]. [7] Besnier, V., Jain, H., Bursuc, A., Cord, M., & Perez, P. (2019). THIS DATASET DOES NOT EXIST: TRAINING MODELS FROM GENERATED IMAGES. arXiv. [8] Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. (2016). Improved Techniques for Training GANs. arXiv preprint arXiv:1606.03498. [Submitted on 10 Jun 2016]. [9] Prafulla Dhariwal, Alex Nichol. (2021). Diffusion Models Beat GANs on Image Synthesis. arXiv preprint arXiv:2105.05233. [Submitted on 11 May 2021 (v1), last revised 1 Jun 2021 (this version, v4)]. [10] Ho, J., Jain, A., & Abbeel, P. (2020). Denoising Diffusion Probabilistic Models. arXiv preprint arXiv:2006.11239. [Submitted on 19 Jun 2020 (v1), last revised 16 Dec 2020 (this version, v2)]. [11] Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. (2016). Generative Adversarial Text to Image Synthesis. arXiv preprint arXiv:1605.05396. [Submitted on 17 May 2016 (v1), last revised 5 Jun 2016 (this version, v2)]. [12] Gerstgrasser, M., Schaeffer, R., Dey, A., Rafailov, R., Sleight, H., Hughes, J., Korbak, T., Agrawal, R., Pai, D., Gromov, A., Roberts, D. A., Yang, D., & Donoho, D. L., & Koyejo, S. (2024). Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data. arXiv preprint arXiv:2404.12345. [Submitted on 1 Apr 2024 (v1), last revised 29 Apr 2024 (this version, v2)]. [13] Bram Vanherle, Steven Moonen, Frank Van Reeth, Nick Michiels. (2022). Analysis of Training Object Detection Models with Synthetic Data. arXiv preprint arXiv:2211.15432. [Submitted on 29 Nov 2022]. [14] Seib, V., Roosen, M., Germann, I., Wirtz, S., & Paulus, D. (2024). Generation of Synthetic Images for Pedestrian Detection Using a Sequence of GANs. arXiv. [15] Huibing Wanga, Tianxiang Cuia, Mingze Yaoa, Huijuan Panga, Yushan Dua. (2023). Domain Adaptive Person Search via GAN-based Scene Synthesis for Cross-scene Videos. arXiv preprint arXiv:2308.04322v1 [cs.CV]. [Submitted on 8 Aug 2023]. [16] Fu, Y., Chen, C., Qiao, Y., & Yu, Y. (2024). DreamDA: Generative Data Augmentation with Diffusion Models. arXiv preprint arXiv:2403.09876. [Submitted on 19 Mar 2024]. [17] Zhu-Cun Xue, Nan Xue, Gui-Song Xia. (2020). Fisheye Distortion Rectification from Deep Straight Lines. arXiv preprint arXiv:2003.11767. [Submitted on 25 Mar 2020]. [18] Yang, S., Lin, C., Liao, K., Zhang, C., & Zhao, Y. (2021). Progressively Complementary Network for Fisheye Image Rectification Using Appearance Flow. arXiv preprint arXiv:2103.12345. [Submitted on 30 Mar 2021 (v1), last revised 31 Mar 2021 (this version, v2)]. [19] Yang, S., Lin, C., Liao, K., & Zhao, Y. (2023). Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization. arXiv preprint arXiv:2301.09876. [Submitted on 26 Jan 2023]. [20] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Once: Unified, Real-Time Object Detection. arXiv preprint arXiv:1506.02640. [Submitted on 8 Jun 2015 (v1), last revised 9 May 2016 (this version, v5)]. [21] Gochoo, M., Otgonbold, M., Ganbold, E., Hsieh, J.-W., Chang, M.-C., Chen, P.-Y., Dorj, B., Al Jassmi, H., Batnasan, G., Alnajjar, F., Abduljabbar, M., & Lin, F.-P. (2023). FishEye8K: A Benchmark and Dataset for Fisheye Camera Object Detection. arXiv preprint arXiv:2305.09876. [Submitted on 27 May 2023 (v1), last revised 6 Jun 2023 (this version, v2)]. [22] Li, S., Tezcan, M. O., Ishwar, P., & Konrad, J. (2019). Supervised People Counting Using An Overhead Fisheye Camera. IEEE Transactions on Image Processing, 28(12), 6142-6157. [23] Rashed, H., Mohamed, E., Sistu, G., Kumar, V. R., Eising, C., El-Sallab, A., & Yogamani, S. (2020). Generalized Object Detection on Fisheye Cameras for Autonomous Driving: Dataset, Representations and Baseline. arXiv preprint arXiv:2012.12345. [Submitted on 3 Dec 2020 (v1), last revised 21 Dec 2022 (this version, v2)]. [24] Z. Duan, M.O. Tezcan, H. Nakamura, P. Ishwar and J. Konrad, “RAPiD: Rotation-Aware People Detection in Overhead Fisheye Images”, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Omnidirectional Computer Vision in Research and Industry (OmniCV) Workshop, June 2020. [25] Rashed, H., Mohamed, E., Sistu, G., Kumar, V. R., Eising, C., El-Sallab, A., & Yogamani, S. (2020). FisheyeYOLO: Object Detection on Fisheye Cameras for Autonomous Driving. arXiv preprint arXiv:2004.13621. [26] Yogamani, S., Hughes, C., Horgan, J., Sistu, G., Varley, P., O'Dea, D., Uricar, M., Milz, S., Simon, M., Amende, K., Witt, C., Rashed, H., Chennupati, S., Nayak, S., Mansoor, S., Perroton, X., & Perez, P. (2021). WoodScape: A multi-task, multi-camera fisheye dataset for autonomous driving. arXiv. [27] Peterson, M. (2024). Dos and don’ts when fine-tuning generative AI models. RWS. Retrieved from https://www.rws.com/artificial-intelligence/train-ai-data-services/blog/dos-and-donts-when-fine-tuning-generative-ai-models/?utm_campaign=TrainAI%20Data%20Services%20-%20GenAI%20Campaign&utm_content=281536374&utm_medium=social&utm_source=linkedin&hss_channel=lcp-12582389. zh_TW