Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Title基於高解析度真實雨天影像資料集與VLM數據優化提升影像除雨模型效能之方法
Enhancing Image Deraining via a High-resolution Paired Real Rainy Dataset and VLM-based Data Refinement
Creator梁師睿
Liang, Shih-Jui
Contributor彭彥璁<br>陳駿丞
Peng, Yan-Tsung<br>Chen, Jun-Cheng
梁師睿
Liang, Shih-Jui
Key Words真實雨資料集
除雨模型
視覺大語言模型
資料分類
Real-world rain dataset
Rain removal model
Vision-Language Model
Data annotation
Date2025
Date Issued3-Mar-2025 14:03:49 (UTC+8)
Summary由於在自動駕駛和監控系統等應用領域中除雨技術相當重要,近年來影像除雨已吸引了廣泛的關注。雖然許多影像除雨模型在去除雨並提升影像清晰度方面都取得了顯著成果,但這些模型通常在合成資料集上進行訓練。由於合成雨與真實雨條件存在差異,當把訓練於合成雨之模型應用於真實世界場景時,經常會出現效能落差。儘管目前已有一些真實世界資料集,但其對照影像往往背景不對齊,或資料品質較低,導致模型效能不佳。取得高品質且背景對齊的真實世界雨景影像對於有效訓練模型來說一直是一項艱鉅的挑戰。 為了改善影像除雨模型在真實世界資料上的泛化能力與穩定度,我們提出了 RealRain-AURA,一個高解析度、分類清晰且高品質的真實雨景成對資料集。此外,我們也提出名為 AURA(Automated Understanding and Refinement Agents)的自動化理解與優化代理人系統,透過視覺-語言模型(Vision-Language Models, VLMs)對資料進行篩選與雨紋密度(如小雨、中雨、大雨)分類,以過濾不適合的訓練資料並將影像做精細分類。這種優化與分類框架能夠提升資料集品質,進而增進影像除雨模型在真實場景下的除雨效能。
Image deraining has recently garnered significant attention due to its critical role in applications such as autonomous driving and surveillance systems. Although many image deraining models have successfully eliminated rain and improved image clarity, they are predominantly trained on synthetic datasets. This reliance on synthetic data creates a performance gap when these models are applied in real-world contexts, as synthetic rain often does not accurately replicate actual rain conditions. Although some real-world datasets are available, they typically lack refinement with misaligned backgrounds in image pairs, resulting in less-than-ideal model performance. The main challenge is to acquire high-quality pairs of real-world rainy images with aligned backgrounds for effective model training. We present RealRain-AURA, a dataset of high-resolution, high-quality paired real rain images, aiming to enhance the generalization and robustness of image deraining models on real-world data. In addition, we introduce Automated Understanding and Refinement Agents (AURA), which employ Vision-Language Models (VLMs) to refine deraining datasets by eliminating unsuitable training data and categorizing images according to rain density (e.g., light, moderate, and heavy rain). This refinement and categorization framework improves dataset quality, thereby boosting the performance of rain removal models in real-world applications.
參考文獻 [1] Pablo Arbelaez, Michael Maire, Charless Fowlkes, and Jitendra Malik. Contour de-tection and hierarchical image segmentation. IEEE transactions on pattern analysis and machine intelligence, 33(5):898–916, 2010. [2] Yunhao Ba, Howard Zhang, Ethan Yang, Akira Suzuki, Arnold Pfahnl, Chethan Chinder Chandrappa, Celso M de Melo, Suya You, Stefano Soatto, Alex Wong, et al. Not just streaks: Towards ground truth for single image deraining. In European Conference on Computer Vision, pages 723–740. Springer, 2022. [3] Tom B Brown. Language models are few-shot learners. arXiv preprint arXiv:2005.14165, 2020. [4] Yi-Lei Chen and Chiou-Ting Hsu. A generalized low-rank appearance model for spatio-temporally correlated rain streaks. In Proc. Conf. Comput. Vis. Pattern Recog-nit., 2013. [5] Yuetian Chen, Bowen Shi, and Mei Si. Prompt to gpt-3: Step-by-step thinking in-structions for humor generation, 2023. [6] Yingjun Du, Jun Xu, Qiang Qiu, Xiantong Zhen, and Lei Zhang. Variational image deraining. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2406–2415, 2020. [7] Chun-Mei Feng, Kai Yu, Yong Liu, Salman Khan, and Wangmeng Zuo. Diverse data augmentation with diffusions for effective test-time prompt tuning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2704–2714, 2023. [8] Xueyang Fu, Jiabin Huang, Delu Zeng, Yue Huang, Xinghao Ding, and John Paisley. Removing rain from single images via a deep detail network. In Proc. Conf. Comput. Vis. Pattern Recognit., 2017. [9] Tianyu Gao, Adam Fisch, and Danqi Chen. Making pre-trained language models better few-shot learners. arXiv preprint arXiv:2012.15723, 2020. [10] Yun Guo, Xueyao Xiao, Yi Chang, Shumin Deng, and Luxin Yan. From sky to the ground: A large-scale benchmark and simple baseline towards real rain removal. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12097–12107, 2023. [11] Weituo Hao, Chunyuan Li, Xiujun Li, Lawrence Carin, and Jianfeng Gao. Towards learning a generic agent for vision-and-language navigation via pre-training, 2020. [12] Jie Huang and Kevin Chen-Chuan Chang. Towards reasoning in large language models: A survey. arXiv preprint arXiv:2212.10403, 2022. [13] Qidong Huang, Xiaoyi Dong, Dongdong Chen, Weiming Zhang, Feifei Wang, Gang Hua, and Nenghai Yu. Diversity-aware meta visual prompting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10878–10887, 2023. [14] Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. Visual prompt tuning. In European Conference on Computer Vision, pages 709–727. Springer, 2022. [15] Minghan Li, Xiangyong Cao, Qian Zhao, Lei Zhang, and Deyu Meng. Online rain/snow removal from surveillance videos. IEEE Transactions on Image Process- ing, 30:2029–2044, 2021. [16] Siyuan Li, Wenqi Ren, Jiawan Zhang, Jinke Yu, and Xiaojie Guo. Single image rain removal via a deep decomposition–composition network. Computer Vision and Image Understanding, 2019. [17] Wei Li, Qiming Zhang, Jing Zhang, Zhen Huang, Xinmei Tian, and Dacheng Tao. Toward real-world single image deraining: A new benchmark and beyond. arXiv preprint arXiv:2206.05514, 2022. [18] Xiang Lisa Li and Percy Liang. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021. [19] Weihuang Liu, Xi Shen, Chi-Man Pun, and Xiaodong Cun. Explicit visual prompting for low-level structure segmentations, 2023. [20] Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Lam Tam, Zhengxiao Du, Zhilin Yang, and Jie Tang. P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. arXiv preprint arXiv:2110.07602, 2021. [21] Yihao Liu, Xiangyu Chen, Xianzheng Ma, Xintao Wang, Jiantao Zhou, Yu Qiao, and Chao Dong. Unifying image processing as visual prompting question answering. arXiv preprint arXiv:2310.10513, 2023. [22] Yu Luo, Yong Xu, and Hui Ji. Removing rain from a single image via discriminative sparse coding. In Proc. Int. Conf. Comput. Vis., 2015. [23] Ziwei Luo, Fredrik K Gustafsson, Zheng Zhao, Jens Sjölund, and Thomas B Schön. Controlling vision-language models for universal image restoration. arXiv preprint arXiv:2310.01018, 2023. [24] Jiaqi Ma, Tianheng Cheng, Guoli Wang, Qian Zhang, Xinggang Wang, and Lefei Zhang. Prores: Exploring degradation-aware visual prompt for universal image restoration. arXiv preprint arXiv:2306.13653, 2023. [25] Armin Mehri, Parichehr B Ardakani, and Angel D Sappa. Mprnet: Multi-path resid-ual network for lightweight image super resolution. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021. [26] V Potlapalli, SW Zamir, S Khan, and FS Khan. Promptir: Prompting for all-in-one blind image restoration. arxiv 2023. arXiv preprint arXiv:2306.13090, 2023. [27] Vaishnav Potlapalli, Syed Waqas Zamir, Salman Khan, and Fahad Shahbaz Khan. Promptir: Prompting for all-in-one blind image restoration, 2023. [28] Yinhe Qi, Huanrong Zhang, Zhi Jin, and Wanquan Liu. Depth-guided asymmetric cyclegan for rain synthesis and image deraining. Multimedia Tools and Applications, 2022. [29] Ruijie Quan, Xin Yu, Yuanzhi Liang, and Yi Yang. Removing raindrops and rain streaks in one go. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9147–9156, 2021. [30] Yongming Rao, Wenliang Zhao, Guangyi Chen, Yansong Tang, Zheng Zhu, Guan Huang, Jie Zhou, and Jiwen Lu. Denseclip: Language-guided dense prediction with context-aware prompting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18082–18091, 2022. [31] Gerald Schaefer and Michal Stich. Ucid: An uncompressed color image database. In Storage and retrieval methods and applications for multimedia 2004, volume 5307, pages 472–480. SPIE, 2003. [32] Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023. [33] Hong Wang, Qi Xie, Qian Zhao, Yuexiang Li, Yong Liang, Yefeng Zheng, and Deyu Meng. Rcdnet: An interpretable rain convolutional dictionary network for single image deraining. IEEE Transactions on Neural Networks and Learning Systems, 2023. [34] Kaige Wang, Long Chen, Tianming Wang, Qixiang Meng, Huatao Jiang, and Lin Chang. Image deraining and denoising convolutional neural network for autonomous driving. In 2021 International Conference on High Performance Big Data and In-telligent Systems (HPBD&IS), pages 241–245. IEEE, 2021. [35] Tianyu Wang, Xin Yang, Ke Xu, Shaozhe Chen, Qiang Zhang, and Rynson WH Lau. Spatial attentive single-image deraining with a high quality real rain dataset. In Pro-ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12270–12279, 2019. [36] Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022. [37] Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, and Tomas Pfister. Learning to prompt for continual learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 139–149, 2022. [38] Yixuan Weng, Minjun Zhu, Fei Xia, Bin Li, Shizhu He, Shengping Liu, Bin Sun, Kang Liu, and Jun Zhao. Large language models are better reasoners with self-verification. arXiv preprint arXiv:2212.09561, 2022. [39] Chenfei Wu, Shengming Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang, and Nan Duan. Visual chatgpt: Talking, drawing and editing with visual foundation models, 2023. [40] Wenhan Yang, Jiaying Liu, and Jiashi Feng. Frame-consistent recurrent video derain-ing with dual-level flow. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1661–1670, 2019. [41] Wenhan Yang, Robby T Tan, Jiashi Feng, Jiaying Liu, Zongming Guo, and Shuicheng Yan. Deep joint rain detection and removal from a single image. In Proc. Conf. Comput. Vis. Pattern Recognit., 2017. [42] Rajeev Yasarla and Vishal M Patel. Uncertainty guided multi-scale residual learning-using a cycle spinning cnn for single image de-raining. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8405–8414, 2019. [43] Rajeev Yasarla, Jeya Maria Jose Valanarasu, and Vishal M Patel. Exploring over-complete representations for single image deraining using cnns. IEEE Journal of Selected Topics in Signal Processing, 15(2):229–239, 2020. [44] Zhangyue Yin, Qiushi Sun, Cheng Chang, Qipeng Guo, Junqi Dai, Xuanjing Huang, and Xipeng Qiu. Exchange-of-thought: Enhancing large language model capabilities through cross-model communication. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 15135–15153, Singapore, December 2023. Association for Computational Linguistics. [45] Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5728–5739, 2022. [46] Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Shao. Multi-stage progressive image restoration. In Proc. Conf. Comput. Vis. Pattern Recognit., 2021. [47] Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Conditional prompt learning for vision-language models. In Proceedings of the IEEE/CVF con-ference on computer vision and pattern recognition, pages 16816–16825, 2022. [48] Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337–2348, 2022.
Description碩士
國立政治大學
資訊科學系
111753216
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0111753216
Typethesis
dc.contributor.advisor 彭彥璁<br>陳駿丞zh_TW
dc.contributor.advisor Peng, Yan-Tsung<br>Chen, Jun-Chengen_US
dc.contributor.author (Authors) 梁師睿zh_TW
dc.contributor.author (Authors) Liang, Shih-Juien_US
dc.creator (作者) 梁師睿zh_TW
dc.creator (作者) Liang, Shih-Juien_US
dc.date (日期) 2025en_US
dc.date.accessioned 3-Mar-2025 14:03:49 (UTC+8)-
dc.date.available 3-Mar-2025 14:03:49 (UTC+8)-
dc.date.issued (上傳時間) 3-Mar-2025 14:03:49 (UTC+8)-
dc.identifier (Other Identifiers) G0111753216en_US
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/155972-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學系zh_TW
dc.description (描述) 111753216zh_TW
dc.description.abstract (摘要) 由於在自動駕駛和監控系統等應用領域中除雨技術相當重要,近年來影像除雨已吸引了廣泛的關注。雖然許多影像除雨模型在去除雨並提升影像清晰度方面都取得了顯著成果,但這些模型通常在合成資料集上進行訓練。由於合成雨與真實雨條件存在差異,當把訓練於合成雨之模型應用於真實世界場景時,經常會出現效能落差。儘管目前已有一些真實世界資料集,但其對照影像往往背景不對齊,或資料品質較低,導致模型效能不佳。取得高品質且背景對齊的真實世界雨景影像對於有效訓練模型來說一直是一項艱鉅的挑戰。 為了改善影像除雨模型在真實世界資料上的泛化能力與穩定度,我們提出了 RealRain-AURA,一個高解析度、分類清晰且高品質的真實雨景成對資料集。此外,我們也提出名為 AURA(Automated Understanding and Refinement Agents)的自動化理解與優化代理人系統,透過視覺-語言模型(Vision-Language Models, VLMs)對資料進行篩選與雨紋密度(如小雨、中雨、大雨)分類,以過濾不適合的訓練資料並將影像做精細分類。這種優化與分類框架能夠提升資料集品質,進而增進影像除雨模型在真實場景下的除雨效能。zh_TW
dc.description.abstract (摘要) Image deraining has recently garnered significant attention due to its critical role in applications such as autonomous driving and surveillance systems. Although many image deraining models have successfully eliminated rain and improved image clarity, they are predominantly trained on synthetic datasets. This reliance on synthetic data creates a performance gap when these models are applied in real-world contexts, as synthetic rain often does not accurately replicate actual rain conditions. Although some real-world datasets are available, they typically lack refinement with misaligned backgrounds in image pairs, resulting in less-than-ideal model performance. The main challenge is to acquire high-quality pairs of real-world rainy images with aligned backgrounds for effective model training. We present RealRain-AURA, a dataset of high-resolution, high-quality paired real rain images, aiming to enhance the generalization and robustness of image deraining models on real-world data. In addition, we introduce Automated Understanding and Refinement Agents (AURA), which employ Vision-Language Models (VLMs) to refine deraining datasets by eliminating unsuitable training data and categorizing images according to rain density (e.g., light, moderate, and heavy rain). This refinement and categorization framework improves dataset quality, thereby boosting the performance of rain removal models in real-world applications.en_US
dc.description.tableofcontents 摘要 i Abstract ii Contents iv List of Figures vii List of Tables x 1 Introduction 1 1.1 Motivation and Challenges 1 1.2 Contributions 3 1.3 Thesis Structure 4 2 Related Work 5 2.1 Deraining Approaches 5 2.2 Synthesizing Methods for Rainy Images 6 2.2.1 Screen Blend Model 6 2.2.2 Heavy Rain Model 6 2.2.3 Comprehensive Rain Synthesis Model 7 2.3 Real-world Method 8 2.3.1 Alignment of Video Frames for Clean Image Capture 8 2.3.2 Simulated Rain Generation Using Sprinkler 10 2.3.3 Online Webcam Imagery 11 2.3.4 Using Advanced Restoration Models 12 2.4 Vision-Language Models 14 2.4.1 Google Gemini 14 2.4.2 LLAVA 14 2.5 LLM prompt methods 15 2.5.1 Step-by-Step 15 2.5.2 Self-Consistency 16 2.5.3 Chain-of-Thought 16 2.5.4 Exchange-of-Thought 16 2.6 Image restoration methods 17 2.6.1 CNN-based method 17 2.6.2 Transformer-based method 19 2.6.3 Prompt Learning for Vision Tasks 19 3 Methodology 21 3.1 Objectives for Rainy Image Collection 21 3.2 Multi-agent Workflow For Rainy Image Dataset Annotation 23 3.3 Workflow Overview 25 4 Experimental Settings 29 4.1 Implementation Details 30 4.2 Experimental Results 31 4.3 Quantitative Results 32 4.4 Ablation Studies 34 4.4.1 Qualitative results 37 5 More qualitative results 41 5.1 Comparison of Attention across Model Variants 41 6 Conclusions 44 Reference 45zh_TW
dc.format.extent 12009982 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0111753216en_US
dc.subject (關鍵詞) 真實雨資料集zh_TW
dc.subject (關鍵詞) 除雨模型zh_TW
dc.subject (關鍵詞) 視覺大語言模型zh_TW
dc.subject (關鍵詞) 資料分類zh_TW
dc.subject (關鍵詞) Real-world rain dataseten_US
dc.subject (關鍵詞) Rain removal modelen_US
dc.subject (關鍵詞) Vision-Language Modelen_US
dc.subject (關鍵詞) Data annotationen_US
dc.title (題名) 基於高解析度真實雨天影像資料集與VLM數據優化提升影像除雨模型效能之方法zh_TW
dc.title (題名) Enhancing Image Deraining via a High-resolution Paired Real Rainy Dataset and VLM-based Data Refinementen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1] Pablo Arbelaez, Michael Maire, Charless Fowlkes, and Jitendra Malik. Contour de-tection and hierarchical image segmentation. IEEE transactions on pattern analysis and machine intelligence, 33(5):898–916, 2010. [2] Yunhao Ba, Howard Zhang, Ethan Yang, Akira Suzuki, Arnold Pfahnl, Chethan Chinder Chandrappa, Celso M de Melo, Suya You, Stefano Soatto, Alex Wong, et al. Not just streaks: Towards ground truth for single image deraining. In European Conference on Computer Vision, pages 723–740. Springer, 2022. [3] Tom B Brown. Language models are few-shot learners. arXiv preprint arXiv:2005.14165, 2020. [4] Yi-Lei Chen and Chiou-Ting Hsu. A generalized low-rank appearance model for spatio-temporally correlated rain streaks. In Proc. Conf. Comput. Vis. Pattern Recog-nit., 2013. [5] Yuetian Chen, Bowen Shi, and Mei Si. Prompt to gpt-3: Step-by-step thinking in-structions for humor generation, 2023. [6] Yingjun Du, Jun Xu, Qiang Qiu, Xiantong Zhen, and Lei Zhang. Variational image deraining. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2406–2415, 2020. [7] Chun-Mei Feng, Kai Yu, Yong Liu, Salman Khan, and Wangmeng Zuo. Diverse data augmentation with diffusions for effective test-time prompt tuning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2704–2714, 2023. [8] Xueyang Fu, Jiabin Huang, Delu Zeng, Yue Huang, Xinghao Ding, and John Paisley. Removing rain from single images via a deep detail network. In Proc. Conf. Comput. Vis. Pattern Recognit., 2017. [9] Tianyu Gao, Adam Fisch, and Danqi Chen. Making pre-trained language models better few-shot learners. arXiv preprint arXiv:2012.15723, 2020. [10] Yun Guo, Xueyao Xiao, Yi Chang, Shumin Deng, and Luxin Yan. From sky to the ground: A large-scale benchmark and simple baseline towards real rain removal. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12097–12107, 2023. [11] Weituo Hao, Chunyuan Li, Xiujun Li, Lawrence Carin, and Jianfeng Gao. Towards learning a generic agent for vision-and-language navigation via pre-training, 2020. [12] Jie Huang and Kevin Chen-Chuan Chang. Towards reasoning in large language models: A survey. arXiv preprint arXiv:2212.10403, 2022. [13] Qidong Huang, Xiaoyi Dong, Dongdong Chen, Weiming Zhang, Feifei Wang, Gang Hua, and Nenghai Yu. Diversity-aware meta visual prompting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10878–10887, 2023. [14] Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. Visual prompt tuning. In European Conference on Computer Vision, pages 709–727. Springer, 2022. [15] Minghan Li, Xiangyong Cao, Qian Zhao, Lei Zhang, and Deyu Meng. Online rain/snow removal from surveillance videos. IEEE Transactions on Image Process- ing, 30:2029–2044, 2021. [16] Siyuan Li, Wenqi Ren, Jiawan Zhang, Jinke Yu, and Xiaojie Guo. Single image rain removal via a deep decomposition–composition network. Computer Vision and Image Understanding, 2019. [17] Wei Li, Qiming Zhang, Jing Zhang, Zhen Huang, Xinmei Tian, and Dacheng Tao. Toward real-world single image deraining: A new benchmark and beyond. arXiv preprint arXiv:2206.05514, 2022. [18] Xiang Lisa Li and Percy Liang. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021. [19] Weihuang Liu, Xi Shen, Chi-Man Pun, and Xiaodong Cun. Explicit visual prompting for low-level structure segmentations, 2023. [20] Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Lam Tam, Zhengxiao Du, Zhilin Yang, and Jie Tang. P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. arXiv preprint arXiv:2110.07602, 2021. [21] Yihao Liu, Xiangyu Chen, Xianzheng Ma, Xintao Wang, Jiantao Zhou, Yu Qiao, and Chao Dong. Unifying image processing as visual prompting question answering. arXiv preprint arXiv:2310.10513, 2023. [22] Yu Luo, Yong Xu, and Hui Ji. Removing rain from a single image via discriminative sparse coding. In Proc. Int. Conf. Comput. Vis., 2015. [23] Ziwei Luo, Fredrik K Gustafsson, Zheng Zhao, Jens Sjölund, and Thomas B Schön. Controlling vision-language models for universal image restoration. arXiv preprint arXiv:2310.01018, 2023. [24] Jiaqi Ma, Tianheng Cheng, Guoli Wang, Qian Zhang, Xinggang Wang, and Lefei Zhang. Prores: Exploring degradation-aware visual prompt for universal image restoration. arXiv preprint arXiv:2306.13653, 2023. [25] Armin Mehri, Parichehr B Ardakani, and Angel D Sappa. Mprnet: Multi-path resid-ual network for lightweight image super resolution. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021. [26] V Potlapalli, SW Zamir, S Khan, and FS Khan. Promptir: Prompting for all-in-one blind image restoration. arxiv 2023. arXiv preprint arXiv:2306.13090, 2023. [27] Vaishnav Potlapalli, Syed Waqas Zamir, Salman Khan, and Fahad Shahbaz Khan. Promptir: Prompting for all-in-one blind image restoration, 2023. [28] Yinhe Qi, Huanrong Zhang, Zhi Jin, and Wanquan Liu. Depth-guided asymmetric cyclegan for rain synthesis and image deraining. Multimedia Tools and Applications, 2022. [29] Ruijie Quan, Xin Yu, Yuanzhi Liang, and Yi Yang. Removing raindrops and rain streaks in one go. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9147–9156, 2021. [30] Yongming Rao, Wenliang Zhao, Guangyi Chen, Yansong Tang, Zheng Zhu, Guan Huang, Jie Zhou, and Jiwen Lu. Denseclip: Language-guided dense prediction with context-aware prompting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18082–18091, 2022. [31] Gerald Schaefer and Michal Stich. Ucid: An uncompressed color image database. In Storage and retrieval methods and applications for multimedia 2004, volume 5307, pages 472–480. SPIE, 2003. [32] Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023. [33] Hong Wang, Qi Xie, Qian Zhao, Yuexiang Li, Yong Liang, Yefeng Zheng, and Deyu Meng. Rcdnet: An interpretable rain convolutional dictionary network for single image deraining. IEEE Transactions on Neural Networks and Learning Systems, 2023. [34] Kaige Wang, Long Chen, Tianming Wang, Qixiang Meng, Huatao Jiang, and Lin Chang. Image deraining and denoising convolutional neural network for autonomous driving. In 2021 International Conference on High Performance Big Data and In-telligent Systems (HPBD&IS), pages 241–245. IEEE, 2021. [35] Tianyu Wang, Xin Yang, Ke Xu, Shaozhe Chen, Qiang Zhang, and Rynson WH Lau. Spatial attentive single-image deraining with a high quality real rain dataset. In Pro-ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12270–12279, 2019. [36] Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022. [37] Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, and Tomas Pfister. Learning to prompt for continual learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 139–149, 2022. [38] Yixuan Weng, Minjun Zhu, Fei Xia, Bin Li, Shizhu He, Shengping Liu, Bin Sun, Kang Liu, and Jun Zhao. Large language models are better reasoners with self-verification. arXiv preprint arXiv:2212.09561, 2022. [39] Chenfei Wu, Shengming Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang, and Nan Duan. Visual chatgpt: Talking, drawing and editing with visual foundation models, 2023. [40] Wenhan Yang, Jiaying Liu, and Jiashi Feng. Frame-consistent recurrent video derain-ing with dual-level flow. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1661–1670, 2019. [41] Wenhan Yang, Robby T Tan, Jiashi Feng, Jiaying Liu, Zongming Guo, and Shuicheng Yan. Deep joint rain detection and removal from a single image. In Proc. Conf. Comput. Vis. Pattern Recognit., 2017. [42] Rajeev Yasarla and Vishal M Patel. Uncertainty guided multi-scale residual learning-using a cycle spinning cnn for single image de-raining. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8405–8414, 2019. [43] Rajeev Yasarla, Jeya Maria Jose Valanarasu, and Vishal M Patel. Exploring over-complete representations for single image deraining using cnns. IEEE Journal of Selected Topics in Signal Processing, 15(2):229–239, 2020. [44] Zhangyue Yin, Qiushi Sun, Cheng Chang, Qipeng Guo, Junqi Dai, Xuanjing Huang, and Xipeng Qiu. Exchange-of-thought: Enhancing large language model capabilities through cross-model communication. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 15135–15153, Singapore, December 2023. Association for Computational Linguistics. [45] Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5728–5739, 2022. [46] Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Shao. Multi-stage progressive image restoration. In Proc. Conf. Comput. Vis. Pattern Recognit., 2021. [47] Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Conditional prompt learning for vision-language models. In Proceedings of the IEEE/CVF con-ference on computer vision and pattern recognition, pages 16816–16825, 2022. [48] Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337–2348, 2022.zh_TW