擴散模型之顯著圖合理性評估及語義分析 | Publication

Publications-Theses

Article View/Open

pdf(0)

Publication Export

Google Scholar^TM

題名	擴散模型之顯著圖合理性評估及語義分析 Rationality Evaluation and Semantic Analysis of Saliency Maps in Diffusion Models
作者	林大維 Lin, Da-Wei
貢獻者	紀明德 Chi, Ming-Te 林大維 Lin, Da-Wei
關鍵詞	擴散模型顯著圖文字到圖像生成模型語義分析 Diffusion Models Saliency Maps Text-to-Image Generation Models Semantic Analysis
日期	2025
上傳時間	2-Jun-2025 14:57:39 (UTC+8)
摘要	近年來，擴散模型（Diffusion Models）在圖像生成領域取得重大進展，特別是 Stable Diffusion 使文字生成圖像的能力達到新高度。然而，模型在解析自然語言與圖像生成的關聯時，可能會產生特徵糾纏（Feature Entanglement），影響生成結果的合理性。本研究採用 DAAM（Diffusion Attentive AttributionMap）方法，透過分析交互注意力層（Cross Attention Map）生成的顯著圖（Saliency Maps），探討模型對提示詞的關注範圍及其對生成圖像的影響。我們提出一種自動化合理性評估方法，結合 Segment Anything（SAM）語義分割技術，以量化顯著圖的準確性，並比較不同 Stable Diffusion 預訓練模型（如 v1.5、v2.1、SDXL）的泛化能力。此外，透過句法剖析（DependencyParsing）與特徵糾纏分析，探討語言提示詞對圖像生成的影響，並驗證形容詞與場景描述對生成結果的影響範圍。實驗結果顯示，DAAM 在語義關聯性評估方面優於傳統梯度方法（如 Grad-CAM、Grad-CAM++），能更準確地反映文本與圖像的對應關係。此外，我們發現某些形容詞會影響整體場景，而非僅限於描述對象，顯示 Stable Diffusion 在處理複雜提示詞時仍面臨挑戰。未來研究將進一步優化 DAAM 技術，並探索更精確的語義解釋方法，以提升擴散模型的可解釋性與生成品質。 Diffusion models have improved image generation, with Stable Diffusion advancing text-to-image synthesis. However, feature entanglement affects coherence. This study employs the Diffusion Attentive Attribution Map(DAAM) to analyze saliency maps from cross-attention layers, examining prompt processing and its impact on generation. We propose an automated evaluation method using the Segment Anything Model (SAM) for semantic segmentation to assess saliency accuracy. DAAM’s generalization is compared across Stable Diffusion versions (v1.5,v2.1, SDXL), with linguistic prompt influence analyzed through dependency parsing and feature entanglement studies. Results show that DAAM outperforms gradient-based methods like Grad-CAM in semantic relevance, revealing how certain adjectives influence entire scenes. Future research will refine DAAM and improve semantic interpretation for better model explainability and generation quality.
參考文獻	[1] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 10 684–10 695. [2] R. Tang, L. Liu, A. Pandey, Z. Jiang, G. Yang, K. Kumar, P. Stenetorp, J. Lin, and F. Ture, “What the DAAM: Interpreting stable diffusion using cross attention,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), A. Rogers, J. Boyd-Graber, and N. Okazaki, Eds. Toronto, Canada: Association for Computational Linguistics, Jul. 2023, pp. 5644–5659. [Online]. Available: https://aclanthology.org/2023.acl-long.310/ [3] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” 2019. [4] A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, and I. Sutskever, “Zero-shot text-to-image generation,” in Proceedings of the 38th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, M. Meila and T. Zhang, Eds., vol. 139. PMLR, 18–24 Jul 2021, pp. 8821–8831. [Online]. Available: https://proceedings.mlr.press/v139/ramesh21a.html [5] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” 2023. [6] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” 2021. [7] C. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. Bethard, and D. McClosky,“The Stanford CoreNLP natural language processing toolkit,” in Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, K. Bontcheva and J. Zhu, Eds. Baltimore, Maryland: Association for Computational Linguistics, Jun. 2014, pp. 55–60. [Online]. Available: https://aclanthology.org/P14-5010/ [8] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” 2014. [9] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” in Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33. Curran Associates, Inc., 2020, pp. 6840–6851. [Online]. Available: https://proceedings.neurips.cc/paper_files/ paper/2020/file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf [10] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” 2015. [11] R. M. Schmidt, “Recurrent neural networks (rnns): A gentle introduction and overview,” 2019. [Online]. Available: https://arxiv.org/abs/1912.05911 [12] C. B. Vennerød, A. Kjærran, and E. S. Bugge, “Long short-term memory rnn,” 2021. [Online]. Available: https://arxiv.org/abs/2105.06756 [13] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” CoRR, vol. abs/2106.09685, 2021. [Online]. Available: https://arxiv.org/abs/2106.09685 [14] K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep inside convolutional networks: Visualising image classification models and saliency maps,” 2014. [15] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradientbased localization,” International Journal of Computer Vision, vol. 128, no. 2, p. 336–359, Oct. 2019. [Online]. Available: http://dx.doi.org/10.1007/s11263-019-01228-7 [16] K. O’Shea and R. Nash, “An introduction to convolutional neural networks,” 2015. [Online]. Available: https://arxiv.org/abs/1511.08458 [17] A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollár, and R. Girshick, “Segment anything,” 2023. [Online]. Available: https://arxiv.org/abs/2304.02643 [18] T.-Y. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona, D. Ramanan, C. L. Zitnick, and P. Dollár, “Microsoft coco: Common objects in context,” 2015. [Online]. Available: https://arxiv.org/abs/1405.0312 [19] J. Li, D. Li, C. Xiong, and S. Hoi, “Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation,” 2022. [Online]. Available: https://arxiv.org/abs/2201.12086 [20] G. Somepalli, V. Singla, M. Goldblum, J. Geiping, and T. Goldstein, “Diffusion art or digital forgery? investigating data replication in diffusion models,”in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 6048–6058. [21] A. Hertz, R. Mokady, J. Tenenbaum, K. Aberman, Y. Pritch, and D. Cohen-Or,“Prompt-to-prompt image editing with cross attention control,” 2022. [22] S. Ge, T. Park, J.-Y. Zhu, and J.-B. Huang, “Expressive text-to-image generation with rich text,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2023, pp. 7545–7556. [23] J. Sun, D. Fu, Y. Hu, S. Wang, R. Rassin, D.-C. Juan, D. Alon, C. Herrmann, S. van Steenkiste, R. Krishna, and C. Rashtchian, “Dreamsync: Aligning text-toimage generation with image understanding feedback,” 2023. [24] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” 2022. [25] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” 2019. [26] J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” 2022. [Online]. Available: https://arxiv.org/abs/2010.02502 [27] R. Daroya, A. Sun, and S. Maji, “Cose: A consistency-sensitivity metric for saliency on image classification,” 2023. [Online]. Available: https: //arxiv.org/abs/2309.10989 [28] V. Shah, N. Ruiz, F. Cole, E. Lu, S. Lazebnik, Y. Li, and V. Jampani,“Ziplora: Any subject in any style by effectively merging loras,” 2023. [Online]. Available: https://arxiv.org/abs/2311.13600 [29] B. Kim, J. Seo, S. Jeon, J. Koo, J. Choe, and T. Jeon, “Why are saliency maps noisy? cause of and solution to noisy saliency maps,” 2019. [Online]. Available: https://arxiv.org/abs/1902.04893 [30] H. Chefer, S. Gur, and L. Wolf, “Transformer interpretability beyond attention visualization,” 2021. [Online]. Available: https://arxiv.org/abs/2012.09838 [31] J. Guerrero-Viu, M. Hasan, A. Roullier, M. Harikumar, Y. Hu, P. Guerrero, D. Gutiérrez, B. Masia, and V. Deschaintre, “Texsliders: Diffusion-based texture editing in clip space,” in Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers ’24, ser. SIGGRAPH ’24. ACM, Jul. 2024, p. 1–11. [Online]. Available: http://dx.doi.org/10.1145/3641519.3657444 [32] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” 2021. [Online]. Available: https://arxiv.org/abs/2010.11929
描述	碩士國立政治大學資訊科學系 111753161
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0111753161
資料類型	thesis

dc.contributor.advisor	紀明德	zh_TW
dc.contributor.advisor	Chi, Ming-Te	en_US
dc.contributor.author (Authors)	林大維	zh_TW
dc.contributor.author (Authors)	Lin, Da-Wei	en_US
dc.creator (作者)	林大維	zh_TW
dc.creator (作者)	Lin, Da-Wei	en_US
dc.date (日期)	2025	en_US
dc.date.accessioned	2-Jun-2025 14:57:39 (UTC+8)	-
dc.date.available	2-Jun-2025 14:57:39 (UTC+8)	-
dc.date.issued (上傳時間)	2-Jun-2025 14:57:39 (UTC+8)	-
dc.identifier (Other Identifiers)	G0111753161	en_US
dc.identifier.uri (URI)	https://nccur.lib.nccu.edu.tw/handle/140.119/157243	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	資訊科學系	zh_TW
dc.description (描述)	111753161	zh_TW
dc.description.abstract (摘要)	近年來，擴散模型（Diffusion Models）在圖像生成領域取得重大進展，特別是 Stable Diffusion 使文字生成圖像的能力達到新高度。然而，模型在解析自然語言與圖像生成的關聯時，可能會產生特徵糾纏（Feature Entanglement），影響生成結果的合理性。本研究採用 DAAM（Diffusion Attentive AttributionMap）方法，透過分析交互注意力層（Cross Attention Map）生成的顯著圖（Saliency Maps），探討模型對提示詞的關注範圍及其對生成圖像的影響。我們提出一種自動化合理性評估方法，結合 Segment Anything（SAM）語義分割技術，以量化顯著圖的準確性，並比較不同 Stable Diffusion 預訓練模型（如 v1.5、v2.1、SDXL）的泛化能力。此外，透過句法剖析（DependencyParsing）與特徵糾纏分析，探討語言提示詞對圖像生成的影響，並驗證形容詞與場景描述對生成結果的影響範圍。實驗結果顯示，DAAM 在語義關聯性評估方面優於傳統梯度方法（如 Grad-CAM、Grad-CAM++），能更準確地反映文本與圖像的對應關係。此外，我們發現某些形容詞會影響整體場景，而非僅限於描述對象，顯示 Stable Diffusion 在處理複雜提示詞時仍面臨挑戰。未來研究將進一步優化 DAAM 技術，並探索更精確的語義解釋方法，以提升擴散模型的可解釋性與生成品質。	zh_TW
dc.description.abstract (摘要)	Diffusion models have improved image generation, with Stable Diffusion advancing text-to-image synthesis. However, feature entanglement affects coherence. This study employs the Diffusion Attentive Attribution Map(DAAM) to analyze saliency maps from cross-attention layers, examining prompt processing and its impact on generation. We propose an automated evaluation method using the Segment Anything Model (SAM) for semantic segmentation to assess saliency accuracy. DAAM’s generalization is compared across Stable Diffusion versions (v1.5,v2.1, SDXL), with linguistic prompt influence analyzed through dependency parsing and feature entanglement studies. Results show that DAAM outperforms gradient-based methods like Grad-CAM in semantic relevance, revealing how certain adjectives influence entire scenes. Future research will refine DAAM and improve semantic interpretation for better model explainability and generation quality.	en_US
dc.description.tableofcontents	致謝 i 摘要 ii Abstract iii 目錄 iv 圖目錄 v 表目錄 vi 第一章緒論 1 1.1 研究動機與目的 1 1.2 問題描述 2 1.3 論文架構 4 第二章相關研究 5 2.1 生成模型 5 2.2 常見的可視化方法 6 2.3 常見的可解釋性指標 7 第三章研究方法與架構 9 3.1 主題生成器 Stable Diffusion 9 3.2 Diffusion Model 的關鍵項 11 3.3 語意標註的設計 13 3.4 基於標註分割的自動化合理性評估 14 3.5 基於標註計算的語義強度 16 3.6 DAAM 之於不同資料輸入結果的比較 17 第四章分析結果 18 4.1 量化指標 18 4.2 觀察合理性以及結果對應樣態的差異 21 4.3 語義相關性觀察 25 4.4 同樣語意的變異性 28 4.5 商業訓練比較 29 4.6 穩定性探討 30 4.7 DAAM 限制 31 第五章結論與未來展望 33 5.1 研究結論 33 5.2 未來研究 33 參考文獻 35	zh_TW
dc.format.extent	7874056 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0111753161	en_US
dc.subject (關鍵詞)	擴散模型	zh_TW
dc.subject (關鍵詞)	顯著圖	zh_TW
dc.subject (關鍵詞)	文字到圖像生成模型	zh_TW
dc.subject (關鍵詞)	語義分析	zh_TW
dc.subject (關鍵詞)	Diffusion Models	en_US
dc.subject (關鍵詞)	Saliency Maps	en_US
dc.subject (關鍵詞)	Text-to-Image Generation Models	en_US
dc.subject (關鍵詞)	Semantic Analysis	en_US
dc.title (題名)	擴散模型之顯著圖合理性評估及語義分析	zh_TW
dc.title (題名)	Rationality Evaluation and Semantic Analysis of Saliency Maps in Diffusion Models	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	[1] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 10 684–10 695. [2] R. Tang, L. Liu, A. Pandey, Z. Jiang, G. Yang, K. Kumar, P. Stenetorp, J. Lin, and F. Ture, “What the DAAM: Interpreting stable diffusion using cross attention,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), A. Rogers, J. Boyd-Graber, and N. Okazaki, Eds. Toronto, Canada: Association for Computational Linguistics, Jul. 2023, pp. 5644–5659. [Online]. Available: https://aclanthology.org/2023.acl-long.310/ [3] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” 2019. [4] A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, and I. Sutskever, “Zero-shot text-to-image generation,” in Proceedings of the 38th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, M. Meila and T. Zhang, Eds., vol. 139. PMLR, 18–24 Jul 2021, pp. 8821–8831. [Online]. Available: https://proceedings.mlr.press/v139/ramesh21a.html [5] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” 2023. [6] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” 2021. [7] C. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. Bethard, and D. McClosky,“The Stanford CoreNLP natural language processing toolkit,” in Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, K. Bontcheva and J. Zhu, Eds. Baltimore, Maryland: Association for Computational Linguistics, Jun. 2014, pp. 55–60. [Online]. Available: https://aclanthology.org/P14-5010/ [8] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” 2014. [9] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” in Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33. Curran Associates, Inc., 2020, pp. 6840–6851. [Online]. Available: https://proceedings.neurips.cc/paper_files/ paper/2020/file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf [10] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” 2015. [11] R. M. Schmidt, “Recurrent neural networks (rnns): A gentle introduction and overview,” 2019. [Online]. Available: https://arxiv.org/abs/1912.05911 [12] C. B. Vennerød, A. Kjærran, and E. S. Bugge, “Long short-term memory rnn,” 2021. [Online]. Available: https://arxiv.org/abs/2105.06756 [13] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” CoRR, vol. abs/2106.09685, 2021. [Online]. Available: https://arxiv.org/abs/2106.09685 [14] K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep inside convolutional networks: Visualising image classification models and saliency maps,” 2014. [15] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradientbased localization,” International Journal of Computer Vision, vol. 128, no. 2, p. 336–359, Oct. 2019. [Online]. Available: http://dx.doi.org/10.1007/s11263-019-01228-7 [16] K. O’Shea and R. Nash, “An introduction to convolutional neural networks,” 2015. [Online]. Available: https://arxiv.org/abs/1511.08458 [17] A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollár, and R. Girshick, “Segment anything,” 2023. [Online]. Available: https://arxiv.org/abs/2304.02643 [18] T.-Y. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona, D. Ramanan, C. L. Zitnick, and P. Dollár, “Microsoft coco: Common objects in context,” 2015. [Online]. Available: https://arxiv.org/abs/1405.0312 [19] J. Li, D. Li, C. Xiong, and S. Hoi, “Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation,” 2022. [Online]. Available: https://arxiv.org/abs/2201.12086 [20] G. Somepalli, V. Singla, M. Goldblum, J. Geiping, and T. Goldstein, “Diffusion art or digital forgery? investigating data replication in diffusion models,”in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 6048–6058. [21] A. Hertz, R. Mokady, J. Tenenbaum, K. Aberman, Y. Pritch, and D. Cohen-Or,“Prompt-to-prompt image editing with cross attention control,” 2022. [22] S. Ge, T. Park, J.-Y. Zhu, and J.-B. Huang, “Expressive text-to-image generation with rich text,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2023, pp. 7545–7556. [23] J. Sun, D. Fu, Y. Hu, S. Wang, R. Rassin, D.-C. Juan, D. Alon, C. Herrmann, S. van Steenkiste, R. Krishna, and C. Rashtchian, “Dreamsync: Aligning text-toimage generation with image understanding feedback,” 2023. [24] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” 2022. [25] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” 2019. [26] J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” 2022. [Online]. Available: https://arxiv.org/abs/2010.02502 [27] R. Daroya, A. Sun, and S. Maji, “Cose: A consistency-sensitivity metric for saliency on image classification,” 2023. [Online]. Available: https: //arxiv.org/abs/2309.10989 [28] V. Shah, N. Ruiz, F. Cole, E. Lu, S. Lazebnik, Y. Li, and V. Jampani,“Ziplora: Any subject in any style by effectively merging loras,” 2023. [Online]. Available: https://arxiv.org/abs/2311.13600 [29] B. Kim, J. Seo, S. Jeon, J. Koo, J. Choe, and T. Jeon, “Why are saliency maps noisy? cause of and solution to noisy saliency maps,” 2019. [Online]. Available: https://arxiv.org/abs/1902.04893 [30] H. Chefer, S. Gur, and L. Wolf, “Transformer interpretability beyond attention visualization,” 2021. [Online]. Available: https://arxiv.org/abs/2012.09838 [31] J. Guerrero-Viu, M. Hasan, A. Roullier, M. Harikumar, Y. Hu, P. Guerrero, D. Gutiérrez, B. Masia, and V. Deschaintre, “Texsliders: Diffusion-based texture editing in clip space,” in Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers ’24, ser. SIGGRAPH ’24. ACM, Jul. 2024, p. 1–11. [Online]. Available: http://dx.doi.org/10.1145/3641519.3657444 [32] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” 2021. [Online]. Available: https://arxiv.org/abs/2010.11929	zh_TW

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM