Stable Diffusion用於人像構圖轉換：從我到迷因 | Publication

Publications-Theses

Article View/Open

pdf(0)

Publication Export

Google Scholar^TM

題名	Stable Diffusion用於人像構圖轉換：從我到迷因 Stable Diffusion used for Portrait Layout Transformation: From Me to Meme
作者	李峻安 Li, Chun-An
貢獻者	紀明德 Chi, Ming-Te 李峻安 Li, Chun-An
關鍵詞	Stable Diffusion Portrait Style Meme Image Composition Stable Diffusion Portrait Style Meme Image Composition
日期	2025
上傳時間	1-Sep-2025 16:56:43 (UTC+8)
摘要	網路文化中，迷因指的是迅速擴散的內容、迷因圖片、影片或語句，具有高度可模仿性、變異性與社群認同感。它們常常具備幽默、諷刺或時事元素，使得使用者能夠快速「共鳴」並創造延伸版本。本研究針對迷因圖片因形變導致的人臉檢測問題，本研究提出一個基於穩定擴散模型與圖像分割技術的轉換方法，該方法在保留原圖像關鍵識別特徵的前提下針對形變效果進行平滑修正，以提高人臉識別的準確率，並確保生成結果具備良好的視覺一致性。在個人化影像合成方面已經有顯著的進展，方法像是 InstantID 和 LoRA 。然而他們在現實世界的應用受限於迷因人臉偵測的限制。人像風格與迷因圖片構圖相融合的需求有技術門檻、冗長的模型微調以及圖像篩選或前處理。針對這些挑戰，本研究提出 Me2Meme ——一種基於擴散模型的解決方案。同時評估這一方法的效果和實用性，為跨領域藝術和圖像處理領域帶來新的啟發和應用。 In internet culture, memes refer to rapidly spreading content, images, videos, or phrases characterized by high imitability, variability, and community resonance. They often incorporate humor, satire, or topical elements, enabling users to quickly connect and create derivative versions. This study addresses the issue of face detection in meme images caused by deformation. We propose a transformation method based on stable diffusion models and image segmentation techniques. This method smooths deformation effects while preserving key identifying features of the original image, thereby improving face detection accuracy and ensuring visually consistent outputs. Significant progress has been made in personalized image synthesis, with methods like InstantID and LoRA. However, their real-world applications are limited by challenges in meme face detection. The integration of portrait styles with meme image compositions faces technical barriers, including complex model fine-tuning, image screening, or preprocessing. To address these challenges, this study introduces Me2Meme, a diffusion model-based solution. We also evaluate the effectiveness and practicality of this approach, offering new insights and applications for cross-disciplinary fields in art and image processing.
參考文獻	[1] J. Yaniv, Y. Newman, and A. Shamir, “The face of art: landmark detection and geometric style in portraits,” ACM Transactions on Graphics, vol. 38, pp. 1–15, 07 2019. [2] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” 2020. [Online]. Available: https://arxiv.org/abs/2006.11239 [3] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” 2022. [Online]. Available: https://arxiv.org/abs/2112.10752 [4] N. Ruiz, Y. Li, V. Jampani, T. Hou, P. O. Pinheiro, T. Liu, A. Goyal, A. Lehrmann, and J. Johnson, “Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation,” 2023. [Online]. Available: https://arxiv.org/abs/2208.12242 [5] A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollár, and R. Girshick, “Segment anything,” 2023. [Online]. Available: https://arxiv.org/abs/2304.02643 [6] Z. Huang, K. C. K. Chan, Y. Jiang, and Z. Liu, “Collaborative diffusion for multi-modal face generation and editing,” 2023. [Online]. Available: https://arxiv.org/abs/2304.10530 [7] X. Ju, A. Zeng, C. Zhao, J. Wang, L. Zhang, and Q. Xu, “Humansd: A native skeleton-guided diffusion model for human image generation,” 2023. [Online]. Available: https://arxiv.org/abs/2304.04269 [8] X. Liu, J. Ren, A. Siarohin, I. Skorokhodov, Y. Li, D. Lin, X. Liu, Z. Liu, and S. Tulyakov, “Hyperhuman: Hyper-realistic human generation with latent structural diffusion,” 2024. [Online]. Available: https://arxiv.org/abs/2310.08579 [9] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” 2022. [Online]. Available: https://arxiv.org/abs/2112.10752 [10] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” 2021. [Online]. Available: https://arxiv.org/abs/2106.09685 [11] N. Ruiz, Y. Li, V. Jampani, Y. Pritch, M. Rubinstein, and K. Aberman,“Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation,” 2023. [Online]. Available: https://arxiv.org/abs/2208.12242 [12] R. Gal, Y. Alaluf, Y. Atzmon, O. Patashnik, A. H. Bermano, G. Chechik, and D. Cohen-Or, “An image is worth one word: Personalizing text-to-image generation using textual inversion,” 2022. [Online]. Available: https://arxiv.org/abs/2208.01618 [13] Y. Ren, X. Xia, Y. Lu, J. Zhang, J. Wu, P. Xie, X. Wang, and X. Xiao, “Hyper-sd: Trajectory segmented consistency model for efficient image synthesis,” 2024. [Online]. Available: https://arxiv.org/abs/2404.13686 [14] R. Dawkins, The Selfish Gene. Oxford University Press, 1976. [15] L. Shifman, Memes in Digital Culture. MIT Press, 2014. [16] R. M. Milner, The World Made Meme: Public Conversations and Participatory Media. MIT Press, 2016. [17] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” 2021. [Online]. Available: https://arxiv.org/abs/2103.00020 [18] A. Nichol, P. Dhariwal, A. Ramesh, P. Shyam, P. Mishkin, B. McGrew, I. Sutskever, and M. Chen, “Glide: Towards photorealistic image generation and editing with text-guided diffusion models,” 2021. [Online]. Available: https://arxiv.org/abs/2112.10741 [19] W. Peebles, S. Xie, and A. Kanazawa, “Dit: Self-supervised pre-training for vision transformers using denoising diffusion probabilistic models,” 2022. [Online]. Available: https://arxiv.org/abs/2212.09748 [20] A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, “Hierarchical text-conditional image generation with clip latents,” 2022. [Online]. Available: https://arxiv.org/abs/2204.06125 [21] C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. Denton, S. K. S. Ghasemipour, B. K. Ayan, S. Gupta, B. Hechtman et al., “Photorealistic text-to-image diffusion models with deep language understanding,” 2022. [Online]. Available: https://arxiv.org/abs/2205.11487 [22] D. Podell, Z. English, K. Lacey, A. Blattmann, T. Dockhorn, J. Müller, J. Penna, and R. Rombach, “Sdxl: Improving latent diffusion models for high-resolution image synthesis,” 2023. [Online]. Available: https://arxiv.org/abs/2307.01952 [23] R. Gal, Y. Alaluf, Y. Atzmon, O. Patashnik, A. H. Bermano, G. Chechik, and D. Cohen-Or, “An image is worth one word: Personalizing text-to-image generation using textual inversion,” 2022. [Online]. Available: https://arxiv.org/abs/2208.01618 [24] P. Wei, Y. Yuan, L. Li, X. Yuan, and H. Yu, “Ip-adapter: Text compatible image prompt adapter for vision-language models,” 2023. [Online]. Available: https://arxiv.org/abs/2303.06173 [25] L. Zhang and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” 2023. [Online]. Available: https://arxiv.org/abs/2302.05543 [26] N. Kumari, M. N. Zeiler, E. Arazi, and D. Cohen-Or, “Custom diffusion: Personalizing text-to-image diffusion models with iterative training,” 2022. [Online]. Available: https://arxiv.org/abs/2212.04082 [27] E. J. Hu, Y. Shen, P. Wallis, Z. Lu, R. Majumder, and G. Yang, “Lora: Low-rank adaptation of large language models,” 2021. [Online]. Available: https://arxiv.org/abs/2106.09685 [28] S. Liu, H. Zhou, Z. He, Z. Zhu, D. Li, X. Zhan, T. Xiang, and C. C. Loy, “Fastcomposer: Tuning-free multi-subject image generation with localized textual inversions,” 2023. [Online]. Available: https://arxiv.org/abs/2305.10431 [29] Q. Zhang, G. Lin, W. Lin, H. Yu, H. Chen, and C. Miao, “Photomaker: Controllable personalized image generation via semantic layouts,” 2023. [Online]. Available: https://arxiv.org/abs/2302.13521 [30] H. Zhang, F. Wang, R. Zhang, S. Lyu, H. Zhao, B. Dai, and B. Zhou, “Instantid: Text-guided instant personalization of text-to-image diffusion models,” 2023. [Online]. Available: https://arxiv.org/abs/2306.08190 [31] R. C. Gonzalez and R. E. Woods, Digital Image Processing, 4th ed. Pearson, 2018. [32] D. A. Forsyth and J. Ponce, Computer Vision: A Modern Approach, 2nd ed. Pearson, 2011. [33] C. Zhang and Z. Zhang, “Face detection with boosted gaussian features,” Pattern Recognition, vol. 43, no. 3, pp. 1025–1035, 2010. [34] M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” 2021. [Online]. Available: https://arxiv.org/abs/2104.14294 [35] J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “Arcface: Additive angular margin loss for deep face recognition,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4685–4694. [36] N. Zhou, D. Jurgens, and D. Bamman, “Social meme-ing: Measuring linguistic variation in memes,” 2023. [Online]. Available: https://arxiv.org/abs/2311.09130 [37] J. Huang, X. Dong, W. Song, H. Li, J. Zhou, Y. Cheng, S. Liao, L. Chen, Y. Yan, S. Liao, and X. Liang, “Consistentid: Portrait generation with multimodal fine-grained identity preserving,” 2024. [Online]. Available: https://arxiv.org/abs/2404.16771 [38] Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” in 2015 IEEE International Conference on Computer Vision (ICCV), Dec 2015, pp. 3730–3738. [39] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” CoRR, vol. abs/1812.04948, 2018. [Online]. Available: http://arxiv.org/abs/1812.04948 [40] AUTOMATIC1111, “Stable diffusion webui,” https://github.com/AUTOMATIC1111/stable-diffusion-webui, 2022. [41] Z. Cao, G. Hidalgo, T. Simon, S. Wei, and Y. Sheikh, “Openpose: Realtime multi-person 2d pose estimation using part affinity fields,” CoRR, vol. abs/1812.08008, 2018. [Online]. Available: http://arxiv.org/abs/1812.08008 [42] L. Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” 2023. [Online]. Available: https://arxiv.org/abs/2302.05543 [43] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVFConference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022,pp. 10 684–10 695.
描述	碩士國立政治大學資訊科學系 111753222
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0111753222
資料類型	thesis

dc.contributor.advisor	紀明德	zh_TW
dc.contributor.advisor	Chi, Ming-Te	en_US
dc.contributor.author (Authors)	李峻安	zh_TW
dc.contributor.author (Authors)	Li, Chun-An	en_US
dc.creator (作者)	李峻安	zh_TW
dc.creator (作者)	Li, Chun-An	en_US
dc.date (日期)	2025	en_US
dc.date.accessioned	1-Sep-2025 16:56:43 (UTC+8)	-
dc.date.available	1-Sep-2025 16:56:43 (UTC+8)	-
dc.date.issued (上傳時間)	1-Sep-2025 16:56:43 (UTC+8)	-
dc.identifier (Other Identifiers)	G0111753222	en_US
dc.identifier.uri (URI)	https://nccur.lib.nccu.edu.tw/handle/140.119/159411	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	資訊科學系	zh_TW
dc.description (描述)	111753222	zh_TW
dc.description.abstract (摘要)	網路文化中，迷因指的是迅速擴散的內容、迷因圖片、影片或語句，具有高度可模仿性、變異性與社群認同感。它們常常具備幽默、諷刺或時事元素，使得使用者能夠快速「共鳴」並創造延伸版本。本研究針對迷因圖片因形變導致的人臉檢測問題，本研究提出一個基於穩定擴散模型與圖像分割技術的轉換方法，該方法在保留原圖像關鍵識別特徵的前提下針對形變效果進行平滑修正，以提高人臉識別的準確率，並確保生成結果具備良好的視覺一致性。在個人化影像合成方面已經有顯著的進展，方法像是 InstantID 和 LoRA 。然而他們在現實世界的應用受限於迷因人臉偵測的限制。人像風格與迷因圖片構圖相融合的需求有技術門檻、冗長的模型微調以及圖像篩選或前處理。針對這些挑戰，本研究提出 Me2Meme ——一種基於擴散模型的解決方案。同時評估這一方法的效果和實用性，為跨領域藝術和圖像處理領域帶來新的啟發和應用。	zh_TW
dc.description.abstract (摘要)	In internet culture, memes refer to rapidly spreading content, images, videos, or phrases characterized by high imitability, variability, and community resonance. They often incorporate humor, satire, or topical elements, enabling users to quickly connect and create derivative versions. This study addresses the issue of face detection in meme images caused by deformation. We propose a transformation method based on stable diffusion models and image segmentation techniques. This method smooths deformation effects while preserving key identifying features of the original image, thereby improving face detection accuracy and ensuring visually consistent outputs. Significant progress has been made in personalized image synthesis, with methods like InstantID and LoRA. However, their real-world applications are limited by challenges in meme face detection. The integration of portrait styles with meme image compositions faces technical barriers, including complex model fine-tuning, image screening, or preprocessing. To address these challenges, this study introduces Me2Meme, a diffusion model-based solution. We also evaluate the effectiveness and practicality of this approach, offering new insights and applications for cross-disciplinary fields in art and image processing.	en_US
dc.description.tableofcontents	第一章緒論 1 1.1 研究動機與目的 1 1.2 問題定義 2 1.3 預期貢獻 4 第二章相關研究 6 2.1 文本到圖像的擴散模型 6 2.2 擴散模型中的個性化 7 2.3 風格抹平 9 第三章資料集 11 3.1 漫畫迷因圖像資料集的建立 11 第四章研究方法與步驟 14 4.1 研究方法 17 4.2 評估方法 20 4.2.1 量化評估指標 20 第五章實驗結果 23 5.1 量化結果與分析24 5.2 質化結果與分析 25 第六章結論與未來展望 29 參考文獻 32	zh_TW
dc.format.extent	8057019 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0111753222	en_US
dc.subject (關鍵詞)	Stable Diffusion	zh_TW
dc.subject (關鍵詞)	Portrait Style	zh_TW
dc.subject (關鍵詞)	Meme Image	zh_TW
dc.subject (關鍵詞)	Composition	zh_TW
dc.subject (關鍵詞)	Stable Diffusion	en_US
dc.subject (關鍵詞)	Portrait Style	en_US
dc.subject (關鍵詞)	Meme Image	en_US
dc.subject (關鍵詞)	Composition	en_US
dc.title (題名)	Stable Diffusion用於人像構圖轉換：從我到迷因	zh_TW
dc.title (題名)	Stable Diffusion used for Portrait Layout Transformation: From Me to Meme	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	[1] J. Yaniv, Y. Newman, and A. Shamir, “The face of art: landmark detection and geometric style in portraits,” ACM Transactions on Graphics, vol. 38, pp. 1–15, 07 2019. [2] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” 2020. [Online]. Available: https://arxiv.org/abs/2006.11239 [3] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” 2022. [Online]. Available: https://arxiv.org/abs/2112.10752 [4] N. Ruiz, Y. Li, V. Jampani, T. Hou, P. O. Pinheiro, T. Liu, A. Goyal, A. Lehrmann, and J. Johnson, “Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation,” 2023. [Online]. Available: https://arxiv.org/abs/2208.12242 [5] A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollár, and R. Girshick, “Segment anything,” 2023. [Online]. Available: https://arxiv.org/abs/2304.02643 [6] Z. Huang, K. C. K. Chan, Y. Jiang, and Z. Liu, “Collaborative diffusion for multi-modal face generation and editing,” 2023. [Online]. Available: https://arxiv.org/abs/2304.10530 [7] X. Ju, A. Zeng, C. Zhao, J. Wang, L. Zhang, and Q. Xu, “Humansd: A native skeleton-guided diffusion model for human image generation,” 2023. [Online]. Available: https://arxiv.org/abs/2304.04269 [8] X. Liu, J. Ren, A. Siarohin, I. Skorokhodov, Y. Li, D. Lin, X. Liu, Z. Liu, and S. Tulyakov, “Hyperhuman: Hyper-realistic human generation with latent structural diffusion,” 2024. [Online]. Available: https://arxiv.org/abs/2310.08579 [9] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” 2022. [Online]. Available: https://arxiv.org/abs/2112.10752 [10] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” 2021. [Online]. Available: https://arxiv.org/abs/2106.09685 [11] N. Ruiz, Y. Li, V. Jampani, Y. Pritch, M. Rubinstein, and K. Aberman,“Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation,” 2023. [Online]. Available: https://arxiv.org/abs/2208.12242 [12] R. Gal, Y. Alaluf, Y. Atzmon, O. Patashnik, A. H. Bermano, G. Chechik, and D. Cohen-Or, “An image is worth one word: Personalizing text-to-image generation using textual inversion,” 2022. [Online]. Available: https://arxiv.org/abs/2208.01618 [13] Y. Ren, X. Xia, Y. Lu, J. Zhang, J. Wu, P. Xie, X. Wang, and X. Xiao, “Hyper-sd: Trajectory segmented consistency model for efficient image synthesis,” 2024. [Online]. Available: https://arxiv.org/abs/2404.13686 [14] R. Dawkins, The Selfish Gene. Oxford University Press, 1976. [15] L. Shifman, Memes in Digital Culture. MIT Press, 2014. [16] R. M. Milner, The World Made Meme: Public Conversations and Participatory Media. MIT Press, 2016. [17] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” 2021. [Online]. Available: https://arxiv.org/abs/2103.00020 [18] A. Nichol, P. Dhariwal, A. Ramesh, P. Shyam, P. Mishkin, B. McGrew, I. Sutskever, and M. Chen, “Glide: Towards photorealistic image generation and editing with text-guided diffusion models,” 2021. [Online]. Available: https://arxiv.org/abs/2112.10741 [19] W. Peebles, S. Xie, and A. Kanazawa, “Dit: Self-supervised pre-training for vision transformers using denoising diffusion probabilistic models,” 2022. [Online]. Available: https://arxiv.org/abs/2212.09748 [20] A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, “Hierarchical text-conditional image generation with clip latents,” 2022. [Online]. Available: https://arxiv.org/abs/2204.06125 [21] C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. Denton, S. K. S. Ghasemipour, B. K. Ayan, S. Gupta, B. Hechtman et al., “Photorealistic text-to-image diffusion models with deep language understanding,” 2022. [Online]. Available: https://arxiv.org/abs/2205.11487 [22] D. Podell, Z. English, K. Lacey, A. Blattmann, T. Dockhorn, J. Müller, J. Penna, and R. Rombach, “Sdxl: Improving latent diffusion models for high-resolution image synthesis,” 2023. [Online]. Available: https://arxiv.org/abs/2307.01952 [23] R. Gal, Y. Alaluf, Y. Atzmon, O. Patashnik, A. H. Bermano, G. Chechik, and D. Cohen-Or, “An image is worth one word: Personalizing text-to-image generation using textual inversion,” 2022. [Online]. Available: https://arxiv.org/abs/2208.01618 [24] P. Wei, Y. Yuan, L. Li, X. Yuan, and H. Yu, “Ip-adapter: Text compatible image prompt adapter for vision-language models,” 2023. [Online]. Available: https://arxiv.org/abs/2303.06173 [25] L. Zhang and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” 2023. [Online]. Available: https://arxiv.org/abs/2302.05543 [26] N. Kumari, M. N. Zeiler, E. Arazi, and D. Cohen-Or, “Custom diffusion: Personalizing text-to-image diffusion models with iterative training,” 2022. [Online]. Available: https://arxiv.org/abs/2212.04082 [27] E. J. Hu, Y. Shen, P. Wallis, Z. Lu, R. Majumder, and G. Yang, “Lora: Low-rank adaptation of large language models,” 2021. [Online]. Available: https://arxiv.org/abs/2106.09685 [28] S. Liu, H. Zhou, Z. He, Z. Zhu, D. Li, X. Zhan, T. Xiang, and C. C. Loy, “Fastcomposer: Tuning-free multi-subject image generation with localized textual inversions,” 2023. [Online]. Available: https://arxiv.org/abs/2305.10431 [29] Q. Zhang, G. Lin, W. Lin, H. Yu, H. Chen, and C. Miao, “Photomaker: Controllable personalized image generation via semantic layouts,” 2023. [Online]. Available: https://arxiv.org/abs/2302.13521 [30] H. Zhang, F. Wang, R. Zhang, S. Lyu, H. Zhao, B. Dai, and B. Zhou, “Instantid: Text-guided instant personalization of text-to-image diffusion models,” 2023. [Online]. Available: https://arxiv.org/abs/2306.08190 [31] R. C. Gonzalez and R. E. Woods, Digital Image Processing, 4th ed. Pearson, 2018. [32] D. A. Forsyth and J. Ponce, Computer Vision: A Modern Approach, 2nd ed. Pearson, 2011. [33] C. Zhang and Z. Zhang, “Face detection with boosted gaussian features,” Pattern Recognition, vol. 43, no. 3, pp. 1025–1035, 2010. [34] M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” 2021. [Online]. Available: https://arxiv.org/abs/2104.14294 [35] J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “Arcface: Additive angular margin loss for deep face recognition,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4685–4694. [36] N. Zhou, D. Jurgens, and D. Bamman, “Social meme-ing: Measuring linguistic variation in memes,” 2023. [Online]. Available: https://arxiv.org/abs/2311.09130 [37] J. Huang, X. Dong, W. Song, H. Li, J. Zhou, Y. Cheng, S. Liao, L. Chen, Y. Yan, S. Liao, and X. Liang, “Consistentid: Portrait generation with multimodal fine-grained identity preserving,” 2024. [Online]. Available: https://arxiv.org/abs/2404.16771 [38] Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” in 2015 IEEE International Conference on Computer Vision (ICCV), Dec 2015, pp. 3730–3738. [39] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” CoRR, vol. abs/1812.04948, 2018. [Online]. Available: http://arxiv.org/abs/1812.04948 [40] AUTOMATIC1111, “Stable diffusion webui,” https://github.com/AUTOMATIC1111/stable-diffusion-webui, 2022. [41] Z. Cao, G. Hidalgo, T. Simon, S. Wei, and Y. Sheikh, “Openpose: Realtime multi-person 2d pose estimation using part affinity fields,” CoRR, vol. abs/1812.08008, 2018. [Online]. Available: http://arxiv.org/abs/1812.08008 [42] L. Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” 2023. [Online]. Available: https://arxiv.org/abs/2302.05543 [43] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVFConference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022,pp. 10 684–10 695.	zh_TW

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM