Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 Stable Diffusion用於人像構圖轉換:從我到迷因
Stable Diffusion used for Portrait Layout Transformation: From Me to Meme
作者 李峻安
Li, Chun-An
貢獻者 紀明德
Chi, Ming-Te
李峻安
Li, Chun-An
關鍵詞 Stable Diffusion
Portrait Style
Meme Image
Composition
Stable Diffusion
Portrait Style
Meme Image
Composition
日期 2025
上傳時間 1-Sep-2025 16:56:43 (UTC+8)
摘要 網路文化中,迷因指的是迅速擴散的內容、迷因圖片、影片或語 句,具有高度可模仿性、變異性與社群認同感。它們常常具備幽默、 諷刺或時事元素,使得使用者能夠快速「共鳴」並創造延伸版本。本 研究針對迷因圖片因形變導致的人臉檢測問題,本研究提出一個基於 穩定擴散模型與圖像分割技術的轉換方法,該方法在保留原圖像關鍵 識別特徵的前提下針對形變效果進行平滑修正,以提高人臉識別的準 確率,並確保生成結果具備良好的視覺一致性。在個人化影像合成方 面已經有顯著的進展,方法像是 InstantID 和 LoRA 。然而他們在現實 世界的應用受限於迷因人臉偵測的限制。人像風格與迷因圖片構圖相 融合的需求有技術門檻、冗長的模型微調以及圖像篩選或前處理。針 對這些挑戰,本研究提出 Me2Meme ——一種基於擴散模型的解決方 案。同時評估這一方法的效果和實用性,為跨領域藝術和圖像處理領 域帶來新的啟發和應用。
In internet culture, memes refer to rapidly spreading content, images, videos, or phrases characterized by high imitability, variability, and community resonance. They often incorporate humor, satire, or topical elements, enabling users to quickly connect and create derivative versions. This study addresses the issue of face detection in meme images caused by deformation. We propose a transformation method based on stable diffusion models and image segmentation techniques. This method smooths deformation effects while preserving key identifying features of the original image, thereby improving face detection accuracy and ensuring visually consistent outputs. Significant progress has been made in personalized image synthesis, with methods like InstantID and LoRA. However, their real-world applications are limited by challenges in meme face detection. The integration of portrait styles with meme image compositions faces technical barriers, including complex model fine-tuning, image screening, or preprocessing. To address these challenges, this study introduces Me2Meme, a diffusion model-based solution. We also evaluate the effectiveness and practicality of this approach, offering new insights and applications for cross-disciplinary fields in art and image processing.
參考文獻 [1] J. Yaniv, Y. Newman, and A. Shamir, “The face of art: landmark detection and geometric style in portraits,” ACM Transactions on Graphics, vol. 38, pp. 1–15, 07 2019. [2] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” 2020. [Online]. Available: https://arxiv.org/abs/2006.11239 [3] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” 2022. [Online]. Available: https://arxiv.org/abs/2112.10752 [4] N. Ruiz, Y. Li, V. Jampani, T. Hou, P. O. Pinheiro, T. Liu, A. Goyal, A. Lehrmann, and J. Johnson, “Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation,” 2023. [Online]. Available: https://arxiv.org/abs/2208.12242 [5] A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollár, and R. Girshick, “Segment anything,” 2023. [Online]. Available: https://arxiv.org/abs/2304.02643 [6] Z. Huang, K. C. K. Chan, Y. Jiang, and Z. Liu, “Collaborative diffusion for multi-modal face generation and editing,” 2023. [Online]. Available: https://arxiv.org/abs/2304.10530 [7] X. Ju, A. Zeng, C. Zhao, J. Wang, L. Zhang, and Q. Xu, “Humansd: A native skeleton-guided diffusion model for human image generation,” 2023. [Online]. Available: https://arxiv.org/abs/2304.04269 [8] X. Liu, J. Ren, A. Siarohin, I. Skorokhodov, Y. Li, D. Lin, X. Liu, Z. Liu, and S. Tulyakov, “Hyperhuman: Hyper-realistic human generation with latent structural diffusion,” 2024. [Online]. Available: https://arxiv.org/abs/2310.08579 [9] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” 2022. [Online]. Available: https://arxiv.org/abs/2112.10752 [10] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” 2021. [Online]. Available: https://arxiv.org/abs/2106.09685 [11] N. Ruiz, Y. Li, V. Jampani, Y. Pritch, M. Rubinstein, and K. Aberman,“Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation,” 2023. [Online]. Available: https://arxiv.org/abs/2208.12242 [12] R. Gal, Y. Alaluf, Y. Atzmon, O. Patashnik, A. H. Bermano, G. Chechik, and D. Cohen-Or, “An image is worth one word: Personalizing text-to-image generation using textual inversion,” 2022. [Online]. Available: https://arxiv.org/abs/2208.01618 [13] Y. Ren, X. Xia, Y. Lu, J. Zhang, J. Wu, P. Xie, X. Wang, and X. Xiao, “Hyper-sd: Trajectory segmented consistency model for efficient image synthesis,” 2024. [Online]. Available: https://arxiv.org/abs/2404.13686 [14] R. Dawkins, The Selfish Gene. Oxford University Press, 1976. [15] L. Shifman, Memes in Digital Culture. MIT Press, 2014. [16] R. M. Milner, The World Made Meme: Public Conversations and Participatory Media. MIT Press, 2016. [17] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” 2021. [Online]. Available: https://arxiv.org/abs/2103.00020 [18] A. Nichol, P. Dhariwal, A. Ramesh, P. Shyam, P. Mishkin, B. McGrew, I. Sutskever, and M. Chen, “Glide: Towards photorealistic image generation and editing with text-guided diffusion models,” 2021. [Online]. Available: https://arxiv.org/abs/2112.10741 [19] W. Peebles, S. Xie, and A. Kanazawa, “Dit: Self-supervised pre-training for vision transformers using denoising diffusion probabilistic models,” 2022. [Online]. Available: https://arxiv.org/abs/2212.09748 [20] A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, “Hierarchical text-conditional image generation with clip latents,” 2022. [Online]. Available: https://arxiv.org/abs/2204.06125 [21] C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. Denton, S. K. S. Ghasemipour, B. K. Ayan, S. Gupta, B. Hechtman et al., “Photorealistic text-to-image diffusion models with deep language understanding,” 2022. [Online]. Available: https://arxiv.org/abs/2205.11487 [22] D. Podell, Z. English, K. Lacey, A. Blattmann, T. Dockhorn, J. Müller, J. Penna, and R. Rombach, “Sdxl: Improving latent diffusion models for high-resolution image synthesis,” 2023. [Online]. Available: https://arxiv.org/abs/2307.01952 [23] R. Gal, Y. Alaluf, Y. Atzmon, O. Patashnik, A. H. Bermano, G. Chechik, and D. Cohen-Or, “An image is worth one word: Personalizing text-to-image generation using textual inversion,” 2022. [Online]. Available: https://arxiv.org/abs/2208.01618 [24] P. Wei, Y. Yuan, L. Li, X. Yuan, and H. Yu, “Ip-adapter: Text compatible image prompt adapter for vision-language models,” 2023. [Online]. Available: https://arxiv.org/abs/2303.06173 [25] L. Zhang and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” 2023. [Online]. Available: https://arxiv.org/abs/2302.05543 [26] N. Kumari, M. N. Zeiler, E. Arazi, and D. Cohen-Or, “Custom diffusion: Personalizing text-to-image diffusion models with iterative training,” 2022. [Online]. Available: https://arxiv.org/abs/2212.04082 [27] E. J. Hu, Y. Shen, P. Wallis, Z. Lu, R. Majumder, and G. Yang, “Lora: Low-rank adaptation of large language models,” 2021. [Online]. Available: https://arxiv.org/abs/2106.09685 [28] S. Liu, H. Zhou, Z. He, Z. Zhu, D. Li, X. Zhan, T. Xiang, and C. C. Loy, “Fastcomposer: Tuning-free multi-subject image generation with localized textual inversions,” 2023. [Online]. Available: https://arxiv.org/abs/2305.10431 [29] Q. Zhang, G. Lin, W. Lin, H. Yu, H. Chen, and C. Miao, “Photomaker: Controllable personalized image generation via semantic layouts,” 2023. [Online]. Available: https://arxiv.org/abs/2302.13521 [30] H. Zhang, F. Wang, R. Zhang, S. Lyu, H. Zhao, B. Dai, and B. Zhou, “Instantid: Text-guided instant personalization of text-to-image diffusion models,” 2023. [Online]. Available: https://arxiv.org/abs/2306.08190 [31] R. C. Gonzalez and R. E. Woods, Digital Image Processing, 4th ed. Pearson, 2018. [32] D. A. Forsyth and J. Ponce, Computer Vision: A Modern Approach, 2nd ed. Pearson, 2011. [33] C. Zhang and Z. Zhang, “Face detection with boosted gaussian features,” Pattern Recognition, vol. 43, no. 3, pp. 1025–1035, 2010. [34] M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” 2021. [Online]. Available: https://arxiv.org/abs/2104.14294 [35] J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “Arcface: Additive angular margin loss for deep face recognition,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4685–4694. [36] N. Zhou, D. Jurgens, and D. Bamman, “Social meme-ing: Measuring linguistic variation in memes,” 2023. [Online]. Available: https://arxiv.org/abs/2311.09130 [37] J. Huang, X. Dong, W. Song, H. Li, J. Zhou, Y. Cheng, S. Liao, L. Chen, Y. Yan, S. Liao, and X. Liang, “Consistentid: Portrait generation with multimodal fine-grained identity preserving,” 2024. [Online]. Available: https://arxiv.org/abs/2404.16771 [38] Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” in 2015 IEEE International Conference on Computer Vision (ICCV), Dec 2015, pp. 3730–3738. [39] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” CoRR, vol. abs/1812.04948, 2018. [Online]. Available: http://arxiv.org/abs/1812.04948 [40] AUTOMATIC1111, “Stable diffusion webui,” https://github.com/AUTOMATIC1111/stable-diffusion-webui, 2022. [41] Z. Cao, G. Hidalgo, T. Simon, S. Wei, and Y. Sheikh, “Openpose: Realtime multi-person 2d pose estimation using part affinity fields,” CoRR, vol. abs/1812.08008, 2018. [Online]. Available: http://arxiv.org/abs/1812.08008 [42] L. Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” 2023. [Online]. Available: https://arxiv.org/abs/2302.05543 [43] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVFConference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022,pp. 10 684–10 695.
描述 碩士
國立政治大學
資訊科學系
111753222
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0111753222
資料類型 thesis
dc.contributor.advisor 紀明德zh_TW
dc.contributor.advisor Chi, Ming-Teen_US
dc.contributor.author (Authors) 李峻安zh_TW
dc.contributor.author (Authors) Li, Chun-Anen_US
dc.creator (作者) 李峻安zh_TW
dc.creator (作者) Li, Chun-Anen_US
dc.date (日期) 2025en_US
dc.date.accessioned 1-Sep-2025 16:56:43 (UTC+8)-
dc.date.available 1-Sep-2025 16:56:43 (UTC+8)-
dc.date.issued (上傳時間) 1-Sep-2025 16:56:43 (UTC+8)-
dc.identifier (Other Identifiers) G0111753222en_US
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/159411-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學系zh_TW
dc.description (描述) 111753222zh_TW
dc.description.abstract (摘要) 網路文化中,迷因指的是迅速擴散的內容、迷因圖片、影片或語 句,具有高度可模仿性、變異性與社群認同感。它們常常具備幽默、 諷刺或時事元素,使得使用者能夠快速「共鳴」並創造延伸版本。本 研究針對迷因圖片因形變導致的人臉檢測問題,本研究提出一個基於 穩定擴散模型與圖像分割技術的轉換方法,該方法在保留原圖像關鍵 識別特徵的前提下針對形變效果進行平滑修正,以提高人臉識別的準 確率,並確保生成結果具備良好的視覺一致性。在個人化影像合成方 面已經有顯著的進展,方法像是 InstantID 和 LoRA 。然而他們在現實 世界的應用受限於迷因人臉偵測的限制。人像風格與迷因圖片構圖相 融合的需求有技術門檻、冗長的模型微調以及圖像篩選或前處理。針 對這些挑戰,本研究提出 Me2Meme ——一種基於擴散模型的解決方 案。同時評估這一方法的效果和實用性,為跨領域藝術和圖像處理領 域帶來新的啟發和應用。zh_TW
dc.description.abstract (摘要) In internet culture, memes refer to rapidly spreading content, images, videos, or phrases characterized by high imitability, variability, and community resonance. They often incorporate humor, satire, or topical elements, enabling users to quickly connect and create derivative versions. This study addresses the issue of face detection in meme images caused by deformation. We propose a transformation method based on stable diffusion models and image segmentation techniques. This method smooths deformation effects while preserving key identifying features of the original image, thereby improving face detection accuracy and ensuring visually consistent outputs. Significant progress has been made in personalized image synthesis, with methods like InstantID and LoRA. However, their real-world applications are limited by challenges in meme face detection. The integration of portrait styles with meme image compositions faces technical barriers, including complex model fine-tuning, image screening, or preprocessing. To address these challenges, this study introduces Me2Meme, a diffusion model-based solution. We also evaluate the effectiveness and practicality of this approach, offering new insights and applications for cross-disciplinary fields in art and image processing.en_US
dc.description.tableofcontents 第一章 緒論 1 1.1 研究動機與目的 1 1.2 問題定義 2 1.3 預期貢獻 4 第二章 相關研究 6 2.1 文本到圖像的擴散模型 6 2.2 擴散模型中的個性化 7 2.3 風格抹平 9 第三章 資料集 11 3.1 漫畫迷因圖像資料集的建立 11 第四章 研究方法與步驟 14 4.1 研究方法 17 4.2 評估方法 20 4.2.1 量化評估指標 20 第五章 實驗結果 23 5.1 量化結果與分析24 5.2 質化結果與分析 25 第六章 結論與未來展望 29 參考文獻 32zh_TW
dc.format.extent 8057019 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0111753222en_US
dc.subject (關鍵詞) Stable Diffusionzh_TW
dc.subject (關鍵詞) Portrait Stylezh_TW
dc.subject (關鍵詞) Meme Imagezh_TW
dc.subject (關鍵詞) Compositionzh_TW
dc.subject (關鍵詞) Stable Diffusionen_US
dc.subject (關鍵詞) Portrait Styleen_US
dc.subject (關鍵詞) Meme Imageen_US
dc.subject (關鍵詞) Compositionen_US
dc.title (題名) Stable Diffusion用於人像構圖轉換:從我到迷因zh_TW
dc.title (題名) Stable Diffusion used for Portrait Layout Transformation: From Me to Memeen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1] J. Yaniv, Y. Newman, and A. Shamir, “The face of art: landmark detection and geometric style in portraits,” ACM Transactions on Graphics, vol. 38, pp. 1–15, 07 2019. [2] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” 2020. [Online]. Available: https://arxiv.org/abs/2006.11239 [3] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” 2022. [Online]. Available: https://arxiv.org/abs/2112.10752 [4] N. Ruiz, Y. Li, V. Jampani, T. Hou, P. O. Pinheiro, T. Liu, A. Goyal, A. Lehrmann, and J. Johnson, “Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation,” 2023. [Online]. Available: https://arxiv.org/abs/2208.12242 [5] A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollár, and R. Girshick, “Segment anything,” 2023. [Online]. Available: https://arxiv.org/abs/2304.02643 [6] Z. Huang, K. C. K. Chan, Y. Jiang, and Z. Liu, “Collaborative diffusion for multi-modal face generation and editing,” 2023. [Online]. Available: https://arxiv.org/abs/2304.10530 [7] X. Ju, A. Zeng, C. Zhao, J. Wang, L. Zhang, and Q. Xu, “Humansd: A native skeleton-guided diffusion model for human image generation,” 2023. [Online]. Available: https://arxiv.org/abs/2304.04269 [8] X. Liu, J. Ren, A. Siarohin, I. Skorokhodov, Y. Li, D. Lin, X. Liu, Z. Liu, and S. Tulyakov, “Hyperhuman: Hyper-realistic human generation with latent structural diffusion,” 2024. [Online]. Available: https://arxiv.org/abs/2310.08579 [9] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” 2022. [Online]. Available: https://arxiv.org/abs/2112.10752 [10] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” 2021. [Online]. Available: https://arxiv.org/abs/2106.09685 [11] N. Ruiz, Y. Li, V. Jampani, Y. Pritch, M. Rubinstein, and K. Aberman,“Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation,” 2023. [Online]. Available: https://arxiv.org/abs/2208.12242 [12] R. Gal, Y. Alaluf, Y. Atzmon, O. Patashnik, A. H. Bermano, G. Chechik, and D. Cohen-Or, “An image is worth one word: Personalizing text-to-image generation using textual inversion,” 2022. [Online]. Available: https://arxiv.org/abs/2208.01618 [13] Y. Ren, X. Xia, Y. Lu, J. Zhang, J. Wu, P. Xie, X. Wang, and X. Xiao, “Hyper-sd: Trajectory segmented consistency model for efficient image synthesis,” 2024. [Online]. Available: https://arxiv.org/abs/2404.13686 [14] R. Dawkins, The Selfish Gene. Oxford University Press, 1976. [15] L. Shifman, Memes in Digital Culture. MIT Press, 2014. [16] R. M. Milner, The World Made Meme: Public Conversations and Participatory Media. MIT Press, 2016. [17] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” 2021. [Online]. Available: https://arxiv.org/abs/2103.00020 [18] A. Nichol, P. Dhariwal, A. Ramesh, P. Shyam, P. Mishkin, B. McGrew, I. Sutskever, and M. Chen, “Glide: Towards photorealistic image generation and editing with text-guided diffusion models,” 2021. [Online]. Available: https://arxiv.org/abs/2112.10741 [19] W. Peebles, S. Xie, and A. Kanazawa, “Dit: Self-supervised pre-training for vision transformers using denoising diffusion probabilistic models,” 2022. [Online]. Available: https://arxiv.org/abs/2212.09748 [20] A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, “Hierarchical text-conditional image generation with clip latents,” 2022. [Online]. Available: https://arxiv.org/abs/2204.06125 [21] C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. Denton, S. K. S. Ghasemipour, B. K. Ayan, S. Gupta, B. Hechtman et al., “Photorealistic text-to-image diffusion models with deep language understanding,” 2022. [Online]. Available: https://arxiv.org/abs/2205.11487 [22] D. Podell, Z. English, K. Lacey, A. Blattmann, T. Dockhorn, J. Müller, J. Penna, and R. Rombach, “Sdxl: Improving latent diffusion models for high-resolution image synthesis,” 2023. [Online]. Available: https://arxiv.org/abs/2307.01952 [23] R. Gal, Y. Alaluf, Y. Atzmon, O. Patashnik, A. H. Bermano, G. Chechik, and D. Cohen-Or, “An image is worth one word: Personalizing text-to-image generation using textual inversion,” 2022. [Online]. Available: https://arxiv.org/abs/2208.01618 [24] P. Wei, Y. Yuan, L. Li, X. Yuan, and H. Yu, “Ip-adapter: Text compatible image prompt adapter for vision-language models,” 2023. [Online]. Available: https://arxiv.org/abs/2303.06173 [25] L. Zhang and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” 2023. [Online]. Available: https://arxiv.org/abs/2302.05543 [26] N. Kumari, M. N. Zeiler, E. Arazi, and D. Cohen-Or, “Custom diffusion: Personalizing text-to-image diffusion models with iterative training,” 2022. [Online]. Available: https://arxiv.org/abs/2212.04082 [27] E. J. Hu, Y. Shen, P. Wallis, Z. Lu, R. Majumder, and G. Yang, “Lora: Low-rank adaptation of large language models,” 2021. [Online]. Available: https://arxiv.org/abs/2106.09685 [28] S. Liu, H. Zhou, Z. He, Z. Zhu, D. Li, X. Zhan, T. Xiang, and C. C. Loy, “Fastcomposer: Tuning-free multi-subject image generation with localized textual inversions,” 2023. [Online]. Available: https://arxiv.org/abs/2305.10431 [29] Q. Zhang, G. Lin, W. Lin, H. Yu, H. Chen, and C. Miao, “Photomaker: Controllable personalized image generation via semantic layouts,” 2023. [Online]. Available: https://arxiv.org/abs/2302.13521 [30] H. Zhang, F. Wang, R. Zhang, S. Lyu, H. Zhao, B. Dai, and B. Zhou, “Instantid: Text-guided instant personalization of text-to-image diffusion models,” 2023. [Online]. Available: https://arxiv.org/abs/2306.08190 [31] R. C. Gonzalez and R. E. Woods, Digital Image Processing, 4th ed. Pearson, 2018. [32] D. A. Forsyth and J. Ponce, Computer Vision: A Modern Approach, 2nd ed. Pearson, 2011. [33] C. Zhang and Z. Zhang, “Face detection with boosted gaussian features,” Pattern Recognition, vol. 43, no. 3, pp. 1025–1035, 2010. [34] M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” 2021. [Online]. Available: https://arxiv.org/abs/2104.14294 [35] J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “Arcface: Additive angular margin loss for deep face recognition,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4685–4694. [36] N. Zhou, D. Jurgens, and D. Bamman, “Social meme-ing: Measuring linguistic variation in memes,” 2023. [Online]. Available: https://arxiv.org/abs/2311.09130 [37] J. Huang, X. Dong, W. Song, H. Li, J. Zhou, Y. Cheng, S. Liao, L. Chen, Y. Yan, S. Liao, and X. Liang, “Consistentid: Portrait generation with multimodal fine-grained identity preserving,” 2024. [Online]. Available: https://arxiv.org/abs/2404.16771 [38] Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” in 2015 IEEE International Conference on Computer Vision (ICCV), Dec 2015, pp. 3730–3738. [39] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” CoRR, vol. abs/1812.04948, 2018. [Online]. Available: http://arxiv.org/abs/1812.04948 [40] AUTOMATIC1111, “Stable diffusion webui,” https://github.com/AUTOMATIC1111/stable-diffusion-webui, 2022. [41] Z. Cao, G. Hidalgo, T. Simon, S. Wei, and Y. Sheikh, “Openpose: Realtime multi-person 2d pose estimation using part affinity fields,” CoRR, vol. abs/1812.08008, 2018. [Online]. Available: http://arxiv.org/abs/1812.08008 [42] L. Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” 2023. [Online]. Available: https://arxiv.org/abs/2302.05543 [43] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVFConference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022,pp. 10 684–10 695.zh_TW