Publications-Periodical Articles

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 CapST: Leveraging Capsule Networks and Temporal Attention for Accurate Model Attribution in Deep-fake Videos
作者 汪新
Ahmad, Wasim;Peng, Yan Tsung;Chang, Yuan-Hao;Ganfure, Gaddisa Olani;Khan, Sarwar
貢獻者 群智博五
日期 2025-04
上傳時間 27-May-2025 11:09:35 (UTC+8)
摘要 Deep-fake videos, generated through AI face-swapping techniques, have garnered considerable attention due to their potential for impactful impersonation attacks. While existing research primarily distinguishes real from fake videos, attributing a deep-fake to its specific generation model or encoder is crucial for forensic investigation, enabling precise source tracing and tailored countermeasures. This approach not only enhances detection accuracy by leveraging unique model-specific artifacts but also provides insights essential for developing proactive defenses against evolving deep-fake techniques. Addressing this gap, this article investigates the model attribution problem for deep-fake videos using two datasets—Deepfakes from Different Models (DFDM) and GANGen-Detection, which comprise deep-fake videos and images generated by GAN models. We select only fake images from the GANGen-Detection dataset to align with the DFDM dataset, which specifies the goal of this study, focusing on model attribution rather than real/fake classification. This study formulates deep-fake model attribution as a multiclass classification task, introducing a novel Capsule-Spatial-Temporal (CapST) model that effectively integrates a modified VGG19 (utilizing only the first 26 out of 52 layers) for feature extraction, combined with Capsule Networks and a Spatio-Temporal attention mechanism. The Capsule module captures intricate feature hierarchies, enabling robust identification of deep-fake attributes, while a video-level fusion technique leverages temporal attention mechanisms to process concatenated feature vectors and capture temporal dependencies in deep-fake videos. By aggregating insights across frames, our model achieves a comprehensive understanding of video content, resulting in more precise predictions. Experimental results on the DFDM and GANGen-Detection datasets demonstrate the efficacy of CapST, achieving substantial improvements in accurately categorizing deep-fake videos over baseline models, all while demanding fewer computational resources.
關聯 ACM Transactions on Multimedia Computing, Communications and Applications, Vol.21, No.4, pp.1-23
資料類型 article
DOI https://doi.org/10.1145/3715138
dc.contributor 群智博五
dc.creator (作者) 汪新
dc.creator (作者) Ahmad, Wasim;Peng, Yan Tsung;Chang, Yuan-Hao;Ganfure, Gaddisa Olani;Khan, Sarwar
dc.date (日期) 2025-04
dc.date.accessioned 27-May-2025 11:09:35 (UTC+8)-
dc.date.available 27-May-2025 11:09:35 (UTC+8)-
dc.date.issued (上傳時間) 27-May-2025 11:09:35 (UTC+8)-
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/157107-
dc.description.abstract (摘要) Deep-fake videos, generated through AI face-swapping techniques, have garnered considerable attention due to their potential for impactful impersonation attacks. While existing research primarily distinguishes real from fake videos, attributing a deep-fake to its specific generation model or encoder is crucial for forensic investigation, enabling precise source tracing and tailored countermeasures. This approach not only enhances detection accuracy by leveraging unique model-specific artifacts but also provides insights essential for developing proactive defenses against evolving deep-fake techniques. Addressing this gap, this article investigates the model attribution problem for deep-fake videos using two datasets—Deepfakes from Different Models (DFDM) and GANGen-Detection, which comprise deep-fake videos and images generated by GAN models. We select only fake images from the GANGen-Detection dataset to align with the DFDM dataset, which specifies the goal of this study, focusing on model attribution rather than real/fake classification. This study formulates deep-fake model attribution as a multiclass classification task, introducing a novel Capsule-Spatial-Temporal (CapST) model that effectively integrates a modified VGG19 (utilizing only the first 26 out of 52 layers) for feature extraction, combined with Capsule Networks and a Spatio-Temporal attention mechanism. The Capsule module captures intricate feature hierarchies, enabling robust identification of deep-fake attributes, while a video-level fusion technique leverages temporal attention mechanisms to process concatenated feature vectors and capture temporal dependencies in deep-fake videos. By aggregating insights across frames, our model achieves a comprehensive understanding of video content, resulting in more precise predictions. Experimental results on the DFDM and GANGen-Detection datasets demonstrate the efficacy of CapST, achieving substantial improvements in accurately categorizing deep-fake videos over baseline models, all while demanding fewer computational resources.
dc.format.extent 95 bytes-
dc.format.mimetype text/html-
dc.relation (關聯) ACM Transactions on Multimedia Computing, Communications and Applications, Vol.21, No.4, pp.1-23
dc.title (題名) CapST: Leveraging Capsule Networks and Temporal Attention for Accurate Model Attribution in Deep-fake Videos
dc.type (資料類型) article
dc.identifier.doi (DOI) 10.1145/3715138
dc.doi.uri (DOI) https://doi.org/10.1145/3715138