學術產出-Proceedings

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

題名 Lip Sync Matters: A Novel Multimodal Forgery Detector
作者 彭彥璁
Peng, Yan-Tsung;Shahzad, Sahibzada Adil;Hashmi, Ammarah;Khan, Sarwar;Tsao, Yu;Wang, Hsin-Min
貢獻者 資訊系
日期 2022-11
上傳時間 16-Feb-2024 15:36:53 (UTC+8)
摘要 Deepfake technology has advanced a lot, but it is a double-sided sword for the community. One can use it for beneficial purposes, such as restoring vintage content in old movies, or for nefarious purposes, such as creating fake footage to manipulate the public and distribute non-consensual pornography. A lot of work has been done to combat its improper use by detecting fake footage with good performance thanks to the availability of numerous public datasets and unimodal deep learning-based models. However, these methods are insufficient to detect multimodal manipulations, such as both visual and acoustic. This work proposes a novel lip-reading-based multi-modal Deepfake detection method called “Lip Sync Matters.” It targets high-level semantic features to exploit the mismatch between the lip sequence extracted from the video and the synthetic lip sequence generated from the audio by the Wav2lip model to detect forged videos. Experimental results show that the proposed method outperforms several existing unimodal, ensemble, and multimodal methods on the publicly available multimodal FakeAVCeleb dataset.
關聯 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), IEEE
資料類型 conference
DOI https://doi.org/10.23919/APSIPAASC55919.2022.9980296
dc.contributor 資訊系
dc.creator (作者) 彭彥璁
dc.creator (作者) Peng, Yan-Tsung;Shahzad, Sahibzada Adil;Hashmi, Ammarah;Khan, Sarwar;Tsao, Yu;Wang, Hsin-Min
dc.date (日期) 2022-11
dc.date.accessioned 16-Feb-2024 15:36:53 (UTC+8)-
dc.date.available 16-Feb-2024 15:36:53 (UTC+8)-
dc.date.issued (上傳時間) 16-Feb-2024 15:36:53 (UTC+8)-
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/149882-
dc.description.abstract (摘要) Deepfake technology has advanced a lot, but it is a double-sided sword for the community. One can use it for beneficial purposes, such as restoring vintage content in old movies, or for nefarious purposes, such as creating fake footage to manipulate the public and distribute non-consensual pornography. A lot of work has been done to combat its improper use by detecting fake footage with good performance thanks to the availability of numerous public datasets and unimodal deep learning-based models. However, these methods are insufficient to detect multimodal manipulations, such as both visual and acoustic. This work proposes a novel lip-reading-based multi-modal Deepfake detection method called “Lip Sync Matters.” It targets high-level semantic features to exploit the mismatch between the lip sequence extracted from the video and the synthetic lip sequence generated from the audio by the Wav2lip model to detect forged videos. Experimental results show that the proposed method outperforms several existing unimodal, ensemble, and multimodal methods on the publicly available multimodal FakeAVCeleb dataset.
dc.format.extent 116 bytes-
dc.format.mimetype text/html-
dc.relation (關聯) Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), IEEE
dc.title (題名) Lip Sync Matters: A Novel Multimodal Forgery Detector
dc.type (資料類型) conference
dc.identifier.doi (DOI) 10.23919/APSIPAASC55919.2022.9980296
dc.doi.uri (DOI) https://doi.org/10.23919/APSIPAASC55919.2022.9980296