Lip Sync Matters: A Novel Multimodal Forgery Detector | NCCU Academic Hub

學術產出-Proceedings

Article View/Open

html(24)

Publication Export

Google Scholar^TM

政大圖書館

學術資源探索系統

Citation Infomation

Simple Record
Full Record

題名	Lip Sync Matters: A Novel Multimodal Forgery Detector
作者	彭彥璁 Peng, Yan-Tsung;Shahzad, Sahibzada Adil;Hashmi, Ammarah;Khan, Sarwar;Tsao, Yu;Wang, Hsin-Min
貢獻者	資訊系
日期	2022-11
上傳時間	16-Feb-2024 15:36:53 (UTC+8)
摘要	Deepfake technology has advanced a lot, but it is a double-sided sword for the community. One can use it for beneficial purposes, such as restoring vintage content in old movies, or for nefarious purposes, such as creating fake footage to manipulate the public and distribute non-consensual pornography. A lot of work has been done to combat its improper use by detecting fake footage with good performance thanks to the availability of numerous public datasets and unimodal deep learning-based models. However, these methods are insufficient to detect multimodal manipulations, such as both visual and acoustic. This work proposes a novel lip-reading-based multi-modal Deepfake detection method called “Lip Sync Matters.” It targets high-level semantic features to exploit the mismatch between the lip sequence extracted from the video and the synthetic lip sequence generated from the audio by the Wav2lip model to detect forged videos. Experimental results show that the proposed method outperforms several existing unimodal, ensemble, and multimodal methods on the publicly available multimodal FakeAVCeleb dataset.
關聯	Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), IEEE
資料類型	conference
DOI	https://doi.org/10.23919/APSIPAASC55919.2022.9980296

dc.contributor	資訊系
dc.creator (作者)	彭彥璁
dc.creator (作者)	Peng, Yan-Tsung;Shahzad, Sahibzada Adil;Hashmi, Ammarah;Khan, Sarwar;Tsao, Yu;Wang, Hsin-Min
dc.date (日期)	2022-11
dc.date.accessioned	16-Feb-2024 15:36:53 (UTC+8)	-
dc.date.available	16-Feb-2024 15:36:53 (UTC+8)	-
dc.date.issued (上傳時間)	16-Feb-2024 15:36:53 (UTC+8)	-
dc.identifier.uri (URI)	https://nccur.lib.nccu.edu.tw/handle/140.119/149882	-
dc.description.abstract (摘要)	Deepfake technology has advanced a lot, but it is a double-sided sword for the community. One can use it for beneficial purposes, such as restoring vintage content in old movies, or for nefarious purposes, such as creating fake footage to manipulate the public and distribute non-consensual pornography. A lot of work has been done to combat its improper use by detecting fake footage with good performance thanks to the availability of numerous public datasets and unimodal deep learning-based models. However, these methods are insufficient to detect multimodal manipulations, such as both visual and acoustic. This work proposes a novel lip-reading-based multi-modal Deepfake detection method called “Lip Sync Matters.” It targets high-level semantic features to exploit the mismatch between the lip sequence extracted from the video and the synthetic lip sequence generated from the audio by the Wav2lip model to detect forged videos. Experimental results show that the proposed method outperforms several existing unimodal, ensemble, and multimodal methods on the publicly available multimodal FakeAVCeleb dataset.
dc.format.extent	116 bytes	-
dc.format.mimetype	text/html	-
dc.relation (關聯)	Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), IEEE
dc.title (題名)	Lip Sync Matters: A Novel Multimodal Forgery Detector
dc.type (資料類型)	conference
dc.identifier.doi (DOI)	10.23919/APSIPAASC55919.2022.9980296
dc.doi.uri (DOI)	https://doi.org/10.23919/APSIPAASC55919.2022.9980296