Audio-driven facial landmark generation in violin performance using 3DCNN network with self attention model | Publication | NCCU Academic Hub

Publications-Proceedings

Article View/Open

html(302)

Publication Export

Google Scholar^TM

NCCU Library

Discovery System

Citation Infomation

Loading...

Loading...

Related Publications in TAIR

Simple Record
Full Record

Title	Audio-driven facial landmark generation in violin performance using 3DCNN network with self attention model
Creator	劉昭麟 Liu, Chao-Lin;Lin, Ting-Wei;Su, Li
Contributor	資訊系
Key Words	music to face generation; facial landmarks generation; music-face dataset; 3DCNN; self-attention
Date	2023-06
Date Issued	30-Nov-2023 11:26:28 (UTC+8)
Summary	In a music scenario, both auditory and visual elements are essential to achieve an outstanding performance. Recent research has focused on the generation of body movements or fingering from audio in music performance. The audio-driven face generation technique in music performance is still deficient. In this paper, we compile a violin soundtrack and facial expression dataset (VSFE) for modeling facial expressions in violin performance. To our knowledge, this is the first dataset mapping the relationship between violin performance audio and musicians’ facial expressions. We then propose a 3DCNN network with self-attention and residual blocks for audio-driven facial expression generation. In the experiments, we compare our methods with three baselines on talking face generation.
Relation	Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, pp.1-5
Type	conference
DOI	https://doi.org/10.1109/ICASSP49357.2023.10096358

dc.contributor	資訊系
dc.creator (作者)	劉昭麟
dc.creator (作者)	Liu, Chao-Lin;Lin, Ting-Wei;Su, Li
dc.date (日期)	2023-06
dc.date.accessioned	30-Nov-2023 11:26:28 (UTC+8)	-
dc.date.available	30-Nov-2023 11:26:28 (UTC+8)	-
dc.date.issued (上傳時間)	30-Nov-2023 11:26:28 (UTC+8)	-
dc.identifier.uri (URI)	https://nccur.lib.nccu.edu.tw/handle/140.119/148299	-
dc.description.abstract (摘要)	In a music scenario, both auditory and visual elements are essential to achieve an outstanding performance. Recent research has focused on the generation of body movements or fingering from audio in music performance. The audio-driven face generation technique in music performance is still deficient. In this paper, we compile a violin soundtrack and facial expression dataset (VSFE) for modeling facial expressions in violin performance. To our knowledge, this is the first dataset mapping the relationship between violin performance audio and musicians’ facial expressions. We then propose a 3DCNN network with self-attention and residual blocks for audio-driven facial expression generation. In the experiments, we compare our methods with three baselines on talking face generation.
dc.format.extent	113 bytes	-
dc.format.mimetype	text/html	-
dc.relation (關聯)	Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, pp.1-5
dc.subject (關鍵詞)	music to face generation; facial landmarks generation; music-face dataset; 3DCNN; self-attention
dc.title (題名)	Audio-driven facial landmark generation in violin performance using 3DCNN network with self attention model
dc.type (資料類型)	conference
dc.identifier.doi (DOI)	10.1109/ICASSP49357.2023.10096358
dc.doi.uri (DOI)	https://doi.org/10.1109/ICASSP49357.2023.10096358