The Impact of Parroting Mode on Cross-Lingual Speaker Recognition | Publication | NCCU Academic Hub

Publications-Proceedings

Article View/Open

html(363)

Publication Export

Google Scholar^TM

NCCU Library

Discovery System

Citation Infomation

Related Publications in TAIR

Simple Record
Full Record

題名	The Impact of Parroting Mode on Cross-Lingual Speaker Recognition
作者	廖文宏 Liao, Wen-Hung;Ou, Yen-Chun;Chen, Po-Han;Wu, Yi-Chieh
貢獻者	資訊系
關鍵詞	text-independent speaker recognition; cross-lingual dataset; deep-learning; audio embedding; parroting mode
日期	2023-12
上傳時間	7-Jan-2025 09:36:43 (UTC+8)
摘要	People use multiple languages in their daily lives across regions worldwide, which motivated us to investigate cross-lingual speaker recognition. In this work, we propose to collect recordings of Mandarin and Spanish, namely the Mandarin-Spanish-Speech Dataset (MSSD-40), to analyze the performance of various audio embeddings for cross-lingual speaker recognition tasks. All participants are fluent in Mandarin, but none of the participants have prior knowledge of the Spanish language. As such, they have been advised to adopt a parroting mode of Spanish speech production, wherein they simply repeat the sounds emanating from the loudspeaker. Using this approach, variations resulting from individual differences in language fluency can be reduced, enabling us to focus on the anatomical aspects of the speech production mechanism.Embeddings extracted from models pre-trained with a large number of audio segments have become effective solutions for coping with audio analysis tasks using small datasets. Preliminary experimental results using two collected multi-lingual datasets indicate that both embedding methods and the language employed will affect the robustness of the speaker recognition task. Precisely, stable performance is observed when familiar languages are used. BEATs embedding generates the best outcome in all languages when no fine-tuning is exercised.
關聯	Proceedings of the 25th International Sympisium on Multimedia, IEEE Technical Committee on Multimedia (TCMC), pp.193-197
資料類型	conference
DOI	https://doi.org/10.1109/ISM59092.2023.00035

dc.contributor	資訊系
dc.creator (作者)	廖文宏
dc.creator (作者)	Liao, Wen-Hung;Ou, Yen-Chun;Chen, Po-Han;Wu, Yi-Chieh
dc.date (日期)	2023-12
dc.date.accessioned	7-Jan-2025 09:36:43 (UTC+8)	-
dc.date.available	7-Jan-2025 09:36:43 (UTC+8)	-
dc.date.issued (上傳時間)	7-Jan-2025 09:36:43 (UTC+8)	-
dc.identifier.uri (URI)	https://ah.lib.nccu.edu.tw/item?item_id=174613	-
dc.description.abstract (摘要)	People use multiple languages in their daily lives across regions worldwide, which motivated us to investigate cross-lingual speaker recognition. In this work, we propose to collect recordings of Mandarin and Spanish, namely the Mandarin-Spanish-Speech Dataset (MSSD-40), to analyze the performance of various audio embeddings for cross-lingual speaker recognition tasks. All participants are fluent in Mandarin, but none of the participants have prior knowledge of the Spanish language. As such, they have been advised to adopt a parroting mode of Spanish speech production, wherein they simply repeat the sounds emanating from the loudspeaker. Using this approach, variations resulting from individual differences in language fluency can be reduced, enabling us to focus on the anatomical aspects of the speech production mechanism.Embeddings extracted from models pre-trained with a large number of audio segments have become effective solutions for coping with audio analysis tasks using small datasets. Preliminary experimental results using two collected multi-lingual datasets indicate that both embedding methods and the language employed will affect the robustness of the speaker recognition task. Precisely, stable performance is observed when familiar languages are used. BEATs embedding generates the best outcome in all languages when no fine-tuning is exercised.
dc.format.extent	107 bytes	-
dc.format.mimetype	text/html	-
dc.relation (關聯)	Proceedings of the 25th International Sympisium on Multimedia, IEEE Technical Committee on Multimedia (TCMC), pp.193-197
dc.subject (關鍵詞)	text-independent speaker recognition; cross-lingual dataset; deep-learning; audio embedding; parroting mode
dc.title (題名)	The Impact of Parroting Mode on Cross-Lingual Speaker Recognition
dc.type (資料類型)	conference
dc.identifier.doi (DOI)	10.1109/ISM59092.2023.00035
dc.doi.uri (DOI)	https://doi.org/10.1109/ISM59092.2023.00035