Publications-Proceedings

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 The Impact of Parroting Mode on Cross-Lingual Speaker Recognition
作者 廖文宏
Liao, Wen-Hung;Ou, Yen-Chun;Chen, Po-Han;Wu, Yi-Chieh
貢獻者 資訊系
關鍵詞 text-independent speaker recognition; cross-lingual dataset; deep-learning; audio embedding; parroting mode
日期 2023-12
上傳時間 7-Jan-2025 09:36:43 (UTC+8)
摘要 People use multiple languages in their daily lives across regions worldwide, which motivated us to investigate cross-lingual speaker recognition. In this work, we propose to collect recordings of Mandarin and Spanish, namely the Mandarin-Spanish-Speech Dataset (MSSD-40), to analyze the performance of various audio embeddings for cross-lingual speaker recognition tasks. All participants are fluent in Mandarin, but none of the participants have prior knowledge of the Spanish language. As such, they have been advised to adopt a parroting mode of Spanish speech production, wherein they simply repeat the sounds emanating from the loudspeaker. Using this approach, variations resulting from individual differences in language fluency can be reduced, enabling us to focus on the anatomical aspects of the speech production mechanism.Embeddings extracted from models pre-trained with a large number of audio segments have become effective solutions for coping with audio analysis tasks using small datasets. Preliminary experimental results using two collected multi-lingual datasets indicate that both embedding methods and the language employed will affect the robustness of the speaker recognition task. Precisely, stable performance is observed when familiar languages are used. BEATs embedding generates the best outcome in all languages when no fine-tuning is exercised.
關聯 Proceedings of the 25th International Sympisium on Multimedia, IEEE Technical Committee on Multimedia (TCMC), pp.193-197
資料類型 conference
DOI https://doi.org/10.1109/ISM59092.2023.00035
dc.contributor 資訊系
dc.creator (作者) 廖文宏
dc.creator (作者) Liao, Wen-Hung;Ou, Yen-Chun;Chen, Po-Han;Wu, Yi-Chieh
dc.date (日期) 2023-12
dc.date.accessioned 7-Jan-2025 09:36:43 (UTC+8)-
dc.date.available 7-Jan-2025 09:36:43 (UTC+8)-
dc.date.issued (上傳時間) 7-Jan-2025 09:36:43 (UTC+8)-
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/155078-
dc.description.abstract (摘要) People use multiple languages in their daily lives across regions worldwide, which motivated us to investigate cross-lingual speaker recognition. In this work, we propose to collect recordings of Mandarin and Spanish, namely the Mandarin-Spanish-Speech Dataset (MSSD-40), to analyze the performance of various audio embeddings for cross-lingual speaker recognition tasks. All participants are fluent in Mandarin, but none of the participants have prior knowledge of the Spanish language. As such, they have been advised to adopt a parroting mode of Spanish speech production, wherein they simply repeat the sounds emanating from the loudspeaker. Using this approach, variations resulting from individual differences in language fluency can be reduced, enabling us to focus on the anatomical aspects of the speech production mechanism.Embeddings extracted from models pre-trained with a large number of audio segments have become effective solutions for coping with audio analysis tasks using small datasets. Preliminary experimental results using two collected multi-lingual datasets indicate that both embedding methods and the language employed will affect the robustness of the speaker recognition task. Precisely, stable performance is observed when familiar languages are used. BEATs embedding generates the best outcome in all languages when no fine-tuning is exercised.
dc.format.extent 107 bytes-
dc.format.mimetype text/html-
dc.relation (關聯) Proceedings of the 25th International Sympisium on Multimedia, IEEE Technical Committee on Multimedia (TCMC), pp.193-197
dc.subject (關鍵詞) text-independent speaker recognition; cross-lingual dataset; deep-learning; audio embedding; parroting mode
dc.title (題名) The Impact of Parroting Mode on Cross-Lingual Speaker Recognition
dc.type (資料類型) conference
dc.identifier.doi (DOI) 10.1109/ISM59092.2023.00035
dc.doi.uri (DOI) https://doi.org/10.1109/ISM59092.2023.00035