學術產出-Proceedings

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

題名 Toward Text-independent Cross-lingual Speaker Recognition Using English-Mandarin-Taiwanese Dataset
作者 吳怡潔;廖文宏
Wu, Yi-Chieh;Liao, Wen-Hung
貢獻者 AI中心
關鍵詞 Speaker recognition; Acoustic features; Text- independent speaker identification; Cross-lingual dataset
日期 2021-01
上傳時間 22-Dec-2023 10:30:45 (UTC+8)
摘要 Over 40% of the world's population is bilingual. Existing speaker identification/verification systems, however, assume the same language type for both enrollment and recognition stages. In this work, we investigate the feasibility of employing multilingual speech for biometric applications. We establish a dataset containing audio recorded in English, Mandarin and Taiwanese. Three acoustic features, namely, i-vector, d-vector and x-vector have been evaluated for both speaker verification (SV) and identification (SI) tasks. Preliminary experimental results indicate that x-vector achieves the best overall performance. Additionally, the model trained with hybrid data demonstrates the highest accuracy, at the cost of extra data collection efforts. In SI tasks, we obtained over 91 % cross-lingual accuracy in all models using 3-second audio. In SV tasks, the EER among cross-lingual test is at most 6.52 %, which is observed on the model trained by English corpus. The outcome suggests the feasibility of adopting cross-lingual speech in building text-independent speaker recognition systems.
關聯 2020 25th International Conference on Pattern Recognition, International Association for Pattern Recognition(IAPR)
資料類型 conference
DOI https://doi.org/10.1109/ICPR48806.2021.9412170
dc.contributor AI中心
dc.creator (作者) 吳怡潔;廖文宏
dc.creator (作者) Wu, Yi-Chieh;Liao, Wen-Hung
dc.date (日期) 2021-01
dc.date.accessioned 22-Dec-2023 10:30:45 (UTC+8)-
dc.date.available 22-Dec-2023 10:30:45 (UTC+8)-
dc.date.issued (上傳時間) 22-Dec-2023 10:30:45 (UTC+8)-
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/148844-
dc.description.abstract (摘要) Over 40% of the world's population is bilingual. Existing speaker identification/verification systems, however, assume the same language type for both enrollment and recognition stages. In this work, we investigate the feasibility of employing multilingual speech for biometric applications. We establish a dataset containing audio recorded in English, Mandarin and Taiwanese. Three acoustic features, namely, i-vector, d-vector and x-vector have been evaluated for both speaker verification (SV) and identification (SI) tasks. Preliminary experimental results indicate that x-vector achieves the best overall performance. Additionally, the model trained with hybrid data demonstrates the highest accuracy, at the cost of extra data collection efforts. In SI tasks, we obtained over 91 % cross-lingual accuracy in all models using 3-second audio. In SV tasks, the EER among cross-lingual test is at most 6.52 %, which is observed on the model trained by English corpus. The outcome suggests the feasibility of adopting cross-lingual speech in building text-independent speaker recognition systems.
dc.format.extent 110 bytes-
dc.format.mimetype text/html-
dc.relation (關聯) 2020 25th International Conference on Pattern Recognition, International Association for Pattern Recognition(IAPR)
dc.subject (關鍵詞) Speaker recognition; Acoustic features; Text- independent speaker identification; Cross-lingual dataset
dc.title (題名) Toward Text-independent Cross-lingual Speaker Recognition Using English-Mandarin-Taiwanese Dataset
dc.type (資料類型) conference
dc.identifier.doi (DOI) 10.1109/ICPR48806.2021.9412170
dc.doi.uri (DOI) https://doi.org/10.1109/ICPR48806.2021.9412170