Toward Text-independent Cross-lingual Speaker Recognition Using English-Mandarin-Taiwanese Dataset | Publication | NCCU Academic Hub

Publications-Proceedings

Article View/Open

html(724)

Publication Export

Google Scholar^TM

NCCU Library

Discovery System

Citation Infomation

Related Publications in TAIR

Simple Record
Full Record

題名	Toward Text-independent Cross-lingual Speaker Recognition Using English-Mandarin-Taiwanese Dataset
作者	吳怡潔;廖文宏 Wu, Yi-Chieh;Liao, Wen-Hung
貢獻者	AI中心
關鍵詞	Speaker recognition; Acoustic features; Text- independent speaker identification; Cross-lingual dataset
日期	2021-01
上傳時間	22-Dec-2023 10:30:45 (UTC+8)
摘要	Over 40% of the world's population is bilingual. Existing speaker identification/verification systems, however, assume the same language type for both enrollment and recognition stages. In this work, we investigate the feasibility of employing multilingual speech for biometric applications. We establish a dataset containing audio recorded in English, Mandarin and Taiwanese. Three acoustic features, namely, i-vector, d-vector and x-vector have been evaluated for both speaker verification (SV) and identification (SI) tasks. Preliminary experimental results indicate that x-vector achieves the best overall performance. Additionally, the model trained with hybrid data demonstrates the highest accuracy, at the cost of extra data collection efforts. In SI tasks, we obtained over 91 % cross-lingual accuracy in all models using 3-second audio. In SV tasks, the EER among cross-lingual test is at most 6.52 %, which is observed on the model trained by English corpus. The outcome suggests the feasibility of adopting cross-lingual speech in building text-independent speaker recognition systems.
關聯	2020 25th International Conference on Pattern Recognition, International Association for Pattern Recognition(IAPR)
資料類型	conference
DOI	https://doi.org/10.1109/ICPR48806.2021.9412170

dc.contributor	AI中心
dc.creator (作者)	吳怡潔;廖文宏
dc.creator (作者)	Wu, Yi-Chieh;Liao, Wen-Hung
dc.date (日期)	2021-01
dc.date.accessioned	22-Dec-2023 10:30:45 (UTC+8)	-
dc.date.available	22-Dec-2023 10:30:45 (UTC+8)	-
dc.date.issued (上傳時間)	22-Dec-2023 10:30:45 (UTC+8)	-
dc.identifier.uri (URI)	https://ah.lib.nccu.edu.tw/item?item_id=168350	-
dc.description.abstract (摘要)	Over 40% of the world's population is bilingual. Existing speaker identification/verification systems, however, assume the same language type for both enrollment and recognition stages. In this work, we investigate the feasibility of employing multilingual speech for biometric applications. We establish a dataset containing audio recorded in English, Mandarin and Taiwanese. Three acoustic features, namely, i-vector, d-vector and x-vector have been evaluated for both speaker verification (SV) and identification (SI) tasks. Preliminary experimental results indicate that x-vector achieves the best overall performance. Additionally, the model trained with hybrid data demonstrates the highest accuracy, at the cost of extra data collection efforts. In SI tasks, we obtained over 91 % cross-lingual accuracy in all models using 3-second audio. In SV tasks, the EER among cross-lingual test is at most 6.52 %, which is observed on the model trained by English corpus. The outcome suggests the feasibility of adopting cross-lingual speech in building text-independent speaker recognition systems.
dc.format.extent	110 bytes	-
dc.format.mimetype	text/html	-
dc.relation (關聯)	2020 25th International Conference on Pattern Recognition, International Association for Pattern Recognition(IAPR)
dc.subject (關鍵詞)	Speaker recognition; Acoustic features; Text- independent speaker identification; Cross-lingual dataset
dc.title (題名)	Toward Text-independent Cross-lingual Speaker Recognition Using English-Mandarin-Taiwanese Dataset
dc.type (資料類型)	conference
dc.identifier.doi (DOI)	10.1109/ICPR48806.2021.9412170
dc.doi.uri (DOI)	https://doi.org/10.1109/ICPR48806.2021.9412170