Deep Learning-Based Restoration of Voice-Converted Audio for Speech and Speaker Recognition | Publication | NCCU Academic Hub

Publications-Proceedings

Article View/Open

html(35)

Publication Export

Google Scholar^TM

NCCU Library

Discovery System

Citation Infomation

Related Publications in TAIR

Simple Record
Full Record

題名	Deep Learning-Based Restoration of Voice-Converted Audio for Speech and Speaker Recognition
作者	廖文宏 Liao, Wen-Hung;Huang, David
貢獻者	資訊系
關鍵詞	Deep Learning; Speaker Recognition; Speech Recognition; Restoration of Transformed Audio
日期	2025-11
上傳時間	11-Feb-2026 09:11:07 (UTC+8)
摘要	Voice conversion alters acoustic features such as pitch, timbre, and rhythm, often degrading the performance of automatic speech and speaker recognition systems. This study explores deep learning–based restoration methods to recover intelligibility and speaker identity from voice-converted audio. We systematically compare generative models including DiscoGAN, CycleGAN, HiFi-GAN, and VITS-SVC, and further introduce a hybrid HiFi-GAN–VITS-SVC architecture. In addition, we evaluate Retrieval-based Voice Conversion (RVC) for its potential in reconstructing both speech quality and speaker characteristics. Experiments on the MET-40 dataset, assessed by character error rate (CER), Perceptual Evaluation of Speech Quality (PESQ), and Top-1/Top-5 speaker identification, show that while HiFi-GAN excels under mild distortions, RVC consistently achieves superior restoration across all conversion types. Importantly, restored audio often retains sufficient speaker identity to enable re-identification, raising privacy and security concerns. Our findings underscore the trade-off between recognition performance and user anonymity, and point toward the need for future research on privacy-preserving speech restoration.
關聯	Pattern Recognition and Computer Vision: 8th Asian Conference on Pattern Recognition, ACPR 2025, IAPR, pp.265-280
資料類型	conference
DOI	https://doi.org/10.1007/978-981-95-4398-4_19

dc.contributor	資訊系
dc.creator (作者)	廖文宏
dc.creator (作者)	Liao, Wen-Hung;Huang, David
dc.date (日期)	2025-11
dc.date.accessioned	11-Feb-2026 09:11:07 (UTC+8)	-
dc.date.available	11-Feb-2026 09:11:07 (UTC+8)	-
dc.date.issued (上傳時間)	11-Feb-2026 09:11:07 (UTC+8)	-
dc.identifier.uri (URI)	https://ah.lib.nccu.edu.tw/item?item_id=181231	-
dc.description.abstract (摘要)	Voice conversion alters acoustic features such as pitch, timbre, and rhythm, often degrading the performance of automatic speech and speaker recognition systems. This study explores deep learning–based restoration methods to recover intelligibility and speaker identity from voice-converted audio. We systematically compare generative models including DiscoGAN, CycleGAN, HiFi-GAN, and VITS-SVC, and further introduce a hybrid HiFi-GAN–VITS-SVC architecture. In addition, we evaluate Retrieval-based Voice Conversion (RVC) for its potential in reconstructing both speech quality and speaker characteristics. Experiments on the MET-40 dataset, assessed by character error rate (CER), Perceptual Evaluation of Speech Quality (PESQ), and Top-1/Top-5 speaker identification, show that while HiFi-GAN excels under mild distortions, RVC consistently achieves superior restoration across all conversion types. Importantly, restored audio often retains sufficient speaker identity to enable re-identification, raising privacy and security concerns. Our findings underscore the trade-off between recognition performance and user anonymity, and point toward the need for future research on privacy-preserving speech restoration.
dc.format.extent	108 bytes	-
dc.format.mimetype	text/html	-
dc.relation (關聯)	Pattern Recognition and Computer Vision: 8th Asian Conference on Pattern Recognition, ACPR 2025, IAPR, pp.265-280
dc.subject (關鍵詞)	Deep Learning; Speaker Recognition; Speech Recognition; Restoration of Transformed Audio
dc.title (題名)	Deep Learning-Based Restoration of Voice-Converted Audio for Speech and Speaker Recognition
dc.type (資料類型)	conference
dc.identifier.doi (DOI)	10.1007/978-981-95-4398-4_19
dc.doi.uri (DOI)	https://doi.org/10.1007/978-981-95-4398-4_19