| dc.contributor | 資訊系 | |
| dc.creator (作者) | 廖文宏 | |
| dc.creator (作者) | Liao, Wen-Hung;Huang, David | |
| dc.date (日期) | 2025-11 | |
| dc.date.accessioned | 11-二月-2026 09:11:07 (UTC+8) | - |
| dc.date.available | 11-二月-2026 09:11:07 (UTC+8) | - |
| dc.date.issued (上傳時間) | 11-二月-2026 09:11:07 (UTC+8) | - |
| dc.identifier.uri (URI) | https://nccur.lib.nccu.edu.tw/handle/140.119/161639 | - |
| dc.description.abstract (摘要) | Voice conversion alters acoustic features such as pitch, timbre, and rhythm, often degrading the performance of automatic speech and speaker recognition systems. This study explores deep learning–based restoration methods to recover intelligibility and speaker identity from voice-converted audio. We systematically compare generative models including DiscoGAN, CycleGAN, HiFi-GAN, and VITS-SVC, and further introduce a hybrid HiFi-GAN–VITS-SVC architecture. In addition, we evaluate Retrieval-based Voice Conversion (RVC) for its potential in reconstructing both speech quality and speaker characteristics. Experiments on the MET-40 dataset, assessed by character error rate (CER), Perceptual Evaluation of Speech Quality (PESQ), and Top-1/Top-5 speaker identification, show that while HiFi-GAN excels under mild distortions, RVC consistently achieves superior restoration across all conversion types. Importantly, restored audio often retains sufficient speaker identity to enable re-identification, raising privacy and security concerns. Our findings underscore the trade-off between recognition performance and user anonymity, and point toward the need for future research on privacy-preserving speech restoration. | |
| dc.format.extent | 108 bytes | - |
| dc.format.mimetype | text/html | - |
| dc.relation (關聯) | Pattern Recognition and Computer Vision: 8th Asian Conference on Pattern Recognition, ACPR 2025, IAPR, pp.265-280 | |
| dc.subject (關鍵詞) | Deep Learning; Speaker Recognition; Speech Recognition; Restoration of Transformed Audio | |
| dc.title (題名) | Deep Learning-Based Restoration of Voice-Converted Audio for Speech and Speaker Recognition | |
| dc.type (資料類型) | conference | |
| dc.identifier.doi (DOI) | 10.1007/978-981-95-4398-4_19 | |
| dc.doi.uri (DOI) | https://doi.org/10.1007/978-981-95-4398-4_19 | |