學術產出-學位論文

文章檢視/開啟

書目匯出

Google ScholarTM

政大圖書館

引文資訊

TAIR相關學術產出

題名 基於i-Vector 特徵之聲音風格分析
Analysis of Voice Styles Using i-Vector Features
作者 高文聰
Kao, Wen-Tsung
貢獻者 廖文宏
Liao, Wen-Hung
高文聰
Kao, Wen-Tsung
關鍵詞 聲音風格
機器學習
模式分類
i-Vector
ALIZE
Sound style
Machine learning
Pattern recognition
I-Vector
ALIZE
日期 2018
上傳時間 29-八月-2018 16:04:21 (UTC+8)
摘要 聲音的風格有若干常見的形容詞,但難以被精確定義。本論文試圖從語者辨識(Speaker Recognition)的觀點出發,針對不同的聲音風格進行分析,使用的方法為目前在語音辨識中常用的特徵值向量i-Vector,並搭配支援向量機(SVM)做分類。為了測試i-Vector對於聲音風格描述的可用性,在過程中我們事先做了許多的驗證,包含基本語者辨識、最短輸入聲音長度測試、白噪音對於語者驗證的影響、說話內容關聯性測試、聲音取樣率測試與配音員使用不同聲調對於風格的測試。確認特徵之相關性後,我們挑選日常生活中常見的八種聲音風格類型進行分類,分析結果是否具一致性,證實利用語者辨識系統也可以有效的辨識聲音的風格類型。
Many adjectives have been used to describe voice characteristics, yet it is challenging to define sound styles precisely using quantitative measure. In this thesis, we attempt to tackle the sound style classification problem based on techniques designed for speaker recognition. Specifically, we employ i-Vector, a widely adopted feature in speaker identification together with support vector machine (SVM) for style classification. In order to verify the reliability of i-vector, we conducted a series of experiments, including basic speaker recognition function, minimum voice duration¸ noise sensitivity, context dependency, sensitivity to different sampling rates and style classification of samples from voice actors. The results indicate that i-Vector can indeed be utlilized to classify sound styles that are commonly perceived in daily life.
參考文獻 [1] Heap, Michael. "Neuro-linguistic programming." Hypnosis: Current clinical, experimental and forensic practices (1988): 268-280.
[2] NIST, “Speaker Recognition”,
https://www.nist.gov/itl/iad/mig/speaker-recognition
[3] Tong, Rong, et al. "The IIR NIST 2006 Speaker Recognition System: Fusion of Acoustic and Tokenization Features." presentation in 5th Int. Symp. on Chinese Spoken Language Processing, ISCSLP. 2006.
[4] Hasan, Md Rashidul, Mustafa Jamil, and M. G. R. M. S. Rahman. "Speaker identification using mel frequency cepstral coefficients." variations 1.4 (2004).
[5] Reynolds, Douglas A., and Richard C. Rose. "Robust text-independent speaker identification using Gaussian mixture speaker models." IEEE transactions on speech and audio processing 3.1 (1995): 72-83.
[6] Reynolds, Douglas A., Thomas F. Quatieri, and Robert B. Dunn. "Speaker verification using adapted Gaussian mixture models." Digital signal processing 10.1-3 (2000): 19-41.
[7] Kenny, Patrick. "Joint factor analysis of speaker and session variability: Theory and algorithms." CRIM, Montreal,(Report) CRIM-06/08-13 14 (2005): 28-29.
[8] Dehak, Najim, et al. "Front-end factor analysis for speaker verification." IEEE Transactions on Audio, Speech, and Language Processing 19.4 (2011): 788-798.
[9] AlplaGo, https://deepmind.com/research/alphago/
[10] Cortes, Corinna, and Vladimir Vapnik. "Support-vector networks." Machine learning 20.3 (1995): 273-297.
[11] Franc, Vojtech, Alexander Zien, and Bernhard Schölkopf. "Support vector machines as probabilistic models." Proceedings of the 28th International Conference on Machine Learning (ICML-11). 2011.
[12] Dehak, Najim, et al. "Front-end factor analysis for speaker verification." IEEE Transactions on Audio, Speech, and Language Processing 19.4 (2011): 788-798
[13] Kenny, Patrick. "Joint factor analysis of speaker and session variability: Theory and algorithms." CRIM, Montreal,(Report) CRIM-06/08-13 215 (2005).
[14] Larcher, Anthony, et al. "I-vectors in the context of phonetically-constrained short utterances for speaker verification." Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on. IEEE, 2012.
[15] 陳嘉穎,“應用因素分析與識別向量於語音情緒辨識”, 國立中山大學碩士論文, 2016.
[16] Bonastre, J-F., Frédéric Wils, and Sylvain Meignier. "ALIZE, a free toolkit for speaker recognition." Acoustics, Speech, and Signal Processing, 2005. Proceedings.(ICASSP`05). IEEE International Conference on. Vol. 1. IEEE, 2005.
[17] Larcher, Anthony, et al. "ALIZE 3.0-open source toolkit for state-of-the-art speaker recognition." Interspeech. 2013.
[18] Chang, Chih-Chung, and Chih-Jen Lin. "LIBSVM: a library for support vector machines." ACM transactions on intelligent systems and technology (TIST) 2.3 (2011): 27
[19] SoX, “Sound eXchange”, http://sox.sourceforge.net
[20] ALIZÉ, http://alize.univ-avignon.fr/
[21] SPro, http://www.irisa.fr/metiss/guig/spro/
[22] Audacity, https://www.audacityteam.org/
[23] Haykin, Simon, and Zhe Chen. "The cocktail party problem." Neural computation 17.9 (2005): 1875-1902.
[24] Hyvärinen, Aapo, Juha Karhunen, and Erkki Oja. Independent component analysis. Vol. 46. John Wiley & Sons, 2004.
[25] FFmpeg, https://www.ffmpeg.org/
[26] 娃娃音,維基百科,https://zh.wikipedia.org/wiki/%E5%A8%83%E5%A8%83%E9%9F%B3
[27] Youtube, https://www.youtube.com/
[28] 愛樂電台,https://www.e-classical.com.tw/index.html
[29] 警察廣播電台,https://www.pbs.gov.tw/cht/index.php
[30] Garcia-Romero, Daniel, and Carol Y. Espy-Wilson. "Analysis of i-vector length normalization in speaker recognition systems." Twelfth Annual Conference of the International Speech Communication Association. 2011.
[31] 百度語音,http://fanyi.baidu.com/#auto/zh/
[32] Google語音, https://translate.google.com.tw/
描述 碩士
國立政治大學
資訊科學系碩士在職專班
103971014
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0103971014
資料類型 thesis
dc.contributor.advisor 廖文宏zh_TW
dc.contributor.advisor Liao, Wen-Hungen_US
dc.contributor.author (作者) 高文聰zh_TW
dc.contributor.author (作者) Kao, Wen-Tsungen_US
dc.creator (作者) 高文聰zh_TW
dc.creator (作者) Kao, Wen-Tsungen_US
dc.date (日期) 2018en_US
dc.date.accessioned 29-八月-2018 16:04:21 (UTC+8)-
dc.date.available 29-八月-2018 16:04:21 (UTC+8)-
dc.date.issued (上傳時間) 29-八月-2018 16:04:21 (UTC+8)-
dc.identifier (其他 識別碼) G0103971014en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/119801-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學系碩士在職專班zh_TW
dc.description (描述) 103971014zh_TW
dc.description.abstract (摘要) 聲音的風格有若干常見的形容詞,但難以被精確定義。本論文試圖從語者辨識(Speaker Recognition)的觀點出發,針對不同的聲音風格進行分析,使用的方法為目前在語音辨識中常用的特徵值向量i-Vector,並搭配支援向量機(SVM)做分類。為了測試i-Vector對於聲音風格描述的可用性,在過程中我們事先做了許多的驗證,包含基本語者辨識、最短輸入聲音長度測試、白噪音對於語者驗證的影響、說話內容關聯性測試、聲音取樣率測試與配音員使用不同聲調對於風格的測試。確認特徵之相關性後,我們挑選日常生活中常見的八種聲音風格類型進行分類,分析結果是否具一致性,證實利用語者辨識系統也可以有效的辨識聲音的風格類型。zh_TW
dc.description.abstract (摘要) Many adjectives have been used to describe voice characteristics, yet it is challenging to define sound styles precisely using quantitative measure. In this thesis, we attempt to tackle the sound style classification problem based on techniques designed for speaker recognition. Specifically, we employ i-Vector, a widely adopted feature in speaker identification together with support vector machine (SVM) for style classification. In order to verify the reliability of i-vector, we conducted a series of experiments, including basic speaker recognition function, minimum voice duration¸ noise sensitivity, context dependency, sensitivity to different sampling rates and style classification of samples from voice actors. The results indicate that i-Vector can indeed be utlilized to classify sound styles that are commonly perceived in daily life.en_US
dc.description.tableofcontents 第一章 緒論 1
1.1 研究動機 1
1.2 論文架構 4
第二章 背景知識與相關研究 5
2.1 聲音特徵 5
2.1.1 梅爾倒頻譜係數 6
2.2 語者模型 7
2.2.1 高斯混合模型 7
2.2.2 通用背景模型 9
2.2.3 聯合因素分析 9
2.2.4 i-Vector 11
2.3 機器學習 13
2.3.1 深度學習 13
2.3.2 支援向量機 15
2.4 小結 16
第三章 研究方法 18
3.1 工具探討 18
3.1.1 ALIZE Toolkit 18
3.1.2 LIBSVM 19
3.2 前期研究 20
3.2.1 資料前處理 21
3.2.2 i-Vector功能基本驗證 21
3.2.3 最短資料長度測試 24
3.2.4 白噪音對於語者辨識的影響能 26
3.2.5 不連續語音內容測試 30
3.2.6 聲音取樣率測試 31
3.2.7 配音員使用不同聲調對於聲音風格的影響 33
3.3 研究架構 35
3.3.1 風格定義 35
3.3.2 資料來源 40
3.4 目標設定 40
第四章 研究過程與結果分析 41
4.1 收集訓練資料 41
4.2 訓練資料前處理 42
4.2.1 i-Vector正規化 42
4.2.2 SVM訓練及測試結果 43
4.2.3 預測錯誤樣本分析 48
4.3 聲音風格分析之應用 50
4.3.1 使用電話錄音之聲音風格辨識 50
4.3.2 電話錄音預測風格結果分析 51
第五章 結論與未來研究方向 52
5.1 結論 52
5.2 未來研究方向 52
參考文獻 54
zh_TW
dc.format.extent 9165164 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0103971014en_US
dc.subject (關鍵詞) 聲音風格zh_TW
dc.subject (關鍵詞) 機器學習zh_TW
dc.subject (關鍵詞) 模式分類zh_TW
dc.subject (關鍵詞) i-Vectorzh_TW
dc.subject (關鍵詞) ALIZEzh_TW
dc.subject (關鍵詞) Sound styleen_US
dc.subject (關鍵詞) Machine learningen_US
dc.subject (關鍵詞) Pattern recognitionen_US
dc.subject (關鍵詞) I-Vectoren_US
dc.subject (關鍵詞) ALIZEen_US
dc.title (題名) 基於i-Vector 特徵之聲音風格分析zh_TW
dc.title (題名) Analysis of Voice Styles Using i-Vector Featuresen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1] Heap, Michael. "Neuro-linguistic programming." Hypnosis: Current clinical, experimental and forensic practices (1988): 268-280.
[2] NIST, “Speaker Recognition”,
https://www.nist.gov/itl/iad/mig/speaker-recognition
[3] Tong, Rong, et al. "The IIR NIST 2006 Speaker Recognition System: Fusion of Acoustic and Tokenization Features." presentation in 5th Int. Symp. on Chinese Spoken Language Processing, ISCSLP. 2006.
[4] Hasan, Md Rashidul, Mustafa Jamil, and M. G. R. M. S. Rahman. "Speaker identification using mel frequency cepstral coefficients." variations 1.4 (2004).
[5] Reynolds, Douglas A., and Richard C. Rose. "Robust text-independent speaker identification using Gaussian mixture speaker models." IEEE transactions on speech and audio processing 3.1 (1995): 72-83.
[6] Reynolds, Douglas A., Thomas F. Quatieri, and Robert B. Dunn. "Speaker verification using adapted Gaussian mixture models." Digital signal processing 10.1-3 (2000): 19-41.
[7] Kenny, Patrick. "Joint factor analysis of speaker and session variability: Theory and algorithms." CRIM, Montreal,(Report) CRIM-06/08-13 14 (2005): 28-29.
[8] Dehak, Najim, et al. "Front-end factor analysis for speaker verification." IEEE Transactions on Audio, Speech, and Language Processing 19.4 (2011): 788-798.
[9] AlplaGo, https://deepmind.com/research/alphago/
[10] Cortes, Corinna, and Vladimir Vapnik. "Support-vector networks." Machine learning 20.3 (1995): 273-297.
[11] Franc, Vojtech, Alexander Zien, and Bernhard Schölkopf. "Support vector machines as probabilistic models." Proceedings of the 28th International Conference on Machine Learning (ICML-11). 2011.
[12] Dehak, Najim, et al. "Front-end factor analysis for speaker verification." IEEE Transactions on Audio, Speech, and Language Processing 19.4 (2011): 788-798
[13] Kenny, Patrick. "Joint factor analysis of speaker and session variability: Theory and algorithms." CRIM, Montreal,(Report) CRIM-06/08-13 215 (2005).
[14] Larcher, Anthony, et al. "I-vectors in the context of phonetically-constrained short utterances for speaker verification." Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on. IEEE, 2012.
[15] 陳嘉穎,“應用因素分析與識別向量於語音情緒辨識”, 國立中山大學碩士論文, 2016.
[16] Bonastre, J-F., Frédéric Wils, and Sylvain Meignier. "ALIZE, a free toolkit for speaker recognition." Acoustics, Speech, and Signal Processing, 2005. Proceedings.(ICASSP`05). IEEE International Conference on. Vol. 1. IEEE, 2005.
[17] Larcher, Anthony, et al. "ALIZE 3.0-open source toolkit for state-of-the-art speaker recognition." Interspeech. 2013.
[18] Chang, Chih-Chung, and Chih-Jen Lin. "LIBSVM: a library for support vector machines." ACM transactions on intelligent systems and technology (TIST) 2.3 (2011): 27
[19] SoX, “Sound eXchange”, http://sox.sourceforge.net
[20] ALIZÉ, http://alize.univ-avignon.fr/
[21] SPro, http://www.irisa.fr/metiss/guig/spro/
[22] Audacity, https://www.audacityteam.org/
[23] Haykin, Simon, and Zhe Chen. "The cocktail party problem." Neural computation 17.9 (2005): 1875-1902.
[24] Hyvärinen, Aapo, Juha Karhunen, and Erkki Oja. Independent component analysis. Vol. 46. John Wiley & Sons, 2004.
[25] FFmpeg, https://www.ffmpeg.org/
[26] 娃娃音,維基百科,https://zh.wikipedia.org/wiki/%E5%A8%83%E5%A8%83%E9%9F%B3
[27] Youtube, https://www.youtube.com/
[28] 愛樂電台,https://www.e-classical.com.tw/index.html
[29] 警察廣播電台,https://www.pbs.gov.tw/cht/index.php
[30] Garcia-Romero, Daniel, and Carol Y. Espy-Wilson. "Analysis of i-vector length normalization in speaker recognition systems." Twelfth Annual Conference of the International Speech Communication Association. 2011.
[31] 百度語音,http://fanyi.baidu.com/#auto/zh/
[32] Google語音, https://translate.google.com.tw/
zh_TW
dc.identifier.doi (DOI) 10.6814/THE.NCCU.EMCS.007.2018.B02-