Publications-Theses

題名 階層式的人聲分類與鼾聲聲學特性分析中的特徵篩選
Feature Selection in Hierarchical Classification of Human Sounds and Acoustic Analysis of Snoring Signals
作者 林裕凱
貢獻者 廖文宏
林裕凱
關鍵詞 人聲分類
聲學特徵篩選
日期 2007
上傳時間 19-Sep-2009 12:12:02 (UTC+8)
摘要 人聲大致上可分為語音和非語音兩部分。傳統上對於聲音分類的研究大多強調語音、音樂和環境聲的分類。在本論文中,我們採取不同的觀點,著重於人聲中非語音部份的研究,聲音種類為笑聲、尖叫聲、打噴嚏聲和鼾聲。為了達到此目標,我們調查常用的幾種聲學特徵,並以多元適應性雲形迴歸和支持向量機進行特徵值篩選,找出對於非語音人聲分類具有代表性的聲學特徵。此外我們也進行多方面的模擬,以觀察雜訊對辨識率的影響。
本論文第二部份為鼾聲研究,我們比較一般普通麥克風和目前醫療用鼾聲麥克風(snoring microphone)、壓電感應器(piezo sensor)三者在偵測鼾聲上的表現。此外,並以KL divergence 和EMD兩種計算差異度的方法進行普通鼾聲與阻塞型鼾聲的分群。同樣地,我們加入不同程度雜訊至鼾聲訊號中,以測試兩方法抗雜訊的穩健度,結果顯示此兩種方法均有不錯的表現,其中EMD在大多數情況下有較佳的結果。
Human sounds can be roughly divided into two categories: speech and non-speech. Traditional audio scene analysis research puts more emphasis on the classification of audio signals into human speech, music, and environmental sounds. We take a different perspective in this thesis. We are mainly interested in the analysis of non-speech human sounds, including laugh, scream, sneeze, and snore. Toward this goal, we investigate many commonly used acoustic features and select useful ones for classification using multivariate adaptive regression splines (MARS) and support vector machine (SVM). To evaluate the robustness of the selected features, we also perform extensive simulations to observe the effect of noise on the accuracy of the classification.<br>The second part of this thesis is concerned with the analysis snoring signals. We use ordinary microphone as our snoring recorder and compare its sensitivity with snoring microphone and piezo sensor, which are often utilized in clinical settings. In addition, we classify simple snores and obstructive snores using two distance measures: KL divergence and earth mover`s distance (EMD). Similarly, we add noises to the snoring signals to examine the robustness of these two measures. It turns out that both methods perform satisfactorily, although EMD generates slightly better results in most cases.
參考文獻 參考文獻
[1] Y. Su, “Analysis and Classification of Human Sounds,” Master’s thesis, Department of Computer Science National Chengchi University, 2006.
[2] W. Stoltzman,“Toward a Social Signaling Framework: Activity and Emphasis in Speech,” Master’s thesis, Engineering in Electrical Engineering and Computer Science Massachusetts Institute of Technology, 2006.
[3] 陳若涵,許肇凌,張智星,羅鳳珠,「以音樂內容為基礎的情緒分析與辨識」,第二屆電腦音樂與音訊技術研討會,Taipei,Taiwan,2006.
[4] M.Pantic and L.J.M. Rothkrantz, “Toward an affect-sensitive multimodal human-computer interaction,” Proceedings of the IEEE, Vol.91, Issue 9, pp.1370 – 1390, 2003.
[5] Z. Xin and Z. Ras, “Analysis of Sound Features for Music Timbre Recognition,” International Conference on Multimedia and Ubiquitous Engineering, 2007.
[6] J. Wang, J. Wang, K. He and C. Hsu, “Environmental Sound Classification using Hybrid SVM/KNN Classifier and MPEG-7 Audio Low-Level Descriptor,” International Joint Conference on Neural Networks, 2006.
[7] D. Deng, C. Simmermacher and S. Cranefield,“Finding the Right Features for Instrument Classification of Classical Music,”Integrating AI and Data Mining, pp.34 – 41, 2006.
[8] R. Jarina and J. Olajec,“Discriminative Feature Selection for Applause Sounds Detection,”Image Analysis for Multimedia Interactive Services, Vol., Issue 6-8, pp.13 – 16, 2007.
[9] V. A. Petrushin, “Emotion Recognition in Speech Signal: Experimental Study, Development, and Application,” Proceedings of the Sixth International Conference on Spoken Language Processing, 2000.
[10] J. Rong, Y. Chen, M. Chowdhury and G. Li, “Acoustic Features Extraction for Emotion Recognition,” 6th IEEE/ACIS International Conference on Computer and Information Science, pp. 419-424, 2007.
[11] J. J. Lien et al, “Automated Facial Expression Recognition,” Proceedings of the Third IEEE International Conference on Automatic Face and Gesture Recognition, pp. 390-395, 1998.
[12] K. Mase, “Recognition of Facial Expression from Optical Flow,” IEICE Transactions, Vol. E74, No.10, pp. 3474-3483, 1991.
[13] C. Cheng and Y. Hung, “Visual/Acoustic Emotion Recognition,” IEEE International Conference on Multimedia and Expo, 2005.
[14] Y. Hsu, M. Chen, C. Cheng and C. Wu, “Development of a portable device for home monitoring of snoring,” Journal of Biomedical Engineering - Applications, Basis & Communications, Vol. 17, No. 4, pp.176-180, 2005.
[15] J. Sola-Soler, R. Jane, J.A. Fiz and J. Morera,“Automatic classification of subjects with and without Sleep Apnea through snoring analysis,”Engineering in Medicine and Biology Society, Vol. , Issue 22-26, pp.6093 -6096, 2007.
[16] M. Cavusoglu, M. Kamasak, O. Erogul, T. Ciloglu, Y. Serinagaoglu and T. Akcam, “An efficient method for snore/nonsnore classification of sleep sounds,” Physiological Measurement, Vol. 28, No. 8, pp. 841-853, 2007.
[17] R. J. Baken, “Clinical Measurement of Speech and Voice. London : Taylor and Francis,” 1987.
[18] X. Huang, A. Acero and H. Hon, “Phonetics and Phonology,” Spoken Language Processing: A Guide to Theory, Algorithm and System Development, pp. 39, 2001.
[19] J. H. Friedman, “Multivariate Adaptive Regression Splines,” Department of Statistics, Stanford University, Technical Report 102 Rev, 1990.
[20] 李天行, 唐筱菁,「整合財務比率與智慧資本於企業危機診斷模式之建構-類神經網路與多元適應性雲形迴歸之應用」,資訊管理學報,11卷2期,2004年4月。
[21] C. Burges,“A Tutorial on Support Vector Machines for Pattern Recognition,”Data Mining and Knowledge Discovery 2:121 - 167, 1998.
[22] 王小川,「語音訊號處理」,全華股份有限公司,2007年4月。
[23] 張智星,「音訊處理與辨識」, http://neural.cs.nthu.edu.tw/jang/books/audioSignalProcessing/ [retrieved July 2008].
[24] X. Lin , H. Peng and B. Liu,“Support Vector Machines for Text Categorization in Chinese Question Classification,” IEEE/WIC/ACM International Conference on Web Intelligence, pp. 334-337, 2006.
[25] B. Ma, N. Nguyen and J. Rajapakse,“Gene .classification using codon usage analysis and support vector machines,”IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2007.
[26] Y. Yang, R. Wang, Y. Liu, S. Li and X. Zhou,“Solving P2P Traffic Identification Problems Via Optimized Support Vector Machines,”IEEE/ACS International Conference on Computer Systems and Applications, pp. 165-171, 2007.
[27] H.T. Lin and C.J. Lin,“A study on sigmoid kernels for SVM and the training of non-PSD kernels by SMO-type methods,”Technical report, Department of Computer Science & Information Engineering, National Taiwan University, 2003.
[28] 譚慶鼎,「鼾聲如雷,傷的是誰?談打鼾與阻塞型睡眠呼吸中止症候群」,
http://w3.mc.ntu.edu.tw/department/ent/tan/tan93-1.doc [retrieved July 2008]
[29] 陳濘宏,「阻塞性睡眠呼吸中止症候群」,
http://www.cgmh.org.tw/sleepcenterlnk/scolumn/20070101-4.html [retrieved July 2008]
[30] 劉勝義,「臨床睡眠檢查學」,合記出版社,民國93年10月。
[31] Roche Seminars on Aging: Aging in Sleep, Zepelin, 1982.
[32] Y. Rubner, C. Tomasi and L. J. Guibas,“A Metric for Distributions with Applications to Image Databases,”Proceedings of the IEEE International Conference on Computer Vision, Bombay, India, pp.59-66, 1998.
描述 碩士
國立政治大學
資訊科學學系
95753024
96
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0957530241
資料類型 thesis
dc.contributor.advisor 廖文宏zh_TW
dc.contributor.author (Authors) 林裕凱zh_TW
dc.creator (作者) 林裕凱zh_TW
dc.date (日期) 2007en_US
dc.date.accessioned 19-Sep-2009 12:12:02 (UTC+8)-
dc.date.available 19-Sep-2009 12:12:02 (UTC+8)-
dc.date.issued (上傳時間) 19-Sep-2009 12:12:02 (UTC+8)-
dc.identifier (Other Identifiers) G0957530241en_US
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/37122-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學學系zh_TW
dc.description (描述) 95753024zh_TW
dc.description (描述) 96zh_TW
dc.description.abstract (摘要) 人聲大致上可分為語音和非語音兩部分。傳統上對於聲音分類的研究大多強調語音、音樂和環境聲的分類。在本論文中,我們採取不同的觀點,著重於人聲中非語音部份的研究,聲音種類為笑聲、尖叫聲、打噴嚏聲和鼾聲。為了達到此目標,我們調查常用的幾種聲學特徵,並以多元適應性雲形迴歸和支持向量機進行特徵值篩選,找出對於非語音人聲分類具有代表性的聲學特徵。此外我們也進行多方面的模擬,以觀察雜訊對辨識率的影響。
本論文第二部份為鼾聲研究,我們比較一般普通麥克風和目前醫療用鼾聲麥克風(snoring microphone)、壓電感應器(piezo sensor)三者在偵測鼾聲上的表現。此外,並以KL divergence 和EMD兩種計算差異度的方法進行普通鼾聲與阻塞型鼾聲的分群。同樣地,我們加入不同程度雜訊至鼾聲訊號中,以測試兩方法抗雜訊的穩健度,結果顯示此兩種方法均有不錯的表現,其中EMD在大多數情況下有較佳的結果。
zh_TW
dc.description.abstract (摘要) Human sounds can be roughly divided into two categories: speech and non-speech. Traditional audio scene analysis research puts more emphasis on the classification of audio signals into human speech, music, and environmental sounds. We take a different perspective in this thesis. We are mainly interested in the analysis of non-speech human sounds, including laugh, scream, sneeze, and snore. Toward this goal, we investigate many commonly used acoustic features and select useful ones for classification using multivariate adaptive regression splines (MARS) and support vector machine (SVM). To evaluate the robustness of the selected features, we also perform extensive simulations to observe the effect of noise on the accuracy of the classification.<br>The second part of this thesis is concerned with the analysis snoring signals. We use ordinary microphone as our snoring recorder and compare its sensitivity with snoring microphone and piezo sensor, which are often utilized in clinical settings. In addition, we classify simple snores and obstructive snores using two distance measures: KL divergence and earth mover`s distance (EMD). Similarly, we add noises to the snoring signals to examine the robustness of these two measures. It turns out that both methods perform satisfactorily, although EMD generates slightly better results in most cases.en_US
dc.description.tableofcontents 目錄

第一章 緒論 1
1.1 研究背景與目的 1
1.2 相關研究 5
1.2.1 人聲分類 5
1.2.2 鼾聲研究 7
1.3 論文架構 8
第二章 聲學特徵分析 9
2.1 基頻 (Fundamental Frequency) 9
2.2 頻譜質心 (Spectral Centroid) 12
2.3 頻譜分散度 (Spectral Spread) 14
2.4 頻譜平坦度 (Spectral Flatness) 15
2.5 熵 (Entropy) 16
2.6 共振峰 (Formant Frequency) 18
2.7 梅爾倒頻譜係數(Mel-Scale Frequency Cepstral Coefficients, MFCC) 22
第三章 分類器 25
3.1 多元適應性雲形迴歸 (Multivariate Adaptive Regression Splines, MARS) 25
3.2 支持向量機 (Support Vector Machine, SVM) 28
3.2.1 線性可分離 (Linear Separable Patterns) 29
3.2.2 非線性分離 (Non Linear Separable Patterns) 31
第四章 人聲分類 34
4.1 聲學特徵值篩選 34
4.2 雜訊對於分類的影響 42
第五章 鼾聲研究 45
5.1 鼾聲的聲學特徵 45
5.2 鼾聲與生理訊號 46
5.3 鼾聲檢測儀器比較 48
5.3.1 訊號端點偵測 49
5.3.2 聲音訊號和振動訊號 52
5.4 鼾聲的分群 57
5.4.1 KL divergence 59
5.4.2 雜訊對於KL divergence之影響 62
5.4.3 Earth Mover’s Distance 64
5.4.4 雜訊對於EMD之影響 67
第六章 結論 69
參考文獻 71
zh_TW
dc.format.extent 50691 bytes-
dc.format.extent 57881 bytes-
dc.format.extent 66029 bytes-
dc.format.extent 156355 bytes-
dc.format.extent 469195 bytes-
dc.format.extent 2381063 bytes-
dc.format.extent 506758 bytes-
dc.format.extent 327701 bytes-
dc.format.extent 2799148 bytes-
dc.format.extent 172245 bytes-
dc.format.extent 136730 bytes-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.language.iso en_US-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0957530241en_US
dc.subject (關鍵詞) 人聲分類zh_TW
dc.subject (關鍵詞) 聲學特徵篩選zh_TW
dc.title (題名) 階層式的人聲分類與鼾聲聲學特性分析中的特徵篩選zh_TW
dc.title (題名) Feature Selection in Hierarchical Classification of Human Sounds and Acoustic Analysis of Snoring Signalsen_US
dc.type (資料類型) thesisen
dc.relation.reference (參考文獻) 參考文獻zh_TW
dc.relation.reference (參考文獻) [1] Y. Su, “Analysis and Classification of Human Sounds,” Master’s thesis, Department of Computer Science National Chengchi University, 2006.zh_TW
dc.relation.reference (參考文獻) [2] W. Stoltzman,“Toward a Social Signaling Framework: Activity and Emphasis in Speech,” Master’s thesis, Engineering in Electrical Engineering and Computer Science Massachusetts Institute of Technology, 2006.zh_TW
dc.relation.reference (參考文獻) [3] 陳若涵,許肇凌,張智星,羅鳳珠,「以音樂內容為基礎的情緒分析與辨識」,第二屆電腦音樂與音訊技術研討會,Taipei,Taiwan,2006.zh_TW
dc.relation.reference (參考文獻) [4] M.Pantic and L.J.M. Rothkrantz, “Toward an affect-sensitive multimodal human-computer interaction,” Proceedings of the IEEE, Vol.91, Issue 9, pp.1370 – 1390, 2003.zh_TW
dc.relation.reference (參考文獻) [5] Z. Xin and Z. Ras, “Analysis of Sound Features for Music Timbre Recognition,” International Conference on Multimedia and Ubiquitous Engineering, 2007.zh_TW
dc.relation.reference (參考文獻) [6] J. Wang, J. Wang, K. He and C. Hsu, “Environmental Sound Classification using Hybrid SVM/KNN Classifier and MPEG-7 Audio Low-Level Descriptor,” International Joint Conference on Neural Networks, 2006.zh_TW
dc.relation.reference (參考文獻) [7] D. Deng, C. Simmermacher and S. Cranefield,“Finding the Right Features for Instrument Classification of Classical Music,”Integrating AI and Data Mining, pp.34 – 41, 2006.zh_TW
dc.relation.reference (參考文獻) [8] R. Jarina and J. Olajec,“Discriminative Feature Selection for Applause Sounds Detection,”Image Analysis for Multimedia Interactive Services, Vol., Issue 6-8, pp.13 – 16, 2007.zh_TW
dc.relation.reference (參考文獻) [9] V. A. Petrushin, “Emotion Recognition in Speech Signal: Experimental Study, Development, and Application,” Proceedings of the Sixth International Conference on Spoken Language Processing, 2000.zh_TW
dc.relation.reference (參考文獻) [10] J. Rong, Y. Chen, M. Chowdhury and G. Li, “Acoustic Features Extraction for Emotion Recognition,” 6th IEEE/ACIS International Conference on Computer and Information Science, pp. 419-424, 2007.zh_TW
dc.relation.reference (參考文獻) [11] J. J. Lien et al, “Automated Facial Expression Recognition,” Proceedings of the Third IEEE International Conference on Automatic Face and Gesture Recognition, pp. 390-395, 1998.zh_TW
dc.relation.reference (參考文獻) [12] K. Mase, “Recognition of Facial Expression from Optical Flow,” IEICE Transactions, Vol. E74, No.10, pp. 3474-3483, 1991.zh_TW
dc.relation.reference (參考文獻) [13] C. Cheng and Y. Hung, “Visual/Acoustic Emotion Recognition,” IEEE International Conference on Multimedia and Expo, 2005.zh_TW
dc.relation.reference (參考文獻) [14] Y. Hsu, M. Chen, C. Cheng and C. Wu, “Development of a portable device for home monitoring of snoring,” Journal of Biomedical Engineering - Applications, Basis & Communications, Vol. 17, No. 4, pp.176-180, 2005.zh_TW
dc.relation.reference (參考文獻) [15] J. Sola-Soler, R. Jane, J.A. Fiz and J. Morera,“Automatic classification of subjects with and without Sleep Apnea through snoring analysis,”Engineering in Medicine and Biology Society, Vol. , Issue 22-26, pp.6093 -6096, 2007.zh_TW
dc.relation.reference (參考文獻) [16] M. Cavusoglu, M. Kamasak, O. Erogul, T. Ciloglu, Y. Serinagaoglu and T. Akcam, “An efficient method for snore/nonsnore classification of sleep sounds,” Physiological Measurement, Vol. 28, No. 8, pp. 841-853, 2007.zh_TW
dc.relation.reference (參考文獻) [17] R. J. Baken, “Clinical Measurement of Speech and Voice. London : Taylor and Francis,” 1987.zh_TW
dc.relation.reference (參考文獻) [18] X. Huang, A. Acero and H. Hon, “Phonetics and Phonology,” Spoken Language Processing: A Guide to Theory, Algorithm and System Development, pp. 39, 2001.zh_TW
dc.relation.reference (參考文獻) [19] J. H. Friedman, “Multivariate Adaptive Regression Splines,” Department of Statistics, Stanford University, Technical Report 102 Rev, 1990.zh_TW
dc.relation.reference (參考文獻) [20] 李天行, 唐筱菁,「整合財務比率與智慧資本於企業危機診斷模式之建構-類神經網路與多元適應性雲形迴歸之應用」,資訊管理學報,11卷2期,2004年4月。zh_TW
dc.relation.reference (參考文獻) [21] C. Burges,“A Tutorial on Support Vector Machines for Pattern Recognition,”Data Mining and Knowledge Discovery 2:121 - 167, 1998.zh_TW
dc.relation.reference (參考文獻) [22] 王小川,「語音訊號處理」,全華股份有限公司,2007年4月。zh_TW
dc.relation.reference (參考文獻) [23] 張智星,「音訊處理與辨識」, http://neural.cs.nthu.edu.tw/jang/books/audioSignalProcessing/ [retrieved July 2008].zh_TW
dc.relation.reference (參考文獻) [24] X. Lin , H. Peng and B. Liu,“Support Vector Machines for Text Categorization in Chinese Question Classification,” IEEE/WIC/ACM International Conference on Web Intelligence, pp. 334-337, 2006.zh_TW
dc.relation.reference (參考文獻) [25] B. Ma, N. Nguyen and J. Rajapakse,“Gene .classification using codon usage analysis and support vector machines,”IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2007.zh_TW
dc.relation.reference (參考文獻) [26] Y. Yang, R. Wang, Y. Liu, S. Li and X. Zhou,“Solving P2P Traffic Identification Problems Via Optimized Support Vector Machines,”IEEE/ACS International Conference on Computer Systems and Applications, pp. 165-171, 2007.zh_TW
dc.relation.reference (參考文獻) [27] H.T. Lin and C.J. Lin,“A study on sigmoid kernels for SVM and the training of non-PSD kernels by SMO-type methods,”Technical report, Department of Computer Science & Information Engineering, National Taiwan University, 2003.zh_TW
dc.relation.reference (參考文獻) [28] 譚慶鼎,「鼾聲如雷,傷的是誰?談打鼾與阻塞型睡眠呼吸中止症候群」,zh_TW
dc.relation.reference (參考文獻) http://w3.mc.ntu.edu.tw/department/ent/tan/tan93-1.doc [retrieved July 2008]zh_TW
dc.relation.reference (參考文獻) [29] 陳濘宏,「阻塞性睡眠呼吸中止症候群」,zh_TW
dc.relation.reference (參考文獻) http://www.cgmh.org.tw/sleepcenterlnk/scolumn/20070101-4.html [retrieved July 2008]zh_TW
dc.relation.reference (參考文獻) [30] 劉勝義,「臨床睡眠檢查學」,合記出版社,民國93年10月。zh_TW
dc.relation.reference (參考文獻) [31] Roche Seminars on Aging: Aging in Sleep, Zepelin, 1982.zh_TW
dc.relation.reference (參考文獻) [32] Y. Rubner, C. Tomasi and L. J. Guibas,“A Metric for Distributions with Applications to Image Databases,”Proceedings of the IEEE International Conference on Computer Vision, Bombay, India, pp.59-66, 1998.zh_TW