Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 應用詞向量及語意分析探討華語歌曲推薦之研究
The Study of Mandopop Music Recommendation By Word Vectors and Semantic Analysis
作者 鄧絜心
Teng, Chieh-Hsin
貢獻者 鄭宇庭
鄧絜心
Teng, Chieh-Hsin
關鍵詞 自然語言處理
文本分析
華語歌曲歌詞
TF-IDF
Word2vec
BERT
日期 2021
上傳時間 4-Aug-2021 16:37:50 (UTC+8)
摘要 音樂,是一種人類生活情感的藝術表現,我們常透過某種意境的旋律或歌詞來代表當下的情緒或是情境的體現。本研究目的為探討華語歌詞內容及意境上的相似程度,進而優化現下最熱門的音樂工具自動推薦歌曲清單。透過將歌詞作為文本轉成向量後計算向量之間的相似程度,並將結果作為推薦歌曲的依據。本研究從KKBOX及魔境歌詞網爬蟲收集共13212首華語歌曲之歌詞,並透過兩種自然語言處理模型-Word2vec及BERT將歌詞轉為向量後,利用餘弦相似度的計算可得兩首不同歌詞之間的相近程度,最後透過焦點團體訪問及問卷調查的方式來驗證實驗之結果。研究結果發現,以使用者主觀意見來看,利用BERT模型所做出來的推薦結果準確率優於Word2vec模型,更貼近使用者之喜好,且BERT之AUC值亦高於Word2vec,說明BERT之效益也高於Word2vec。本研究期許藉由實驗結果能幫助音樂產業企業在推薦歌單之演算法設計上能更正確地符合使用者之需求。
Music is an artistic expression of human emotions. We often use a certain artistic conception of melody or lyrics to represent present emotions or the embodiment of the situation. The purpose of our study is to explore the similarity of different Chinese lyrics content and artistic conception, optimizing the automatically recommended song lists. Our study collects a total of 13,212 lyrics of mandopop songs from KKBOX and Mojing Lyrics by python web crawler. We use two natural language processing models, Word2vec and BERT to convert lyrics into vectors. Then, through cosine similarity we can obtain the similarity of two different lyrics. The results of the experiment were verified through focus group interviews and questionnaire surveys. Based on the result of this study we found that in user’s subjective opinion, the accuracy of the recommendation results by using the BERT model is better than that of the Word2vec model, which is closer to the user’s preferences. The AUC value of BERT is higher than that of Word2vec as well, indicating that the benefits of BERT are also Higher than Word2vec. We hope that the experimental results can help music industry companies to more accurately meet the needs of users in the algorithm design of recommended playlists.
參考文獻 期刊論文
[1] Sulartopo, S. (2020). The thesis topic similarity test with TF-IDF method. E-Bisnis : Jurnal Ilmiah Ekonomi Dan Bisnis, 13(1), 13-16.
[2] Salton, G. and C. Buckley (1988). "Term-weighting approaches in automatic text retrieval." Inf. Process. Manage. 24(5): 513-523.
[3] Tomas Mikolov, K. C., Greg Corrado,Jeffrey Dean (2013). Efficient Estimation of Word Representations in Vector Space. International Conference on Learning Representations.
[4] Menno van Zaanen and Pieter Kanters. Automatic Mood Classification Using tf*idf Based on Lyrics. In J. Stephen Downie and Remco C. Veltkamp, editors, 11th International Society for Music Information and Retrieval Conference, August 2010.
[5] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv: 1810.04805, 2018.
[6] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in neural information processing systems, pp. 5998–6008, 2017.
[7] Spärck Jones, K. (2004), "A statistical interpretation of term specificity and its application in retrieval", Journal of Documentation, Vol. 60 No. 5, pp. 493-502.
[8] R.T.-W. Lo, B. He, and I. Ounis. "Automatically building a stopword list for an information retrieval system," Proceedings of The 5th Dutch-Belgian Workshop on Information Retrieval(DIR), Utrecht, Dutch, 2005, pp. 3-8.
[9] 尹其言, 楊建民. (2010). 應用文件分群與文字探勘技術於機器學習領域趨勢分析以 SSCI 資料庫為例.
[10] 溫品竹, 蔡易霖, et al. (2015). 基於 Word2Vec 詞向量的網路情緒文和流行 音樂媒合方法之研究. on Computational Linguistics and Speech Processing ROCLING XXVII (2015), 167.

書籍
[1] 謝邦昌, 鄭宇庭, 謝邦彥, 硬是愛數據應用股份有限公司(2019). 玩轉社群:文字大數據實作.

網際網路
[1] https://pypi.org/project/pywordseg/
[2] https://www.kkbox.com/tw/tc/
[3] https://mojim.com/twznew.htm
[4] https://selenium-python.readthedocs.io
描述 碩士
國立政治大學
企業管理研究所(MBA學位學程)
108363073
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0108363073
資料類型 thesis
dc.contributor.advisor 鄭宇庭zh_TW
dc.contributor.author (Authors) 鄧絜心zh_TW
dc.contributor.author (Authors) Teng, Chieh-Hsinen_US
dc.creator (作者) 鄧絜心zh_TW
dc.creator (作者) Teng, Chieh-Hsinen_US
dc.date (日期) 2021en_US
dc.date.accessioned 4-Aug-2021 16:37:50 (UTC+8)-
dc.date.available 4-Aug-2021 16:37:50 (UTC+8)-
dc.date.issued (上傳時間) 4-Aug-2021 16:37:50 (UTC+8)-
dc.identifier (Other Identifiers) G0108363073en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/136730-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 企業管理研究所(MBA學位學程)zh_TW
dc.description (描述) 108363073zh_TW
dc.description.abstract (摘要) 音樂,是一種人類生活情感的藝術表現,我們常透過某種意境的旋律或歌詞來代表當下的情緒或是情境的體現。本研究目的為探討華語歌詞內容及意境上的相似程度,進而優化現下最熱門的音樂工具自動推薦歌曲清單。透過將歌詞作為文本轉成向量後計算向量之間的相似程度,並將結果作為推薦歌曲的依據。本研究從KKBOX及魔境歌詞網爬蟲收集共13212首華語歌曲之歌詞,並透過兩種自然語言處理模型-Word2vec及BERT將歌詞轉為向量後,利用餘弦相似度的計算可得兩首不同歌詞之間的相近程度,最後透過焦點團體訪問及問卷調查的方式來驗證實驗之結果。研究結果發現,以使用者主觀意見來看,利用BERT模型所做出來的推薦結果準確率優於Word2vec模型,更貼近使用者之喜好,且BERT之AUC值亦高於Word2vec,說明BERT之效益也高於Word2vec。本研究期許藉由實驗結果能幫助音樂產業企業在推薦歌單之演算法設計上能更正確地符合使用者之需求。zh_TW
dc.description.abstract (摘要) Music is an artistic expression of human emotions. We often use a certain artistic conception of melody or lyrics to represent present emotions or the embodiment of the situation. The purpose of our study is to explore the similarity of different Chinese lyrics content and artistic conception, optimizing the automatically recommended song lists. Our study collects a total of 13,212 lyrics of mandopop songs from KKBOX and Mojing Lyrics by python web crawler. We use two natural language processing models, Word2vec and BERT to convert lyrics into vectors. Then, through cosine similarity we can obtain the similarity of two different lyrics. The results of the experiment were verified through focus group interviews and questionnaire surveys. Based on the result of this study we found that in user’s subjective opinion, the accuracy of the recommendation results by using the BERT model is better than that of the Word2vec model, which is closer to the user’s preferences. The AUC value of BERT is higher than that of Word2vec as well, indicating that the benefits of BERT are also Higher than Word2vec. We hope that the experimental results can help music industry companies to more accurately meet the needs of users in the algorithm design of recommended playlists.en_US
dc.description.tableofcontents 第一章 緒論 8
第一節 研究背景與動機 8
第二節 研究目的 9
第三節 研究流程 9
第二章 文獻探討 10
第一節 TF-IDF 11
第二節 Word2vec詞向量 12
第三節 BERT 14
第三章 研究方法與實驗設計 16
第一節 資料收集與前處理 16
壹、 動態網頁爬蟲 – Python Selenium套件 16
貳、 資料前處理 18
參、 斷詞工具–pywordseg 19
第二節 文本向量計算 20
壹、 方法一:Word2Vec & TF-IDF 20
貳、 方法二:BERT 22
第三節 相似度分析 22
第四節 實驗設計與評估 24
壹、 焦點團體訪談 24
貳、 問卷設計 27
第四章 實驗結果與討論 31
第一節 模型結果 31
第二節 實驗結果與評估 32
壹、 模型效能評估 32
貳、 問卷結果分析 34
第五章 結論與建議 36
第一節 研究結論 36
第二節 未來建議 36
附錄 38
參考文獻 48
zh_TW
dc.format.extent 5522464 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0108363073en_US
dc.subject (關鍵詞) 自然語言處理zh_TW
dc.subject (關鍵詞) 文本分析zh_TW
dc.subject (關鍵詞) 華語歌曲歌詞zh_TW
dc.subject (關鍵詞) TF-IDFen_US
dc.subject (關鍵詞) Word2vecen_US
dc.subject (關鍵詞) BERTen_US
dc.title (題名) 應用詞向量及語意分析探討華語歌曲推薦之研究zh_TW
dc.title (題名) The Study of Mandopop Music Recommendation By Word Vectors and Semantic Analysisen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) 期刊論文
[1] Sulartopo, S. (2020). The thesis topic similarity test with TF-IDF method. E-Bisnis : Jurnal Ilmiah Ekonomi Dan Bisnis, 13(1), 13-16.
[2] Salton, G. and C. Buckley (1988). "Term-weighting approaches in automatic text retrieval." Inf. Process. Manage. 24(5): 513-523.
[3] Tomas Mikolov, K. C., Greg Corrado,Jeffrey Dean (2013). Efficient Estimation of Word Representations in Vector Space. International Conference on Learning Representations.
[4] Menno van Zaanen and Pieter Kanters. Automatic Mood Classification Using tf*idf Based on Lyrics. In J. Stephen Downie and Remco C. Veltkamp, editors, 11th International Society for Music Information and Retrieval Conference, August 2010.
[5] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv: 1810.04805, 2018.
[6] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in neural information processing systems, pp. 5998–6008, 2017.
[7] Spärck Jones, K. (2004), "A statistical interpretation of term specificity and its application in retrieval", Journal of Documentation, Vol. 60 No. 5, pp. 493-502.
[8] R.T.-W. Lo, B. He, and I. Ounis. "Automatically building a stopword list for an information retrieval system," Proceedings of The 5th Dutch-Belgian Workshop on Information Retrieval(DIR), Utrecht, Dutch, 2005, pp. 3-8.
[9] 尹其言, 楊建民. (2010). 應用文件分群與文字探勘技術於機器學習領域趨勢分析以 SSCI 資料庫為例.
[10] 溫品竹, 蔡易霖, et al. (2015). 基於 Word2Vec 詞向量的網路情緒文和流行 音樂媒合方法之研究. on Computational Linguistics and Speech Processing ROCLING XXVII (2015), 167.

書籍
[1] 謝邦昌, 鄭宇庭, 謝邦彥, 硬是愛數據應用股份有限公司(2019). 玩轉社群:文字大數據實作.

網際網路
[1] https://pypi.org/project/pywordseg/
[2] https://www.kkbox.com/tw/tc/
[3] https://mojim.com/twznew.htm
[4] https://selenium-python.readthedocs.io
zh_TW
dc.identifier.doi (DOI) 10.6814/NCCU202100892en_US