Publications-Theses
Article View/Open
Publication Export
-
Google ScholarTM
NCCU Library
Citation Infomation
Related Publications in TAIR
題名 基於深度學習以情感辭典增強情緒分析
Emotion Analysis Enhanced with Sentiment Lexicons Based on Deep Learning作者 張禎尹 貢獻者 邱淑怡
Chiu, Shu-I
張禎尹關鍵詞 情緒分析
多標籤資料不平衡
相似詞替換
NRC情感辭典
雙向長短期記憶網絡(BiLSTM)
NRC情感辭典 (EmoLex)
掩碼語言模型(MLM)
emotion analysis
EmoLex
multi-label data imbalance
synonym replacement
Bidirectional Long Short-Term Memory (BiL STM)
Masked Language Model (MLM)日期 2024 上傳時間 4-Sep-2024 14:59:43 (UTC+8) 摘要 本研究結合了雙向長短期記憶網絡(BiLSTM)和NRC情感辭典(EmoLex),名為EmoBiLSTM,旨在提高台灣社交媒體文本的情緒識別準確性。隨著COVID-19 疫情的全球蔓延,人們的生活和心理健康受到了顯著影響,及時準確地掌握公眾的情感變化對於公共衛生政策的制定具有重要意義。情緒分析在疫情期間的重要性尤為突出,能夠幫助政府及時了解公眾的情緒狀態,並針對性地採取措施。然而,現有的情緒分析技術在準確性和適應性方面仍存在不足,特別是在面對多標籤資料不平衡問題時。通過結合深度學習技術和情感辭典,提升情緒分析的準確性和適應性。為了解決多標籤資料不平衡問題,採用了相似詞替換和掩碼語言模型(MLM)進行資料擴增。相似詞替換通替換句子中的部分詞彙來生成新的訓練樣本,增加少數類別的數據量;MLM 通過預測句子中被隨機掩碼的單詞進行訓練,學習詞語的語境和句子結構,提升文本生成和擴增的效果。模型結合了BiLSTM和CNN 兩種技術。CNN 用於提取文本的局部特徵,BiLSTM 則負責捕捉文本的全局上下文信息。為了進一步增強模型的情感識別能力,模型引入了NRC 情感辭典(EmoLex)。這一辭典提供了豐富的情感詞彙,能夠幫助模型更準確地識別和處理文本中的情感信息。模型參數經過調整以優化性能,使用訓練數據集進行訓練。訓練過程中,採用準確率、召回率和F1-score 等性能指標對模型進行評估。結果顯示,相似詞換搭配EmoLex 和BiLSTM 模型在各項指標上均表現優異,特別是在處理多標籤資料不平衡問題時,顯示出了顯著的優勢。實驗結果表明,在處理台灣社交媒體文本的情緒識別任務中,具有較高的準確性和穩定性。這表明,結合深度學習技術與情感辭典的情緒分析方法,在處理多標籤資料不平衡問題方面,具有顯著的效果。
This study integrates Bidirectional Long Short-Term Memory (BiLSTM) networks and the NRC Emotion Lexicon (EmoLex) to enhance the accuracy of emotion recognition in Taiwanese social media texts during the COVID- 19 pandemic. The model, named EmoBiLSTM, aims to provide timely and accurate insights into public emotional changes, which is crucial for public health policy formulation. To address multi-label data imbalance, the study employs synonym replacement and Masked Language Model (MLM) for data augmentation. Synonym replacement generates new training samples by substituting words in sentences, increasing the data volume of minority classes.MLM predicts randomly masked words in sentences, enhancing text generation and augmentation. The model combines CNN and BiLSTM techniques,with CNN extracting local text features and BiLSTM capturing global contextual information. Introducing the NRC Emotion Lexicon (EmoLex) further enhances the model’s ability to identify and process emotional information. Performance metrics such as accuracy, recall, and F1-score are used to evaluate the model. Results show that synonym replacement combined with EmoLex and BiLSTM models performs excellently, particularly in handling multi-label data imbalance issues. This demonstrates the effectiveness of combining deep learning techniques with emotion lexicons for emotion analysis in social media texts.參考文獻 [1] Zi-xian Liu, De-gan Zhang, Gu-zhao Luo, Ming Lian, and Bing Liu. A new method of emotional analysis based on cnn–bilstm hybrid neural network. Cluster Computing, 23:2901–2913, 2020. [2] Cuiyan Wang, Riyu Pan, Xiaoyang Wan, Yilin Tan, Linkang Xu, Roger S McIntyre, Faith N Choo, Bach Tran, Roger Ho, Vijay K Sharma, et al. A longitudinal study on the mental health of general population during the covid-19 epidemic in china. Brain, behavior, and immunity, 87:40–48, 2020. [3] Tian-Ru Huang. Did covid-19 form an unexpected shield? post-pandemic suicide deaths surge to a 14-year high: ”so many more people” in two groups. The Storm Media, 2024. [4] Jasmin Bogatinovski, Ljupčo Todorovski, Sašo Džeroski, and Dragi Kocev. Comprehensive comparative study of multi-label classification methods. Expert Systems with Applications, 203:117215, 2022. [5] Alex Graves and Alex Graves. Long short-term memory. Supervised sequence labelling with recurrent neural networks, pages 37–45, 2012. [6] Mike Schuster and Kuldip K Paliwal. Bidirectional recurrent neural networks. IEEE transactions on Signal Processing, 45(11):2673–2681, 1997. [7] Christos Pavlatos, Evangelos Makris, Georgios Fotis, Vasiliki Vita, and Valeri Mladenov. Enhancing electrical load prediction using a bidirectional lstm neural network. Electronics, 12(22):4652, 2023. [8] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. [9] Yoon Kim. Convolutional neural networks for sentence classification, 2014. [10] Kai Zhou and Fei Long. Sentiment analysis of text based on cnn and bi-directional lstm model. pages 1–5, 2018. [11] Saif M Mohammad and Peter D Turney. Crowdsourcing a word–emotion association lexicon. Computational intelligence, 29(3):436–465, 2013. [12] Qihuang Zhang, Grace Y. Yi, Li-Pang Chen, and Wenqing He. Sentiment analysis and causal learning of covid-19 tweets prior to the rollout of vaccines. PLOS ONE, 18(2):e0277878, February 2023. ISSN 1932-6203. doi: 10.1371/journal.pone. 0277878. [13] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pretraining of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. [14] Ritesh Kumar. Augment your small dataset using transformers: Synonym replacement for sentiment analysis part 1. Towards Data Science, 2020. [15] Qihuang Zhang, Grace Y Yi, Li-Pang Chen, and Wenqing He. Sentiment analysis and causal learning of covid-19 tweets prior to the rollout of vaccines. Plos one, 18 (2):e0277878, 2023. [16] Hua Qian and Craig R Scott. Anonymity and self-disclosure on weblogs. Journal of Computer-Mediated Communication, 12(4):1428–1451, 2007. [17] Marcus Müller, Sabine Bartsch, and Jens O Zinn. Communicating the unknown: An interdisciplinary annotation study of uncertainty in the coronavirus pandemic. International Journal of Corpus Linguistics, 26(4):498–531, 2021. [18] Sun Peng. Jieba: Chinese word segmentation tool. 2012. [19] Tomasz Szandała. Review and comparison of commonly used activation functions for deep neural networks. Bio-inspired neurocomputing, pages 203–224, 2021. [20] Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, et al. A survey on evaluation of large language models. ACM Transactions on Intelligent Systems and Technology, 15(3), 2024. 描述 碩士
國立政治大學
資訊科學系
111753136資料來源 http://thesis.lib.nccu.edu.tw/record/#G0111753136 資料類型 thesis dc.contributor.advisor 邱淑怡 zh_TW dc.contributor.advisor Chiu, Shu-I en_US dc.contributor.author (Authors) 張禎尹 zh_TW dc.creator (作者) 張禎尹 zh_TW dc.date (日期) 2024 en_US dc.date.accessioned 4-Sep-2024 14:59:43 (UTC+8) - dc.date.available 4-Sep-2024 14:59:43 (UTC+8) - dc.date.issued (上傳時間) 4-Sep-2024 14:59:43 (UTC+8) - dc.identifier (Other Identifiers) G0111753136 en_US dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/153378 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊科學系 zh_TW dc.description (描述) 111753136 zh_TW dc.description.abstract (摘要) 本研究結合了雙向長短期記憶網絡(BiLSTM)和NRC情感辭典(EmoLex),名為EmoBiLSTM,旨在提高台灣社交媒體文本的情緒識別準確性。隨著COVID-19 疫情的全球蔓延,人們的生活和心理健康受到了顯著影響,及時準確地掌握公眾的情感變化對於公共衛生政策的制定具有重要意義。情緒分析在疫情期間的重要性尤為突出,能夠幫助政府及時了解公眾的情緒狀態,並針對性地採取措施。然而,現有的情緒分析技術在準確性和適應性方面仍存在不足,特別是在面對多標籤資料不平衡問題時。通過結合深度學習技術和情感辭典,提升情緒分析的準確性和適應性。為了解決多標籤資料不平衡問題,採用了相似詞替換和掩碼語言模型(MLM)進行資料擴增。相似詞替換通替換句子中的部分詞彙來生成新的訓練樣本,增加少數類別的數據量;MLM 通過預測句子中被隨機掩碼的單詞進行訓練,學習詞語的語境和句子結構,提升文本生成和擴增的效果。模型結合了BiLSTM和CNN 兩種技術。CNN 用於提取文本的局部特徵,BiLSTM 則負責捕捉文本的全局上下文信息。為了進一步增強模型的情感識別能力,模型引入了NRC 情感辭典(EmoLex)。這一辭典提供了豐富的情感詞彙,能夠幫助模型更準確地識別和處理文本中的情感信息。模型參數經過調整以優化性能,使用訓練數據集進行訓練。訓練過程中,採用準確率、召回率和F1-score 等性能指標對模型進行評估。結果顯示,相似詞換搭配EmoLex 和BiLSTM 模型在各項指標上均表現優異,特別是在處理多標籤資料不平衡問題時,顯示出了顯著的優勢。實驗結果表明,在處理台灣社交媒體文本的情緒識別任務中,具有較高的準確性和穩定性。這表明,結合深度學習技術與情感辭典的情緒分析方法,在處理多標籤資料不平衡問題方面,具有顯著的效果。 zh_TW dc.description.abstract (摘要) This study integrates Bidirectional Long Short-Term Memory (BiLSTM) networks and the NRC Emotion Lexicon (EmoLex) to enhance the accuracy of emotion recognition in Taiwanese social media texts during the COVID- 19 pandemic. The model, named EmoBiLSTM, aims to provide timely and accurate insights into public emotional changes, which is crucial for public health policy formulation. To address multi-label data imbalance, the study employs synonym replacement and Masked Language Model (MLM) for data augmentation. Synonym replacement generates new training samples by substituting words in sentences, increasing the data volume of minority classes.MLM predicts randomly masked words in sentences, enhancing text generation and augmentation. The model combines CNN and BiLSTM techniques,with CNN extracting local text features and BiLSTM capturing global contextual information. Introducing the NRC Emotion Lexicon (EmoLex) further enhances the model’s ability to identify and process emotional information. Performance metrics such as accuracy, recall, and F1-score are used to evaluate the model. Results show that synonym replacement combined with EmoLex and BiLSTM models performs excellently, particularly in handling multi-label data imbalance issues. This demonstrates the effectiveness of combining deep learning techniques with emotion lexicons for emotion analysis in social media texts. en_US dc.description.tableofcontents 1 緒論 1 1.1 背景介紹 1 1.2 研究動機 4 1.3 研究問題 5 1.4 研究限制 6 1.5 研究目標 7 1.6 貢獻 7 1.7 結構概述 8 2 相關研究 9 2.1 LSTM 及BiLSTM 9 2.2 特徵提取 11 2.3 應用於文本分類的CNN-BiLSTM 混合模型 12 2.4 深度學習與情感辭典的協同應用於情緒分類研究 13 2.5 情緒分析的挑戰 15 3 研究方法 19 3.1 資料集 20 3.2 模型設計 27 3.3 評估方法 33 4 實驗設置與結果 35 4.1 實驗設定與參數 35 4.2 採用BiLSTM 模型作為基線進行情緒分類 37 4.3 應對策略(任務一與任務二) 40 4.4 EmoLex 與BiLSTM 的協同分析 44 4.5 各項比較 47 5 結論與未來展望 49 5.1 結論 49 5.2 未來展望 50 參考文獻 52 zh_TW dc.format.extent 4425172 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0111753136 en_US dc.subject (關鍵詞) 情緒分析 zh_TW dc.subject (關鍵詞) 多標籤資料不平衡 zh_TW dc.subject (關鍵詞) 相似詞替換 zh_TW dc.subject (關鍵詞) NRC情感辭典 zh_TW dc.subject (關鍵詞) 雙向長短期記憶網絡(BiLSTM) zh_TW dc.subject (關鍵詞) NRC情感辭典 (EmoLex) zh_TW dc.subject (關鍵詞) 掩碼語言模型(MLM) zh_TW dc.subject (關鍵詞) emotion analysis en_US dc.subject (關鍵詞) EmoLex en_US dc.subject (關鍵詞) multi-label data imbalance en_US dc.subject (關鍵詞) synonym replacement en_US dc.subject (關鍵詞) Bidirectional Long Short-Term Memory (BiL STM) en_US dc.subject (關鍵詞) Masked Language Model (MLM) en_US dc.title (題名) 基於深度學習以情感辭典增強情緒分析 zh_TW dc.title (題名) Emotion Analysis Enhanced with Sentiment Lexicons Based on Deep Learning en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) [1] Zi-xian Liu, De-gan Zhang, Gu-zhao Luo, Ming Lian, and Bing Liu. A new method of emotional analysis based on cnn–bilstm hybrid neural network. Cluster Computing, 23:2901–2913, 2020. [2] Cuiyan Wang, Riyu Pan, Xiaoyang Wan, Yilin Tan, Linkang Xu, Roger S McIntyre, Faith N Choo, Bach Tran, Roger Ho, Vijay K Sharma, et al. A longitudinal study on the mental health of general population during the covid-19 epidemic in china. Brain, behavior, and immunity, 87:40–48, 2020. [3] Tian-Ru Huang. Did covid-19 form an unexpected shield? post-pandemic suicide deaths surge to a 14-year high: ”so many more people” in two groups. The Storm Media, 2024. [4] Jasmin Bogatinovski, Ljupčo Todorovski, Sašo Džeroski, and Dragi Kocev. Comprehensive comparative study of multi-label classification methods. Expert Systems with Applications, 203:117215, 2022. [5] Alex Graves and Alex Graves. Long short-term memory. Supervised sequence labelling with recurrent neural networks, pages 37–45, 2012. [6] Mike Schuster and Kuldip K Paliwal. Bidirectional recurrent neural networks. IEEE transactions on Signal Processing, 45(11):2673–2681, 1997. [7] Christos Pavlatos, Evangelos Makris, Georgios Fotis, Vasiliki Vita, and Valeri Mladenov. Enhancing electrical load prediction using a bidirectional lstm neural network. Electronics, 12(22):4652, 2023. [8] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. [9] Yoon Kim. Convolutional neural networks for sentence classification, 2014. [10] Kai Zhou and Fei Long. Sentiment analysis of text based on cnn and bi-directional lstm model. pages 1–5, 2018. [11] Saif M Mohammad and Peter D Turney. Crowdsourcing a word–emotion association lexicon. Computational intelligence, 29(3):436–465, 2013. [12] Qihuang Zhang, Grace Y. Yi, Li-Pang Chen, and Wenqing He. Sentiment analysis and causal learning of covid-19 tweets prior to the rollout of vaccines. PLOS ONE, 18(2):e0277878, February 2023. ISSN 1932-6203. doi: 10.1371/journal.pone. 0277878. [13] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pretraining of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. [14] Ritesh Kumar. Augment your small dataset using transformers: Synonym replacement for sentiment analysis part 1. Towards Data Science, 2020. [15] Qihuang Zhang, Grace Y Yi, Li-Pang Chen, and Wenqing He. Sentiment analysis and causal learning of covid-19 tweets prior to the rollout of vaccines. Plos one, 18 (2):e0277878, 2023. [16] Hua Qian and Craig R Scott. Anonymity and self-disclosure on weblogs. Journal of Computer-Mediated Communication, 12(4):1428–1451, 2007. [17] Marcus Müller, Sabine Bartsch, and Jens O Zinn. Communicating the unknown: An interdisciplinary annotation study of uncertainty in the coronavirus pandemic. International Journal of Corpus Linguistics, 26(4):498–531, 2021. [18] Sun Peng. Jieba: Chinese word segmentation tool. 2012. [19] Tomasz Szandała. Review and comparison of commonly used activation functions for deep neural networks. Bio-inspired neurocomputing, pages 203–224, 2021. [20] Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, et al. A survey on evaluation of large language models. ACM Transactions on Intelligent Systems and Technology, 15(3), 2024. zh_TW