運用深度學習方法於中文謠言辨識之比較

學術產出-學位論文

文章檢視/開啟

pdf(0)

書目匯出

Google Scholar^TM

政大圖書館

學術資源探索系統

引文資訊

TAIR相關學術產出

Simple Record
Full Record

題名	運用深度學習方法於中文謠言辨識之比較 Comparison of Applying Deep Learning Methods on Misinformation Detection
作者	邱靖雅 Chiu, Ching-Ya
貢獻者	鄭宇庭邱靖雅 Chiu, Ching-Ya
關鍵詞	自然語言處理中文文本分類 BERT CKIP-BERT RoBERTa 假訊息辨識 Natural Language Processing (NLP) BERT CKIP-BERT RoBERTa Chinese text classification Online rumor identification
日期	2022
上傳時間	1-七月-2022 16:57:48 (UTC+8)
摘要	隨著新媒體時代與網際網路的蓬勃發展，資訊流通的速度更快速卻也伴隨社群媒體上大量參雜不實資訊的網路謠言被迅速散播。一般民眾尤其高齡者不易辨認謠言真實與否，在新冠肺炎疫情蔓延之下，誤信不實謠言可能造成不良影響。現今有許多官方與民間的訊息查證平台，如: 衛生福利部疾病管制署－澄清專區、Cofacts和台灣事實查核中心等，將可疑訊息查證結果公布於網頁上供民眾查詢真偽，然而單純以人工方式查核不僅流程耗費大量人力與時間成本，且闢謠速度跟不上網路謠言在群組間轉傳的速度。因此本研究以Cofacts開源資料庫為中文文本，微調Google BERT、CKIP-BERT和RoBERTa預訓練模型對網路謠言進行「真實訊息」與「虛假訊息」的辨識與分類。根據模型評估指標結果，三個模型皆達到平均85%的準確度，能夠正確判斷85%訊息內容的真偽，其中又以RoBERTa模型的分類能力最佳。說明Google BERT、CKIP-BERT和RoBERTa預訓練模型的分類性能對於本研究所蒐集的網路謠言資料集具有良好的成效。 With the development of the new media era and the Internet, the speed of information spreading is dramatically higher than before. But it is also accompanied by the rapid spread of a large number of online rumors mixed with fake information on social media. It is difficult for the general public, especially the elderly, to identify whether the rumors are true or not. Under the spread of the Covid-19 epidemic, the misleading of the fake news may cause public panic and serious consequences. Nowadays, there are many official or private rumor verification platforms, such as Taiwan Centers for Disease Control - Clarification Zone, Cofacts and Taiwan FactCheck Center, etc., publish suspicious information verification results on the website for public to check the authenticity. Not only does the process cost a lot of manpower and time, but also the speed of refuting rumors cannot keep up with the spreading which online rumors are transmitted among social media. Therefore, this research uses Cofacts’s open sources as experimental corpus, and fine-tunes the Google BERT, CKIP-BERT and RoBERTa pre-training models to identify and classify "Truth Information" and "Fake Information" on online rumors. According to the results of the model evaluation indicators, the three models have achieved an average accuracy of 85%, and can correctly judge the authenticity of 85% of the message content. Among them, the RoBERTa model has the best classification ability. It shows that the identification performance of Google BERT, CKIP-BERT and RoBERTa pre-trained models have productive results for the rumor data set collected in this search.
參考文獻	一、中文文獻 1. 王鈞威，(2021)，基於整合RoBERTa與CRF模型之中文文法錯誤診斷系統，朝陽科技大學。 2. 吳晨皓，(2020)，BERT 與 GPT-2 分別應用於刑事案件之罪名分類及判決書生成，國立高雄科技大學。 3. 呂明聲，(2020)，基於深度學習之謠言檢測法: 以食安謠言為例，國立中央大學。 4. 邱彥誠，(2020)，應用人工智慧於股市新聞與情感分析預測股價走勢，國立臺北大學。 5. 胡林辳，(2019)，植基於深度學習假新聞人工智慧偵測: 台灣與美國真實資料實作，國立臺北大學。 6. 夏鶴芸，(2020)，應用深度學習與自然語言處理新技術預測股票走勢－以台積電為例，國立臺北大學。 7. 翁嘉嫻，(2020)，基於預訓練語言模型之中文虛假評論偵測，國立中興大學。 8. 黃若蓁，(2020)，運用BERT模型對中文消費者評價之基於屬性的情緒分析，國立成功大學。 9. 黃慧宜，周倩，(2019)，國中學生面對網路謠言之回應行為初探: 以 Facebook 謠言訊息為例，教育科學研究期刊，64(1)，149-180。 10. 黃獻霆，(2021)，應用RoBERTa-wwm預訓練模型與集成學習以增強機器閱讀理解之表現，國立臺灣大學。 11. 蔡楨永，龍希文，林家安，(2019)，高齡者面對網路謠言困境之探討，國際數位媒體設計學刊，11(1)，53-59。 12. 鍾士慕，(2019)，深度學習技術在中文輿情分析之應用: 以BERT演算法為例，元智大學。 13. 蘇文群，(2021)，真的假的? ! BERT你怎麼說?，國立臺中教育大學。 14. 蘇志昇，(2021)，結合Google BERT語義特徵於LSTM遞迴神經網路建模之美食店家評論情緒分析，亞洲大學。二、英文文獻 1. Baevski, A., Edunov, S., Liu, Y., Zettlemoyer, L., & Auli, M. (2019). Cloze-driven pretraining of self-attention networks. arXiv preprint arXiv:1903.07785. 2. Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. 3. Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078. 4. Chowdhary, K. (2020). Natural language processing. Fundamentals of artificial intelligence, 603-649. 5. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. 6. Galassi, A., Lippi, M., & Torroni, P. (2020). Attention in natural language processing. IEEE Transactions on Neural Networks and Learning Systems, 32(10), 4291-4308. 7. Gillioz, A., Casas, J., Mugellini, E., & Abou Khaled, O. (2020, September). Overview of the Transformer-based Models for NLP Tasks. In 2020 15th Conference on Computer Science and Information Systems (FedCSIS) (pp. 179-183). IEEE. 8. Hirschberg, J., & Manning, C. D. (2015). Advances in natural language processing. Science, 349(6245), 261-266. 9. Jusoh, S., & Al-Fawareh, H. M. (2007, November). Natural language interface for online sales systems. In 2007 International Conference on Intelligent and Advanced Systems (pp. 224-228). IEEE. 10. Jusoh, S., & Alfawareh, H. M. (2012). Techniques, applications and challenging issue in text mining. International Journal of Computer Science Issues (IJCSI), 9(6), 431. 11. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692. 12. Luong, M. T., Pham, H., & Manning, C. D. (2015). Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025. 13. Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. 14. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9. 15. Sennrich, R., Haddow, B., & Birch, A. (2015). Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909. 16. Socher, R., Bengio, Y., & Manning, C. D. (2012). Deep learning for NLP (without magic). In Tutorial Abstracts of ACL 2012 (pp. 5-5). 17. Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. Advances in neural information processing systems, 27. 18. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30. 19. Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., ... & Rush, A. M. (2020, October). Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations (pp. 38-45). 20. Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., ... & Dean, J. (2016). Google`s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144. 21. Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. R., & Le, Q. V. (2019). Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32.
描述	碩士國立政治大學統計學系 109354013
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0109354013
資料類型	thesis

dc.contributor.advisor	鄭宇庭	zh_TW
dc.contributor.author (作者)	邱靖雅	zh_TW
dc.contributor.author (作者)	Chiu, Ching-Ya	en_US
dc.creator (作者)	邱靖雅	zh_TW
dc.creator (作者)	Chiu, Ching-Ya	en_US
dc.date (日期)	2022	en_US
dc.date.accessioned	1-七月-2022 16:57:48 (UTC+8)	-
dc.date.available	1-七月-2022 16:57:48 (UTC+8)	-
dc.date.issued (上傳時間)	1-七月-2022 16:57:48 (UTC+8)	-
dc.identifier (其他識別碼)	G0109354013	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/140752	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	統計學系	zh_TW
dc.description (描述)	109354013	zh_TW
dc.description.abstract (摘要)	隨著新媒體時代與網際網路的蓬勃發展，資訊流通的速度更快速卻也伴隨社群媒體上大量參雜不實資訊的網路謠言被迅速散播。一般民眾尤其高齡者不易辨認謠言真實與否，在新冠肺炎疫情蔓延之下，誤信不實謠言可能造成不良影響。現今有許多官方與民間的訊息查證平台，如: 衛生福利部疾病管制署－澄清專區、Cofacts和台灣事實查核中心等，將可疑訊息查證結果公布於網頁上供民眾查詢真偽，然而單純以人工方式查核不僅流程耗費大量人力與時間成本，且闢謠速度跟不上網路謠言在群組間轉傳的速度。因此本研究以Cofacts開源資料庫為中文文本，微調Google BERT、CKIP-BERT和RoBERTa預訓練模型對網路謠言進行「真實訊息」與「虛假訊息」的辨識與分類。根據模型評估指標結果，三個模型皆達到平均85%的準確度，能夠正確判斷85%訊息內容的真偽，其中又以RoBERTa模型的分類能力最佳。說明Google BERT、CKIP-BERT和RoBERTa預訓練模型的分類性能對於本研究所蒐集的網路謠言資料集具有良好的成效。	zh_TW
dc.description.abstract (摘要)	With the development of the new media era and the Internet, the speed of information spreading is dramatically higher than before. But it is also accompanied by the rapid spread of a large number of online rumors mixed with fake information on social media. It is difficult for the general public, especially the elderly, to identify whether the rumors are true or not. Under the spread of the Covid-19 epidemic, the misleading of the fake news may cause public panic and serious consequences. Nowadays, there are many official or private rumor verification platforms, such as Taiwan Centers for Disease Control - Clarification Zone, Cofacts and Taiwan FactCheck Center, etc., publish suspicious information verification results on the website for public to check the authenticity. Not only does the process cost a lot of manpower and time, but also the speed of refuting rumors cannot keep up with the spreading which online rumors are transmitted among social media. Therefore, this research uses Cofacts’s open sources as experimental corpus, and fine-tunes the Google BERT, CKIP-BERT and RoBERTa pre-training models to identify and classify "Truth Information" and "Fake Information" on online rumors. According to the results of the model evaluation indicators, the three models have achieved an average accuracy of 85%, and can correctly judge the authenticity of 85% of the message content. Among them, the RoBERTa model has the best classification ability. It shows that the identification performance of Google BERT, CKIP-BERT and RoBERTa pre-trained models have productive results for the rumor data set collected in this search.	en_US
dc.description.tableofcontents	目錄 III 表目錄 IV 圖目錄 V 第壹章、緒論 1 第一節研究背景與動機 1 第二節研究目的 2 第三節研究流程 4 第貳章、文獻探討 5 第一節自然語言處理(Natural Language Processing) 5 第二節注意力機制(Attention Mechanism) 7 第三節 Transformer模型 8 第四節 Google BERT 10 第五節 RoBERTa 14 第六節中文文本分類(Text Classification)之文獻回顧 16 第參章、研究方法 17 第一節研究架構 17 第二節資料蒐集與資料預處理 18 第三節分析模型 23 第肆章、實證分析 29 第一節資料篩選流程 29 第二節模型成效分析 32 第伍章、結論與建議 36 第一節結論 36 第二節未來研究方向與建議 37 參考文獻 39	zh_TW
dc.format.extent	2392830 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0109354013	en_US
dc.subject (關鍵詞)	自然語言處理	zh_TW
dc.subject (關鍵詞)	中文文本分類	zh_TW
dc.subject (關鍵詞)	BERT	zh_TW
dc.subject (關鍵詞)	CKIP-BERT	zh_TW
dc.subject (關鍵詞)	RoBERTa	zh_TW
dc.subject (關鍵詞)	假訊息辨識	zh_TW
dc.subject (關鍵詞)	Natural Language Processing (NLP)	en_US
dc.subject (關鍵詞)	BERT	en_US
dc.subject (關鍵詞)	CKIP-BERT	en_US
dc.subject (關鍵詞)	RoBERTa	en_US
dc.subject (關鍵詞)	Chinese text classification	en_US
dc.subject (關鍵詞)	Online rumor identification	en_US
dc.title (題名)	運用深度學習方法於中文謠言辨識之比較	zh_TW
dc.title (題名)	Comparison of Applying Deep Learning Methods on Misinformation Detection	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	一、中文文獻 1. 王鈞威，(2021)，基於整合RoBERTa與CRF模型之中文文法錯誤診斷系統，朝陽科技大學。 2. 吳晨皓，(2020)，BERT 與 GPT-2 分別應用於刑事案件之罪名分類及判決書生成，國立高雄科技大學。 3. 呂明聲，(2020)，基於深度學習之謠言檢測法: 以食安謠言為例，國立中央大學。 4. 邱彥誠，(2020)，應用人工智慧於股市新聞與情感分析預測股價走勢，國立臺北大學。 5. 胡林辳，(2019)，植基於深度學習假新聞人工智慧偵測: 台灣與美國真實資料實作，國立臺北大學。 6. 夏鶴芸，(2020)，應用深度學習與自然語言處理新技術預測股票走勢－以台積電為例，國立臺北大學。 7. 翁嘉嫻，(2020)，基於預訓練語言模型之中文虛假評論偵測，國立中興大學。 8. 黃若蓁，(2020)，運用BERT模型對中文消費者評價之基於屬性的情緒分析，國立成功大學。 9. 黃慧宜，周倩，(2019)，國中學生面對網路謠言之回應行為初探: 以 Facebook 謠言訊息為例，教育科學研究期刊，64(1)，149-180。 10. 黃獻霆，(2021)，應用RoBERTa-wwm預訓練模型與集成學習以增強機器閱讀理解之表現，國立臺灣大學。 11. 蔡楨永，龍希文，林家安，(2019)，高齡者面對網路謠言困境之探討，國際數位媒體設計學刊，11(1)，53-59。 12. 鍾士慕，(2019)，深度學習技術在中文輿情分析之應用: 以BERT演算法為例，元智大學。 13. 蘇文群，(2021)，真的假的? ! BERT你怎麼說?，國立臺中教育大學。 14. 蘇志昇，(2021)，結合Google BERT語義特徵於LSTM遞迴神經網路建模之美食店家評論情緒分析，亞洲大學。二、英文文獻 1. Baevski, A., Edunov, S., Liu, Y., Zettlemoyer, L., & Auli, M. (2019). Cloze-driven pretraining of self-attention networks. arXiv preprint arXiv:1903.07785. 2. Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. 3. Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078. 4. Chowdhary, K. (2020). Natural language processing. Fundamentals of artificial intelligence, 603-649. 5. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. 6. Galassi, A., Lippi, M., & Torroni, P. (2020). Attention in natural language processing. IEEE Transactions on Neural Networks and Learning Systems, 32(10), 4291-4308. 7. Gillioz, A., Casas, J., Mugellini, E., & Abou Khaled, O. (2020, September). Overview of the Transformer-based Models for NLP Tasks. In 2020 15th Conference on Computer Science and Information Systems (FedCSIS) (pp. 179-183). IEEE. 8. Hirschberg, J., & Manning, C. D. (2015). Advances in natural language processing. Science, 349(6245), 261-266. 9. Jusoh, S., & Al-Fawareh, H. M. (2007, November). Natural language interface for online sales systems. In 2007 International Conference on Intelligent and Advanced Systems (pp. 224-228). IEEE. 10. Jusoh, S., & Alfawareh, H. M. (2012). Techniques, applications and challenging issue in text mining. International Journal of Computer Science Issues (IJCSI), 9(6), 431. 11. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692. 12. Luong, M. T., Pham, H., & Manning, C. D. (2015). Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025. 13. Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. 14. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9. 15. Sennrich, R., Haddow, B., & Birch, A. (2015). Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909. 16. Socher, R., Bengio, Y., & Manning, C. D. (2012). Deep learning for NLP (without magic). In Tutorial Abstracts of ACL 2012 (pp. 5-5). 17. Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. Advances in neural information processing systems, 27. 18. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30. 19. Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., ... & Rush, A. M. (2020, October). Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations (pp. 38-45). 20. Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., ... & Dean, J. (2016). Google`s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144. 21. Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. R., & Le, Q. V. (2019). Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32.	zh_TW
dc.identifier.doi (DOI)	10.6814/NCCU202200690	en_US

學術產出-學位論文

文章檢視/開啟

書目匯出

Google ScholarTM

政大圖書館

引文資訊

TAIR相關學術產出

Google Scholar^TM