Please use this identifier to cite or link to this item: https://ah.lib.nccu.edu.tw/handle/140.119/147747
題名: 基於自監督學習之生成語言模型序列文本知識更新
Sequential Text-based Knowledge Update with Self-Supervised Learning for Generative Language Models
作者: 宋浩茹
Sung, Hao-Ru
貢獻者: 李蔡彥<br>黃瀚萱
Li, Tsai-Yen<br>Huang, Hen-Hsen
宋浩茹
Sung, Hao-Ru
關鍵詞: 自然語言生成
時間知識建模
摘要更新
自監督學習
Natural Language Generation
Temporal Knowledge Modeling
Update Summarization
Self-Supervision
日期: 2023
上傳時間: 3-Oct-2023
摘要: 本研究提出新的自然語言處理(NLP)任務,以解決多輪、序列式的文本知識更新問題。該研究引入了一種混合學習架構和新穎的自監督訓練策略,旨在使生成語言模型能夠像人類一樣有效地鞏固和更新知識。這種方式對於改善語言模型的學習和理解能力具有重大意義。為了驗證這種策略的有效性,我們還創建了一個新的數據集以進行評估。從實驗結果來看,我們的方法在效能上超越了現有的模型和GPT-3.5-Turbo。本研究所提出的任務和模型架構能夠提升知識組織的自動化程度,使得基於文本知識的大型語言模型(LLM),成為協助人類執行各種任務的重要資源。
This work proposes a new natural language processing (NLP) task to tackle the issue of multi-round, sequential text-based knowledge update. The study introduces a hybrid learning architecture and a novel self-supervised training strategy to enable generative language models to consolidate knowledge in the same way as humans. A dataset was also created for evaluation and results showed the effectiveness of our methodology. Experimental results confirm the superiority of the proposed approach over existing models and GPT-3.5-Turbo. The proposed task and model framework have the potential to significantly improve the automation of knowledge organization, making text-based knowledge an increasingly crucial resource for powerful large language models (LLM) to perform various tasks for humans.
參考文獻: [1] Y. T. Lee, Y. J. Tang, Y. C. Cheng, P. L. Chen, T. Y. Li, and H. H. Huang, &quot;A Multi-grained Dataset for News Event Triggered Knowledge Update.&quot; pp. 4158-4162.\n[2] S. F. Chen, and J. Goodman, &quot;An empirical study of smoothing techniques for language modeling.&quot;\n[3] L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257-286, 1989.\n[4] R. J. Williams, and D. Zipser, “A Learning Algorithm for Continually Running Fully Recurrent Neural Networks,” Neural Computation, vol. 1, no. 2, pp. 270-280, 1989.\n[5] S. Hochreiter, and J. Schmidhuber, “Long Short-Term Memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.\n[6] D. Bahdanau, K. Cho, and Y. Bengio, “Neural Machine Translation by Jointly Learning to Align and Translate,” ArXiv, vol. 1409, 09/01, 2014.\n[7] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, California, USA, 2017, pp. 6000–6010.\n[8] X. Liu, H.-F. Yu, I. S. Dhillon, and C.-J. Hsieh, “Learning to encode position for transformer with continuous dynamical model,” in Proceedings of the 37th International Conference on Machine Learning, 2020, pp. Article 587.\n[9] X. Tannier, and V. Moriceau, &quot;Building event threads out of multiple news articles.&quot; pp. 958-967.\n[10] M. A. Hearst, S. T. Dumais, E. Osuna, J. Platt, and B. Scholkopf, “Support vector machines,” IEEE Intelligent Systems and their Applications, vol. 13, no. 4, pp. 18-28, 1998.\n[11] J. Platt, Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines, MSR-TR-98-14 ,, Microsoft, 1998.\n[12] S. Tarnpradab, F. Jafariakinabad, and K. A. Hua, “Improving Online Forums Summarization via Hierarchical Unified Deep Neural Network,” 2021.\n[13] H. T. Dang, and K. Owczarzak, &quot;Overview of the TAC 2008 update summarization task.&quot;\n[14] J. Aslam, F. Diaz, M. Ekstrand-Abueg, R. McCreadie, V. Pavlu, and T. Sakai, TREC 2014 Temporal Summarization Track Overview, 2015.\n[15] J. A. Aslam, M. Ekstrand-Abueg, V. Pavlu, F. Diaz, and T. Sakai, &quot;TREC 2013 Temporal Summarization.&quot;\n[16] S. Panthaplackel, A. Benton, and M. Dredze, &quot;Updated Headline Generation: Creating Updated Summaries for Evolving News Stories.&quot; pp. 6438-6461.\n[17] F. Dernoncourt, M. M. Ghassemi, and W. Chang, &quot;A Repository of Corpora for Summarization.&quot;\n[18] M. Banko, V. O. Mittal, and M. J. Witbrock, “Headline generation based on statistical translation,” in Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, Hong Kong, 2000, pp. 318–325.\n[19] B. Dorr, D. Zajic, and R. Schwartz, Hedge Trimmer: A Parse-and-Trim Approach to Headline Generation, 2003.\n[20] K. Matsumaru, S. Takase, and N. Okazaki, “Improving Truthfulness of Headline Generation,” 2020.\n[21] S. Takase, J. Suzuki, N. Okazaki, T. Hirao, and M. Nagata, &quot;Neural headline generation on abstract meaning representation.&quot; pp. 1054-1059.\n[22] D. Z. R. Schwartz, B. E. Door, and R. M. Schwartz, &quot;Automatic Headline Generation for Newspaper Stories.&quot;\n[23] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “RoBERTa: A Robustly Optimized BERT Pretraining Approach,” ArXiv, vol. abs/1907.11692, 2019.\n[24] W. Xiao, I. Beltagy, G. Carenini, and A. Cohan, &quot;PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization,&quot; Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 5245-5263.\n[25] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” ArXiv, vol. abs/1810.04805, 2019.\n[26] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, Lake Tahoe, Nevada, 2013, pp. 3111–3119.\n[27] A. Radford, and K. Narasimhan, &quot;Improving Language Understanding by Generative Pre-Training.&quot;\n[28] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, “Albert: A lite bert for self-supervised learning of language representations,” arXiv preprint arXiv:1909.11942, 2019.\n[29] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer, “Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” arXiv preprint arXiv:1910.13461, 2019.\n[30] J. Zhang, Y. Zhao, M. Saleh, and P. J. Liu, “PEGASUS: pre-training with extracted gap-sentences for abstractive summarization,” in Proceedings of the 37th International Conference on Machine Learning, 2020, pp. Article 1051.\n[31] B. Guo, Y. Gong, Y. Shen, S. Han, H. Huang, N. Duan, and W. Chen, “GENIUS: Sketch-based Language Model Pre-training via Extreme and Selective Masking for Text Generation and Augmentation,” arXiv preprint arXiv:2211.10330, 2022.\n[32] R. Campos, V. Mangaravite, A. Pasquali, A. Jorge, C. Nunes, and A. Jatowt, “YAKE! Keyword extraction from single documents using multiple local features,” Information sciences, vol. 509, pp. 257-289, 2020.\n[33] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer, “BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension,” arXiv pre-print server, 2019-10-29, 2019.\n[34] D. G. Ghalandari, C. Hokamp, N. T. Pham, J. Glover, and G. Ifrim, “A Large-Scale Multi-Document Summarization Dataset from the Wikipedia Current Events Portal,” 2020.\n[35] H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. e. Lacroix, B. Rozi\\"ere, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, and G. Lample, “LLaMA: Open and Efficient Foundation Language Models,” arXiv pre-print server, 2023-02-27, 2023.\n[36] W.-L. Chiang, Z. Li, Z. Lin, Y. Sheng, Z. Wu, H. Zhang, L. Zheng, S. Zhuang, Y. Zhuang, J. E. Gonzalez, I. Stoica, and E. P. Xing, “Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality,” 2023.\n[37] G. Penedo, Q. Malartic, D. Hesslow, R. Cojocaru, A. Cappelli, H. Alobeidli, B. Pannier, E. Almazrouei, and J. Launay, “The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only,” arXiv pre-print server, 2023-06-01, 2023.\n[38] J. Wei, M. Bosma, Vincent, K. Guu, Adams, B. Lester, N. Du, Andrew, and Quoc, “Finetuned Language Models Are Zero-Shot Learners,” arXiv pre-print server, 2021-09-03, 2021.\n[39] Edward, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-Rank Adaptation of Large Language Models,” arXiv pre-print server, 2021-10-16, 2021.\n[40] C.-Y. Lin, &quot;ROUGE: A Package for Automatic Evaluation of Summaries.&quot;\n[41] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, &quot;Bleu: a method for automatic evaluation of machine translation.&quot; pp. 311-318.\n[42] S. Banerjee, and A. Lavie, &quot;METEOR: An automatic metric for MT evaluation with improved correlation with human judgments.&quot; pp. 65-72.\n[43] T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, and Y. Artzi, “Bertscore: Evaluating text generation with bert,” arXiv preprint arXiv:1904.09675, 2019.\n[44] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,” Journal of machine learning research, vol. 21, 2020.\n[45] Y. Li, “Deep Reinforcement Learning,” arXiv pre-print server, 2018-10-15, 2018.
描述: 碩士
國立政治大學
資訊科學系
110753124
資料來源: http://thesis.lib.nccu.edu.tw/record/#G0110753124
資料類型: thesis
Appears in Collections:學位論文

Files in This Item:
File Description SizeFormat
312401.pdf3.32 MBAdobe PDF2View/Open
Show full item record

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.