Transformer 應用於中文文章摘要

學術產出-Theses

Article View/Open

pdf(0)

Publication Export

Google Scholar^TM

政大圖書館

學術資源探索系統

Citation Infomation

Simple Record
Full Record

題名	Transformer 應用於中文文章摘要 Using Transformer for Chinese article summarization
作者	林奕勳 Lin, Yi-Hsun
貢獻者	蔡炎龍 Tsai, Yen-Lung 林奕勳 Lin, Yi-Hsun
關鍵詞	Transformer BERT GPT-2 中文文章摘要抽取式摘要生成式摘要深度學習 Transformer BERT GPT-2 Chinese article summarization Extractive summarization Abstractive summarization Deep learning
日期	2022
上傳時間	1-Aug-2022 18:13:06 (UTC+8)
摘要	自從Transformer 發表後，無疑為自然語言處理領域的立下新的里程碑，許多的模型也因應而起，分別在各自然語言處理項目有傑出的表現。如此強大的模型多數背後依靠巨量的參數運算，但各模型皆以英文為發展主軸，我們很難訓練一個一樣強的中文模型，在缺乏原生中文模型的情況下，我們利用現有的資源及模型訓練機器做中文文章摘要，使用BERT 及GPT-2，搭配中研院中文詞知識庫小組的中文模型，並採用新聞資料進行訓練。先透過BERT 從原文章獲得抽取式摘要，使文章篇幅縮短並保留住重要資訊，接著使用GPT-2 從抽取過的摘要中再進行生成式摘要，去除掉重複的資訊並使語句更平順。在我們的實驗中，我們獲得了不錯的中文文章摘要，證明這個方法是有效的。 Since the publication of Transformer, it has undoubtedly set a new milestone in the field of Natural Language Processing, and many models have also been released depending on it and performed outstandingly in various Natural Language Processing tasks. Most of such powerful models rely on a large number of parameter operations, but most of them are developed in English, and it is difficult for us to train a Chinese model that is equally strong. In the absence of native Chinese models, we use existing resources and model to train the machine to make Chinese article summaries: using BERT and GPT-2 model, with the Chinese model of the Chinese Knowledge and Information Processing of the Academia Sinica of Taiwan, and using news datasets for training. First, use BERT to obtained an extractive summarization from the original article, so that the length of the article is shortened and important information is retained, then use GPT-2 to generate a summarization from the extracted summary to remove duplicate information and make the sentence smoother. In our experiments, we obtained decent Chinese article summaries, proving that this method is effective.
參考文獻	[1] Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization. arXiv preprint arXiv:1607.06450, 2016. [2] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014. [3] Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8):1798–1828, 2013. [4] Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. Enriching word vectors with subword information, 2016. [5] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020. [6] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre- training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. [7] Kunihiko Fukushima. Neural network model for a mechanism of pattern recognition unaffected by shift in position-neocognitron. IEICE Technical Report, A, 62(10):658–665, 1979. [8] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. [9] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. [10] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997. [11] Anil K Jain, Jianchang Mao, and K Moidin Mohiuddin. Artificial neural networks: A tutorial. Computer, 29(3):31–44, 1996. [12] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):436–444, 2015. [13] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. [14] Moshe Leshno, Vladimir Ya Lin, Allan Pinkus, and Shimon Schocken. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural networks, 6(6):861–867, 1993. [15] Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461, 2019. [16] YangLiuandMirellaLapata.Textsummarizationwithpretrainedencoders.arXivpreprint arXiv:1908.08345, 2019. [17] Rada Mihalcea and Paul Tarau. Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing, pages 404–411, 2004. [18] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013. [19] Vinod Nair and Geoffrey E Hinton. Rectified linear units improve restricted boltzmann machines. In Icml, 2010. [20] LawrencePage,SergeyBrin,RajeevMotwani,andTerryWinograd.Thepagerankcitation ranking: Bringing order to the web. Technical report, Stanford InfoLab, 1999. [21] Jeffrey Pennington, Richard Socher, and Christopher D Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014. [22] MatthewE.Peters,MarkNeumann,MohitIyyer,MattGardner,ChristopherClark,Kenton Lee, and Luke Zettlemoyer. Deep contextualized word representations, 2018. [23] Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. Improving language understanding by generative pre-training. 2018. [24] AlecRadford,JeffreyWu,RewonChild,DavidLuan,DarioAmodei,IlyaSutskever,etal. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019. [25] Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J Liu, et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140):1–67, 2020. [26] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning representations by back-propagating errors. nature, 323(6088):533–536, 1986. [27] Jürgen Schmidhuber. Deep learning in neural networks: An overview. Neural Networks, 61:85–117, Jan 2015. [28] David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016. [29] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017. [30] Ronald J Williams and David Zipser. A learning algorithm for continually running fully recurrent neural networks. Neural computation, 1(2):270–280, 1989. [31] W Yonghui, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144, 2016.
描述	碩士國立政治大學應用數學系 109751004
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0109751004
資料類型	thesis

dc.contributor.advisor	蔡炎龍	zh_TW
dc.contributor.advisor	Tsai, Yen-Lung	en_US
dc.contributor.author (Authors)	林奕勳	zh_TW
dc.contributor.author (Authors)	Lin, Yi-Hsun	en_US
dc.creator (作者)	林奕勳	zh_TW
dc.creator (作者)	Lin, Yi-Hsun	en_US
dc.date (日期)	2022	en_US
dc.date.accessioned	1-Aug-2022 18:13:06 (UTC+8)	-
dc.date.available	1-Aug-2022 18:13:06 (UTC+8)	-
dc.date.issued (上傳時間)	1-Aug-2022 18:13:06 (UTC+8)	-
dc.identifier (Other Identifiers)	G0109751004	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/141182	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	應用數學系	zh_TW
dc.description (描述)	109751004	zh_TW
dc.description.abstract (摘要)	自從Transformer 發表後，無疑為自然語言處理領域的立下新的里程碑，許多的模型也因應而起，分別在各自然語言處理項目有傑出的表現。如此強大的模型多數背後依靠巨量的參數運算，但各模型皆以英文為發展主軸，我們很難訓練一個一樣強的中文模型，在缺乏原生中文模型的情況下，我們利用現有的資源及模型訓練機器做中文文章摘要，使用BERT 及GPT-2，搭配中研院中文詞知識庫小組的中文模型，並採用新聞資料進行訓練。先透過BERT 從原文章獲得抽取式摘要，使文章篇幅縮短並保留住重要資訊，接著使用GPT-2 從抽取過的摘要中再進行生成式摘要，去除掉重複的資訊並使語句更平順。在我們的實驗中，我們獲得了不錯的中文文章摘要，證明這個方法是有效的。	zh_TW
dc.description.abstract (摘要)	Since the publication of Transformer, it has undoubtedly set a new milestone in the field of Natural Language Processing, and many models have also been released depending on it and performed outstandingly in various Natural Language Processing tasks. Most of such powerful models rely on a large number of parameter operations, but most of them are developed in English, and it is difficult for us to train a Chinese model that is equally strong. In the absence of native Chinese models, we use existing resources and model to train the machine to make Chinese article summaries: using BERT and GPT-2 model, with the Chinese model of the Chinese Knowledge and Information Processing of the Academia Sinica of Taiwan, and using news datasets for training. First, use BERT to obtained an extractive summarization from the original article, so that the length of the article is shortened and important information is retained, then use GPT-2 to generate a summarization from the extracted summary to remove duplicate information and make the sentence smoother. In our experiments, we obtained decent Chinese article summaries, proving that this method is effective.	en_US
dc.description.tableofcontents	1 Introduction 1 2 Deep Learning 2 2.1 Neurons and Neural Networks 4 2.2 Activation Function 6 2.3 Loss Function 8 2.4 Gradient Descent Method 10 3 Word Embeddings 12 3.1 Word2Vec 12 3.2 GloVe 13 3.3 FastText 14 4 Transformer 16 4.1 Embeddings 16 4.2 Encoder 18 4.3 Decoder 22 5 Contextualized Word Embeddings 24 5.1 ELMo 24 5.2 BERT 25 5.3 GPT-2 27 6 Summarization 28 6.1 Two methods of summarization 28 6.2 TextRank 29 6.3 BERTSUM 30 7 Experiments 33 7.1 Data Preparation 34 7.2 Extractive Summarization with BERTSUM 34 7.3 Abstractive Summarization with GPT-2 35 7.4 Result 36 8 Conclusion 39 Bibliography 40	zh_TW
dc.format.extent	4294382 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0109751004	en_US
dc.subject (關鍵詞)	Transformer	zh_TW
dc.subject (關鍵詞)	BERT	zh_TW
dc.subject (關鍵詞)	GPT-2	zh_TW
dc.subject (關鍵詞)	中文文章摘要	zh_TW
dc.subject (關鍵詞)	抽取式摘要	zh_TW
dc.subject (關鍵詞)	生成式摘要	zh_TW
dc.subject (關鍵詞)	深度學習	zh_TW
dc.subject (關鍵詞)	Transformer	en_US
dc.subject (關鍵詞)	BERT	en_US
dc.subject (關鍵詞)	GPT-2	en_US
dc.subject (關鍵詞)	Chinese article summarization	en_US
dc.subject (關鍵詞)	Extractive summarization	en_US
dc.subject (關鍵詞)	Abstractive summarization	en_US
dc.subject (關鍵詞)	Deep learning	en_US
dc.title (題名)	Transformer 應用於中文文章摘要	zh_TW
dc.title (題名)	Using Transformer for Chinese article summarization	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	[1] Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization. arXiv preprint arXiv:1607.06450, 2016. [2] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014. [3] Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8):1798–1828, 2013. [4] Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. Enriching word vectors with subword information, 2016. [5] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020. [6] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre- training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. [7] Kunihiko Fukushima. Neural network model for a mechanism of pattern recognition unaffected by shift in position-neocognitron. IEICE Technical Report, A, 62(10):658–665, 1979. [8] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. [9] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. [10] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997. [11] Anil K Jain, Jianchang Mao, and K Moidin Mohiuddin. Artificial neural networks: A tutorial. Computer, 29(3):31–44, 1996. [12] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):436–444, 2015. [13] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. [14] Moshe Leshno, Vladimir Ya Lin, Allan Pinkus, and Shimon Schocken. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural networks, 6(6):861–867, 1993. [15] Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461, 2019. [16] YangLiuandMirellaLapata.Textsummarizationwithpretrainedencoders.arXivpreprint arXiv:1908.08345, 2019. [17] Rada Mihalcea and Paul Tarau. Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing, pages 404–411, 2004. [18] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013. [19] Vinod Nair and Geoffrey E Hinton. Rectified linear units improve restricted boltzmann machines. In Icml, 2010. [20] LawrencePage,SergeyBrin,RajeevMotwani,andTerryWinograd.Thepagerankcitation ranking: Bringing order to the web. Technical report, Stanford InfoLab, 1999. [21] Jeffrey Pennington, Richard Socher, and Christopher D Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014. [22] MatthewE.Peters,MarkNeumann,MohitIyyer,MattGardner,ChristopherClark,Kenton Lee, and Luke Zettlemoyer. Deep contextualized word representations, 2018. [23] Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. Improving language understanding by generative pre-training. 2018. [24] AlecRadford,JeffreyWu,RewonChild,DavidLuan,DarioAmodei,IlyaSutskever,etal. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019. [25] Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J Liu, et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140):1–67, 2020. [26] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning representations by back-propagating errors. nature, 323(6088):533–536, 1986. [27] Jürgen Schmidhuber. Deep learning in neural networks: An overview. Neural Networks, 61:85–117, Jan 2015. [28] David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016. [29] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017. [30] Ronald J Williams and David Zipser. A learning algorithm for continually running fully recurrent neural networks. Neural computation, 1(2):270–280, 1989. [31] W Yonghui, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144, 2016.	zh_TW
dc.identifier.doi (DOI)	10.6814/NCCU202200797	en_US

學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

Google Scholar^TM