Please use this identifier to cite or link to this item: https://ah.lib.nccu.edu.tw/handle/140.119/136971
題名: 基於師生方法學習多層次注意力的跨領域轉移學習
A Teacher-Student Approach to Cross-domain Transfer Learning with Multi-level Attention
作者: 唐英哲
Tang, Ying-Jhe
貢獻者: 黃瀚萱
Huang, Hen-Hsen
唐英哲
Tang, Ying-Jhe
關鍵詞: 自然語言學習
跨領域轉移問題
多任務學習
注意力機制
Natural language processing
Domain adaptation
Multi task learning
Attention mechanism
日期: 2021
上傳時間: 2-Sep-2021
摘要: 本研究應用於跨領域轉移問題上。跨領域轉移問題希望能解決在一個領域資料利用機器學習訓練模型,並將此訓練後的模型應用於其他不同領域的資料。跨領域問題的困難處在於源領域以及目標領域之間的差異,如 "快" 這個形容詞在跑車產品是好的形容詞,但在電池產品卻是不好的形容詞。在機器學習的問題中,利用已標記資料訓練模型已能達到非常好的效能,但更多情況是沒有足夠的已標記資料訓練模型。基於上述原因,本研究希望可以建立一個既可以解決跨領域轉移問題,又可以解決已標記資料量少的模型。\n模型架構可以分為三個部分的多任務學習,分別為監督式學習、師生跨領域轉移注意力模型以及相關度偵測任務。監督式學習使用資料及標籤輸入模型進行學習。師生跨領域轉移模型由教師模型提供學生模型訓練的偽標記資料,學生模型藉由資料層級注意力和領域層級注意力的幫助,為學生模型篩選出適合訓練的偽標記資料。相關度偵測任務用來偵測句子與描述主體之間的關係。\n本研究應用於產品意見的情緒立場判斷以及藝人與核能的網路輿情立場判斷問題,實驗結果顯示使用本研究的方法能夠在上述的情緒及輿情立場的分類任務都能達到最好的效能。
The lack of training data forms a challenging issue for applying NLP models in a new domain. Previous work on cross-domain transfer learning aims to exploit the information from the source domains to do prediction for the target domain. To reduce the noises from the out-of-domain data and improve the model`s generalization ability, this work proposes a novel teacher-student approach with multi-task learning that transfers the information from source domains to the target domain with sophisicated weights determined by using the attention mechanism at both instance level and domain level. The generalization ability is further enhanced by unsupervised data augmentation. We also introduce a subject detection task for co-training the main model. Our approach is evaluated not only on the widely-adopted English dataset, Amazon product reviews, but also on Chinese datasets including product reviews, artist reviews, and public opinions of nuclear power. Experimental results show that our approach outperforms state-of-the-art models.
參考文獻: [1] Anthony Aue and Michael Gamon. “Customizing Sentiment Classifiers to New Domains: A Case Study”. In: Jan. 2005.\n[2] John Blitzer, Mark Dredze, and Fernando Pereira. “Biographies, Bollywood, Boom­ boxes and Blenders: Domain Adaptation for Sentiment Classification”. In: Pro­ ceedings of the 45th Annual Meeting of the Association of Computational Lin­ guistics. Prague, Czech Republic: Association for Computational Linguistics, June 2007, pp. 440–447. URL: https://www.aclweb.org/anthology/P07-1056.\n[3] John Blitzer, Ryan McDonald, and Fernando Pereira. “Domain Adaptation with Structural Correspondence Learning”. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. Sydney, Australia: Associa­ tion for Computational Linguistics, July 2006, pp. 120–128. URL: https://www. aclweb.org/anthology/W06-1615.\n[4] Danushka Bollegala, David Weir, and John Carroll.“Cross­Domain Sentiment Classification Using a Sentiment Sensitive Thesaurus”. In: IEEE Transactions on Knowledge and Data Engineering 25.8 (2013), pp. 1719–1731. DOI:10.1109/TKDE.2012.103.\n[5] Danushka Bollegala, David Weir, and John Carroll. “Using Multiple Sources to Construct a Sentiment Sensitive Thesaurus for Cross­Domain Sentiment Classifi­ cation”. In: Proceedings of the 49th Annual Meeting of the Association for Com­ putational Linguistics: Human Language Technologies. Portland, Oregon, USA: Association for Computational Linguistics, June 2011, pp. 132–141. URL: https:\n//www.aclweb.org/anthology/P11-1014.\n[6] Minmin Chen et al. “Marginalized Denoising Autoencoders for Domain Adapta­ tion”. In: CoRR abs/1206.4683 (2012). arXiv: 1206.4683. URL: http://arxiv.org/ abs/1206.4683.\n[7] Junyoung Chung et al. “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling”. In: CoRR abs/1412.3555 (2014). arXiv: 1412.3555. URL: http://arxiv.org/abs/1412.3555.\n[8] Xia Cui and Danushka Bollegala. Multi­source Attention for Unsupervised Domain Adaptation. 2020. arXiv: 2004.06608 [cs.CL].\n[9] Xia Cui and Danushka Bollegala. “Self­Adaptation for Unsupervised Domain Adap­ tation”. In: RANLP. 2019.\n[10] Yong Dai et al. Adversarial Training Based Multi­Source Unsupervised Domain Adaptation for Sentiment Analysis. 2020. arXiv: 2006.05602 [cs.CL].\n[11] Jacob Devlin et al. “BERT: Pre­training of Deep Bidirectional Transformers for Language Understanding”. In: CoRR abs/1810.04805 (2018). arXiv: 1810.04805. URL: http://arxiv.org/abs/1810.04805.\n[12] X. Ding et al. “Learning Multi­Domain Adversarial Neural Networks for Text Clas­ sification”. In: IEEE Access 7 (2019), pp. 40323–40332. DOI: 10.1109/ACCESS. 2019.2904858.\n[13] Hady Elsahar and Matthias Gallé. “To Annotate or Not? Predicting Performance Drop under Domain Shift”. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Con­\nference on Natural Language Processing (EMNLP­IJCNLP). Hong Kong, China: Association for Computational Linguistics, Nov. 2019, pp. 2163–2173. DOI: 10. 18653/v1/D19-1222. URL: https://www.aclweb.org/anthology/D19-1222.\n[14] Yaroslav Ganin et al. “Domain­Adversarial Training of Neural Networks”. In: Jour­ nal of Machine Learning Research 17.59 (2016), pp. 1–35. URL: http://jmlr.org/ papers/v17/15-239.html.\n[15] Deepanway Ghosal et al. “KinGDOM: Knowledge­Guided DOMain Adaptation for Sentiment Analysis”. In: Proceedings of the 58th Annual Meeting of the Asso­\nciation for Computational Linguistics. Online: Association for Computational Lin­ guistics, July 2020, pp. 3198–3210. DOI: 10.18653/v1/2020.acl-main.292. URL: https://www.aclweb.org/anthology/2020.acl-main.292.\n[16] Arthur Gretton et al. “A Kernel Two­Sample Test”. In: Journal of Machine Learn­ ing Research 13.25 (2012), pp. 723–773. URL: http:// jmlr. org/ papers/ v13 / gretton12a.html.\n[17] Han Guo, Ramakanth Pasunuru, and Mohit Bansal. Multi­Source Domain Adap­ tation for Text Classification via DistanceNet­Bandits. 2020. arXiv: 2001 . 04362 [cs.CL].\n[18] Jiang Guo, Darsh J. Shah, and Regina Barzilay. “Multi­Source Domain Adaptation with Mixture of Experts”. In: CoRR abs/1809.02256 (2018). arXiv: 1809 . 02256. URL: http://arxiv.org/abs/1809.02256.\n[19] Sepp Hochreiter and Jürgen Schmidhuber. “Long Short­Term Memory”. In: Neural Computation 9.8 (1997), pp. 1735–1780. DOI: 10 . 1162 / neco . 1997 . 9 . 8 . 1735. eprint: https://doi.org/10.1162/neco.1997.9.8.1735. URL: https://doi.org/10.\n1162/neco.1997.9.8.1735.\n[20] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. “Deep Learning”. In: Nature 521.7553 (2015), pp. 436–444. DOI: 10.1038/nature14539. URL: https://doi. org/10.1038/nature14539.\n[21] Zheng Li et al. “End­to­End Adversarial Memory Network for Cross­domain Senti­ ment Classification”. In: Proceedings of the Twenty­Sixth International Joint Con­\nference on Artificial Intelligence, IJCAI­17. 2017, pp. 2237–2243. DOI: 10.24963/\nijcai.2017/311. URL: https://doi.org/10.24963/ijcai.2017/311.\n[22] Zheng Li et al. “Hierarchical Attention Transfer Network for Cross­domain Senti­ ment Classification”. In: Jan. 2018.\n[23] https://en.wikipedia.org/wiki/Jensen%E2%80%93Shannon_divergence\n[24] Sinno Jialin Pan et al. “Cross­Domain Sentiment Classification via Spectral Fea­ ture Alignment”. In: Proceedings of the 19th International Conference on World\nWide Web. WWW ’10. Raleigh, North Carolina, USA: Association for Computing Machinery, 2010, pp. 751–760. ISBN: 9781605587998. DOI: 10.1145/1772690.\n1772767. URL: https://doi.org/10.1145/1772690.1772767.\n[25] Alan Ramponi and Barbara Plank. Neural Unsupervised Domain Adaptation in NLP—A Survey. 2020. arXiv: 2006.00632 [cs.CL].\n[26] Sebastian Ruder. “An Overview of Multi­Task Learning in Deep Neural Networks”. In: CoRR abs/1706.05098 (2017). arXiv: 1706.05098. URL: http://arxiv.org/ abs/1706.05098.\n[27] Sebastian Ruder and Barbara Plank. “Strong Baselines for Neural Semi­supervised Learning under Domain Shift”. In: CoRR abs/1804.09530 (2018). arXiv: 1804 . 09530. URL: http://arxiv.org/abs/1804.09530.\n[28] Sainbayar Sukhbaatar et al. “Weakly Supervised Memory Networks”. In: CoRR abs/1503.08895 (2015). arXiv: 1503.08895. URL: http://arxiv.org/abs/1503. 08895.\n\n[29] Pascal Vincent et al. “Extracting and Composing Robust Features with Denoising Autoencoders”. In: Proceedings of the 25th International Conference on Machine\nLearning. ICML ’08. Helsinki, Finland: Association for Computing Machinery, 2008, pp. 1096–1103. ISBN: 9781605582054. DOI: 10.1145/1390156.1390294.\nURL: https://doi.org/10.1145/1390156.1390294.\n[30] Jason W. Wei and Kai Zou. “EDA: Easy Data Augmentation Techniques for Boost­ ing Performance on Text Classification Tasks”. In: CoRR abs/1901.11196 (2019). arXiv: 1901.11196. URL: http://arxiv.org/abs/1901.11196.\n[31] Fangzhao Wu and Yongfeng Huang. “Sentiment Domain Adaptation with Multiple Sources”. In: Proceedings of the 54th Annual Meeting of the Association for Com­ putational Linguistics (Volume 1: Long Papers). Berlin, Germany: Association for\nComputational Linguistics, Aug. 2016, pp. 301–310. DOI: 10 . 18653 / v1 / P16 -\n1029. URL: https://www.aclweb.org/anthology/P16-1029.\n[32] Qizhe Xie et al. “Unsupervised Data Augmentation”. In: CoRR abs/1904.12848 (2019). arXiv: 1904.12848. URL: http://arxiv.org/abs/1904.12848.\n[33] Shujuan Yu et al. “Hierarchical Data Augmentation and the Application in Text Classification”. In: IEEE Access PP (Dec. 2019), pp. 1–1. DOI: 10.1109/ACCESS. 2019.2960263.\n[34] Han Zhao et al. “Adversarial Multiple Source Domain Adaptation”. In: Advances in Neural Information Processing Systems. Ed. by S. Bengio et al. Vol. 31. Curran\nAssociates, Inc., 2018, pp. 8559–8570. URL: https://proceedings.neurips.cc/\npaper/2018/file/717d8b3d60d9eea997b35b02b6a4e867-Paper.pdf.
描述: 碩士
國立政治大學
資訊科學系
108753207
資料來源: http://thesis.lib.nccu.edu.tw/record/#G0108753207
資料類型: thesis
Appears in Collections:學位論文

Files in This Item:
File Description SizeFormat
320701.pdf1.56 MBAdobe PDF2View/Open
Show full item record

Google ScholarTM

Check

Altmetric

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.