Please use this identifier to cite or link to this item: https://ah.lib.nccu.edu.tw/handle/140.119/136970
題名: 標註資料擴展應用於零樣本意向偵測的生成式模型
Generative Model with Label-Augmentation for Zero-Shot Intent Detection
作者: 李鈺祥
Li, Yu-Siang
貢獻者: 黃瀚萱
Huang, Hen-Hsen
李鈺祥
Li, Yu-Siang
關鍵詞: 意向偵測
零樣本學習
少樣本學習
資料擴展
生成式模型
Intent Detection
Zero-Shot Learning
Few-Shot Learning
Data Augmentation
Generative Model
日期: 2021
上傳時間: 2-Sep-2021
摘要: 意向偵測是任務導向對話系統的子任務之一,針對使用者的輸入字句辨識其真正的意向為何,如此才能幫助後續的應用。現今多數已上線商用的意向偵測,皆是在特定的產業、領域。然而,更多的實際應用會面臨不同領域、資料的稀缺等問題。\n本篇論文針對跨領域、零樣本的情境進行探討,提出了具有新穎向生成式推理模型,並且進一步結合標註資料擴展,提升效能。在訓練階段,標註資料擴展包含了來回翻譯擴展、同義詞擴展,以及兩者的結合,使主任務模型進行意向偵測之結果可具多樣性。在跨領域零樣本推理的階段,集成模型匯聚基於BERT預訓練模型的句子向量餘弦相似度、基於T5預訓練模型的STSB任務字句相似度、基於詞彙的句子相似度等資訊,使來源領域的已預測意向類別,能被精準推理至目標領域的意向類別。\n本篇論文的實驗素材包含SNIPS、ATIS、MultiWOZ、SGD等常見於意向偵測的資料集。實驗設定則是利用每一個資料集當作來源領域,其他皆為目標領域,同時結合不同的推理與擴展方法組合進行實驗。實驗結果顯示,本研究所提出之方法能有效提升零樣本跨領域學習的效能。對於本研究的挑戰性因素,亦作了詳細的分析與討論。
Intent detection is one of sub-tasks in task-oriented dialogue system, aiming at understanding how users think and want from their input queries, so that it can help them for downstream applications. Nowadays, more and more task-oriented dialogue systems using intent detection in in commercial and industrial fields. However, many practical problems remain to deal with.\nThis work focuses on zero-shot cross-domain intent detection, where training instances in the target domain are unavailable, and the intent set of the target domain may greatly differ from that of the source domain. For this challenging scenario, we propose a novel approach that performs the inference as a generation task with label augmentation for performance enhancement.\nIn the training stage, the main-task model learns from augmented label data, enabling the model to do prediction for diverse intent classes. We explore back-translation augmentation, synonym-set augmentation, and their combination in label augmentation. In the inference stage, our generative inference methodology facilitates zero-shot predictions for the new target domain. Our model is an ensemble of the cosine similarity based on BERT, the sentence similarity based on the built-in task, and the STSB measurement based on T5. The weighted combination is intensively explored.\nOur approach is evaluated on four famous datasets for intent detection, including SNIPS, ATIS, MultiWOZ, and SGD. We explore all combinations of the source and the target domains from these four datasets. Experimental results show the effectiveness of our approach. The challenging issues for zero-shot cross-domain intent detection are further analyzed and discussed.
參考文獻: 1. Natural Language Processing, Retrieved Nov. 19 2020, from: https://zh.wikipedia.org/wiki/%E8%87%AA%E7%84%B6%E8%AF%AD%E8%A8%80%E5%A4%84%E7%90%86\n2. Ter Meulen, A. (2017). Logic and natural language. The Blackwell guide to philosophical logic, 461-483.\n3. Casanueva, I., Temčinas, T., Gerz, D., Henderson, M., & Vulić, I. (2020). Efficient intent detection with dual sentence encoders. arXiv preprint arXiv:2003.04807.\n4. Semantic Parsing, Retrieved May. 2 2021, from https://en.wikipedia.org/wiki/Semantic_parsing\n5. Jia, R., & Liang, P. (2016). Data recombination for neural semantic parsing. arXiv preprint arXiv:1606.03622.\n6. Andreas, J., Vlachos, A., & Clark, S. (2013, August). Semantic parsing as machine translation. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 47-52).\n7. Berant, J., Chou, A., Frostig, R., & Liang, P. (2013, October). Semantic parsing on freebase from question-answer pairs. In Proceedings of the 2013 conference on empirical methods in natural language processing (pp. 1533-1544).\n8. Poon, H., & Domingos, P. (2010, July). Unsupervised ontology induction from text. In Proceedings of the 48th annual meeting of the Association for Computational Linguistics (pp. 296-305).\n9. Kaliszyk, C., Urban, J., & Vyskočil, J. (2017, September). Automating formalization by statistical and semantic parsing of mathematics. In International Conference on Interactive Theorem Proving (pp. 12-27). Springer, Cham.\n10. Rabinovich, M., Stern, M., & Klein, D. (2017). Abstract syntax networks for code generation and semantic parsing. arXiv preprint arXiv:1704.07535.\n11. Yin, P., & Neubig, G. (2017). A syntactic neural model for general-purpose code generation. arXiv preprint arXiv:1704.01696.\n12. Kumar, A., Gupta, A., Chan, J., Tucker, S., Hoffmeister, B., Dreyer, M., ... & Kumar, A. (2017). Just ASK: building an architecture for extensible self-service spoken language understanding. arXiv preprint arXiv:1711.00549.\n13. Bapna, A., Tur, G., Hakkani-Tur, D., & Heck, L. (2017). Towards zero-shot frame semantic parsing for domain scaling. arXiv preprint arXiv:1707.02363.\n14. Liu, B., & Lane, I. (2016). Attention-based recurrent neural network models for joint intent detection and slot filling. arXiv preprint arXiv:1609.01454.\n15. Liang, P., & Potts, C. (2015). Bringing machine learning and compositional semantics together. Annu. Rev. Linguist., 1(1), 355-376.\n16. Redko, I., Morvant, E., Habrard, A., Sebban, M., & Bennani, Y. (2019). Advances in domain adaptation theory. Elsevier.\n17. Bridle, J., & Cox, S. J. (1990). Recnorm: Simultaneous normalisation and classification applied to speech recognition. Advances in Neural Information Processing Systems, 3, 234-240.\n18. Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., & Vaughan, J. W. (2010). A theory of learning from different domains. Machine learning, 79(1), 151-175.\n19. Sun, B., Feng, J., & Saenko, K. (2016, March). Return of frustratingly easy domain adaptation. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 30, No. 1).\n20. Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., & Mané, D. (2016). Concrete problems in AI safety. arXiv preprint arXiv:1606.06565.\n21. Wang, Y., Yao, Q., Kwok, J. T., & Ni, L. M. (2020). Generalizing from a few examples: A survey on few-shot learning. ACM Computing Surveys (CSUR), 53(3), 1-34.\n22. Zero-Shot Learning, Retrieved Nov. 22 2020, from https://en.wikipedia.org/wiki/Zero-shot_learning\n23. Xian, Y., Schiele, B., & Akata, Z. (2017). Zero-shot learning-the good, the bad and the ugly. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4582-4591).\n24. Lampert, C. H., Nickisch, H., & Harmeling, S. (2009, June). Learning to detect unseen object classes by between-class attribute transfer. In 2009 IEEE Conference on Computer Vision and Pattern Recognition (pp. 951-958). IEEE.\n25. Romera-Paredes, B., & Torr, P. (2015, June). An embarrassingly simple approach to zero-shot learning. In International conference on machine learning (pp. 2152-2161). PMLR.\n26. Atzmon, Y., & Chechik, G. (2018). Probabilistic and-or attribute grouping for zero-shot learning. arXiv preprint arXiv:1806.02664.\n27. Hu, R. L., Xiong, C., & Socher, R. Zero-Shot Image Classification Guided by Natural Language Descriptions of Classes: A Meta-Learning Approach.\n28. Srivastava, S., Labutov, I., & Mitchell, T. (2018, July). Zero-shot learning of classifiers from natural language quantification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 306-316).\n29. Frome, A., Corrado, G., Shlens, J., Bengio, S., Dean, J., Ranzato, M. A., & Mikolov, T. (2013). Devise: A deep visual-semantic embedding model.\n30. Data Augmentation, Retrieved May. 2 2021, from https://en.wikipedia.org/wiki/Data_augmentation\n31. Shorten, C., & Khoshgoftaar, T. M. (2019). A survey on image data augmentation for deep learning. Journal of Big Data, 6(1), 1-48.\n32. Bird, J. J., Faria, D. R., Premebida, C., Ekárt, A., & Ayrosa, P. P. (2020, April). Overcoming data scarcity in speaker identification: Dataset augmentation with synthetic mfccs via character-level rnn. In 2020 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC) (pp. 146-151). IEEE.\n33. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. arXiv preprint arXiv:1706.03762.\n34. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.\n35. Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing. Retrieved Nov. 22 2020 from https://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html\n36. Better Language Models and Their Implications. Retrieved Nov. 22 2020 from https://openai.com/blog/better-language-models/\n37. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. arXiv preprint arXiv:1310.4546.\n38. Pennington, J., Socher, R., & Manning, C. D. (2014, October). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532-1543).\n39. Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2019). Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683.\n40. Daumé III, H. (2009). Frustratingly easy domain adaptation. arXiv preprint arXiv:0907.1815.\n41. Kim, Y. B., Stratos, K., Sarikaya, R., & Jeong, M. (2015, July). New transfer learning techniques for disparate label sets. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 473-482).\n42. Obeidat, R., Fern, X. Z., & Tadepalli, P. (2016, August). Label Embedding Approach for Transfer Learning. In ICBO/BioCreative.\n43. Wang, Z., Qu, Y., Chen, L., Shen, J., Zhang, W., Zhang, S., ... & Yu, Y. (2018). Label-aware double transfer learning for cross-specialty medical named entity recognition. arXiv preprint arXiv:1804.09021.\n44. Lee, J. Y., Dernoncourt, F., & Szolovits, P. (2017). Transfer learning for named-entity recognition with neural networks. arXiv preprint arXiv:1705.06273.\n45. Sachan, D. S., Xie, P., Sachan, M., & Xing, E. P. (2018, November). Effective use of bidirectional language modeling for transfer learning in biomedical named entity recognition. In Machine learning for healthcare conference (pp. 383-402). PMLR.\n46. Yang, Z., Salakhutdinov, R., & Cohen, W. W. (2017). Transfer learning for sequence tagging with hierarchical recurrent networks. arXiv preprint arXiv:1703.06345.\n47. Lin, B. Y., & Lu, W. (2018). Neural adaptation layers for cross-domain named entity recognition. arXiv preprint arXiv:1810.06368.\n48. Jia, C., Liang, X., & Zhang, Y. (2019, July). Cross-domain NER using cross-domain language modeling. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 2464-2474).\n49. Liu, Z., Winata, G. I., & Fung, P. (2020). Zero-resource cross-domain named entity recognition. arXiv preprint arXiv:2002.05923.\n50. Shah, D. J., Gupta, R., Fayazi, A. A., & Hakkani-Tur, D. (2019). Robust zero-shot cross-domain slot filling with example values. arXiv preprint arXiv:1906.06870.\n51. Liu, Z., Winata, G. I., Xu, P., & Fung, P. (2020). Coach: A coarse-to-fine approach for cross-domain slot filling. arXiv preprint arXiv:2004.11727.\n52. Siddique, A. B., Jamour, F., & Hristidis, V. (2021). Linguistically-enriched and context-aware zero-shot slot filling. arXiv preprint arXiv:2101.06514.\n53. Yazdani, M., & Henderson, J. (2015, September). A model of zero-shot learning of spoken language understanding. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (pp. 244-249).\n54. Ferreira, E., Jabaian, B., & Lefevre, F. (2015, April). Online adaptative zero-shot learning spoken language understanding using word-embedding. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5321-5325). IEEE.\n55. Dauphin, Y. N., Tur, G., Hakkani-Tur, D., & Heck, L. (2013). Zero-shot learning for semantic utterance classification. arXiv preprint arXiv:1401.0509.\n56. Kumar, A., Muddireddy, P. R., Dreyer, M., & Hoffmeister, B. (2017, August). Zero-Shot Learning Across Heterogeneous Overlapping Domains. In INTERSPEECH (pp. 2914-2918).\n57. Williams, K. (2019). Zero Shot Intent Classification Using Long-Short Term Memory Networks. In INTERSPEECH (pp. 844-848).\n58. Gangal, V., Arora, A., Einolghozati, A., & Gupta, S. (2020, April). Likelihood Ratios and Generative Classifiers for Unsupervised Out-of-Domain Detection in Task Oriented Dialog. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 05, pp. 7764-7771).\n59. Breunig, M. M., Kriegel, H. P., Ng, R. T., & Sander, J. (2000, May). LOF: identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data (pp. 93-104).\n60. Edunov, S., Ott, M., Auli, M., & Grangier, D. (2018). Understanding back-translation at scale. arXiv preprint arXiv:1808.09381.\n61. WordNet, Retrieved May. 5 2021, from https://wordnet.princeton.edu/\n62. Wei, J., & Zou, K. (2019). Eda: Easy data augmentation techniques for boosting performance on text classification tasks. arXiv preprint arXiv:1901.11196.\n63. Cosine Similarity, Retrieved May. 5 2021, from https://en.wikipedia.org/wiki/Cosine_similarity\n64. Gestalt Pattern Matching, Retrieved June. 5 2021, from https://en.wikipedia.org/wiki/Gestalt_Pattern_Matching\n65. Coucke, A., Saade, A., Ball, A., Bluche, T., Caulier, A., Leroy, D., ... & Dureau, J. (2018). Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces. arXiv preprint arXiv:1805.10190.\n66. Price, P. (1990). Evaluation of spoken language systems: The ATIS domain. In Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24-27, 1990.\n67. Zang, X., Rastogi, A., Sunkara, S., Gupta, R., Zhang, J., & Chen, J. (2020). Multiwoz 2.2: A dialogue dataset with additional annotation corrections and state tracking baselines. arXiv preprint arXiv:2007.12720.\n68. Rastogi, A., Zang, X., Sunkara, S., Gupta, R., & Khaitan, P. (2020, April). Towards scalable multi-domain conversational agents: The schema-guided dialogue dataset. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 05, pp. 8689-8696).\n69. Scikit-Learn Metrics F1 Score, Retrieved May. 5 2021, from: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html\n70. McNemar`s test, Retrieved May. 5 2021, from: https://en.wikipedia.org/wiki/McNemar`s_test\n71. Hou, Y., Mao, J., Lai, Y., Chen, C., Che, W., Chen, Z., & Liu, T. (2020). Fewjoint: A few-shot learning benchmark for joint language understanding. arXiv preprint arXiv:2009.08138.\n72. Zhu, S., Zhao, Z., Zhao, T., Zong, C., & Yu, K. (2019, October). Catslu: The 1st chinese audio-textual spoken language understanding challenge. In 2019 International Conference on Multimodal Interaction (pp. 521-525).\n73. Chu-Ren Huang and Shu-Kai Hsieh. (2010). Infrastructure for Cross-lingual Knowledge Representation ─ Towards Multilingualism in Linguistic Studies. Taiwan NSC-granted Research Project (NSC 96-2411-H-003-061-MY3)
描述: 碩士
國立政治大學
資訊科學系
108753206
資料來源: http://thesis.lib.nccu.edu.tw/record/#G0108753206
資料類型: thesis
Appears in Collections:學位論文

Files in This Item:
File Description SizeFormat
320601.pdf8.15 MBAdobe PDF2View/Open
Show full item record

Google ScholarTM

Check

Altmetric

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.