Publications-Theses
Article View/Open
Publication Export
-
Google ScholarTM
NCCU Library
Citation Infomation
Related Publications in TAIR
題名 語言模型潛在空間中的語義概念注入之探討
Semantic Concept Injection in the Latent Space of Language Models作者 羅永富
Lo, Yung-Fu貢獻者 蕭舜文
Hsiao, Shun-Wen
羅永富
Lo, Yung-Fu關鍵詞 自然語言處理
潛在空間
WordNet
低秩適應(LoRA)
語義概念注入
NLP
Latent space
WordNet
Low-Rank Adaptation (LoRA)
Semantic concept injection日期 2025 上傳時間 1-Sep-2025 15:04:27 (UTC+8) 摘要 近年來,以Transformer為基礎的語言模型在自然語言處理領域取得了革命性的進展,這些模型能夠生成高維度的嵌入向量,捕捉複雜的語義關聯。然而,這些嵌入向量往往與直觀的人類概念結構缺乏對齊。本研究提出一個原則性架構,藉由結合WordNet等外部結構化知識,實現在語言模型潛在空間中的語義概念注入。我們運用低秩適應(LoRA)進行參數高效的微調,於模型中注入本體論語義約束,使嵌入向量的餘弦相似度對齊WordNet中階層性的Wu-Palmer相似度量。我們探討了兩種注入策略:用於複合語義概念的統一式LoRA(ULCSC)及用於單一語義概念的解耦式LoRA(DLISC)。在多項與語義相關的下游任務(如概念分類、多標籤概念分類及零樣本概念分類)中的實驗評估顯示,所提出的方法均有顯著的效能提升。在一個包含13個類別的概念分類任務中,我們的DLISC方法達到完美評分(準確率:1.0000,F1 Macro:1.0000,F1 Weighted:1.0000),大幅優於基線BERT模型(準確率:0.7778,F1 Macro:0.5051,F1 Weighted:0.7390)。此外,在多標籤概念分類及零樣本概念分類場景中,本方法始終優於基線模型,突顯其在多義性處理與語義增強上的有效性。結果顯示,本方法在提升潛在空間的可解釋性與語義含意的同時,亦維持了參數效率,驗證了利用外部知識庫實施語義概念注入以優化自然語言處理應用的可行性與有效性。
Recent advances in transformer-based language models have revolutionized natural language processing by generating high-dimensional embeddings that capture complex semantic relationships. However, these embeddings often lack alignment with intuitive human conceptual structures. This study proposes a principled framework for semantic concept injection in the latent space of language models by leveraging external structured knowledge from WordNet. We employ parameter-efficient fine-tuning using Low-Rank Adaptation (LoRA) to inject ontological semantic constraints, enforcing the cosine similarity of embeddings to align with WordNet’s hierarchical Wu-Palmer similarity metric. Two injection strategies are explored: Unified LoRA for Composite Semantic Concepts (ULCSC) and Disentangled LoRA for Individual Semantic Concepts (DLISC). Experimental evaluations across several semantically related downstream tasks, such as concept classification, multi-label concept classification, and zero-shot concept classification, show substantial performance improvements. In a 13-class concept classification task, our DLISC method achieves perfect scores (Accuracy: 1.0000, F1 Macro: 1.0000, F1 Weighted: 1.0000), outperforming the baseline BERT model (Accuracy: 0.7778, F1 Macro: 0.5051, F1 Weighted: 0.7390). Additionally, in multi-label concept and zero-shot concept classification scenarios, our methods consistently outperform baseline models, emphasizing its effectiveness in handling polysemy and enhancing semantic meanings. Results show enhanced interpretability and semantic meanings in the latent space while maintaining parameter efficiency, validating the effectiveness of semantic concept injection using external knowledge bases for improved NLP applications.參考文獻 Abeysiriwardana, M., & Sumanathilaka, D. (2024). A survey on lexical ambiguity detection and word sense disambiguation. In Proceedings of the 20th IEEE International Colloquium on Signal Processing and its Applications (CSPA) (pp. 1–6). Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information (L. Lee, M. Johnson, & K. Toutanova, Eds.). Transactions of the Association for Computational Linguistics, 5, 135–146. https://doi.org/10.1162/tacl_a_00051 Bystrov, D. (2024). Information retrieval multi-agent system established on the metaphysics lexical database. In Information Systems and Technological Advances for Sustainable Development (pp. 1–6). Springer. https://doi.org/10.1007/978-3-031-75329-9_1 Chandrasekaran, D., & Mago, V. (2021). Evolution of semantic similarity—a survey. ACM Computing Surveys, 54(2). https://doi.org/10.1145/3440755 Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In J. Burstein, C. Doran, & T. Solorio (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 4171–4186). Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1423 Guarino, N., Oberle, D., & Staab, S. (2009). What is an ontology? In S. Staab & R. Studer (Eds.), Handbook on Ontologies (pp. 1–17). Springer. https://doi.org/10.1007/978-3-540-92673-3_0 Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., De Laroussilhe, Q., Gesmundo, A., Attariyan, M., & Gelly, S. (2019). Parameter-efficient transfer learning for NLP. In K. Chaudhuri & R. Salakhutdinov (Eds.), Proceedings of the 36th International Conference on Machine Learning (pp. 2790–2799, Vol. 97). PMLR. https://proceedings.mlr.press/v97/houlsby19a.html Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2022). LoRA: Low-rank adaptation of large language models. In Proceedings of the 10th International Conference on Learning Representations (ICLR 2022). OpenReview. https://openreview.net/forum?id=nZeVKeeFYf9 Lester, B., Al-Rfou, R., & Constant, N. (2021). The power of scale for parameter-efficient prompt tuning. In M.-F. Moens, X. Huang, L. Specia, & S. W.-t. Yih (Eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 3045–3059). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.emnlp-main.243 Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., & Zettlemoyer, L. (2020). BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In D. Jurafsky, J. Chai, N. Schluter, & J. Tetreault (Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 7871–7880). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.703 Li, X. L., & Liang, P. (2021). Prefix-tuning: Optimizing continuous prompts for generation. In C. Zong, F. Xia, W. Li, & R. Navigli (Eds.), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 4582–4597). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.acl-long.353 Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2020). RoBERTa: A robustly optimized BERT pretraining approach. In Proceedings of the 8th International Conference on Learning Representations (ICLR 2020). OpenReview. https://openreview.net/forum?id=SyxS0T4tvS Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. Miller, G. A. (1995). WordNet: A lexical database for english. Communications of the ACM, 38(11), 39–41. https://doi.org/10.1145/219717.219748 Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. J. (1990). Introduction to WordNet: An on-line lexical database*. International Journal of Lexicography, 3(4), 235–244. https://doi.org/10.1093/ijl/3.4.235 Munroe, R. (2010, May). Color name survey results. https://blog.xkcd.com/2010/05/03/color-survey-results/ Navigli, R., & Ponzetto, S. P. (2012). BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 193, 217–250. https://doi.org/10.1016/j.artint.2012.07.001 Pennington, J., Socher, R., & Manning, C. (2014). GloVe: Global vectors for word representation. In A. Moschitti, B. Pang, & W. Daelemans (Eds.), Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1532–1543). Association for Computational Linguistics. https://doi.org/10.3115/v1/D14-1162 Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. In M. Walker, H. Ji, & A. Stent (Eds.), Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (pp. 2227–2237). Association for Computational Linguistics. https://doi.org/10.18653/v1/N18-1202 Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training (Technical report). OpenAI. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In K. Inui, J. Jiang, V. Ng, & X. Wan (Eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLPIJCNLP) (pp. 3982–3992). Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-1410 Sinha, K., Jia, R., Hupkes, D., Pineau, J., Williams, A., & Kiela, D. (2021). Masked language modeling and the distributional hypothesis: Order word matters pre-training for little. In M.-F. Moens, X. Huang, L. Specia, & S. W.-t. Yih (Eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 2888–2913). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.emnlp-main.230 Song, Z., Yan, B., Liu, Y., Fang, M., Li, M., Yan, R., & Chen, X. (2025). Injecting domainspecific knowledge into large language models: A comprehensive survey. arXiv preprint arXiv:2502.10708. Speer, R., Chin, J., & Havasi, C. (2017). ConceptNet 5.5: An open multilingual graph of general knowledge. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (pp. 4444–4451). AAAI Press. https://doi.org/10.1609/aaai.v31i1.11164 van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(86), 2579–2605. http://jmlr.org/papers/v9/vandermaaten08a.html Wu, Z., & Palmer, M. (1994). Verb semantics and lexical selection. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics (pp. 133–138). Association for Computational Linguistics. https://doi.org/10.3115/981732.981751 描述 碩士
國立政治大學
資訊管理學系
112356021資料來源 http://thesis.lib.nccu.edu.tw/record/#G0112356021 資料類型 thesis dc.contributor.advisor 蕭舜文 zh_TW dc.contributor.advisor Hsiao, Shun-Wen en_US dc.contributor.author (Authors) 羅永富 zh_TW dc.contributor.author (Authors) Lo, Yung-Fu en_US dc.creator (作者) 羅永富 zh_TW dc.creator (作者) Lo, Yung-Fu en_US dc.date (日期) 2025 en_US dc.date.accessioned 1-Sep-2025 15:04:27 (UTC+8) - dc.date.available 1-Sep-2025 15:04:27 (UTC+8) - dc.date.issued (上傳時間) 1-Sep-2025 15:04:27 (UTC+8) - dc.identifier (Other Identifiers) G0112356021 en_US dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/159092 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊管理學系 zh_TW dc.description (描述) 112356021 zh_TW dc.description.abstract (摘要) 近年來,以Transformer為基礎的語言模型在自然語言處理領域取得了革命性的進展,這些模型能夠生成高維度的嵌入向量,捕捉複雜的語義關聯。然而,這些嵌入向量往往與直觀的人類概念結構缺乏對齊。本研究提出一個原則性架構,藉由結合WordNet等外部結構化知識,實現在語言模型潛在空間中的語義概念注入。我們運用低秩適應(LoRA)進行參數高效的微調,於模型中注入本體論語義約束,使嵌入向量的餘弦相似度對齊WordNet中階層性的Wu-Palmer相似度量。我們探討了兩種注入策略:用於複合語義概念的統一式LoRA(ULCSC)及用於單一語義概念的解耦式LoRA(DLISC)。在多項與語義相關的下游任務(如概念分類、多標籤概念分類及零樣本概念分類)中的實驗評估顯示,所提出的方法均有顯著的效能提升。在一個包含13個類別的概念分類任務中,我們的DLISC方法達到完美評分(準確率:1.0000,F1 Macro:1.0000,F1 Weighted:1.0000),大幅優於基線BERT模型(準確率:0.7778,F1 Macro:0.5051,F1 Weighted:0.7390)。此外,在多標籤概念分類及零樣本概念分類場景中,本方法始終優於基線模型,突顯其在多義性處理與語義增強上的有效性。結果顯示,本方法在提升潛在空間的可解釋性與語義含意的同時,亦維持了參數效率,驗證了利用外部知識庫實施語義概念注入以優化自然語言處理應用的可行性與有效性。 zh_TW dc.description.abstract (摘要) Recent advances in transformer-based language models have revolutionized natural language processing by generating high-dimensional embeddings that capture complex semantic relationships. However, these embeddings often lack alignment with intuitive human conceptual structures. This study proposes a principled framework for semantic concept injection in the latent space of language models by leveraging external structured knowledge from WordNet. We employ parameter-efficient fine-tuning using Low-Rank Adaptation (LoRA) to inject ontological semantic constraints, enforcing the cosine similarity of embeddings to align with WordNet’s hierarchical Wu-Palmer similarity metric. Two injection strategies are explored: Unified LoRA for Composite Semantic Concepts (ULCSC) and Disentangled LoRA for Individual Semantic Concepts (DLISC). Experimental evaluations across several semantically related downstream tasks, such as concept classification, multi-label concept classification, and zero-shot concept classification, show substantial performance improvements. In a 13-class concept classification task, our DLISC method achieves perfect scores (Accuracy: 1.0000, F1 Macro: 1.0000, F1 Weighted: 1.0000), outperforming the baseline BERT model (Accuracy: 0.7778, F1 Macro: 0.5051, F1 Weighted: 0.7390). Additionally, in multi-label concept and zero-shot concept classification scenarios, our methods consistently outperform baseline models, emphasizing its effectiveness in handling polysemy and enhancing semantic meanings. Results show enhanced interpretability and semantic meanings in the latent space while maintaining parameter efficiency, validating the effectiveness of semantic concept injection using external knowledge bases for improved NLP applications. en_US dc.description.tableofcontents 摘要 i Abstract ii Contents iv List of Figures vi List of Tables viii 1 Introduction 1 2 Related Work 10 2.1 Representation Learning 10 2.1.1 Word2Vec 10 2.1.2 GloVe 11 2.1.3 FastText 11 2.1.4 ELMo 11 2.1.5 BERT 11 2.1.6 RoBERTa 12 2.1.7 GPT 12 2.2 Ontologies and Semantic Hierarchies 13 2.3 Knowledge Base Injection 15 2.4 Parameter-Efficient Fine-Tuning (PEFT) 17 3 Proposed Method 19 3.1 Overview 19 3.1.1 Concept-Based Word Pair Similarity Calculation 20 3.1.2 Semantic Concept Injection 20 3.1.3 Loss Function 20 3.2 Concept-Based Word Pair Similarity Calculation 21 3.2.1 Extracting Words Under a Specified Concept 21 3.2.2 Generating Word Pairs and Computing WUP Similarity 23 3.3 Semantic Concept Injection 25 3.3.1 Unified LoRA for Composite Semantic Concepts (ULCSC) 26 3.3.2 Disentangled LoRA for Individual Semantic Concepts (DLISC) 27 3.4 Loss Function 29 4 Experiments 31 4.1 Data Set 31 4.1.1 Datasets for Semantic Space Analysis 31 4.1.2 Datasets for Downstream Task Evaluation 32 4.2 Evaluation Metrics 34 4.2.1 Semantic Space Structure Evaluation 34 4.2.2 Downstream Task Evaluation 35 4.3 Semantic Space Analysis 37 4.4 Concept Classification 42 4.5 Multi-Label Concept Classification 44 4.6 Zero-Shot Concept Classification 45 5 Conclusion 47 References 49 A LoRA Hyperparameter Ablation Study 54 B Training Data Sampling 55 zh_TW dc.format.extent 3772596 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0112356021 en_US dc.subject (關鍵詞) 自然語言處理 zh_TW dc.subject (關鍵詞) 潛在空間 zh_TW dc.subject (關鍵詞) WordNet zh_TW dc.subject (關鍵詞) 低秩適應(LoRA) zh_TW dc.subject (關鍵詞) 語義概念注入 zh_TW dc.subject (關鍵詞) NLP en_US dc.subject (關鍵詞) Latent space en_US dc.subject (關鍵詞) WordNet en_US dc.subject (關鍵詞) Low-Rank Adaptation (LoRA) en_US dc.subject (關鍵詞) Semantic concept injection en_US dc.title (題名) 語言模型潛在空間中的語義概念注入之探討 zh_TW dc.title (題名) Semantic Concept Injection in the Latent Space of Language Models en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) Abeysiriwardana, M., & Sumanathilaka, D. (2024). A survey on lexical ambiguity detection and word sense disambiguation. In Proceedings of the 20th IEEE International Colloquium on Signal Processing and its Applications (CSPA) (pp. 1–6). Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information (L. Lee, M. Johnson, & K. Toutanova, Eds.). Transactions of the Association for Computational Linguistics, 5, 135–146. https://doi.org/10.1162/tacl_a_00051 Bystrov, D. (2024). Information retrieval multi-agent system established on the metaphysics lexical database. In Information Systems and Technological Advances for Sustainable Development (pp. 1–6). Springer. https://doi.org/10.1007/978-3-031-75329-9_1 Chandrasekaran, D., & Mago, V. (2021). Evolution of semantic similarity—a survey. ACM Computing Surveys, 54(2). https://doi.org/10.1145/3440755 Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In J. Burstein, C. Doran, & T. Solorio (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 4171–4186). Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1423 Guarino, N., Oberle, D., & Staab, S. (2009). What is an ontology? In S. Staab & R. Studer (Eds.), Handbook on Ontologies (pp. 1–17). Springer. https://doi.org/10.1007/978-3-540-92673-3_0 Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., De Laroussilhe, Q., Gesmundo, A., Attariyan, M., & Gelly, S. (2019). Parameter-efficient transfer learning for NLP. In K. Chaudhuri & R. Salakhutdinov (Eds.), Proceedings of the 36th International Conference on Machine Learning (pp. 2790–2799, Vol. 97). PMLR. https://proceedings.mlr.press/v97/houlsby19a.html Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2022). LoRA: Low-rank adaptation of large language models. In Proceedings of the 10th International Conference on Learning Representations (ICLR 2022). OpenReview. https://openreview.net/forum?id=nZeVKeeFYf9 Lester, B., Al-Rfou, R., & Constant, N. (2021). The power of scale for parameter-efficient prompt tuning. In M.-F. Moens, X. Huang, L. Specia, & S. W.-t. Yih (Eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 3045–3059). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.emnlp-main.243 Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., & Zettlemoyer, L. (2020). BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In D. Jurafsky, J. Chai, N. Schluter, & J. Tetreault (Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 7871–7880). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.703 Li, X. L., & Liang, P. (2021). Prefix-tuning: Optimizing continuous prompts for generation. In C. Zong, F. Xia, W. Li, & R. Navigli (Eds.), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 4582–4597). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.acl-long.353 Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2020). RoBERTa: A robustly optimized BERT pretraining approach. In Proceedings of the 8th International Conference on Learning Representations (ICLR 2020). OpenReview. https://openreview.net/forum?id=SyxS0T4tvS Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. Miller, G. A. (1995). WordNet: A lexical database for english. Communications of the ACM, 38(11), 39–41. https://doi.org/10.1145/219717.219748 Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. J. (1990). Introduction to WordNet: An on-line lexical database*. International Journal of Lexicography, 3(4), 235–244. https://doi.org/10.1093/ijl/3.4.235 Munroe, R. (2010, May). Color name survey results. https://blog.xkcd.com/2010/05/03/color-survey-results/ Navigli, R., & Ponzetto, S. P. (2012). BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 193, 217–250. https://doi.org/10.1016/j.artint.2012.07.001 Pennington, J., Socher, R., & Manning, C. (2014). GloVe: Global vectors for word representation. In A. Moschitti, B. Pang, & W. Daelemans (Eds.), Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1532–1543). Association for Computational Linguistics. https://doi.org/10.3115/v1/D14-1162 Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. In M. Walker, H. Ji, & A. Stent (Eds.), Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (pp. 2227–2237). Association for Computational Linguistics. https://doi.org/10.18653/v1/N18-1202 Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training (Technical report). OpenAI. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In K. Inui, J. Jiang, V. Ng, & X. Wan (Eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLPIJCNLP) (pp. 3982–3992). Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-1410 Sinha, K., Jia, R., Hupkes, D., Pineau, J., Williams, A., & Kiela, D. (2021). Masked language modeling and the distributional hypothesis: Order word matters pre-training for little. In M.-F. Moens, X. Huang, L. Specia, & S. W.-t. Yih (Eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 2888–2913). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.emnlp-main.230 Song, Z., Yan, B., Liu, Y., Fang, M., Li, M., Yan, R., & Chen, X. (2025). Injecting domainspecific knowledge into large language models: A comprehensive survey. arXiv preprint arXiv:2502.10708. Speer, R., Chin, J., & Havasi, C. (2017). ConceptNet 5.5: An open multilingual graph of general knowledge. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (pp. 4444–4451). AAAI Press. https://doi.org/10.1609/aaai.v31i1.11164 van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(86), 2579–2605. http://jmlr.org/papers/v9/vandermaaten08a.html Wu, Z., & Palmer, M. (1994). Verb semantics and lexical selection. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics (pp. 133–138). Association for Computational Linguistics. https://doi.org/10.3115/981732.981751 zh_TW
