Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 語言模型潛在空間中的語義概念注入之探討
Semantic Concept Injection in the Latent Space of Language Models
作者 羅永富
Lo, Yung-Fu
貢獻者 蕭舜文
Hsiao, Shun-Wen
羅永富
Lo, Yung-Fu
關鍵詞 自然語言處理
潛在空間
WordNet
低秩適應(LoRA)
語義概念注入
NLP
Latent space
WordNet
Low-Rank Adaptation (LoRA)
Semantic concept injection
日期 2025
上傳時間 1-Sep-2025 15:04:27 (UTC+8)
摘要 近年來,以Transformer為基礎的語言模型在自然語言處理領域取得了革命性的進展,這些模型能夠生成高維度的嵌入向量,捕捉複雜的語義關聯。然而,這些嵌入向量往往與直觀的人類概念結構缺乏對齊。本研究提出一個原則性架構,藉由結合WordNet等外部結構化知識,實現在語言模型潛在空間中的語義概念注入。我們運用低秩適應(LoRA)進行參數高效的微調,於模型中注入本體論語義約束,使嵌入向量的餘弦相似度對齊WordNet中階層性的Wu-Palmer相似度量。我們探討了兩種注入策略:用於複合語義概念的統一式LoRA(ULCSC)及用於單一語義概念的解耦式LoRA(DLISC)。在多項與語義相關的下游任務(如概念分類、多標籤概念分類及零樣本概念分類)中的實驗評估顯示,所提出的方法均有顯著的效能提升。在一個包含13個類別的概念分類任務中,我們的DLISC方法達到完美評分(準確率:1.0000,F1 Macro:1.0000,F1 Weighted:1.0000),大幅優於基線BERT模型(準確率:0.7778,F1 Macro:0.5051,F1 Weighted:0.7390)。此外,在多標籤概念分類及零樣本概念分類場景中,本方法始終優於基線模型,突顯其在多義性處理與語義增強上的有效性。結果顯示,本方法在提升潛在空間的可解釋性與語義含意的同時,亦維持了參數效率,驗證了利用外部知識庫實施語義概念注入以優化自然語言處理應用的可行性與有效性。
Recent advances in transformer-based language models have revolutionized natural language processing by generating high-dimensional embeddings that capture complex semantic relationships. However, these embeddings often lack alignment with intuitive human conceptual structures. This study proposes a principled framework for semantic concept injection in the latent space of language models by leveraging external structured knowledge from WordNet. We employ parameter-efficient fine-tuning using Low-Rank Adaptation (LoRA) to inject ontological semantic constraints, enforcing the cosine similarity of embeddings to align with WordNet’s hierarchical Wu-Palmer similarity metric. Two injection strategies are explored: Unified LoRA for Composite Semantic Concepts (ULCSC) and Disentangled LoRA for Individual Semantic Concepts (DLISC). Experimental evaluations across several semantically related downstream tasks, such as concept classification, multi-label concept classification, and zero-shot concept classification, show substantial performance improvements. In a 13-class concept classification task, our DLISC method achieves perfect scores (Accuracy: 1.0000, F1 Macro: 1.0000, F1 Weighted: 1.0000), outperforming the baseline BERT model (Accuracy: 0.7778, F1 Macro: 0.5051, F1 Weighted: 0.7390). Additionally, in multi-label concept and zero-shot concept classification scenarios, our methods consistently outperform baseline models, emphasizing its effectiveness in handling polysemy and enhancing semantic meanings. Results show enhanced interpretability and semantic meanings in the latent space while maintaining parameter efficiency, validating the effectiveness of semantic concept injection using external knowledge bases for improved NLP applications.
參考文獻 Abeysiriwardana, M., & Sumanathilaka, D. (2024). A survey on lexical ambiguity detection and word sense disambiguation. In Proceedings of the 20th IEEE International Colloquium on Signal Processing and its Applications (CSPA) (pp. 1–6). Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information (L. Lee, M. Johnson, & K. Toutanova, Eds.). Transactions of the Association for Computational Linguistics, 5, 135–146. https://doi.org/10.1162/tacl_a_00051 Bystrov, D. (2024). Information retrieval multi-agent system established on the metaphysics lexical database. In Information Systems and Technological Advances for Sustainable Development (pp. 1–6). Springer. https://doi.org/10.1007/978-3-031-75329-9_1 Chandrasekaran, D., & Mago, V. (2021). Evolution of semantic similarity—a survey. ACM Computing Surveys, 54(2). https://doi.org/10.1145/3440755 Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In J. Burstein, C. Doran, & T. Solorio (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 4171–4186). Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1423 Guarino, N., Oberle, D., & Staab, S. (2009). What is an ontology? In S. Staab & R. Studer (Eds.), Handbook on Ontologies (pp. 1–17). Springer. https://doi.org/10.1007/978-3-540-92673-3_0 Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., De Laroussilhe, Q., Gesmundo, A., Attariyan, M., & Gelly, S. (2019). Parameter-efficient transfer learning for NLP. In K. Chaudhuri & R. Salakhutdinov (Eds.), Proceedings of the 36th International Conference on Machine Learning (pp. 2790–2799, Vol. 97). PMLR. https://proceedings.mlr.press/v97/houlsby19a.html Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2022). LoRA: Low-rank adaptation of large language models. In Proceedings of the 10th International Conference on Learning Representations (ICLR 2022). OpenReview. https://openreview.net/forum?id=nZeVKeeFYf9 Lester, B., Al-Rfou, R., & Constant, N. (2021). The power of scale for parameter-efficient prompt tuning. In M.-F. Moens, X. Huang, L. Specia, & S. W.-t. Yih (Eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 3045–3059). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.emnlp-main.243 Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., & Zettlemoyer, L. (2020). BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In D. Jurafsky, J. Chai, N. Schluter, & J. Tetreault (Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 7871–7880). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.703 Li, X. L., & Liang, P. (2021). Prefix-tuning: Optimizing continuous prompts for generation. In C. Zong, F. Xia, W. Li, & R. Navigli (Eds.), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 4582–4597). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.acl-long.353 Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2020). RoBERTa: A robustly optimized BERT pretraining approach. In Proceedings of the 8th International Conference on Learning Representations (ICLR 2020). OpenReview. https://openreview.net/forum?id=SyxS0T4tvS Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. Miller, G. A. (1995). WordNet: A lexical database for english. Communications of the ACM, 38(11), 39–41. https://doi.org/10.1145/219717.219748 Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. J. (1990). Introduction to WordNet: An on-line lexical database*. International Journal of Lexicography, 3(4), 235–244. https://doi.org/10.1093/ijl/3.4.235 Munroe, R. (2010, May). Color name survey results. https://blog.xkcd.com/2010/05/03/color-survey-results/ Navigli, R., & Ponzetto, S. P. (2012). BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 193, 217–250. https://doi.org/10.1016/j.artint.2012.07.001 Pennington, J., Socher, R., & Manning, C. (2014). GloVe: Global vectors for word representation. In A. Moschitti, B. Pang, & W. Daelemans (Eds.), Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1532–1543). Association for Computational Linguistics. https://doi.org/10.3115/v1/D14-1162 Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. In M. Walker, H. Ji, & A. Stent (Eds.), Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (pp. 2227–2237). Association for Computational Linguistics. https://doi.org/10.18653/v1/N18-1202 Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training (Technical report). OpenAI. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In K. Inui, J. Jiang, V. Ng, & X. Wan (Eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLPIJCNLP) (pp. 3982–3992). Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-1410 Sinha, K., Jia, R., Hupkes, D., Pineau, J., Williams, A., & Kiela, D. (2021). Masked language modeling and the distributional hypothesis: Order word matters pre-training for little. In M.-F. Moens, X. Huang, L. Specia, & S. W.-t. Yih (Eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 2888–2913). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.emnlp-main.230 Song, Z., Yan, B., Liu, Y., Fang, M., Li, M., Yan, R., & Chen, X. (2025). Injecting domainspecific knowledge into large language models: A comprehensive survey. arXiv preprint arXiv:2502.10708. Speer, R., Chin, J., & Havasi, C. (2017). ConceptNet 5.5: An open multilingual graph of general knowledge. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (pp. 4444–4451). AAAI Press. https://doi.org/10.1609/aaai.v31i1.11164 van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(86), 2579–2605. http://jmlr.org/papers/v9/vandermaaten08a.html Wu, Z., & Palmer, M. (1994). Verb semantics and lexical selection. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics (pp. 133–138). Association for Computational Linguistics. https://doi.org/10.3115/981732.981751
描述 碩士
國立政治大學
資訊管理學系
112356021
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0112356021
資料類型 thesis
dc.contributor.advisor 蕭舜文zh_TW
dc.contributor.advisor Hsiao, Shun-Wenen_US
dc.contributor.author (Authors) 羅永富zh_TW
dc.contributor.author (Authors) Lo, Yung-Fuen_US
dc.creator (作者) 羅永富zh_TW
dc.creator (作者) Lo, Yung-Fuen_US
dc.date (日期) 2025en_US
dc.date.accessioned 1-Sep-2025 15:04:27 (UTC+8)-
dc.date.available 1-Sep-2025 15:04:27 (UTC+8)-
dc.date.issued (上傳時間) 1-Sep-2025 15:04:27 (UTC+8)-
dc.identifier (Other Identifiers) G0112356021en_US
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/159092-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊管理學系zh_TW
dc.description (描述) 112356021zh_TW
dc.description.abstract (摘要) 近年來,以Transformer為基礎的語言模型在自然語言處理領域取得了革命性的進展,這些模型能夠生成高維度的嵌入向量,捕捉複雜的語義關聯。然而,這些嵌入向量往往與直觀的人類概念結構缺乏對齊。本研究提出一個原則性架構,藉由結合WordNet等外部結構化知識,實現在語言模型潛在空間中的語義概念注入。我們運用低秩適應(LoRA)進行參數高效的微調,於模型中注入本體論語義約束,使嵌入向量的餘弦相似度對齊WordNet中階層性的Wu-Palmer相似度量。我們探討了兩種注入策略:用於複合語義概念的統一式LoRA(ULCSC)及用於單一語義概念的解耦式LoRA(DLISC)。在多項與語義相關的下游任務(如概念分類、多標籤概念分類及零樣本概念分類)中的實驗評估顯示,所提出的方法均有顯著的效能提升。在一個包含13個類別的概念分類任務中,我們的DLISC方法達到完美評分(準確率:1.0000,F1 Macro:1.0000,F1 Weighted:1.0000),大幅優於基線BERT模型(準確率:0.7778,F1 Macro:0.5051,F1 Weighted:0.7390)。此外,在多標籤概念分類及零樣本概念分類場景中,本方法始終優於基線模型,突顯其在多義性處理與語義增強上的有效性。結果顯示,本方法在提升潛在空間的可解釋性與語義含意的同時,亦維持了參數效率,驗證了利用外部知識庫實施語義概念注入以優化自然語言處理應用的可行性與有效性。zh_TW
dc.description.abstract (摘要) Recent advances in transformer-based language models have revolutionized natural language processing by generating high-dimensional embeddings that capture complex semantic relationships. However, these embeddings often lack alignment with intuitive human conceptual structures. This study proposes a principled framework for semantic concept injection in the latent space of language models by leveraging external structured knowledge from WordNet. We employ parameter-efficient fine-tuning using Low-Rank Adaptation (LoRA) to inject ontological semantic constraints, enforcing the cosine similarity of embeddings to align with WordNet’s hierarchical Wu-Palmer similarity metric. Two injection strategies are explored: Unified LoRA for Composite Semantic Concepts (ULCSC) and Disentangled LoRA for Individual Semantic Concepts (DLISC). Experimental evaluations across several semantically related downstream tasks, such as concept classification, multi-label concept classification, and zero-shot concept classification, show substantial performance improvements. In a 13-class concept classification task, our DLISC method achieves perfect scores (Accuracy: 1.0000, F1 Macro: 1.0000, F1 Weighted: 1.0000), outperforming the baseline BERT model (Accuracy: 0.7778, F1 Macro: 0.5051, F1 Weighted: 0.7390). Additionally, in multi-label concept and zero-shot concept classification scenarios, our methods consistently outperform baseline models, emphasizing its effectiveness in handling polysemy and enhancing semantic meanings. Results show enhanced interpretability and semantic meanings in the latent space while maintaining parameter efficiency, validating the effectiveness of semantic concept injection using external knowledge bases for improved NLP applications.en_US
dc.description.tableofcontents 摘要 i Abstract ii Contents iv List of Figures vi List of Tables viii 1 Introduction 1 2 Related Work 10 2.1 Representation Learning 10 2.1.1 Word2Vec 10 2.1.2 GloVe 11 2.1.3 FastText 11 2.1.4 ELMo 11 2.1.5 BERT 11 2.1.6 RoBERTa 12 2.1.7 GPT 12 2.2 Ontologies and Semantic Hierarchies 13 2.3 Knowledge Base Injection 15 2.4 Parameter-Efficient Fine-Tuning (PEFT) 17 3 Proposed Method 19 3.1 Overview 19 3.1.1 Concept-Based Word Pair Similarity Calculation 20 3.1.2 Semantic Concept Injection 20 3.1.3 Loss Function 20 3.2 Concept-Based Word Pair Similarity Calculation 21 3.2.1 Extracting Words Under a Specified Concept 21 3.2.2 Generating Word Pairs and Computing WUP Similarity 23 3.3 Semantic Concept Injection 25 3.3.1 Unified LoRA for Composite Semantic Concepts (ULCSC) 26 3.3.2 Disentangled LoRA for Individual Semantic Concepts (DLISC) 27 3.4 Loss Function 29 4 Experiments 31 4.1 Data Set 31 4.1.1 Datasets for Semantic Space Analysis 31 4.1.2 Datasets for Downstream Task Evaluation 32 4.2 Evaluation Metrics 34 4.2.1 Semantic Space Structure Evaluation 34 4.2.2 Downstream Task Evaluation 35 4.3 Semantic Space Analysis 37 4.4 Concept Classification 42 4.5 Multi-Label Concept Classification 44 4.6 Zero-Shot Concept Classification 45 5 Conclusion 47 References 49 A LoRA Hyperparameter Ablation Study 54 B Training Data Sampling 55zh_TW
dc.format.extent 3772596 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0112356021en_US
dc.subject (關鍵詞) 自然語言處理zh_TW
dc.subject (關鍵詞) 潛在空間zh_TW
dc.subject (關鍵詞) WordNetzh_TW
dc.subject (關鍵詞) 低秩適應(LoRA)zh_TW
dc.subject (關鍵詞) 語義概念注入zh_TW
dc.subject (關鍵詞) NLPen_US
dc.subject (關鍵詞) Latent spaceen_US
dc.subject (關鍵詞) WordNeten_US
dc.subject (關鍵詞) Low-Rank Adaptation (LoRA)en_US
dc.subject (關鍵詞) Semantic concept injectionen_US
dc.title (題名) 語言模型潛在空間中的語義概念注入之探討zh_TW
dc.title (題名) Semantic Concept Injection in the Latent Space of Language Modelsen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) Abeysiriwardana, M., & Sumanathilaka, D. (2024). A survey on lexical ambiguity detection and word sense disambiguation. In Proceedings of the 20th IEEE International Colloquium on Signal Processing and its Applications (CSPA) (pp. 1–6). Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information (L. Lee, M. Johnson, & K. Toutanova, Eds.). Transactions of the Association for Computational Linguistics, 5, 135–146. https://doi.org/10.1162/tacl_a_00051 Bystrov, D. (2024). Information retrieval multi-agent system established on the metaphysics lexical database. In Information Systems and Technological Advances for Sustainable Development (pp. 1–6). Springer. https://doi.org/10.1007/978-3-031-75329-9_1 Chandrasekaran, D., & Mago, V. (2021). Evolution of semantic similarity—a survey. ACM Computing Surveys, 54(2). https://doi.org/10.1145/3440755 Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In J. Burstein, C. Doran, & T. Solorio (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 4171–4186). Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1423 Guarino, N., Oberle, D., & Staab, S. (2009). What is an ontology? In S. Staab & R. Studer (Eds.), Handbook on Ontologies (pp. 1–17). Springer. https://doi.org/10.1007/978-3-540-92673-3_0 Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., De Laroussilhe, Q., Gesmundo, A., Attariyan, M., & Gelly, S. (2019). Parameter-efficient transfer learning for NLP. In K. Chaudhuri & R. Salakhutdinov (Eds.), Proceedings of the 36th International Conference on Machine Learning (pp. 2790–2799, Vol. 97). PMLR. https://proceedings.mlr.press/v97/houlsby19a.html Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2022). LoRA: Low-rank adaptation of large language models. In Proceedings of the 10th International Conference on Learning Representations (ICLR 2022). OpenReview. https://openreview.net/forum?id=nZeVKeeFYf9 Lester, B., Al-Rfou, R., & Constant, N. (2021). The power of scale for parameter-efficient prompt tuning. In M.-F. Moens, X. Huang, L. Specia, & S. W.-t. Yih (Eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 3045–3059). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.emnlp-main.243 Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., & Zettlemoyer, L. (2020). BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In D. Jurafsky, J. Chai, N. Schluter, & J. Tetreault (Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 7871–7880). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.703 Li, X. L., & Liang, P. (2021). Prefix-tuning: Optimizing continuous prompts for generation. In C. Zong, F. Xia, W. Li, & R. Navigli (Eds.), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 4582–4597). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.acl-long.353 Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2020). RoBERTa: A robustly optimized BERT pretraining approach. In Proceedings of the 8th International Conference on Learning Representations (ICLR 2020). OpenReview. https://openreview.net/forum?id=SyxS0T4tvS Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. Miller, G. A. (1995). WordNet: A lexical database for english. Communications of the ACM, 38(11), 39–41. https://doi.org/10.1145/219717.219748 Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. J. (1990). Introduction to WordNet: An on-line lexical database*. International Journal of Lexicography, 3(4), 235–244. https://doi.org/10.1093/ijl/3.4.235 Munroe, R. (2010, May). Color name survey results. https://blog.xkcd.com/2010/05/03/color-survey-results/ Navigli, R., & Ponzetto, S. P. (2012). BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 193, 217–250. https://doi.org/10.1016/j.artint.2012.07.001 Pennington, J., Socher, R., & Manning, C. (2014). GloVe: Global vectors for word representation. In A. Moschitti, B. Pang, & W. Daelemans (Eds.), Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1532–1543). Association for Computational Linguistics. https://doi.org/10.3115/v1/D14-1162 Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. In M. Walker, H. Ji, & A. Stent (Eds.), Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (pp. 2227–2237). Association for Computational Linguistics. https://doi.org/10.18653/v1/N18-1202 Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training (Technical report). OpenAI. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In K. Inui, J. Jiang, V. Ng, & X. Wan (Eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLPIJCNLP) (pp. 3982–3992). Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-1410 Sinha, K., Jia, R., Hupkes, D., Pineau, J., Williams, A., & Kiela, D. (2021). Masked language modeling and the distributional hypothesis: Order word matters pre-training for little. In M.-F. Moens, X. Huang, L. Specia, & S. W.-t. Yih (Eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 2888–2913). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.emnlp-main.230 Song, Z., Yan, B., Liu, Y., Fang, M., Li, M., Yan, R., & Chen, X. (2025). Injecting domainspecific knowledge into large language models: A comprehensive survey. arXiv preprint arXiv:2502.10708. Speer, R., Chin, J., & Havasi, C. (2017). ConceptNet 5.5: An open multilingual graph of general knowledge. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (pp. 4444–4451). AAAI Press. https://doi.org/10.1609/aaai.v31i1.11164 van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(86), 2579–2605. http://jmlr.org/papers/v9/vandermaaten08a.html Wu, Z., & Palmer, M. (1994). Verb semantics and lexical selection. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics (pp. 133–138). Association for Computational Linguistics. https://doi.org/10.3115/981732.981751zh_TW