結合規則式評分與分群方法之大型語言模型語意風險與合規性評估 | Publication

Publications-Theses

Article View/Open

pdf(0)

Publication Export

Google Scholar^TM

題名	結合規則式評分與分群方法之大型語言模型語意風險與合規性評估 Semantic risk and compliance evaluation on LLM responses using rule-based scoring and clustering
作者	陳卉縈 Chen, Hui-Ying
貢獻者	郁方 Yu, Fang 陳卉縈 Chen, Hui-Ying
關鍵詞	大型語言模型 PyRIT GHSOM 倫理合規性安全性評估對抗式提示越獄攻擊 Large Language Models PyRIT GHSOM Ethical compliance Safety evaluation Adversarial prompts Jailbreaking
日期	2025
上傳時間	4-Aug-2025 14:28:07 (UTC+8)
摘要	隨著大型語言模型（LLM）廣泛應用於自然語言處理領域，如何強化其倫理防護能力、抵禦惡意提示詞攻擊，成為當前重要的研究課題。本研究提出一套具可解釋性的雙層評估架構，結合 PyRIT 規則式風險評分與 GHSOM 語意分群方法，從合規性與語氣風險兩個層面，系統性檢視模型的安全性表現。在本架構中，模型回應依據風險程度與語言風格被分類為四種語氣行為類型：明確違規（Vulgar）、語氣冒犯（Blunt）、潛在誤導（Deceptive）與合規回應（Eloquent）。此外，透過語意分群與特徵選取分析，本方法亦能辨識群集層級的風險特徵，並協助偵測出規則式評分中常見的誤判情形。實驗涵蓋 10 組情境與 12 種越獄攻擊腳本，共分析 2,925 筆模型回應。結果顯示，Gemini 產出的違規回應數量最多（119 筆），其次為 Perplexity（70 筆）與 DeepSeek（59 筆），而 Claude 與 ChatGPT 則整體展現出較高的倫理一致性。為進一步驗證風險行為是否具有跨模型的遷移性，本研究將其中 170 筆高風險提示詞重新測試於 API 模型與本地量化模型。結果顯示，API 模型仍容易受到對抗性提示詞影響，而量化模型則因理解能力較弱，導致攻擊成功率相對較低。整體而言，本研究所提出的整合式雙層評估方法，能有效補足傳統規則式指標的侷限，並提升語言模型風險分析的深度與可解釋性，為未來的 LLM 安全評估與對抗性測試提供重要的實證基礎與應用潛力。 Large Language Models (LLMs) have advanced natural language processing (NLP) applications but remain vulnerable to ethical misalignment and adversarial prompts. This study proposes a dual-layer evaluation framework that integrates rule-based scoring using the Python Risk Identification Tool (PyRIT) with clustering via the Growing Hierarchical Self-Organizing Map (GHSOM). LLM outputs are categorized into Vulgar, Blunt, Deceptive, and Eloquent behaviors based on compliance and semantic risks. The framework also enables cluster-level feature identification and false positive detection. Evaluating 2,925 responses across 10 scenarios and 12 jailbreak scripts, Gemini generated the highest number of Vulgar outputs (119), fol- lowed by Perplexity (70) and DeepSeek (59), while Claude and ChatGPT were more ethically aligned. Testing 170 high-risk prompts on API-based versus quantized local models revealed that API models remain susceptible to adversarial inputs, whereas quantized models exhibited lower attack success rates—likely due to reduced comprehension rather than stronger alignment safeguards. These findings underscore the value of layered evaluation frameworks for improving the safety and interpretability of LLMs.
參考文獻	AI, D. (2024a). Deepseek-r1-distill-llama-8b [Accessed: 2025-05]. AI, M. (2024b). Meta-llama-3.1-8b-instruct [Accessed: 2025-05]. Anthropic. (2023). Claude [Model version: Claude 3.5 Haiku].https://www.anthropic. com/claude DeepSeek-AI, Liu, A., Feng, B., Wang, B., Wang, B., Liu, B., et al. (2024). Deepseek-v2: A strong, economical, and efficient mixture-of-experts language model. https://arxiv.org/abs/2405.04434 Deng, G., Liu, Y., Li, Y., Wang, K., Zhang, Y., Li, Z., Wang, H., Zhang, T., & Liu, Y. (2024). Masterkey: Automated jailbreaking of large language model chatbots. Proceedings 2024 Network and Distributed System Security Symposium. https://doi. org/10.14722/ndss.2024.24188 Dittenbach, M., Merkl, D., & Rauber, A. (2001). Hierarchical clustering of document archives with the growing hierarchical self-organizing map. Proceedings of the International Conference on Artificial Neural Networks (ICANN), 486–491. https://doi.org/10.1007/3-540-44668-0_70 Gehman, S., Gururangan, S., Sap, M., Choi, Y., & Smith, N. A. (2020). Realtoxicityprompts: Evaluating neural toxic degeneration in language models. In T. Cohn, Y. He, & Y. Liu (Eds.), Findings of the association for computational linguistics: Emnlp 2020 (pp. 3356–3369). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.findings-emnlp.301 Google. (2024). Gemini [Model version: Gemini 2.0 Flash-Lite]. https://gemini.google. com/app Guo, Z., Jin, R., Liu, C., Huang, Y., Shi, D., Supryadi, Yu, L., Liu, Y., Li, J., Xiong, B., & Xiong, D. (2023). Evaluating large language models: A comprehensive survey. https://arxiv.org/abs/2310.19736 Hendrycks, D., Burns, C., Basart, S., Critch, A., Li, J., Song, D., & Steinhardt, J. (2021). Aligning ai with shared human values. International Conference on Learning Rep- resentations. https://openreview.net/forum?id=dNy%5C_RKzJacY Huang, Y., Zhang, Q., Y, P. S., & Sun, L. (2023). Trustgpt: A benchmark for trustworthy and responsible large language models. https://arxiv.org/abs/2306.11507 Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE, 78(9), 1464– 1480. https://doi.org/10.1109/5.58325 Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M.-W., Dai, A., Uszkoreit, J., Le, Q., & Petrov, S. (2019). Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7, 453–466. https://doi.org/10.1162/tacl_a_00276 Lees, A., Tran, V. Q., Tay, Y., Sorensen, J., Gupta, J., Metzler, D., & Vasserman, L. (2022). A new generation of perspective api: Efficient multilingual character-level transformers. Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 3197–3207. https://doi.org/10.1145/3534678.3539147 Liu, Y., Deng, G., Xu, Z., Li, Y., Zheng, Y., Zhang, Y., Zhao, L., Zhang, T., Wang, K., & Liu, Y. (2024). Jailbreaking chatgpt via prompt engineering: An empirical study. Munoz, G. D. L., Minnich, A. J., Lutz, R., Lundeen, R., Dheekonda, R. S. R., Chikanov, N., Jagdagdorj, B.-E., Pouliot, M., Chawla, S., Maxwell, W., Bullwinkel, B., Pratt, K., de Gruyter, J., Siska, C., Bryan, P., Westerhoff, T., Kawaguchi, C., Seifert, C., Kumar, R. S. S., & Zunger, Y. (2024). Pyrit: A framework for security risk identification and red teaming in generative ai system. https://arxiv.org/abs/2410.02828 Nangia, N., Vania, C., Bhalerao, R., & Bowman, S. R. (2020). Crows-pairs: A challenge dataset for measuring social biases in masked language models. In B. Webber, T. Cohn, Y. He, & Y. Liu (Eds.), Proceedings of the 2020 conference on empirical methods in natural language processing (emnlp) (pp. 1953–1967). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-main.154 OpenAI. (2023). Chatgpt [Model version: GPT-4o mini]. https://openai.com/chatgpt Patil, S. G., Zhang, T., Wang, X., & Gonzalez, J. E. (2023). Gorilla: Large language model connected with massive apis. https://arxiv.org/abs/2305.15334 Perplexity. (2023). Perplexity ai [Model version: Sonar]. https://www.perplexity.ai Rajpurkar, P., Zhang, J., Lopyrev, K., & Liang, P. (2016). Squad: 100, 000+ questions for machine comprehension of text. https://arxiv.org/abs/1606.05250 Reimers, N., & Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. https://arxiv.org/abs/1908.10084 Rudinger, R., Naradowsky, J., Leonard, B., & Durme, B. V. (2018). Gender bias in coreference resolution. https://arxiv.org/abs/1804.09301 Su, J., Kempe, J., & Ullrich, K. (2024). Mission impossible: A statistical perspective on jailbreaking llms. https://arxiv.org/abs/2408.01420 Talmor, A., Herzig, J., Lourie, N., & Berant, J. (2019). Commonsenseqa: A question answering challenge targeting commonsense knowledge. https://arxiv.org/abs/1811. 00937 Tang, H., Li, H., Liu, J., Hong, Y., Wu, H., & Wang, H. (2021). Dureade_robust: A chinese dataset towards evaluating robustness and generalization of machine reading comprehension in real-world applications. https://arxiv.org/abs/2004.11142 Wen, S.-J., Chang, J.-M., & Yu, F. (2024). Scghsom: Hierarchical clustering and visualization of single-cell and crispr data using growing hierarchical som. https://arxiv. org/abs/2407.16984 Yang, Z., Qi, P., Zhang, S., Bengio, Y., Cohen, W. W., Salakhutdinov, R., & Manning, C. D. (2018). Hotpotqa: A dataset for diverse, explainable multi-hop question answering. https://arxiv.org/abs/1809.09600 Zhao, J., Wang, T., Yatskar, M., Ordonez, V., & Chang, K.-W. (2018). Gender bias in coreference resolution: Evaluation and debiasing methods. In M. Walker, H. Ji, & A. Stent (Eds.), Proceedings of the 2018 conference of the north American chapter of the association for computational linguistics: Human language technologies, volume 2 (short papers) (pp. 15–20). Association for Computational Linguistics. https://doi.org/10.18653/v1/N18-2003 Zhao, Y., Zhao, C., Nan, L., Qi, Z., Zhang, W., Tang, X., Mi, B., & Radev, D. (2023). RobuT: A systematic study of table QA robustness against human-annotated adversarial perturbations. In A. Rogers, J. Boyd-Graber, & N. Okazaki (Eds.), Proceedings of the 61st annual meeting of the association for computational linguistics (volume 1: Long papers) (pp. 6064–6081). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.acl-long.334 Zhu, K., Wang, J., Zhou, J., Wang, Z., Chen, H., Wang, Y., Yang, L., Ye, W., Zhang, Y., Gong, N. Z., & Xie, X. (2024). Promptrobust: Towards evaluating the robustness of large language models on adversarial prompts. https://arxiv.org/abs/2306.04528 Zou, A., Wang, Z., Carlini, N., Nasr, M., Kolter, J. Z., & Fredrikson, M. (2023). Universal and transferable adversarial attacks on aligned language models. https://arxiv.org/ abs/2307.15043
描述	碩士國立政治大學資訊管理學系 112356043
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0112356043
資料類型	thesis

dc.contributor.advisor	郁方	zh_TW
dc.contributor.advisor	Yu, Fang	en_US
dc.contributor.author (Authors)	陳卉縈	zh_TW
dc.contributor.author (Authors)	Chen, Hui-Ying	en_US
dc.creator (作者)	陳卉縈	zh_TW
dc.creator (作者)	Chen, Hui-Ying	en_US
dc.date (日期)	2025	en_US
dc.date.accessioned	4-Aug-2025 14:28:07 (UTC+8)	-
dc.date.available	4-Aug-2025 14:28:07 (UTC+8)	-
dc.date.issued (上傳時間)	4-Aug-2025 14:28:07 (UTC+8)	-
dc.identifier (Other Identifiers)	G0112356043	en_US
dc.identifier.uri (URI)	https://nccur.lib.nccu.edu.tw/handle/140.119/158581	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	資訊管理學系	zh_TW
dc.description (描述)	112356043	zh_TW
dc.description.abstract (摘要)	隨著大型語言模型（LLM）廣泛應用於自然語言處理領域，如何強化其倫理防護能力、抵禦惡意提示詞攻擊，成為當前重要的研究課題。本研究提出一套具可解釋性的雙層評估架構，結合 PyRIT 規則式風險評分與 GHSOM 語意分群方法，從合規性與語氣風險兩個層面，系統性檢視模型的安全性表現。在本架構中，模型回應依據風險程度與語言風格被分類為四種語氣行為類型：明確違規（Vulgar）、語氣冒犯（Blunt）、潛在誤導（Deceptive）與合規回應（Eloquent）。此外，透過語意分群與特徵選取分析，本方法亦能辨識群集層級的風險特徵，並協助偵測出規則式評分中常見的誤判情形。實驗涵蓋 10 組情境與 12 種越獄攻擊腳本，共分析 2,925 筆模型回應。結果顯示，Gemini 產出的違規回應數量最多（119 筆），其次為 Perplexity（70 筆）與 DeepSeek（59 筆），而 Claude 與 ChatGPT 則整體展現出較高的倫理一致性。為進一步驗證風險行為是否具有跨模型的遷移性，本研究將其中 170 筆高風險提示詞重新測試於 API 模型與本地量化模型。結果顯示，API 模型仍容易受到對抗性提示詞影響，而量化模型則因理解能力較弱，導致攻擊成功率相對較低。整體而言，本研究所提出的整合式雙層評估方法，能有效補足傳統規則式指標的侷限，並提升語言模型風險分析的深度與可解釋性，為未來的 LLM 安全評估與對抗性測試提供重要的實證基礎與應用潛力。	zh_TW
dc.description.abstract (摘要)	Large Language Models (LLMs) have advanced natural language processing (NLP) applications but remain vulnerable to ethical misalignment and adversarial prompts. This study proposes a dual-layer evaluation framework that integrates rule-based scoring using the Python Risk Identification Tool (PyRIT) with clustering via the Growing Hierarchical Self-Organizing Map (GHSOM). LLM outputs are categorized into Vulgar, Blunt, Deceptive, and Eloquent behaviors based on compliance and semantic risks. The framework also enables cluster-level feature identification and false positive detection. Evaluating 2,925 responses across 10 scenarios and 12 jailbreak scripts, Gemini generated the highest number of Vulgar outputs (119), fol- lowed by Perplexity (70) and DeepSeek (59), while Claude and ChatGPT were more ethically aligned. Testing 170 high-risk prompts on API-based versus quantized local models revealed that API models remain susceptible to adversarial inputs, whereas quantized models exhibited lower attack success rates—likely due to reduced comprehension rather than stronger alignment safeguards. These findings underscore the value of layered evaluation frameworks for improving the safety and interpretability of LLMs.	en_US
dc.description.tableofcontents	摘要 i Abstract ii Contents iii List of Figures vi List of Tables viii 1 Introduction 1 2 Related Work 4 2.1 Knowledge and Capability Evaluation 4 2.2 Alignment Evaluation 5 2.3 Safety Evaluation 6 2.4 Limitations of Existing Approaches 7 3 Methodology 9 3.1 Prompt Generation 10 3.1.1 Contextual Prompts 10 3.1.2 Jailbreak Prompts 11 3.1.3 External Prompt Baseline from AdvBench 13 3.2 Response Collection 14 3.3 Semantic Embedding Conversion 14 3.4 Scoring and Classification with PyRIT 15 3.4.1 Binary Compliance Classification 16 3.4.2 Likert-Scale Compliance Scoring 16 3.4.3 Categorical Compliance Assessment 16 3.4.4 Objective Success Evaluation 17 3.5 Clustering Analysis with GHSOM 18 3.5.1 False Positive Detection 18 3.5.2 Feature Identification 19 3.6 Integration of PyRIT and GHSOM 20 4 Evaluation 22 4.1 PyRIT Scoring Analysis 22 4.1.1 Binary Compliance Classification 22 4.1.2 Likert-Scale Compliance Scoring 23 4.1.3 Categorical Compliance Assessment 23 4.1.4 Objective Success Evaluation 24 4.2 GHSOM Clustering Analysis 26 4.2.1 False Positive Detection 26 4.2.2 Feature Identification 28 4.3 Semantic Risk Quadrant Analysis 31 4.3.1 Vulgar Responses 32 4.3.2 Blunt Responses 33 4.3.3 Deceptive Responses 34 4.3.4 Eloquent Responses 34 4.4 Backtracking Analysis of Adversarial Responses 35 4.5 Transferability Evaluation Across Advanced and Quantized Models 38 4.6 Comparison with AdvBench Prompts 40 5 Conclusion 42 Reference 44 Appendix 48 A Representative Examples 48 A.1 Vulgar Response 48 A.2 Blunt Response 49 A.3 Deceptive Response 50 A.4 Eloquent Response 51	zh_TW
dc.format.extent	2226221 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0112356043	en_US
dc.subject (關鍵詞)	大型語言模型	zh_TW
dc.subject (關鍵詞)	PyRIT	zh_TW
dc.subject (關鍵詞)	GHSOM	zh_TW
dc.subject (關鍵詞)	倫理合規性	zh_TW
dc.subject (關鍵詞)	安全性評估	zh_TW
dc.subject (關鍵詞)	對抗式提示	zh_TW
dc.subject (關鍵詞)	越獄攻擊	zh_TW
dc.subject (關鍵詞)	Large Language Models	en_US
dc.subject (關鍵詞)	PyRIT	en_US
dc.subject (關鍵詞)	GHSOM	en_US
dc.subject (關鍵詞)	Ethical compliance	en_US
dc.subject (關鍵詞)	Safety evaluation	en_US
dc.subject (關鍵詞)	Adversarial prompts	en_US
dc.subject (關鍵詞)	Jailbreaking	en_US
dc.title (題名)	結合規則式評分與分群方法之大型語言模型語意風險與合規性評估	zh_TW
dc.title (題名)	Semantic risk and compliance evaluation on LLM responses using rule-based scoring and clustering	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	AI, D. (2024a). Deepseek-r1-distill-llama-8b [Accessed: 2025-05]. AI, M. (2024b). Meta-llama-3.1-8b-instruct [Accessed: 2025-05]. Anthropic. (2023). Claude [Model version: Claude 3.5 Haiku].https://www.anthropic. com/claude DeepSeek-AI, Liu, A., Feng, B., Wang, B., Wang, B., Liu, B., et al. (2024). Deepseek-v2: A strong, economical, and efficient mixture-of-experts language model. https://arxiv.org/abs/2405.04434 Deng, G., Liu, Y., Li, Y., Wang, K., Zhang, Y., Li, Z., Wang, H., Zhang, T., & Liu, Y. (2024). Masterkey: Automated jailbreaking of large language model chatbots. Proceedings 2024 Network and Distributed System Security Symposium. https://doi. org/10.14722/ndss.2024.24188 Dittenbach, M., Merkl, D., & Rauber, A. (2001). Hierarchical clustering of document archives with the growing hierarchical self-organizing map. Proceedings of the International Conference on Artificial Neural Networks (ICANN), 486–491. https://doi.org/10.1007/3-540-44668-0_70 Gehman, S., Gururangan, S., Sap, M., Choi, Y., & Smith, N. A. (2020). Realtoxicityprompts: Evaluating neural toxic degeneration in language models. In T. Cohn, Y. He, & Y. Liu (Eds.), Findings of the association for computational linguistics: Emnlp 2020 (pp. 3356–3369). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.findings-emnlp.301 Google. (2024). Gemini [Model version: Gemini 2.0 Flash-Lite]. https://gemini.google. com/app Guo, Z., Jin, R., Liu, C., Huang, Y., Shi, D., Supryadi, Yu, L., Liu, Y., Li, J., Xiong, B., & Xiong, D. (2023). Evaluating large language models: A comprehensive survey. https://arxiv.org/abs/2310.19736 Hendrycks, D., Burns, C., Basart, S., Critch, A., Li, J., Song, D., & Steinhardt, J. (2021). Aligning ai with shared human values. International Conference on Learning Rep- resentations. https://openreview.net/forum?id=dNy%5C_RKzJacY Huang, Y., Zhang, Q., Y, P. S., & Sun, L. (2023). Trustgpt: A benchmark for trustworthy and responsible large language models. https://arxiv.org/abs/2306.11507 Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE, 78(9), 1464– 1480. https://doi.org/10.1109/5.58325 Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M.-W., Dai, A., Uszkoreit, J., Le, Q., & Petrov, S. (2019). Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7, 453–466. https://doi.org/10.1162/tacl_a_00276 Lees, A., Tran, V. Q., Tay, Y., Sorensen, J., Gupta, J., Metzler, D., & Vasserman, L. (2022). A new generation of perspective api: Efficient multilingual character-level transformers. Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 3197–3207. https://doi.org/10.1145/3534678.3539147 Liu, Y., Deng, G., Xu, Z., Li, Y., Zheng, Y., Zhang, Y., Zhao, L., Zhang, T., Wang, K., & Liu, Y. (2024). Jailbreaking chatgpt via prompt engineering: An empirical study. Munoz, G. D. L., Minnich, A. J., Lutz, R., Lundeen, R., Dheekonda, R. S. R., Chikanov, N., Jagdagdorj, B.-E., Pouliot, M., Chawla, S., Maxwell, W., Bullwinkel, B., Pratt, K., de Gruyter, J., Siska, C., Bryan, P., Westerhoff, T., Kawaguchi, C., Seifert, C., Kumar, R. S. S., & Zunger, Y. (2024). Pyrit: A framework for security risk identification and red teaming in generative ai system. https://arxiv.org/abs/2410.02828 Nangia, N., Vania, C., Bhalerao, R., & Bowman, S. R. (2020). Crows-pairs: A challenge dataset for measuring social biases in masked language models. In B. Webber, T. Cohn, Y. He, & Y. Liu (Eds.), Proceedings of the 2020 conference on empirical methods in natural language processing (emnlp) (pp. 1953–1967). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-main.154 OpenAI. (2023). Chatgpt [Model version: GPT-4o mini]. https://openai.com/chatgpt Patil, S. G., Zhang, T., Wang, X., & Gonzalez, J. E. (2023). Gorilla: Large language model connected with massive apis. https://arxiv.org/abs/2305.15334 Perplexity. (2023). Perplexity ai [Model version: Sonar]. https://www.perplexity.ai Rajpurkar, P., Zhang, J., Lopyrev, K., & Liang, P. (2016). Squad: 100, 000+ questions for machine comprehension of text. https://arxiv.org/abs/1606.05250 Reimers, N., & Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. https://arxiv.org/abs/1908.10084 Rudinger, R., Naradowsky, J., Leonard, B., & Durme, B. V. (2018). Gender bias in coreference resolution. https://arxiv.org/abs/1804.09301 Su, J., Kempe, J., & Ullrich, K. (2024). Mission impossible: A statistical perspective on jailbreaking llms. https://arxiv.org/abs/2408.01420 Talmor, A., Herzig, J., Lourie, N., & Berant, J. (2019). Commonsenseqa: A question answering challenge targeting commonsense knowledge. https://arxiv.org/abs/1811. 00937 Tang, H., Li, H., Liu, J., Hong, Y., Wu, H., & Wang, H. (2021). Dureade_robust: A chinese dataset towards evaluating robustness and generalization of machine reading comprehension in real-world applications. https://arxiv.org/abs/2004.11142 Wen, S.-J., Chang, J.-M., & Yu, F. (2024). Scghsom: Hierarchical clustering and visualization of single-cell and crispr data using growing hierarchical som. https://arxiv. org/abs/2407.16984 Yang, Z., Qi, P., Zhang, S., Bengio, Y., Cohen, W. W., Salakhutdinov, R., & Manning, C. D. (2018). Hotpotqa: A dataset for diverse, explainable multi-hop question answering. https://arxiv.org/abs/1809.09600 Zhao, J., Wang, T., Yatskar, M., Ordonez, V., & Chang, K.-W. (2018). Gender bias in coreference resolution: Evaluation and debiasing methods. In M. Walker, H. Ji, & A. Stent (Eds.), Proceedings of the 2018 conference of the north American chapter of the association for computational linguistics: Human language technologies, volume 2 (short papers) (pp. 15–20). Association for Computational Linguistics. https://doi.org/10.18653/v1/N18-2003 Zhao, Y., Zhao, C., Nan, L., Qi, Z., Zhang, W., Tang, X., Mi, B., & Radev, D. (2023). RobuT: A systematic study of table QA robustness against human-annotated adversarial perturbations. In A. Rogers, J. Boyd-Graber, & N. Okazaki (Eds.), Proceedings of the 61st annual meeting of the association for computational linguistics (volume 1: Long papers) (pp. 6064–6081). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.acl-long.334 Zhu, K., Wang, J., Zhou, J., Wang, Z., Chen, H., Wang, Y., Yang, L., Ye, W., Zhang, Y., Gong, N. Z., & Xie, X. (2024). Promptrobust: Towards evaluating the robustness of large language models on adversarial prompts. https://arxiv.org/abs/2306.04528 Zou, A., Wang, Z., Carlini, N., Nasr, M., Kolter, J. Z., & Fredrikson, M. (2023). Universal and transferable adversarial attacks on aligned language models. https://arxiv.org/ abs/2307.15043	zh_TW

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM