Publications-Theses
Article View/Open
Publication Export
-
Google ScholarTM
NCCU Library
Citation Infomation
Related Publications in TAIR
題名 應用自動化提示工程與RAG機制於問答系統之優化
Application of automated prompt engineering and the RAG mechanism to the optimization of question answering systems作者 葉柏皓
Yeh, Bo-Hao貢獻者 陳恭
Chen, Kung
葉柏皓
Yeh, Bo-Hao關鍵詞 生成式AI
RAG
自動化提示工程
PE2
BERT Score
Generative AI
RAG
Automated Prompt Engineering
PE2
BERT Score日期 2025 上傳時間 4-Aug-2025 14:27:45 (UTC+8) 摘要 生程式AI近年快速崛起,其中結合RAG(Retrieval-Augmented Generation)問答系統的應用更受到許多不同產業界廣泛關注。然而,作為問答系統核心的大型語言模型(LLM),其回答品質直接影響系統效能。傳統上透過微調LLM的方法,往往需投入大量硬體資源與專業技術,導致推廣困難。因此,本研究以自動化提示工程方法 PE2(Prompt Engineering a Prompt Engineer)為基礎框架,並根據實際應用情境進行調整與設計,將其有效融合至生成式 AI 的 RAG問答系統中。透過自動化調整與優化查詢(Query)的方式,在無需對模型進行額外微調的前提下,有效提升LLM的回答品質,並降低系統建置所需的資源成本與技術門檻。 實驗結果顯示,本研究所提出之方法能有效提高LLM的回答品質,並改善語意相關度評估指標(BERT Score)。此外,本研究亦自行設計了一套客觀的 Query 評估標準,取代以往缺乏統一客觀指標,僅能依靠人工主觀判斷 Query 品質之不足,進一步提升了提示詞評估的一致性與可靠性。本研究最後亦提出未來的研究方向,聚焦於進一步強化生成式 AI 問答系統的穩定性與準確性,期望透過持續優化與擴展,使其更能因應多元且複雜的應用情境,提升實務運用價值。
Generative AI has rapidly emerged in recent years, with RAG (Retrieval-Augmented Generation) QA systems receiving growing attention across industries. As the core of these systems, the response quality of large language models (LLMs) greatly affects system performance. However, improving LLMs through fine-tuning requires substantial resources and expertise, limiting its scalability. This study adopts the automated prompt engineering method PE2 (Prompt Engineer a Prompt Engineer) as a framework, tailoring it to real-world scenarios and integrating it into a generative AI-based RAG QA system. By automatically adjusting and optimizing prompt queries, our method improves response quality without additional fine-tuning, reducing technical and resource costs. Experiments show that the proposed approach effectively increases answer quality and improves semantic relevance (BERT Score). Additionally, we design an objective query evaluation standard to replace subjective judgment and enhance prompt consistency. Finally, this study proposes future directions for improving the robustness and precision of generative AI QA systems, aiming to enhance their adaptability to diverse and complex application scenarios.參考文獻 Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901. Chase, H. (2022). LangChain [Software]. https://github.com/langchain-ai/langchain Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., ... & Wang, H. (2023). Retrieval- augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997, 2. Gupta, S., Ranjan, R., & Singh, S. N. (2024). A Comprehensive Survey of Retrieval- Augmented Generation (RAG): Evolution, Current Landscape and Future Directions. arXiv preprint arXiv:2410.12837. Guu, K., Lee, K., Tung, Z., Pasupat, P., & Chang, M. (2020, November). Retrieval augmented language model pre-training. In International conference on machine learning (pp. 3929-3938). PMLR. Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., ... & Fung, P. (2023). Survey of hallucination in natural language generation. ACM computing surveys, 55(12), 1-38. Karpukhin, V., Oguz, B., Min, S., Lewis, P. S., Wu, L., Edunov, S., ... & Yih, W. T. (2020, November). Dense Passage Retrieval for Open-Domain Question Answering. In EMNLP (1) (pp. 6769-6781). Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., ... & Zettlemoyer, L. (2019). Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461. Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems, 33, 9459-9474. PromptEngineering.org. (2024). What is prompt engineering? Retrieved May 10, 2025, from https://promptengineering.org/what-is-prompt-engineering/ Pryzant, R., Iter, D., Li, J., Lee, Y. T., Zhu, C., & Zeng, M. (2023). Automatic prompt optimization with" gradient descent" and beam search. arXiv preprint arXiv:2305.03495. Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140), 1-67. Sahoo, P., Singh, A. K., Saha, S., Jain, V., Mondal, S., & Chadha, A. (2024). A systematic survey of prompt engineering in large language models: Techniques and applications. arXiv preprint arXiv:2402.07927. Schulhoff, S., Ilie, M., Balepur, N., Kahadze, K., Liu, A., Si, C., ... & Resnik, P. (2024). The prompt report: A systematic survey of prompting techniques. arXiv preprint arXiv:2406.06608. Suzgun, M., Scales, N., Schärli, N., Gehrmann, S., Tay, Y., Chung, H. W., ... & Wei, J. (2022). Challenging big-bench tasks and whether chain-of-thought can solve them. arXiv preprint arXiv:2210.09261. Vatsal, S., & Dubey, H. (2024). A survey of prompt engineering methods in large language models for different nlp tasks. arXiv preprint arXiv:2407.12994. Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., ... & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35, 24824-24837. Ye, Q., Axmed, M., Pryzant, R., & Khani, F. (2023). Prompt engineering a prompt engineer. arXiv preprint arXiv:2311.05661. Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., ... & Wen, J. R. (2023). A survey of large language models. arXiv preprint arXiv:2303.18223, 1(2). Zheng, L., Chiang, W. L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., ... & Stoica, I. (2023). Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in Neural Information Processing Systems, 36, 46595-46623. 描述 碩士
國立政治大學
資訊管理學系
112356038資料來源 http://thesis.lib.nccu.edu.tw/record/#G0112356038 資料類型 thesis dc.contributor.advisor 陳恭 zh_TW dc.contributor.advisor Chen, Kung en_US dc.contributor.author (Authors) 葉柏皓 zh_TW dc.contributor.author (Authors) Yeh, Bo-Hao en_US dc.creator (作者) 葉柏皓 zh_TW dc.creator (作者) Yeh, Bo-Hao en_US dc.date (日期) 2025 en_US dc.date.accessioned 4-Aug-2025 14:27:45 (UTC+8) - dc.date.available 4-Aug-2025 14:27:45 (UTC+8) - dc.date.issued (上傳時間) 4-Aug-2025 14:27:45 (UTC+8) - dc.identifier (Other Identifiers) G0112356038 en_US dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/158579 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊管理學系 zh_TW dc.description (描述) 112356038 zh_TW dc.description.abstract (摘要) 生程式AI近年快速崛起,其中結合RAG(Retrieval-Augmented Generation)問答系統的應用更受到許多不同產業界廣泛關注。然而,作為問答系統核心的大型語言模型(LLM),其回答品質直接影響系統效能。傳統上透過微調LLM的方法,往往需投入大量硬體資源與專業技術,導致推廣困難。因此,本研究以自動化提示工程方法 PE2(Prompt Engineering a Prompt Engineer)為基礎框架,並根據實際應用情境進行調整與設計,將其有效融合至生成式 AI 的 RAG問答系統中。透過自動化調整與優化查詢(Query)的方式,在無需對模型進行額外微調的前提下,有效提升LLM的回答品質,並降低系統建置所需的資源成本與技術門檻。 實驗結果顯示,本研究所提出之方法能有效提高LLM的回答品質,並改善語意相關度評估指標(BERT Score)。此外,本研究亦自行設計了一套客觀的 Query 評估標準,取代以往缺乏統一客觀指標,僅能依靠人工主觀判斷 Query 品質之不足,進一步提升了提示詞評估的一致性與可靠性。本研究最後亦提出未來的研究方向,聚焦於進一步強化生成式 AI 問答系統的穩定性與準確性,期望透過持續優化與擴展,使其更能因應多元且複雜的應用情境,提升實務運用價值。 zh_TW dc.description.abstract (摘要) Generative AI has rapidly emerged in recent years, with RAG (Retrieval-Augmented Generation) QA systems receiving growing attention across industries. As the core of these systems, the response quality of large language models (LLMs) greatly affects system performance. However, improving LLMs through fine-tuning requires substantial resources and expertise, limiting its scalability. This study adopts the automated prompt engineering method PE2 (Prompt Engineer a Prompt Engineer) as a framework, tailoring it to real-world scenarios and integrating it into a generative AI-based RAG QA system. By automatically adjusting and optimizing prompt queries, our method improves response quality without additional fine-tuning, reducing technical and resource costs. Experiments show that the proposed approach effectively increases answer quality and improves semantic relevance (BERT Score). Additionally, we design an objective query evaluation standard to replace subjective judgment and enhance prompt consistency. Finally, this study proposes future directions for improving the robustness and precision of generative AI QA systems, aiming to enhance their adaptability to diverse and complex application scenarios. en_US dc.description.tableofcontents 第1章 緒論 1 1.1 研究背景與動機 1 1.2 研究目的 2 1.3 論文架構 2 第2章 文獻探討與回顧 4 2.1 檢索增強生成 4 2.1.1 技術背景與發展歷程 4 2.1.2 運作機制與系統架構 5 2.1.3 應用情境與發展潛力 6 2.2 提示工程 7 2.2.1 技術背景與定義 7 2.2.2 核心原理與技術方法 8 2.2.3 典型應用場景 9 2.2.4 挑戰 9 2.3 自動化提示工程 10 2.3.1 Automatic Prompt Engineer 11 2.3.2 Automatic Prompt Optimization 12 2.3.3 Prompt Engineering a Prompt Engineer 12 2.3.4 自動提示工程框架的比較 14 第3章 研究方法 15 3.1 PE2 方法探討與驗證 15 3.1.1 測試資料 15 3.1.2 模型選擇 17 3.1.3 PE2 流程 18 3.1.4 驗證結果 18 3.2 PE2 結合 RAG 問答系統 19 3.2.1 系統設計理念 19 3.2.2 系統架構設計 19 3.2.3 知識庫設計 21 3.2.4 RAG 方法設計 22 3.2.5 PE2 結合 RAG 之設計 23 第4章 系統實作 26 4.1 實驗環境與開發工具 26 4.2 程式碼設計 28 4.2.1 知識庫建立 28 4.2.2 RAG 方法實作 30 4.2.3 PE2 結合 RAG 實作 32 4.3 系統評估與限制 45 4.3.1 系統評估 45 4.3.2 系統限制 50 第5章 結論與未來展望 52 5.1 結論 52 5.2 未來展望 53 參考文獻 54 zh_TW dc.format.extent 3671215 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0112356038 en_US dc.subject (關鍵詞) 生成式AI zh_TW dc.subject (關鍵詞) RAG zh_TW dc.subject (關鍵詞) 自動化提示工程 zh_TW dc.subject (關鍵詞) PE2 zh_TW dc.subject (關鍵詞) BERT Score zh_TW dc.subject (關鍵詞) Generative AI en_US dc.subject (關鍵詞) RAG en_US dc.subject (關鍵詞) Automated Prompt Engineering en_US dc.subject (關鍵詞) PE2 en_US dc.subject (關鍵詞) BERT Score en_US dc.title (題名) 應用自動化提示工程與RAG機制於問答系統之優化 zh_TW dc.title (題名) Application of automated prompt engineering and the RAG mechanism to the optimization of question answering systems en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901. Chase, H. (2022). LangChain [Software]. https://github.com/langchain-ai/langchain Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., ... & Wang, H. (2023). Retrieval- augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997, 2. Gupta, S., Ranjan, R., & Singh, S. N. (2024). A Comprehensive Survey of Retrieval- Augmented Generation (RAG): Evolution, Current Landscape and Future Directions. arXiv preprint arXiv:2410.12837. Guu, K., Lee, K., Tung, Z., Pasupat, P., & Chang, M. (2020, November). Retrieval augmented language model pre-training. In International conference on machine learning (pp. 3929-3938). PMLR. Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., ... & Fung, P. (2023). Survey of hallucination in natural language generation. ACM computing surveys, 55(12), 1-38. Karpukhin, V., Oguz, B., Min, S., Lewis, P. S., Wu, L., Edunov, S., ... & Yih, W. T. (2020, November). Dense Passage Retrieval for Open-Domain Question Answering. In EMNLP (1) (pp. 6769-6781). Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., ... & Zettlemoyer, L. (2019). Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461. Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems, 33, 9459-9474. PromptEngineering.org. (2024). What is prompt engineering? Retrieved May 10, 2025, from https://promptengineering.org/what-is-prompt-engineering/ Pryzant, R., Iter, D., Li, J., Lee, Y. T., Zhu, C., & Zeng, M. (2023). Automatic prompt optimization with" gradient descent" and beam search. arXiv preprint arXiv:2305.03495. Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140), 1-67. Sahoo, P., Singh, A. K., Saha, S., Jain, V., Mondal, S., & Chadha, A. (2024). A systematic survey of prompt engineering in large language models: Techniques and applications. arXiv preprint arXiv:2402.07927. Schulhoff, S., Ilie, M., Balepur, N., Kahadze, K., Liu, A., Si, C., ... & Resnik, P. (2024). The prompt report: A systematic survey of prompting techniques. arXiv preprint arXiv:2406.06608. Suzgun, M., Scales, N., Schärli, N., Gehrmann, S., Tay, Y., Chung, H. W., ... & Wei, J. (2022). Challenging big-bench tasks and whether chain-of-thought can solve them. arXiv preprint arXiv:2210.09261. Vatsal, S., & Dubey, H. (2024). A survey of prompt engineering methods in large language models for different nlp tasks. arXiv preprint arXiv:2407.12994. Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., ... & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35, 24824-24837. Ye, Q., Axmed, M., Pryzant, R., & Khani, F. (2023). Prompt engineering a prompt engineer. arXiv preprint arXiv:2311.05661. Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., ... & Wen, J. R. (2023). A survey of large language models. arXiv preprint arXiv:2303.18223, 1(2). Zheng, L., Chiang, W. L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., ... & Stoica, I. (2023). Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in Neural Information Processing Systems, 36, 46595-46623. zh_TW
