應用自動化提示工程與RAG機制於問答系統之優化 | Publication

Publications-Theses

Article View/Open

pdf(0)

Publication Export

Google Scholar^TM

題名	應用自動化提示工程與RAG機制於問答系統之優化 Application of automated prompt engineering and the RAG mechanism to the optimization of question answering systems
作者	葉柏皓 Yeh, Bo-Hao
貢獻者	陳恭 Chen, Kung 葉柏皓 Yeh, Bo-Hao
關鍵詞	生成式AI RAG 自動化提示工程 PE2 BERT Score Generative AI RAG Automated Prompt Engineering PE2 BERT Score
日期	2025
上傳時間	4-Aug-2025 14:27:45 (UTC+8)
摘要	生程式AI近年快速崛起，其中結合RAG（Retrieval-Augmented Generation）問答系統的應用更受到許多不同產業界廣泛關注。然而，作為問答系統核心的大型語言模型（LLM），其回答品質直接影響系統效能。傳統上透過微調LLM的方法，往往需投入大量硬體資源與專業技術，導致推廣困難。因此，本研究以自動化提示工程方法 PE2（Prompt Engineering a Prompt Engineer）為基礎框架，並根據實際應用情境進行調整與設計，將其有效融合至生成式 AI 的 RAG問答系統中。透過自動化調整與優化查詢（Query）的方式，在無需對模型進行額外微調的前提下，有效提升LLM的回答品質，並降低系統建置所需的資源成本與技術門檻。實驗結果顯示，本研究所提出之方法能有效提高LLM的回答品質，並改善語意相關度評估指標（BERT Score）。此外，本研究亦自行設計了一套客觀的 Query 評估標準，取代以往缺乏統一客觀指標，僅能依靠人工主觀判斷 Query 品質之不足，進一步提升了提示詞評估的一致性與可靠性。本研究最後亦提出未來的研究方向，聚焦於進一步強化生成式 AI 問答系統的穩定性與準確性，期望透過持續優化與擴展，使其更能因應多元且複雜的應用情境，提升實務運用價值。 Generative AI has rapidly emerged in recent years, with RAG (Retrieval-Augmented Generation) QA systems receiving growing attention across industries. As the core of these systems, the response quality of large language models (LLMs) greatly affects system performance. However, improving LLMs through fine-tuning requires substantial resources and expertise, limiting its scalability. This study adopts the automated prompt engineering method PE2 (Prompt Engineer a Prompt Engineer) as a framework, tailoring it to real-world scenarios and integrating it into a generative AI-based RAG QA system. By automatically adjusting and optimizing prompt queries, our method improves response quality without additional fine-tuning, reducing technical and resource costs. Experiments show that the proposed approach effectively increases answer quality and improves semantic relevance (BERT Score). Additionally, we design an objective query evaluation standard to replace subjective judgment and enhance prompt consistency. Finally, this study proposes future directions for improving the robustness and precision of generative AI QA systems, aiming to enhance their adaptability to diverse and complex application scenarios.
參考文獻	Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901. Chase, H. (2022). LangChain [Software]. https://github.com/langchain-ai/langchain Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., ... & Wang, H. (2023). Retrieval- augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997, 2. Gupta, S., Ranjan, R., & Singh, S. N. (2024). A Comprehensive Survey of Retrieval- Augmented Generation (RAG): Evolution, Current Landscape and Future Directions. arXiv preprint arXiv:2410.12837. Guu, K., Lee, K., Tung, Z., Pasupat, P., & Chang, M. (2020, November). Retrieval augmented language model pre-training. In International conference on machine learning (pp. 3929-3938). PMLR. Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., ... & Fung, P. (2023). Survey of hallucination in natural language generation. ACM computing surveys, 55(12), 1-38. Karpukhin, V., Oguz, B., Min, S., Lewis, P. S., Wu, L., Edunov, S., ... & Yih, W. T. (2020, November). Dense Passage Retrieval for Open-Domain Question Answering. In EMNLP (1) (pp. 6769-6781). Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., ... & Zettlemoyer, L. (2019). Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461. Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems, 33, 9459-9474. PromptEngineering.org. (2024). What is prompt engineering? Retrieved May 10, 2025, from https://promptengineering.org/what-is-prompt-engineering/ Pryzant, R., Iter, D., Li, J., Lee, Y. T., Zhu, C., & Zeng, M. (2023). Automatic prompt optimization with" gradient descent" and beam search. arXiv preprint arXiv:2305.03495. Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140), 1-67. Sahoo, P., Singh, A. K., Saha, S., Jain, V., Mondal, S., & Chadha, A. (2024). A systematic survey of prompt engineering in large language models: Techniques and applications. arXiv preprint arXiv:2402.07927. Schulhoff, S., Ilie, M., Balepur, N., Kahadze, K., Liu, A., Si, C., ... & Resnik, P. (2024). The prompt report: A systematic survey of prompting techniques. arXiv preprint arXiv:2406.06608. Suzgun, M., Scales, N., Schärli, N., Gehrmann, S., Tay, Y., Chung, H. W., ... & Wei, J. (2022). Challenging big-bench tasks and whether chain-of-thought can solve them. arXiv preprint arXiv:2210.09261. Vatsal, S., & Dubey, H. (2024). A survey of prompt engineering methods in large language models for different nlp tasks. arXiv preprint arXiv:2407.12994. Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., ... & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35, 24824-24837. Ye, Q., Axmed, M., Pryzant, R., & Khani, F. (2023). Prompt engineering a prompt engineer. arXiv preprint arXiv:2311.05661. Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., ... & Wen, J. R. (2023). A survey of large language models. arXiv preprint arXiv:2303.18223, 1(2). Zheng, L., Chiang, W. L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., ... & Stoica, I. (2023). Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in Neural Information Processing Systems, 36, 46595-46623.
描述	碩士國立政治大學資訊管理學系 112356038
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0112356038
資料類型	thesis

dc.contributor.advisor	陳恭	zh_TW
dc.contributor.advisor	Chen, Kung	en_US
dc.contributor.author (Authors)	葉柏皓	zh_TW
dc.contributor.author (Authors)	Yeh, Bo-Hao	en_US
dc.creator (作者)	葉柏皓	zh_TW
dc.creator (作者)	Yeh, Bo-Hao	en_US
dc.date (日期)	2025	en_US
dc.date.accessioned	4-Aug-2025 14:27:45 (UTC+8)	-
dc.date.available	4-Aug-2025 14:27:45 (UTC+8)	-
dc.date.issued (上傳時間)	4-Aug-2025 14:27:45 (UTC+8)	-
dc.identifier (Other Identifiers)	G0112356038	en_US
dc.identifier.uri (URI)	https://nccur.lib.nccu.edu.tw/handle/140.119/158579	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	資訊管理學系	zh_TW
dc.description (描述)	112356038	zh_TW
dc.description.abstract (摘要)	生程式AI近年快速崛起，其中結合RAG（Retrieval-Augmented Generation）問答系統的應用更受到許多不同產業界廣泛關注。然而，作為問答系統核心的大型語言模型（LLM），其回答品質直接影響系統效能。傳統上透過微調LLM的方法，往往需投入大量硬體資源與專業技術，導致推廣困難。因此，本研究以自動化提示工程方法 PE2（Prompt Engineering a Prompt Engineer）為基礎框架，並根據實際應用情境進行調整與設計，將其有效融合至生成式 AI 的 RAG問答系統中。透過自動化調整與優化查詢（Query）的方式，在無需對模型進行額外微調的前提下，有效提升LLM的回答品質，並降低系統建置所需的資源成本與技術門檻。實驗結果顯示，本研究所提出之方法能有效提高LLM的回答品質，並改善語意相關度評估指標（BERT Score）。此外，本研究亦自行設計了一套客觀的 Query 評估標準，取代以往缺乏統一客觀指標，僅能依靠人工主觀判斷 Query 品質之不足，進一步提升了提示詞評估的一致性與可靠性。本研究最後亦提出未來的研究方向，聚焦於進一步強化生成式 AI 問答系統的穩定性與準確性，期望透過持續優化與擴展，使其更能因應多元且複雜的應用情境，提升實務運用價值。	zh_TW
dc.description.abstract (摘要)	Generative AI has rapidly emerged in recent years, with RAG (Retrieval-Augmented Generation) QA systems receiving growing attention across industries. As the core of these systems, the response quality of large language models (LLMs) greatly affects system performance. However, improving LLMs through fine-tuning requires substantial resources and expertise, limiting its scalability. This study adopts the automated prompt engineering method PE2 (Prompt Engineer a Prompt Engineer) as a framework, tailoring it to real-world scenarios and integrating it into a generative AI-based RAG QA system. By automatically adjusting and optimizing prompt queries, our method improves response quality without additional fine-tuning, reducing technical and resource costs. Experiments show that the proposed approach effectively increases answer quality and improves semantic relevance (BERT Score). Additionally, we design an objective query evaluation standard to replace subjective judgment and enhance prompt consistency. Finally, this study proposes future directions for improving the robustness and precision of generative AI QA systems, aiming to enhance their adaptability to diverse and complex application scenarios.	en_US
dc.description.tableofcontents	第1章緒論 1 1.1 研究背景與動機 1 1.2 研究目的 2 1.3 論文架構 2 第2章文獻探討與回顧 4 2.1 檢索增強生成 4 　2.1.1 技術背景與發展歷程 4 　2.1.2 運作機制與系統架構 5 　2.1.3 應用情境與發展潛力 6 2.2 提示工程 7 　2.2.1 技術背景與定義 7 　2.2.2 核心原理與技術方法 8 　2.2.3 典型應用場景 9 　2.2.4 挑戰 9 2.3 自動化提示工程 10 　2.3.1 Automatic Prompt Engineer 11 　2.3.2 Automatic Prompt Optimization 12 　2.3.3 Prompt Engineering a Prompt Engineer 12 　2.3.4 自動提示工程框架的比較 14 第3章研究方法 15 3.1 PE2 方法探討與驗證 15 　3.1.1 測試資料 15 　3.1.2 模型選擇 17 　3.1.3 PE2 流程 18 　3.1.4 驗證結果 18 3.2 PE2 結合 RAG 問答系統 19 　3.2.1 系統設計理念 19 　3.2.2 系統架構設計 19 　3.2.3 知識庫設計 21 　3.2.4 RAG 方法設計 22 　3.2.5 PE2 結合 RAG 之設計 23 第4章系統實作 26 4.1 實驗環境與開發工具 26 4.2 程式碼設計 28 　4.2.1 知識庫建立 28 　4.2.2 RAG 方法實作 30 　4.2.3 PE2 結合 RAG 實作 32 4.3 系統評估與限制 45 　4.3.1 系統評估 45 　4.3.2 系統限制 50 第5章結論與未來展望 52 5.1 結論 52 5.2 未來展望 53 參考文獻 54	zh_TW
dc.format.extent	3671215 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0112356038	en_US
dc.subject (關鍵詞)	生成式AI	zh_TW
dc.subject (關鍵詞)	RAG	zh_TW
dc.subject (關鍵詞)	自動化提示工程	zh_TW
dc.subject (關鍵詞)	PE2	zh_TW
dc.subject (關鍵詞)	BERT Score	zh_TW
dc.subject (關鍵詞)	Generative AI	en_US
dc.subject (關鍵詞)	RAG	en_US
dc.subject (關鍵詞)	Automated Prompt Engineering	en_US
dc.subject (關鍵詞)	PE2	en_US
dc.subject (關鍵詞)	BERT Score	en_US
dc.title (題名)	應用自動化提示工程與RAG機制於問答系統之優化	zh_TW
dc.title (題名)	Application of automated prompt engineering and the RAG mechanism to the optimization of question answering systems	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901. Chase, H. (2022). LangChain [Software]. https://github.com/langchain-ai/langchain Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., ... & Wang, H. (2023). Retrieval- augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997, 2. Gupta, S., Ranjan, R., & Singh, S. N. (2024). A Comprehensive Survey of Retrieval- Augmented Generation (RAG): Evolution, Current Landscape and Future Directions. arXiv preprint arXiv:2410.12837. Guu, K., Lee, K., Tung, Z., Pasupat, P., & Chang, M. (2020, November). Retrieval augmented language model pre-training. In International conference on machine learning (pp. 3929-3938). PMLR. Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., ... & Fung, P. (2023). Survey of hallucination in natural language generation. ACM computing surveys, 55(12), 1-38. Karpukhin, V., Oguz, B., Min, S., Lewis, P. S., Wu, L., Edunov, S., ... & Yih, W. T. (2020, November). Dense Passage Retrieval for Open-Domain Question Answering. In EMNLP (1) (pp. 6769-6781). Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., ... & Zettlemoyer, L. (2019). Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461. Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems, 33, 9459-9474. PromptEngineering.org. (2024). What is prompt engineering? Retrieved May 10, 2025, from https://promptengineering.org/what-is-prompt-engineering/ Pryzant, R., Iter, D., Li, J., Lee, Y. T., Zhu, C., & Zeng, M. (2023). Automatic prompt optimization with" gradient descent" and beam search. arXiv preprint arXiv:2305.03495. Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140), 1-67. Sahoo, P., Singh, A. K., Saha, S., Jain, V., Mondal, S., & Chadha, A. (2024). A systematic survey of prompt engineering in large language models: Techniques and applications. arXiv preprint arXiv:2402.07927. Schulhoff, S., Ilie, M., Balepur, N., Kahadze, K., Liu, A., Si, C., ... & Resnik, P. (2024). The prompt report: A systematic survey of prompting techniques. arXiv preprint arXiv:2406.06608. Suzgun, M., Scales, N., Schärli, N., Gehrmann, S., Tay, Y., Chung, H. W., ... & Wei, J. (2022). Challenging big-bench tasks and whether chain-of-thought can solve them. arXiv preprint arXiv:2210.09261. Vatsal, S., & Dubey, H. (2024). A survey of prompt engineering methods in large language models for different nlp tasks. arXiv preprint arXiv:2407.12994. Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., ... & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35, 24824-24837. Ye, Q., Axmed, M., Pryzant, R., & Khani, F. (2023). Prompt engineering a prompt engineer. arXiv preprint arXiv:2311.05661. Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., ... & Wen, J. R. (2023). A survey of large language models. arXiv preprint arXiv:2303.18223, 1(2). Zheng, L., Chiang, W. L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., ... & Stoica, I. (2023). Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in Neural Information Processing Systems, 36, 46595-46623.	zh_TW

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM