Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 產業新聞問答系統之RAG工作流程最佳化研究
Optimizing RAG Workflow for Industrial News QA Systems
作者 傅國書
Fu, Kwo-Shu
貢獻者 邱淑怡
Chiu, Shu-I
傅國書
Fu, Kwo-Shu
關鍵詞 檢索增強生成
產業新聞問答
新聞媒體
工作流程最佳化
大型語言模型
資訊檢索
Retrieval-Augmented Generation
Industrial news question answering
News media
Workflow optimization
Large Language Models
Information retrieval
日期 2026
上傳時間 2-Feb-2026 12:33:43 (UTC+8)
摘要 本研究聚焦於檢索增強生成(Retrieval-Augmented Generation, RAG)技術於產業新聞問答系統中的應用與工作流程最佳化。相較於僅依賴大型語言模型(Large Language Models, LLM)內部知識之生成方式,RAG透過動態檢索外部新聞內容作為生成依據,可有效降低幻覺風險,並提升回答之可驗證性與可信度。然而,於實際新聞媒體場域中,使用者提問常同時具備明確時序條件與跨主題資訊整合需求,使檢索階段易受到語意相近但時間錯置或情境不符之新聞內容干擾,進而影響最終生成結果之正確性。 為回應上述實務挑戰,本研究以完整RAG工作流程為分析對象,系統性比較分塊策略(Chunking Strategy)、檢索(Retrieval)、重排序(Reranking)、重組(Repacking)及摘要(Summarization)等模組於產業新聞問答情境下之效能影響,並以實際科技產業新聞語料作為實驗基礎進行驗證。評估方面,採用RAGAS 所提出之多項生成品質指標,並透過標準化方法進行綜合分析,以確保不同流程組態間之比較具備一致性與可解釋性。 實驗結果顯示,適當的分塊設定結合高效能重排序機制,能有效改善新聞問答系統中檢索偏差對生成品質所造成之影響,並在不過度增加系統複雜度的情況下,顯著提升回答之正確性與可信度。綜合而言,本研究不僅驗證RAG各模組於產業新聞場域中的實際效益,亦提出一套兼顧生成品質與實務可行性之工作流程設計建議,作為新聞媒體導入生成式問答服務之實證參考。
This study focuses on the application of Retrieval-Augmented Generation (RAG) techniques in industrial news question-answering systems and on optimizing their workflows. Compared with generation approaches that rely solely on the internal knowledge of Large Language Models (LLMs), RAG dynamically retrieves external news content as evidence for generation, thereby effectively reducing hallucination risks and improving the verifiability and credibility of answers. However, in real-world news media settings, user queries often involve explicit temporal constraints and cross-topic information integration requirements. As a result, the retrieval stage is prone to interference from news content that is semantically similar but temporally misaligned or contextually irrelevant, which in turn degrades the correctness of the final generated responses. To address these practical challenges, this study takes the complete RAG workflow as the object of analysis and systematically compares the performance impacts of key modules-including chunking strategies, retrieval, reranking, repacking, and summarization-under industrial news QA scenarios. Empirical validation is conducted using real-world technology industry news corpora. For evaluation, multiple generation quality metrics proposed by RAGAS are adopted, and standardized aggregation methods are applied to ensure consistency and interpretability when comparing different workflow configurations. Experimental results demonstrate that an appropriate chunking configuration combined with an effective reranking mechanism can substantially mitigate the negative effects of retrieval bias on generation quality in news QA systems. These improvements are achieved without excessively increasing system complexity, while significantly enhancing the correctness and credibility of generated answers. Overall, this study not only verifies the practical effectiveness of individual RAG modules in the industrial news domain, but also proposes a workflow design that balances generation quality with practical feasibility, providing empirical guidance for news media organizations deploying generative question-answering services.
參考文獻 [1] 陳藝方, "國際新聞自動化摘要—— 以「以色列—哈瑪斯戰爭」新聞為例," 碩士, 新聞研究所, 國立臺灣大學, 台北市, 2025. [2] V. Karpukhin, et al. , "Dense Passage Retrieval for Open-Domain Question Answering," 2020. [3] X. Wang, et al., "Searching for Best Practices in Retrieval-Augmented Generation," 2024. [4] 張育浚, "結合知識圖譜及對比學習以強化RAG技術之法律問答系統," 碩士, 資訊工程學系碩士班, 淡江大學, 新北市, 2024. [5] Y. Gao, et al., "Retrieval-Augmented Generation for Large Language Models: A Survey," 2024. [6] Y. Wang, et al. , "MaFeRw: Query rewriting with multi-aspect feedbacks for retrieval-augmented large language models," Proceedings of the AAAI Conference on Artificial Intelligence No. 24. 2025., vol. Vol. 39, 2025. [7] N. Ampazis, "Improving RAG Quality for Large Language Models with Topic-Enhanced Reranking," 2024. [8] J. J. Pan, Jianguo Wang, and Guoliang Li., "Survey of vector database management systems," The VLDB Journal 33.5, pp. 1591-1615, 2024. [9] E. Öztürk, and Altan Mesut., "Performance Analysis Of Chroma, Qdrant, and FAISS Databases," presented at the International Scientific Conference, 2024. [10] A. Gan, et al., "Retrieval Augmented Generation Evaluation in the Era of Large Language Models: A Comprehensive Survey," 2025. [11] C.-Y. Lin, "Rouge: A package for automatic evaluation of summaries," Text summarization branches out, 2004. [12] K. Papineni, et al, "Bleu: a method for automatic evaluation of machine translation," Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 2002. [13] P. Sarthi, et al., "Raptor: Recursive abstractive processing for tree-organized retrieval.," presented at the The Twelfth International Conference on Learning Representations., 2024. [14] L. Wang, Nan Yang, and Furu Wei., "Query2doc: Query Expansion with Large Language Models," 2023. [15] L. Gao, et al., "Precise Zero-Shot Dense Retrieval without Relevance Labels," 2022. [16] K. S. A. M. S. R. Solanki, "Blended RAG: Improving RAG (Retriever-Augmented Generation) Accuracy with Semantic Search and Hybrid Query-Based Retrievers," presented at the 2024 IEEE 7th International Conference on Multimedia Information Processing and Retrieval (MIPR), 2024. [17] H. Yu, et al., Evaluation of Retrieval-Augmented Generation: A Survey. Springer Nature Singapore, 2025, pp. 102--120. [18] G. d. S. P. Moreira, et al. , "Enhancing Q&A Text Retrieval with Ranking Models: Benchmarking, fine-tuning and deploying Rerankers for RAG," 2024. [19] R. Nogueira, et al., "Document Ranking with a Pretrained Sequence-to-Sequence Model," 2020. [20] S. Xiao, et al., "C-pack: Packed resources for general chinese embeddings.," Proceedings of the 47th international ACM SIGIR conference on research and development in information retrieval. 2024, 2024. [21] J. Dong, et al., "Don't forget to connect! improving rag with graph-based reranking," 2024. [22] A. J. Yepes, et al., "Financial Report Chunking for Effective Retrieval Augmented Generation," 2024. [23] S. Es, et al., "Ragas: Automated Evaluation of Retrieval Augmented Generation," 2025. [24] J. Chen, et al., "Benchmarking Large Language Models in Retrieval-Augmented Generation," 2023. [25] F. Times. "Ask-FT." https://professional.ft.com/ask-ft (accessed 2025/9/30, 2025). [26] T. W. Post. "About Ask The Post AI." https://www.washingtonpost.com/technology/2024/11/07/faq-ask-the-post-ai/ (accessed 2025-9-30, 2025). [27] DIGITIMES. "Ask DIGITIMES." https://www.digitimes.com.tw/tech/searchdomain/askdigitimes (accessed 2025-9-30, 2025). [28] J. Chen, et al., "Fintextqa: A dataset for long-form financial question answering," Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2-24. [29] 牛. 王博楷, and 劉麗玲, "基於RAG 的供應鏈智能問能模型," Operations Research and Fuzziology 14 (2024): 637.
描述 碩士
國立政治大學
資訊科學系碩士在職專班
112971025
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0112971025
資料類型 thesis
dc.contributor.advisor 邱淑怡zh_TW
dc.contributor.advisor Chiu, Shu-Ien_US
dc.contributor.author (Authors) 傅國書zh_TW
dc.contributor.author (Authors) Fu, Kwo-Shuen_US
dc.creator (作者) 傅國書zh_TW
dc.creator (作者) Fu, Kwo-Shuen_US
dc.date (日期) 2026en_US
dc.date.accessioned 2-Feb-2026 12:33:43 (UTC+8)-
dc.date.available 2-Feb-2026 12:33:43 (UTC+8)-
dc.date.issued (上傳時間) 2-Feb-2026 12:33:43 (UTC+8)-
dc.identifier (Other Identifiers) G0112971025en_US
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/161444-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學系碩士在職專班zh_TW
dc.description (描述) 112971025zh_TW
dc.description.abstract (摘要) 本研究聚焦於檢索增強生成(Retrieval-Augmented Generation, RAG)技術於產業新聞問答系統中的應用與工作流程最佳化。相較於僅依賴大型語言模型(Large Language Models, LLM)內部知識之生成方式,RAG透過動態檢索外部新聞內容作為生成依據,可有效降低幻覺風險,並提升回答之可驗證性與可信度。然而,於實際新聞媒體場域中,使用者提問常同時具備明確時序條件與跨主題資訊整合需求,使檢索階段易受到語意相近但時間錯置或情境不符之新聞內容干擾,進而影響最終生成結果之正確性。 為回應上述實務挑戰,本研究以完整RAG工作流程為分析對象,系統性比較分塊策略(Chunking Strategy)、檢索(Retrieval)、重排序(Reranking)、重組(Repacking)及摘要(Summarization)等模組於產業新聞問答情境下之效能影響,並以實際科技產業新聞語料作為實驗基礎進行驗證。評估方面,採用RAGAS 所提出之多項生成品質指標,並透過標準化方法進行綜合分析,以確保不同流程組態間之比較具備一致性與可解釋性。 實驗結果顯示,適當的分塊設定結合高效能重排序機制,能有效改善新聞問答系統中檢索偏差對生成品質所造成之影響,並在不過度增加系統複雜度的情況下,顯著提升回答之正確性與可信度。綜合而言,本研究不僅驗證RAG各模組於產業新聞場域中的實際效益,亦提出一套兼顧生成品質與實務可行性之工作流程設計建議,作為新聞媒體導入生成式問答服務之實證參考。zh_TW
dc.description.abstract (摘要) This study focuses on the application of Retrieval-Augmented Generation (RAG) techniques in industrial news question-answering systems and on optimizing their workflows. Compared with generation approaches that rely solely on the internal knowledge of Large Language Models (LLMs), RAG dynamically retrieves external news content as evidence for generation, thereby effectively reducing hallucination risks and improving the verifiability and credibility of answers. However, in real-world news media settings, user queries often involve explicit temporal constraints and cross-topic information integration requirements. As a result, the retrieval stage is prone to interference from news content that is semantically similar but temporally misaligned or contextually irrelevant, which in turn degrades the correctness of the final generated responses. To address these practical challenges, this study takes the complete RAG workflow as the object of analysis and systematically compares the performance impacts of key modules-including chunking strategies, retrieval, reranking, repacking, and summarization-under industrial news QA scenarios. Empirical validation is conducted using real-world technology industry news corpora. For evaluation, multiple generation quality metrics proposed by RAGAS are adopted, and standardized aggregation methods are applied to ensure consistency and interpretability when comparing different workflow configurations. Experimental results demonstrate that an appropriate chunking configuration combined with an effective reranking mechanism can substantially mitigate the negative effects of retrieval bias on generation quality in news QA systems. These improvements are achieved without excessively increasing system complexity, while significantly enhancing the correctness and credibility of generated answers. Overall, this study not only verifies the practical effectiveness of individual RAG modules in the industrial news domain, but also proposes a workflow design that balances generation quality with practical feasibility, providing empirical guidance for news media organizations deploying generative question-answering services.en_US
dc.description.tableofcontents 摘要 ii Abstract iii 目次 v 表次 vii 圖次 viii 第一章 緒論 1 1.1 研究背景 1 1.2 研究動機 2 1.3 研究目標 4 1.4 研究貢獻 5 1.5 論文架構 6 第二章 文獻探討 8 2.1 RAG工作流程 8 2.2 向量資料庫管理系統 11 2.3 分塊與嵌入策略 13 2.4 檢索機制 14 2.5 重排序模型 15 2.6 摘要處理 16 2.7 評估方法 17 2.8 新聞問答系統 19 第三章 研究架構與驗證設計 21 3.1 實驗設計 22 3.2 資料集及內容特性 22 3.3 分塊及嵌入設計 26 3.4 檢索階段:Hybrid Search with HyDE 28 3.5 重排序階段:BGE-Reranker-Base 30 3.6 重組階段:Reverse 32 3.7 摘要階段:Recomp 33 3.8 評估階段:RAGAS 35 第四章 實驗分析 39 4.1 實驗目的與結果 39 4.2 分塊策略效能分析 42 4.3 檢索機制效能分析 43 4.4 重排序模組效能分析 44 4.5 重組階段效能分析 45 4.6 摘要處理效能分析 46 4.7 綜合分析 47 第五章 結論與未來展望 49 5.1 主要研究發現 49 5.2 應用於新聞問答系統的建議 50 5.3 研究限制及未來展望 51 參考文獻 53zh_TW
dc.format.extent 1865504 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0112971025en_US
dc.subject (關鍵詞) 檢索增強生成zh_TW
dc.subject (關鍵詞) 產業新聞問答zh_TW
dc.subject (關鍵詞) 新聞媒體zh_TW
dc.subject (關鍵詞) 工作流程最佳化zh_TW
dc.subject (關鍵詞) 大型語言模型zh_TW
dc.subject (關鍵詞) 資訊檢索zh_TW
dc.subject (關鍵詞) Retrieval-Augmented Generationen_US
dc.subject (關鍵詞) Industrial news question answeringen_US
dc.subject (關鍵詞) News mediaen_US
dc.subject (關鍵詞) Workflow optimizationen_US
dc.subject (關鍵詞) Large Language Modelsen_US
dc.subject (關鍵詞) Information retrievalen_US
dc.title (題名) 產業新聞問答系統之RAG工作流程最佳化研究zh_TW
dc.title (題名) Optimizing RAG Workflow for Industrial News QA Systemsen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1] 陳藝方, "國際新聞自動化摘要—— 以「以色列—哈瑪斯戰爭」新聞為例," 碩士, 新聞研究所, 國立臺灣大學, 台北市, 2025. [2] V. Karpukhin, et al. , "Dense Passage Retrieval for Open-Domain Question Answering," 2020. [3] X. Wang, et al., "Searching for Best Practices in Retrieval-Augmented Generation," 2024. [4] 張育浚, "結合知識圖譜及對比學習以強化RAG技術之法律問答系統," 碩士, 資訊工程學系碩士班, 淡江大學, 新北市, 2024. [5] Y. Gao, et al., "Retrieval-Augmented Generation for Large Language Models: A Survey," 2024. [6] Y. Wang, et al. , "MaFeRw: Query rewriting with multi-aspect feedbacks for retrieval-augmented large language models," Proceedings of the AAAI Conference on Artificial Intelligence No. 24. 2025., vol. Vol. 39, 2025. [7] N. Ampazis, "Improving RAG Quality for Large Language Models with Topic-Enhanced Reranking," 2024. [8] J. J. Pan, Jianguo Wang, and Guoliang Li., "Survey of vector database management systems," The VLDB Journal 33.5, pp. 1591-1615, 2024. [9] E. Öztürk, and Altan Mesut., "Performance Analysis Of Chroma, Qdrant, and FAISS Databases," presented at the International Scientific Conference, 2024. [10] A. Gan, et al., "Retrieval Augmented Generation Evaluation in the Era of Large Language Models: A Comprehensive Survey," 2025. [11] C.-Y. Lin, "Rouge: A package for automatic evaluation of summaries," Text summarization branches out, 2004. [12] K. Papineni, et al, "Bleu: a method for automatic evaluation of machine translation," Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 2002. [13] P. Sarthi, et al., "Raptor: Recursive abstractive processing for tree-organized retrieval.," presented at the The Twelfth International Conference on Learning Representations., 2024. [14] L. Wang, Nan Yang, and Furu Wei., "Query2doc: Query Expansion with Large Language Models," 2023. [15] L. Gao, et al., "Precise Zero-Shot Dense Retrieval without Relevance Labels," 2022. [16] K. S. A. M. S. R. Solanki, "Blended RAG: Improving RAG (Retriever-Augmented Generation) Accuracy with Semantic Search and Hybrid Query-Based Retrievers," presented at the 2024 IEEE 7th International Conference on Multimedia Information Processing and Retrieval (MIPR), 2024. [17] H. Yu, et al., Evaluation of Retrieval-Augmented Generation: A Survey. Springer Nature Singapore, 2025, pp. 102--120. [18] G. d. S. P. Moreira, et al. , "Enhancing Q&A Text Retrieval with Ranking Models: Benchmarking, fine-tuning and deploying Rerankers for RAG," 2024. [19] R. Nogueira, et al., "Document Ranking with a Pretrained Sequence-to-Sequence Model," 2020. [20] S. Xiao, et al., "C-pack: Packed resources for general chinese embeddings.," Proceedings of the 47th international ACM SIGIR conference on research and development in information retrieval. 2024, 2024. [21] J. Dong, et al., "Don't forget to connect! improving rag with graph-based reranking," 2024. [22] A. J. Yepes, et al., "Financial Report Chunking for Effective Retrieval Augmented Generation," 2024. [23] S. Es, et al., "Ragas: Automated Evaluation of Retrieval Augmented Generation," 2025. [24] J. Chen, et al., "Benchmarking Large Language Models in Retrieval-Augmented Generation," 2023. [25] F. Times. "Ask-FT." https://professional.ft.com/ask-ft (accessed 2025/9/30, 2025). [26] T. W. Post. "About Ask The Post AI." https://www.washingtonpost.com/technology/2024/11/07/faq-ask-the-post-ai/ (accessed 2025-9-30, 2025). [27] DIGITIMES. "Ask DIGITIMES." https://www.digitimes.com.tw/tech/searchdomain/askdigitimes (accessed 2025-9-30, 2025). [28] J. Chen, et al., "Fintextqa: A dataset for long-form financial question answering," Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2-24. [29] 牛. 王博楷, and 劉麗玲, "基於RAG 的供應鏈智能問能模型," Operations Research and Fuzziology 14 (2024): 637.zh_TW