Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 基於 LLM 的無監督多顆粒度重排序用於長文本檢索
Unsupervised Multi-granularity LLM-based Reranking for Long Text Retrieval
作者 吳家瑋
Wu, Chia-Wei
貢獻者 李蔡彥<br>黃瀚萱
Li, Tsai-Yen<br>Huang, Hen-Hsen
吳家瑋
Wu, Chia-Wei
關鍵詞 資訊檢索
大型語言模型
查詢重寫
文本壓縮
長文本
無監督式文本重新排序
Information Retrieval
Large Language Model
Query Rewriting
Text Compression
Long Text
Unsupervised Text Reranking
日期 2024
上傳時間 5-Aug-2024 12:45:39 (UTC+8)
摘要 本研究提出Rate and Rank GPT(RRGPT),以提高文本重排序的效能與效率,並解決使用大型語言模型進行文檔檢索任務時遇到的長文本挑戰。RRGPT是一種新穎的資訊檢索方法,利用大型語言模型輔助資訊檢索系統中的子任務:查詢重寫任務和無監督式文本重新排序任務。在查詢重寫任務中,本研究將大型語言模型產生的關鍵術語堆疊起來,以擴充原始查詢。在無監督文本重新排序任務中,本研究提出混合式文本重新排序演算法,透過多顆粒度和低成本的方式,依相關度重新排序文本列表。對於長文本問題,本研究採用文本壓縮法從長文本中提取關鍵訊息,以確保文本符合大型語言模型的輸入長度限制。最後,本研究使用DL19和DL20的資料集驗證RRGPT在文檔檢索任務和段落檢索任務的表現。結果表明,RRGPT能更好地依相關度重排序文本列表,並且解決長文本問題。
This research proposes Rate and Rank GPT (RRGPT) to enhance the effectiveness and efficiency of text reranking and to address the challenges associated with long text in document retrieval tasks using Large Language Models (LLMs). RRGPT is a novel information retrieval method that utilize LLMs to improve subtasks such as query rewriting and unsupervised text reranking within the information retrieval system. For the query rewriting task, this research stacks terms generated by LLMs to expand queries. For the unsupervised text reranking task, this research proposes the hybrid text reranking algorithm with multi-granularity that ranks a list of texts with higher accuracy and lower cost than traditional methods. For the long text issue, this research uses a text compression strategy to extract crucial information from long texts, ensuring the texts compliance the input length constraints of LLMs. Finally, this research empirically validate the effectiveness and efficiency of RRGPT using the DL19 and DL20 datasets for document retrieval tasks and passage retrieval tasks. The empirical results demonstrate that RRPGT improves the effectiveness and efficiency text reranking and addresses long text issue.
參考文獻 [1] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017. [2] Yutao Zhu, Huaying Yuan, Shuting Wang, Jiongnan Liu, Wenhan Liu, Chenlong Deng, Zhicheng Dou, and Ji-Rong Wen. Large language models for information retrieval: A survey. arXiv preprint arXiv:2308.07107, 2023. [3] Weiwei Sun, Lingyong Yan, Xinyu Ma, Pengjie Ren, Dawei Yin, and Zhaochun Ren. Is chatgpt good at search? investigating large language models as re-ranking agent. arXiv preprint arXiv:2304.09542, 2023. [4] Rolf Jagerman, Honglei Zhuang, Zhen Qin, Xuanhui Wang, and Michael Bendersky. Query expansion by prompting large language models. arXiv preprint arXiv:2305.03653, 2023. [5] Honglei Zhuang, Zhen Qin, Kai Hui, Junru Wu, Le Yan, Xuanhui Wang, and Michael Berdersky. Beyond yes and no: Improving zero-shot llm rankers via scoring finegrained relevance labels. arXiv preprint arXiv:2310.14122, 2023. [6] Big Data, Big Impact: New Possibilities for International Development. (2012, January 22). World Economic Forum. [7] Steve Loh. (2012, February 11). The Age of Big Data. The New York Times. [8] Martin Hilbert and Priscila López. The world’s technological capacity to store, communicate, and compute information. science, 332(6025):60–65, 2011. [9] Netcraft. August 2011 Web Server Survey. https://web.archive.org/, 2011. [10] The Impact of TV in the U.S. Daily Iowan. 22 Nov 1955: 2. [11] OpenAI. OpenAI: Introducing ChatGPT. https://openai.com/, 2022. [12] Google. Google: Introducing Gemini. https://ai.google.dev/, 2023. [13] Alonzo Church. George boole. an investigation of the laws of thought, on which are founded the mathematical theories of logic and probabilities. dover publications, inc., new york1951, 11+ 424 pp. The Journal of Symbolic Logic, 16(3):224–225, 1951. [14] Gerard Salton, Anita Wong, and Chung-Shu Yang. A vector space model for automatic indexing. Communications of the ACM, 18(11):613–620, 1975. [15] Xiaoyong Liu and W Bruce Croft. Statistical language modeling for information retrieval. Annu. Rev. Inf. Sci. Technol., 39(1):1–31, 2005. [16] Fei Song and W Bruce Croft. A general language model for information retrieval. In Proceedings of the eighth international conference on Information and knowledge management, pages 316–321, 1999. [17] David RH Miller, Tim Leek, and Richard M Schwartz. A hidden markov model information retrieval system. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 214–221, 1999. [18] Stephen E Robertson and K Sparck Jones. Relevance weighting of search terms. Journal of the American Society for Information science, 27(3):129–146, 1976. [19] Wayne Xin Zhao, Jing Liu, Ruiyang Ren, and Ji-Rong Wen. Dense text retrieval based on pretrained language models: A survey. ACM Transactions on Information Systems, 42(4):1–60, 2024. [20] Jiafeng Guo, Yixing Fan, Qingyao Ai, and W Bruce Croft. A deep relevance matching model for ad-hoc retrieval. In Proceedings of the 25th ACM international on conference on information and knowledge management, pages 55–64, 2016. [21] Qingyao Ai, Ting Bai, Zhao Cao, Yi Chang, Jiawei Chen, Zhumin Chen, Zhiyong Cheng, Shoubin Dong, Zhicheng Dou, Fuli Feng, et al. Information retrieval meets large language models: a strategic report from chinese ir community. AI Open, 4:80–90, 2023. [22] Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020. [23] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023. [24] Luciano Floridi and Massimo Chiriatti. Gpt-3: Its nature, scope, limits, and consequences. Minds and Machines, 30:681–694, 2020. [25] Gerard Salton and Chris Buckley. Improving retrieval performance by relevance feedback. Journal of the American society for information science, 41(4):288–297, 1990. [26] Ian Ruthven and Mounia Lalmas. A survey on the use of relevance feedback for information access systems. The Knowledge Engineering Review, 18(2):95–145, 2003. [27] Guihong Cao, Jian-Yun Nie, Jianfeng Gao, and Stephen Robertson. Selecting good expansion terms for pseudo-relevance feedback. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 243–250, 2008. [28] Yuanhua Lv and ChengXiang Zhai. Positional relevance model for pseudo-relevance feedback. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 579–586, 2010. [29] Kevyn Collins-Thompson and Jamie Callan. Estimation and use of uncertainty in pseudo-relevance feedback. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 303–310, 2007. [30] Rong Yan, Alexander Hauptmann, and Rong Jin. Multimedia search with pseudorelevance feedback. In Image and Video Retrieval: Second International Conference, CIVR 2003 Urbana-Champaign, IL, USA, July 24–25, 2003 Proceedings 2, pages 238–247. Springer, 2003. [31] Jiazhan Feng, Chongyang Tao, Xiubo Geng, Tao Shen, Can Xu, Guodong Long, Dongyan Zhao, and Daxin Jiang. Knowledge refinement via interaction between search engines and large language models. arXiv preprint arXiv:2305.07402, 2023. [32] Liang Wang, Nan Yang, and Furu Wei. Query2doc: Query expansion with large language models. corr abs/2303.07678 (2023), 2023. [33] Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824– 24837, 2022. [34] Wenjun Peng, Guiyang Li, Yue Jiang, Zilong Wang, Dan Ou, Xiaoyi Zeng, Enhong Chen, et al. Large language model based long-tail query rewriting in taobao search. arXiv preprint arXiv:2311.03758, 2023. [35] Krishna Srinivasan, Karthik Raman, Anupam Samanta, Lingrui Liao, Luca Bertelli, and Mike Bendersky. Quill: Query intent with large language models using retrieval augmentation and multi-stage distillation. arXiv preprint arXiv:2210.15718, 2022. [36] Ellen M Voorhees et al. Overview of the trec 2003 robust retrieval track. In Trec, pages 69–77, 2003. [37] Kirk Roberts, Tasmeer Alam, Steven Bedrick, Dina Demner-Fushman, Kyle Lo, Ian Soboroff, Ellen Voorhees, Lucy Lu Wang, and William R Hersh. Searching for scientific evidence in a pandemic: An overview of trec-covid. Journal of Biomedical Informatics, 121:103865, 2021. [38] Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, and Ellen M Voorhees. Overview of the trec 2019 deep learning track. arXiv preprint arXiv:2003.07820, 2020. [39] Xueguang Ma, Xinyu Zhang, Ronak Pradeep, and Jimmy Lin. Zero-shot listwise document reranking with a large language model. arXiv preprint arXiv:2305.02156, 2023. [40] Zhen Qin, Rolf Jagerman, Kai Hui, Honglei Zhuang, Junru Wu, Jiaming Shen, Tianqi Liu, Jialu Liu, Donald Metzler, Xuanhui Wang, et al. Large language models are effective text rankers with pairwise ranking prompting. arXiv preprint arXiv:2306.17563, 2023. [41] Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep, and Rodrigo Nogueira. Pyserini: A python toolkit for reproducible information retrieval research with sparse and dense representations. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2356–2362, 2021. [42] Peilin Yang, Hui Fang, and Jimmy Lin. Anserini: Reproducible ranking baselines using lucene. Journal of Data and Information Quality (JDIQ), 10(4):1–20, 2018. [43] Peilin Yang, Hui Fang, and Jimmy Lin. Anserini: Enabling the use of lucene for information retrieval research. In Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, pages 1253–1256, 2017. [44] Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906, 2020. [45] Emine Yilmaz Daniel Campos Ellen M. Voorhees Nick Craswell, Bhaskar Mitra. Overview of the trec 2020 deep learning track. arXiv preprint arXiv:2102.07662, 2021. [46] Christophe Van Gysel and Maarten de Rijke. Pytrec_eval: An extremely fast python interface to trec_eval. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pages 873–876, 2018. [47] Monika Arora, Uma Kanjilal, and Dinesh Varshney. Evaluation of information retrieval: precision and recall. International Journal of Indian Culture and Business Management, 12(2):224–236, 2016. [48] Tie-Yan Liu et al. Learning to rank for information retrieval. Foundations and Trends® in Information Retrieval, 3(3):225–331, 2009. [49] Kazuaki Kishida. Property of average precision and its generalization: An examination of evaluation indicator for information retrieval experiments. National Institute of Informatics Tokyo, Japan, 2005. [50] Kalervo Järvelin and Jaana Kekäläinen. Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems (TOIS), 20(4):422–446, 2002. [51] Lin CY Rouge. A package for automatic evaluation of summaries. In Proceedings of Workshop on Text Summarization of ACL, Spain, volume 5, 2004. [52] Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675,2019. [53] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pretraining of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
描述 碩士
國立政治大學
資訊科學系
111753141
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0111753141
資料類型 thesis
dc.contributor.advisor 李蔡彥<br>黃瀚萱zh_TW
dc.contributor.advisor Li, Tsai-Yen<br>Huang, Hen-Hsenen_US
dc.contributor.author (Authors) 吳家瑋zh_TW
dc.contributor.author (Authors) Wu, Chia-Weien_US
dc.creator (作者) 吳家瑋zh_TW
dc.creator (作者) Wu, Chia-Weien_US
dc.date (日期) 2024en_US
dc.date.accessioned 5-Aug-2024 12:45:39 (UTC+8)-
dc.date.available 5-Aug-2024 12:45:39 (UTC+8)-
dc.date.issued (上傳時間) 5-Aug-2024 12:45:39 (UTC+8)-
dc.identifier (Other Identifiers) G0111753141en_US
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/152571-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學系zh_TW
dc.description (描述) 111753141zh_TW
dc.description.abstract (摘要) 本研究提出Rate and Rank GPT(RRGPT),以提高文本重排序的效能與效率,並解決使用大型語言模型進行文檔檢索任務時遇到的長文本挑戰。RRGPT是一種新穎的資訊檢索方法,利用大型語言模型輔助資訊檢索系統中的子任務:查詢重寫任務和無監督式文本重新排序任務。在查詢重寫任務中,本研究將大型語言模型產生的關鍵術語堆疊起來,以擴充原始查詢。在無監督文本重新排序任務中,本研究提出混合式文本重新排序演算法,透過多顆粒度和低成本的方式,依相關度重新排序文本列表。對於長文本問題,本研究採用文本壓縮法從長文本中提取關鍵訊息,以確保文本符合大型語言模型的輸入長度限制。最後,本研究使用DL19和DL20的資料集驗證RRGPT在文檔檢索任務和段落檢索任務的表現。結果表明,RRGPT能更好地依相關度重排序文本列表,並且解決長文本問題。zh_TW
dc.description.abstract (摘要) This research proposes Rate and Rank GPT (RRGPT) to enhance the effectiveness and efficiency of text reranking and to address the challenges associated with long text in document retrieval tasks using Large Language Models (LLMs). RRGPT is a novel information retrieval method that utilize LLMs to improve subtasks such as query rewriting and unsupervised text reranking within the information retrieval system. For the query rewriting task, this research stacks terms generated by LLMs to expand queries. For the unsupervised text reranking task, this research proposes the hybrid text reranking algorithm with multi-granularity that ranks a list of texts with higher accuracy and lower cost than traditional methods. For the long text issue, this research uses a text compression strategy to extract crucial information from long texts, ensuring the texts compliance the input length constraints of LLMs. Finally, this research empirically validate the effectiveness and efficiency of RRGPT using the DL19 and DL20 datasets for document retrieval tasks and passage retrieval tasks. The empirical results demonstrate that RRPGT improves the effectiveness and efficiency text reranking and addresses long text issue.en_US
dc.description.tableofcontents 第一章 緒論 1 第一節 研究動機 1 第二節 研究目的 3 第三節 預期貢獻 4 第四節 論文架構 4 第二章 文獻探討 6 第一節 自然語言處理 6 第二節 資訊檢索 7 第三節 預訓練語言模型 8 第四節 查詢重寫研究 10 第五節 無監督式文本重新排序研究 13 第三章 研究問題暨研究設計 17 第一節 研究問題定義 17 第二節 模型架構 18 第三節 Pyserini 介紹 20 第四節 查詢重寫任務設計 22 第五節 文本壓縮任務設計 24 第六節 混合式文本重新排序演算法 26 第七節 無監督式文本重新排序任務設計 28 第四章 實驗結果 31 第一節 資料集和評估工具介紹 31 第二節 評估指標介紹 34 第三節 段落檢索任務之結果 36 第四節 文檔檢索任務之結果 37 第五節 查詢重寫任務之實驗結果 38 第六節 文本壓縮任務之實驗結果 40 第七節 無監督式文本重新排序任務之實驗結果 42 第五章 結論與未來展望 44 第一節 研究結論 44 第二節 應用場景 46 第三節 研究限制與未來展望 47 參考文獻 49zh_TW
dc.format.extent 1621380 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0111753141en_US
dc.subject (關鍵詞) 資訊檢索zh_TW
dc.subject (關鍵詞) 大型語言模型zh_TW
dc.subject (關鍵詞) 查詢重寫zh_TW
dc.subject (關鍵詞) 文本壓縮zh_TW
dc.subject (關鍵詞) 長文本zh_TW
dc.subject (關鍵詞) 無監督式文本重新排序zh_TW
dc.subject (關鍵詞) Information Retrievalen_US
dc.subject (關鍵詞) Large Language Modelen_US
dc.subject (關鍵詞) Query Rewritingen_US
dc.subject (關鍵詞) Text Compressionen_US
dc.subject (關鍵詞) Long Texten_US
dc.subject (關鍵詞) Unsupervised Text Rerankingen_US
dc.title (題名) 基於 LLM 的無監督多顆粒度重排序用於長文本檢索zh_TW
dc.title (題名) Unsupervised Multi-granularity LLM-based Reranking for Long Text Retrievalen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017. [2] Yutao Zhu, Huaying Yuan, Shuting Wang, Jiongnan Liu, Wenhan Liu, Chenlong Deng, Zhicheng Dou, and Ji-Rong Wen. Large language models for information retrieval: A survey. arXiv preprint arXiv:2308.07107, 2023. [3] Weiwei Sun, Lingyong Yan, Xinyu Ma, Pengjie Ren, Dawei Yin, and Zhaochun Ren. Is chatgpt good at search? investigating large language models as re-ranking agent. arXiv preprint arXiv:2304.09542, 2023. [4] Rolf Jagerman, Honglei Zhuang, Zhen Qin, Xuanhui Wang, and Michael Bendersky. Query expansion by prompting large language models. arXiv preprint arXiv:2305.03653, 2023. [5] Honglei Zhuang, Zhen Qin, Kai Hui, Junru Wu, Le Yan, Xuanhui Wang, and Michael Berdersky. Beyond yes and no: Improving zero-shot llm rankers via scoring finegrained relevance labels. arXiv preprint arXiv:2310.14122, 2023. [6] Big Data, Big Impact: New Possibilities for International Development. (2012, January 22). World Economic Forum. [7] Steve Loh. (2012, February 11). The Age of Big Data. The New York Times. [8] Martin Hilbert and Priscila López. The world’s technological capacity to store, communicate, and compute information. science, 332(6025):60–65, 2011. [9] Netcraft. August 2011 Web Server Survey. https://web.archive.org/, 2011. [10] The Impact of TV in the U.S. Daily Iowan. 22 Nov 1955: 2. [11] OpenAI. OpenAI: Introducing ChatGPT. https://openai.com/, 2022. [12] Google. Google: Introducing Gemini. https://ai.google.dev/, 2023. [13] Alonzo Church. George boole. an investigation of the laws of thought, on which are founded the mathematical theories of logic and probabilities. dover publications, inc., new york1951, 11+ 424 pp. The Journal of Symbolic Logic, 16(3):224–225, 1951. [14] Gerard Salton, Anita Wong, and Chung-Shu Yang. A vector space model for automatic indexing. Communications of the ACM, 18(11):613–620, 1975. [15] Xiaoyong Liu and W Bruce Croft. Statistical language modeling for information retrieval. Annu. Rev. Inf. Sci. Technol., 39(1):1–31, 2005. [16] Fei Song and W Bruce Croft. A general language model for information retrieval. In Proceedings of the eighth international conference on Information and knowledge management, pages 316–321, 1999. [17] David RH Miller, Tim Leek, and Richard M Schwartz. A hidden markov model information retrieval system. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 214–221, 1999. [18] Stephen E Robertson and K Sparck Jones. Relevance weighting of search terms. Journal of the American Society for Information science, 27(3):129–146, 1976. [19] Wayne Xin Zhao, Jing Liu, Ruiyang Ren, and Ji-Rong Wen. Dense text retrieval based on pretrained language models: A survey. ACM Transactions on Information Systems, 42(4):1–60, 2024. [20] Jiafeng Guo, Yixing Fan, Qingyao Ai, and W Bruce Croft. A deep relevance matching model for ad-hoc retrieval. In Proceedings of the 25th ACM international on conference on information and knowledge management, pages 55–64, 2016. [21] Qingyao Ai, Ting Bai, Zhao Cao, Yi Chang, Jiawei Chen, Zhumin Chen, Zhiyong Cheng, Shoubin Dong, Zhicheng Dou, Fuli Feng, et al. Information retrieval meets large language models: a strategic report from chinese ir community. AI Open, 4:80–90, 2023. [22] Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020. [23] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023. [24] Luciano Floridi and Massimo Chiriatti. Gpt-3: Its nature, scope, limits, and consequences. Minds and Machines, 30:681–694, 2020. [25] Gerard Salton and Chris Buckley. Improving retrieval performance by relevance feedback. Journal of the American society for information science, 41(4):288–297, 1990. [26] Ian Ruthven and Mounia Lalmas. A survey on the use of relevance feedback for information access systems. The Knowledge Engineering Review, 18(2):95–145, 2003. [27] Guihong Cao, Jian-Yun Nie, Jianfeng Gao, and Stephen Robertson. Selecting good expansion terms for pseudo-relevance feedback. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 243–250, 2008. [28] Yuanhua Lv and ChengXiang Zhai. Positional relevance model for pseudo-relevance feedback. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 579–586, 2010. [29] Kevyn Collins-Thompson and Jamie Callan. Estimation and use of uncertainty in pseudo-relevance feedback. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 303–310, 2007. [30] Rong Yan, Alexander Hauptmann, and Rong Jin. Multimedia search with pseudorelevance feedback. In Image and Video Retrieval: Second International Conference, CIVR 2003 Urbana-Champaign, IL, USA, July 24–25, 2003 Proceedings 2, pages 238–247. Springer, 2003. [31] Jiazhan Feng, Chongyang Tao, Xiubo Geng, Tao Shen, Can Xu, Guodong Long, Dongyan Zhao, and Daxin Jiang. Knowledge refinement via interaction between search engines and large language models. arXiv preprint arXiv:2305.07402, 2023. [32] Liang Wang, Nan Yang, and Furu Wei. Query2doc: Query expansion with large language models. corr abs/2303.07678 (2023), 2023. [33] Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824– 24837, 2022. [34] Wenjun Peng, Guiyang Li, Yue Jiang, Zilong Wang, Dan Ou, Xiaoyi Zeng, Enhong Chen, et al. Large language model based long-tail query rewriting in taobao search. arXiv preprint arXiv:2311.03758, 2023. [35] Krishna Srinivasan, Karthik Raman, Anupam Samanta, Lingrui Liao, Luca Bertelli, and Mike Bendersky. Quill: Query intent with large language models using retrieval augmentation and multi-stage distillation. arXiv preprint arXiv:2210.15718, 2022. [36] Ellen M Voorhees et al. Overview of the trec 2003 robust retrieval track. In Trec, pages 69–77, 2003. [37] Kirk Roberts, Tasmeer Alam, Steven Bedrick, Dina Demner-Fushman, Kyle Lo, Ian Soboroff, Ellen Voorhees, Lucy Lu Wang, and William R Hersh. Searching for scientific evidence in a pandemic: An overview of trec-covid. Journal of Biomedical Informatics, 121:103865, 2021. [38] Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, and Ellen M Voorhees. Overview of the trec 2019 deep learning track. arXiv preprint arXiv:2003.07820, 2020. [39] Xueguang Ma, Xinyu Zhang, Ronak Pradeep, and Jimmy Lin. Zero-shot listwise document reranking with a large language model. arXiv preprint arXiv:2305.02156, 2023. [40] Zhen Qin, Rolf Jagerman, Kai Hui, Honglei Zhuang, Junru Wu, Jiaming Shen, Tianqi Liu, Jialu Liu, Donald Metzler, Xuanhui Wang, et al. Large language models are effective text rankers with pairwise ranking prompting. arXiv preprint arXiv:2306.17563, 2023. [41] Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep, and Rodrigo Nogueira. Pyserini: A python toolkit for reproducible information retrieval research with sparse and dense representations. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2356–2362, 2021. [42] Peilin Yang, Hui Fang, and Jimmy Lin. Anserini: Reproducible ranking baselines using lucene. Journal of Data and Information Quality (JDIQ), 10(4):1–20, 2018. [43] Peilin Yang, Hui Fang, and Jimmy Lin. Anserini: Enabling the use of lucene for information retrieval research. In Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, pages 1253–1256, 2017. [44] Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906, 2020. [45] Emine Yilmaz Daniel Campos Ellen M. Voorhees Nick Craswell, Bhaskar Mitra. Overview of the trec 2020 deep learning track. arXiv preprint arXiv:2102.07662, 2021. [46] Christophe Van Gysel and Maarten de Rijke. Pytrec_eval: An extremely fast python interface to trec_eval. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pages 873–876, 2018. [47] Monika Arora, Uma Kanjilal, and Dinesh Varshney. Evaluation of information retrieval: precision and recall. International Journal of Indian Culture and Business Management, 12(2):224–236, 2016. [48] Tie-Yan Liu et al. Learning to rank for information retrieval. Foundations and Trends® in Information Retrieval, 3(3):225–331, 2009. [49] Kazuaki Kishida. Property of average precision and its generalization: An examination of evaluation indicator for information retrieval experiments. National Institute of Informatics Tokyo, Japan, 2005. [50] Kalervo Järvelin and Jaana Kekäläinen. Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems (TOIS), 20(4):422–446, 2002. [51] Lin CY Rouge. A package for automatic evaluation of summaries. In Proceedings of Workshop on Text Summarization of ACL, Spain, volume 5, 2004. [52] Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675,2019. [53] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pretraining of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.zh_TW