學術產出-學位論文
文章檢視/開啟
書目匯出
-
題名 基於大型語言模型之提示詞策略於改進文件檢索效能
Prompting Large Language Models for Improving Document Retrieval Performance作者 蕭喬宇
Hsiao, Chiao-Yu貢獻者 蔡銘峰
Tsai, Ming-Feng
蕭喬宇
Hsiao, Chiao-Yu關鍵詞 資訊檢索
大型語言模型
重排序
列表排序
Information Retrieval
LLM
Reordering
List-wise Reordering日期 2024 上傳時間 4-九月-2024 15:02:11 (UTC+8) 摘要 本研究旨在探討如何利用大型語言模型(LLM)進行列表重排序,以提升文件檢索的性能。文件檢索通常分為兩個階段:第一階段是檢索器,用於從大型文檔庫中檢索相關文檔;第二階段是生成器,根據檢索到的文檔生成適當的回答。重排序技術通過對初步檢索到的文檔進行進一步的精細排序,確保生成器接收到最相關和最有價值的文檔,從而提升回答的準確性和相關性。近年來,LLM在重排序任務中的應用已成為趨勢,LLM以其強大的語言理解和生成能力,能夠更好地捕捉文檔間的語義相關性,並更準確地識別與查詢相關的文檔。 儘管 LLM 具有提升重排序性能的潛力,但在實際應用中仍存在一系列限制,例如模型在某些情況下可能存在幻覺和不確定性,這可能導致重排序結果不合理或不準確;為了解決這些問題,本研究使用了以大型語言模型為核心的列表重排序。 具體而言,我們使用查詢和相關段落列表作為提示,讓模型回答最相關的文檔排序。為了進一步提高重排序的準確性和性能,我們探討了多種提示策略,包括改寫前綴提示詞、引入索引標記和上下文學習等。此外,針對 LLM 的提示長度限制,本研究提出了針對文檔的前處理方法,包括段落文字處理及轉換、關鍵字提取及段落摘要生成等,以減少文檔長度並最大程度保留文檔的資訊量。 通過系統性的實驗驗證,我們得出了這些策略對於提升文件檢索性能的有效性,特別是對於基礎性能較差的檢索器,提升幅度可達一倍以上;綜合上述,這項研究對於改進文件檢索中文檔重排序的品質提供了有價值的方法和啟示,不僅能夠提升系統的性能,還有助於推動檢索技術的發展,使其在實際應用中更加有效和可靠。
This study aims to explore how to use large language models (LLM) for listwise reordering to improve document retrieval performance. Document retrieval typically consists of two stages: the first stage involves a retriever that fetches relevant documents from a large corpus, and the second stage involves a generator that produces appropriate responses based on the retrieved documents. Reordering techniques refine the initially retrieved documents to ensure the generator receives the most relevant and valuable documents, thus enhancing the accuracy and relevance of the generated responses. Recently, the application of LLMs in reordering tasks has become a noticeable trend. With their powerful language understanding and generation capabilities, LLMs can better capture the semantic relevance between documents and more accurately identify documents related to the query. Despite the potential of LLMs to enhance reordering performance, practical applications still face several limitations, such as hallucinations and uncertainties that may lead to unreasonable or inaccurate reordering results. To address these issues, this study proposes a listwise reordering method centered on LLMs. Specifically, we use queries and lists of relevant passages as prompts to guide the model in determining the most relevant document order. To further improve reordering accuracy and performance, we explored various prompt strategies, including prefix rewriting, index tagging, and in-context learning. Additionally, to address the prompt length limitations of LLMs, we developed preprocessing methods for documents, including text processing, keyword extraction, and paragraph summarization, to reduce document length while preserving as much information as possible. Through systematic experiments, we verified the effectiveness of these strategies in enhancing document retrieval performance, especially for retrievers with lower baseline performance, where improvements can be more than doubled. In summary, this study provides valuable methods and insights for improving the quality of document reordering in document retrieval, enhancing system performance, and advancing the development of retrieval techniques, making them more effective and reliable in practical applications.參考文獻 [1] Stephen E Robertson, Steve Walker, MM Beaulieu, Mike Gatford, and Alison Payne. Okapi at trec-4. Nist Special Publication Sp, pages 73–96, 1996. [2] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. [3] Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K¨uttler, Mike Lewis, Wen-tau Yih, Tim Rockt¨aschel, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020. [4] Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeff Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. Language models are few-shot learners. ArXiv, abs/2005.14165, 2020. [5] Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, et al. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113, 2023. [6] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth´ee Lacroix, Baptiste Rozi`ere, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023. [7] Chunting Zhou, Pengfei Liu, Puxin Xu, Srinivasan Iyer, Jiao Sun, Yuning Mao,Xuezhe Ma, Avia Efrat, Ping Yu, Lili Yu, et al. Lima: Less is more for alignment. Advances in Neural Information Processing Systems, 36, 2024. [8] Yifan Qiao, Chenyan Xiong, Zhenghao Liu, and Zhiyuan Liu. Understanding the behaviors of bert in ranking. arXiv preprint arXiv:1904.07531, 2019. [9] Stephen Robertson, Hugo Zaragoza, et al. The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends® in Information Retrieval, 3(4):333–389, 2009. [10] Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, R´emi Louf, Morgan Funtowicz, et al. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pages 38–45, 2020. [11] Weiwei Sun, Lingyong Yan, Xinyu Ma, Pengjie Ren, Dawei Yin, and Zhaochun Ren. Is chatgpt good at search? investigating large language models as re-ranking agent. arXiv preprint arXiv:2304.09542, 2023. [12] Rodrigo Nogueira and Kyunghyun Cho. Passage re-ranking with bert. arXiv preprint arXiv:1901.04085, 2019. [13] Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. Ms marco: A human-generated machine reading comprehension dataset. 2016. [14] Laura Dietz, Manisha Verma, Filip Radlinski, and Nick Craswell. Trec complex answer retrieval overview. In TREC, 2017. [15] Zeynep Akkalyoncu Yilmaz, Wei Yang, Haotian Zhang, and Jimmy Lin. Cross-domain modeling of sentence-level evidence for document retrieval. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pages 3490–3496, 2019. [16] Omar Khattab and Matei Zaharia. Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Re- trieval, pages 39–48, 2020. [17] Rodrigo Nogueira, Jimmy Lin, and AI Epistemic. From doc2query to doctttttquery. Online preprint, 6(2), 2019. [18] Rodrigo Nogueira, Zhiying Jiang, and Jimmy Lin. Document ranking with a pre-trained sequence-to-sequence model. arXiv preprint arXiv:2003.06713, 2020. [19] Shengyao Zhuang and Guido Zuccon. Tilde: Term independent likelihood model for passage re-ranking. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1483–1492, 2021. [20] Aliaksei Mikhailiuk, Clifford Wilmot, Maria Perez-Ortiz, Dingcheng Yue, and Rafał K Mantiuk. Active sampling for pairwise comparisons via approximate message passing and information gain maximization. In 2020 25th International Con- ference on Pattern Recognition (ICPR), pages 2559–2566. IEEE, 2021. [21] Shengyao Zhuang, Honglei Zhuang, Bevan Koopman, and Guido Zuccon. A setwise approach for effective and highly efficient zero-shot ranking with large language models. arXiv preprint arXiv:2310.09497, 2023. [22] Xueguang Ma, Xinyu Zhang, Ronak Pradeep, and Jimmy Lin. Zero-shot listwise document reranking with a large language model. arXiv preprint arXiv:2305.02156, 2023. [23] Ronak Pradeep, Sahel Sharifymoghaddam, and Jimmy Lin. Rankvicuna: Zero-shot listwise document reranking with open-source large language models. arXiv preprint arXiv:2309.15088, 2023. [24] Ronak Pradeep, Sahel Sharifymoghaddam, and Jimmy Lin. Rankzephyr: Effective and robust zero-shot listwise reranking is a breeze! arXiv preprint arXiv:2312.02724, 2023. [25] Vladimir Karpukhin, Barlas O˘guz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906, 2020. [26] Shitao Xiao, Zheng Liu, Peitian Zhang, and Niklas Muennighoff. C-pack: Packaged resources to advance general chinese embedding, 2023. [27] Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, et al. Scaling instruction-finetuned language models. Journal of Machine Learning Research, 25(70):1–53, 2024. [28] Jaime Carbonell and Jade Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 335–336, 1998. [29] Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140):1–67, 2020. [30] Nandan Thakur, Nils Reimers, Andreas R¨uckl´e, Abhishek Srivastava, and Iryna Gurevych. Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models. arXiv preprint arXiv:2104.08663, 2021. [31] Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084, 2019. [32] Peitian Zhang, Shitao Xiao, Zheng Liu, Zhicheng Dou, and Jian-Yun Nie. Retrieve anything to augment large language models. arXiv preprint arXiv:2310.07554, 2023. [33] Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Cl´ementine Fourrier, Nathan Habib, et al. Zephyr: Direct distillation of lm alignment. arXiv preprint arXiv:2310.16944, 2023. [34] Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, and Ming Zhou. Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. Advances in Neural Information Processing Systems, 33:5776–5788, 2020. [35] Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, and Qun Liu. Tinybert: Distilling bert for natural language understanding. arXiv preprint arXiv:1909.10351, 2019. 描述 碩士
國立政治大學
資訊科學系
111753203資料來源 http://thesis.lib.nccu.edu.tw/record/#G0111753203 資料類型 thesis dc.contributor.advisor 蔡銘峰 zh_TW dc.contributor.advisor Tsai, Ming-Feng en_US dc.contributor.author (作者) 蕭喬宇 zh_TW dc.contributor.author (作者) Hsiao, Chiao-Yu en_US dc.creator (作者) 蕭喬宇 zh_TW dc.creator (作者) Hsiao, Chiao-Yu en_US dc.date (日期) 2024 en_US dc.date.accessioned 4-九月-2024 15:02:11 (UTC+8) - dc.date.available 4-九月-2024 15:02:11 (UTC+8) - dc.date.issued (上傳時間) 4-九月-2024 15:02:11 (UTC+8) - dc.identifier (其他 識別碼) G0111753203 en_US dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/153391 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊科學系 zh_TW dc.description (描述) 111753203 zh_TW dc.description.abstract (摘要) 本研究旨在探討如何利用大型語言模型(LLM)進行列表重排序,以提升文件檢索的性能。文件檢索通常分為兩個階段:第一階段是檢索器,用於從大型文檔庫中檢索相關文檔;第二階段是生成器,根據檢索到的文檔生成適當的回答。重排序技術通過對初步檢索到的文檔進行進一步的精細排序,確保生成器接收到最相關和最有價值的文檔,從而提升回答的準確性和相關性。近年來,LLM在重排序任務中的應用已成為趨勢,LLM以其強大的語言理解和生成能力,能夠更好地捕捉文檔間的語義相關性,並更準確地識別與查詢相關的文檔。 儘管 LLM 具有提升重排序性能的潛力,但在實際應用中仍存在一系列限制,例如模型在某些情況下可能存在幻覺和不確定性,這可能導致重排序結果不合理或不準確;為了解決這些問題,本研究使用了以大型語言模型為核心的列表重排序。 具體而言,我們使用查詢和相關段落列表作為提示,讓模型回答最相關的文檔排序。為了進一步提高重排序的準確性和性能,我們探討了多種提示策略,包括改寫前綴提示詞、引入索引標記和上下文學習等。此外,針對 LLM 的提示長度限制,本研究提出了針對文檔的前處理方法,包括段落文字處理及轉換、關鍵字提取及段落摘要生成等,以減少文檔長度並最大程度保留文檔的資訊量。 通過系統性的實驗驗證,我們得出了這些策略對於提升文件檢索性能的有效性,特別是對於基礎性能較差的檢索器,提升幅度可達一倍以上;綜合上述,這項研究對於改進文件檢索中文檔重排序的品質提供了有價值的方法和啟示,不僅能夠提升系統的性能,還有助於推動檢索技術的發展,使其在實際應用中更加有效和可靠。 zh_TW dc.description.abstract (摘要) This study aims to explore how to use large language models (LLM) for listwise reordering to improve document retrieval performance. Document retrieval typically consists of two stages: the first stage involves a retriever that fetches relevant documents from a large corpus, and the second stage involves a generator that produces appropriate responses based on the retrieved documents. Reordering techniques refine the initially retrieved documents to ensure the generator receives the most relevant and valuable documents, thus enhancing the accuracy and relevance of the generated responses. Recently, the application of LLMs in reordering tasks has become a noticeable trend. With their powerful language understanding and generation capabilities, LLMs can better capture the semantic relevance between documents and more accurately identify documents related to the query. Despite the potential of LLMs to enhance reordering performance, practical applications still face several limitations, such as hallucinations and uncertainties that may lead to unreasonable or inaccurate reordering results. To address these issues, this study proposes a listwise reordering method centered on LLMs. Specifically, we use queries and lists of relevant passages as prompts to guide the model in determining the most relevant document order. To further improve reordering accuracy and performance, we explored various prompt strategies, including prefix rewriting, index tagging, and in-context learning. Additionally, to address the prompt length limitations of LLMs, we developed preprocessing methods for documents, including text processing, keyword extraction, and paragraph summarization, to reduce document length while preserving as much information as possible. Through systematic experiments, we verified the effectiveness of these strategies in enhancing document retrieval performance, especially for retrievers with lower baseline performance, where improvements can be more than doubled. In summary, this study provides valuable methods and insights for improving the quality of document reordering in document retrieval, enhancing system performance, and advancing the development of retrieval techniques, making them more effective and reliable in practical applications. en_US dc.description.tableofcontents 第一章 緒論 1 1.1 前言 1 第二章 相關文獻探討 3 2.1 資訊檢索 (Information Retrieval, IR) 4 2.1.1 稀疏檢索 (Sparse Retrieval)4 2.1.2 密集檢索 (Dense Retrieval) 4 2.2 重排序(Reordering) 5 2.2.1 交叉編碼器(Cross-Encoder)5 2.2.2 LLM應用於重新排序(LLM for reordering) 6 第三章 研究方法 8 3.1 研究架構 8 3.2 檢索器 (Retriever) 9 3.3 LLM重排器(LLM Reorder)9 3.4 提示設計與文檔預處理 10 3.4.1 提示設計策略 11 3.4.2 文檔前處理策略 14 第四章 實驗結果與討論 16 4.1 資料集 16 4.2 評估標準 17 4.3 實驗設定 18 4.3.1 預訓練語言模型 18 4.3.2 重排序模型 18 4.4 實驗結果分析與討論 21 4.4.1 提示設計策略 21 4.4.2 文檔前處理策略 26 4.4.3 重排序模型之比較 28 4.4.4 案例研究 29 第五章 結論 33 參考文獻 35 zh_TW dc.format.extent 2207092 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0111753203 en_US dc.subject (關鍵詞) 資訊檢索 zh_TW dc.subject (關鍵詞) 大型語言模型 zh_TW dc.subject (關鍵詞) 重排序 zh_TW dc.subject (關鍵詞) 列表排序 zh_TW dc.subject (關鍵詞) Information Retrieval en_US dc.subject (關鍵詞) LLM en_US dc.subject (關鍵詞) Reordering en_US dc.subject (關鍵詞) List-wise Reordering en_US dc.title (題名) 基於大型語言模型之提示詞策略於改進文件檢索效能 zh_TW dc.title (題名) Prompting Large Language Models for Improving Document Retrieval Performance en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) [1] Stephen E Robertson, Steve Walker, MM Beaulieu, Mike Gatford, and Alison Payne. Okapi at trec-4. Nist Special Publication Sp, pages 73–96, 1996. [2] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. [3] Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K¨uttler, Mike Lewis, Wen-tau Yih, Tim Rockt¨aschel, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020. [4] Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeff Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. Language models are few-shot learners. ArXiv, abs/2005.14165, 2020. [5] Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, et al. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113, 2023. [6] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth´ee Lacroix, Baptiste Rozi`ere, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023. [7] Chunting Zhou, Pengfei Liu, Puxin Xu, Srinivasan Iyer, Jiao Sun, Yuning Mao,Xuezhe Ma, Avia Efrat, Ping Yu, Lili Yu, et al. Lima: Less is more for alignment. Advances in Neural Information Processing Systems, 36, 2024. [8] Yifan Qiao, Chenyan Xiong, Zhenghao Liu, and Zhiyuan Liu. Understanding the behaviors of bert in ranking. arXiv preprint arXiv:1904.07531, 2019. [9] Stephen Robertson, Hugo Zaragoza, et al. The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends® in Information Retrieval, 3(4):333–389, 2009. [10] Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, R´emi Louf, Morgan Funtowicz, et al. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pages 38–45, 2020. [11] Weiwei Sun, Lingyong Yan, Xinyu Ma, Pengjie Ren, Dawei Yin, and Zhaochun Ren. Is chatgpt good at search? investigating large language models as re-ranking agent. arXiv preprint arXiv:2304.09542, 2023. [12] Rodrigo Nogueira and Kyunghyun Cho. Passage re-ranking with bert. arXiv preprint arXiv:1901.04085, 2019. [13] Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. Ms marco: A human-generated machine reading comprehension dataset. 2016. [14] Laura Dietz, Manisha Verma, Filip Radlinski, and Nick Craswell. Trec complex answer retrieval overview. In TREC, 2017. [15] Zeynep Akkalyoncu Yilmaz, Wei Yang, Haotian Zhang, and Jimmy Lin. Cross-domain modeling of sentence-level evidence for document retrieval. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pages 3490–3496, 2019. [16] Omar Khattab and Matei Zaharia. Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Re- trieval, pages 39–48, 2020. [17] Rodrigo Nogueira, Jimmy Lin, and AI Epistemic. From doc2query to doctttttquery. Online preprint, 6(2), 2019. [18] Rodrigo Nogueira, Zhiying Jiang, and Jimmy Lin. Document ranking with a pre-trained sequence-to-sequence model. arXiv preprint arXiv:2003.06713, 2020. [19] Shengyao Zhuang and Guido Zuccon. Tilde: Term independent likelihood model for passage re-ranking. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1483–1492, 2021. [20] Aliaksei Mikhailiuk, Clifford Wilmot, Maria Perez-Ortiz, Dingcheng Yue, and Rafał K Mantiuk. Active sampling for pairwise comparisons via approximate message passing and information gain maximization. In 2020 25th International Con- ference on Pattern Recognition (ICPR), pages 2559–2566. IEEE, 2021. [21] Shengyao Zhuang, Honglei Zhuang, Bevan Koopman, and Guido Zuccon. A setwise approach for effective and highly efficient zero-shot ranking with large language models. arXiv preprint arXiv:2310.09497, 2023. [22] Xueguang Ma, Xinyu Zhang, Ronak Pradeep, and Jimmy Lin. Zero-shot listwise document reranking with a large language model. arXiv preprint arXiv:2305.02156, 2023. [23] Ronak Pradeep, Sahel Sharifymoghaddam, and Jimmy Lin. Rankvicuna: Zero-shot listwise document reranking with open-source large language models. arXiv preprint arXiv:2309.15088, 2023. [24] Ronak Pradeep, Sahel Sharifymoghaddam, and Jimmy Lin. Rankzephyr: Effective and robust zero-shot listwise reranking is a breeze! arXiv preprint arXiv:2312.02724, 2023. [25] Vladimir Karpukhin, Barlas O˘guz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906, 2020. [26] Shitao Xiao, Zheng Liu, Peitian Zhang, and Niklas Muennighoff. C-pack: Packaged resources to advance general chinese embedding, 2023. [27] Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, et al. Scaling instruction-finetuned language models. Journal of Machine Learning Research, 25(70):1–53, 2024. [28] Jaime Carbonell and Jade Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 335–336, 1998. [29] Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140):1–67, 2020. [30] Nandan Thakur, Nils Reimers, Andreas R¨uckl´e, Abhishek Srivastava, and Iryna Gurevych. Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models. arXiv preprint arXiv:2104.08663, 2021. [31] Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084, 2019. [32] Peitian Zhang, Shitao Xiao, Zheng Liu, Zhicheng Dou, and Jian-Yun Nie. Retrieve anything to augment large language models. arXiv preprint arXiv:2310.07554, 2023. [33] Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Cl´ementine Fourrier, Nathan Habib, et al. Zephyr: Direct distillation of lm alignment. arXiv preprint arXiv:2310.16944, 2023. [34] Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, and Ming Zhou. Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. Advances in Neural Information Processing Systems, 33:5776–5788, 2020. [35] Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, and Qun Liu. Tinybert: Distilling bert for natural language understanding. arXiv preprint arXiv:1909.10351, 2019. zh_TW