基於輕量化微調方法之進階檢索模型於改進文件檢索效能 | 學術產出

學術產出-學位論文

文章檢視/開啟

pdf(0)

書目匯出

Google Scholar^TM

題名	基於輕量化微調方法之進階檢索模型於改進文件檢索效能 Lightweight Fine-Tuning Dense Retrieval Models for Enhancing Document Retrieval Performance
作者	王奕凱 Wang, I-Kai
貢獻者	蔡銘峰王奕凱 Wang, I-Kai
關鍵詞	資訊檢索大語言模型參數高效微調低佚適應 Information Retrieval LoRA LLM PEFT
日期	2024
上傳時間	4-九月-2024 15:01:34 (UTC+8)
摘要	資訊檢索（IR）是一項從大規模文本集合中找到與用戶查詢相關資訊的任務。隨著大型語言模型（PLM）的發展達到了新的高度。密集檢索技術便是透過將查詢句和文本輸入大型語言模型，編碼成密集向量進行關聯度計算此項技術能處理語言多樣性和複雜性。大型預訓練語言模型的訓練資源需求高，因此參數高效率微調（PEFT）如適配器、 LoRA（低秩適應）等技術相繼提出，旨在減少微調參數量並保持性能。然而研究指出，此類方法在資訊檢索任務中效果有限，訓練參數過少會影響梯度下降方向，導致模型性能下降。本研究想利用LoRA的靈活性，在不增加額外訓練參數的情況下，以LoRA矩陣再加權文句的向量，增進訓練效果，設計一個更加通用的模型架構，並與其他較先進的LoRA技術結合，以應對PEFT方法在資訊檢索任務中的挑戰。 Information retrieval (IR) is the task of finding information related to user queries from large text collections. With the development of large pre- trained language models (PLMs) reaching new heights, dense retrieval tech- niques have emerged. These techniques involve encoding query sentences and texts into dense vectors using large language models to calculate rel- evance scores. This approach effectively handles linguistic diversity and complexity. However, training large pre-trained language models requires substantial resources. Consequently, parameter-efficient fine-tuning (PEFT) techniques, such as adapters and LoRA (Low-Rank Adaptation), have been proposed to reduce the number of fine-tuning parameters while maintaining performance. Nonetheless, studies indicate that these methods have limited effectiveness in IR tasks, as too few training parameters can affect the direc- tion of gradient descent, leading to degraded model performance. This study aims to leverage the flexibility of LoRA to enhance training effectiveness without increasing additional training parameters. By re-weighting sentence vectors with LoRA matrices, we design a more versatile model architecture. This architecture will be combined with other advanced LoRA techniques to address the challenges of PEFT methods in IR tasks.
參考文獻	A. Aghajanyan, L. Zettlemoyer, and S. Gupta. Intrinsic dimensionality explains the effectiveness of language model fine-tuning, 2020. [2] V. Boteva, D. Gholipour, A. Sokolov, and S. Riezler. A full-text learning to rank dataset for medical information retrieval. In N. Ferro, F. Crestani, M.-F. Moens, J. Mothe, F. Silvestri, G. M. Di Nunzio, C. Hauff, and G. Silvello, editors, Advances in Information Retrieval, pages 716–722, Cham, 2016. Springer International Pub- lishing. [3] J. Chen, S. Xiao, P. Zhang, K. Luo, D. Lian, and Z. Liu. Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self- knowledge distillation. arXiv preprint arXiv:2402.03216, 2024. [4] T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer. Qlora: Efficient fine- tuning of quantized llms. Advances in Neural Information Processing Systems, 36, 2024. [5] J. Frankle and M. Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural networks, 2019. [6] S. Hayou, N. Ghosh, and B. Yu. Lora+: Efficient low rank adaptation of large models. arXiv preprint arXiv:2402.12354, 2024. [7] N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Ges- mundo, M. Attariyan, and S. Gelly. Parameter-efficient transfer learning for nlp. In International conference on machine learning, pages 2790–2799. PMLR, 2019. [8] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021. [9] N. Hyeon-Woo, M. Ye-Bin, and T.-H. Oh. Fedpara: Low-rank hadamard product for communication-efficient federated learning, 2023. 35 [10] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. [11] D. J. Kopiczko, T. Blankevoort, and Y. M. Asano. Vera: Vector-based random matrix adaptation, 2024. [12] T. Kwiatkowski, J. Palomaki, O. Redfield, M. Collins, A. Parikh, C. Alberti, D. Ep- stein, I. Polosukhin, M. Kelcey, J. Devlin, K. Lee, K. N. Toutanova, L. Jones, M.-W. Chang, A. Dai, J. Uszkoreit, Q. Le, and S. Petrov. Natural questions: a benchmark for question answering research. Transactions of the Association of Computational Linguistics, 2019. [13] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. K¨uttler, M. Lewis, W.-t. Yih, T. Rockt¨aschel, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Sys- tems, 33:9459–9474, 2020. [14] X. L. Li and P. Liang. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021. [15] Y. Li, Y. Yu, C. Liang, P. He, N. Karampatziakis, W. Chen, and T. Zhao. Loftq: Lora-fine-tuning-aware quantization for large language models. arXiv preprint arXiv:2310.08659, 2023. [16] S.-Y. Liu, C.-Y. Wang, H. Yin, P. Molchanov, Y.-C. F. Wang, K.-T. Cheng, and M.-H. Chen. Dora: Weight-decomposed low-rank adaptation. arXiv preprint arXiv:2402.09353, 2024. [17] X. Liu, Y. Zheng, Z. Du, M. Ding, Y. Qian, Z. Yang, and J. Tang. Gpt understands, too. AI Open, 2023. [18] K. Lu, A. Grover, P. Abbeel, and I. Mordatch. Pretrained transformers as universal computation engines, 2021. [19] X. Ma, J. Guo, R. Zhang, Y. Fan, and X. Cheng. Scattered or connected? an opti- mized parameter-efficient tuning approach for information retrieval. In Proceedings of the 31st ACM International Conference on Information & Knowledge Manage- ment, pages 1471–1480, 2022. [20] M. Maia, S. Handschuh, A. Freitas, B. Davis, R. McDermott, M. Zarrouk, and A. Balahur. Www’18 open challenge: Financial opinion mining and question an- swering. In Companion Proceedings of the The Web Conference 2018, WWW ’18, page 1941–1942, Republic and Canton of Geneva, CHE, 2018. International World Wide Web Conferences Steering Committee. 36 [21] N. Muennighoff, H. Su, L. Wang, N. Yang, F. Wei, T. Yu, A. Singh, and D. Kiela. Generative representational instruction tuning. arXiv preprint arXiv:2402.09906, 2024. [22] J. Pfeiffer, A. Kamath, A. R¨uckl´e, K. Cho, and I. Gurevych. Adapterfusion: Non- destructive task composition for transfer learning. arXiv preprint arXiv:2005.00247, 2020. [23] S.-A. Rebuffi, H. Bilen, and A. Vedaldi. Learning multiple visual domains with residual adapters. Advances in neural information processing systems, 30, 2017. [24] N. Reimers and I. Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084, 2019. [25] S. E. Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu, M. Gatford, et al. Okapi at trec-3. Nist Special Publication Sp, 109:109, 1995. [26] T. Salimans and D. P. Kingma. Weight normalization: A simple reparameterization to accelerate training of deep neural networks, 2016. [27] V. Sanh, L. Debut, J. Chaumond, and T. Wolf. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter, 2020. [28] N. Thakur, N. Reimers, J. Daxenberger, and I. Gurevych. Augmened sbert: Data augmentation method for improving bi-encoders for pairwise sentence scoring tasks. arXiv preprint arXiv:2010.08240, 2020. [29] N. Thakur, N. Reimers, A. R¨uckl´e, A. Srivastava, and I. Gurevych. Beir: A heteroge- nous benchmark for zero-shot evaluation of information retrieval models. arXiv preprint arXiv:2104.08663, 2021. [30] D. Wadden, S. Lin, K. Lo, L. L. Wang, M. van Zuylen, A. Cohan, and H. Hajishirzi. Fact or fiction: Verifying scientific claims. arXiv preprint arXiv:2004.14974, 2020. [31] L. Wang, N. Yang, X. Huang, B. Jiao, L. Yang, D. Jiang, R. Majumder, and F. Wei. Text embeddings by weakly-supervised contrastive pre-training. arXiv preprint arXiv:2212.03533, 2022. [32] A. Waswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. Gomez, L. Kaiser, and I. Polosukhin. Attention is all you need. In NIPS, 2017. [33] L. Xu, H. Xie, S.-Z. J. Qin, X. Tao, and F. L. Wang. Parameter-efficient fine-tuning methods for pretrained language models: A critical review and assessment. arXiv preprint arXiv:2312.12148, 2023. [34] S.-Y. Yeh, Y.-G. Hsieh, Z. Gao, B. B. W. Yang, G. Oh, and Y. Gong. Navigating text-to-image customization: From lycoris fine-tuning to model evaluation, 2024. [35] Q. Zhang, M. Chen, A. Bukharin, P. He, Y. Cheng, W. Chen, and T. Zhao. Adaptive budget allocation for parameter-efficient fine-tuning. In The Eleventh International Conference on Learning Representations, 2023.
描述	碩士國立政治大學資訊科學系 111753169
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0111753169
資料類型	thesis

dc.contributor.advisor	蔡銘峰	zh_TW
dc.contributor.author (作者)	王奕凱	zh_TW
dc.contributor.author (作者)	Wang, I-Kai	en_US
dc.creator (作者)	王奕凱	zh_TW
dc.creator (作者)	Wang, I-Kai	en_US
dc.date (日期)	2024	en_US
dc.date.accessioned	4-九月-2024 15:01:34 (UTC+8)	-
dc.date.available	4-九月-2024 15:01:34 (UTC+8)	-
dc.date.issued (上傳時間)	4-九月-2024 15:01:34 (UTC+8)	-
dc.identifier (其他識別碼)	G0111753169	en_US
dc.identifier.uri (URI)	https://nccur.lib.nccu.edu.tw/handle/140.119/153388	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	資訊科學系	zh_TW
dc.description (描述)	111753169	zh_TW
dc.description.abstract (摘要)	資訊檢索（IR）是一項從大規模文本集合中找到與用戶查詢相關資訊的任務。隨著大型語言模型（PLM）的發展達到了新的高度。密集檢索技術便是透過將查詢句和文本輸入大型語言模型，編碼成密集向量進行關聯度計算此項技術能處理語言多樣性和複雜性。大型預訓練語言模型的訓練資源需求高，因此參數高效率微調（PEFT）如適配器、 LoRA（低秩適應）等技術相繼提出，旨在減少微調參數量並保持性能。然而研究指出，此類方法在資訊檢索任務中效果有限，訓練參數過少會影響梯度下降方向，導致模型性能下降。本研究想利用LoRA的靈活性，在不增加額外訓練參數的情況下，以LoRA矩陣再加權文句的向量，增進訓練效果，設計一個更加通用的模型架構，並與其他較先進的LoRA技術結合，以應對PEFT方法在資訊檢索任務中的挑戰。	zh_TW
dc.description.abstract (摘要)	Information retrieval (IR) is the task of finding information related to user queries from large text collections. With the development of large pre- trained language models (PLMs) reaching new heights, dense retrieval tech- niques have emerged. These techniques involve encoding query sentences and texts into dense vectors using large language models to calculate rel- evance scores. This approach effectively handles linguistic diversity and complexity. However, training large pre-trained language models requires substantial resources. Consequently, parameter-efficient fine-tuning (PEFT) techniques, such as adapters and LoRA (Low-Rank Adaptation), have been proposed to reduce the number of fine-tuning parameters while maintaining performance. Nonetheless, studies indicate that these methods have limited effectiveness in IR tasks, as too few training parameters can affect the direc- tion of gradient descent, leading to degraded model performance. This study aims to leverage the flexibility of LoRA to enhance training effectiveness without increasing additional training parameters. By re-weighting sentence vectors with LoRA matrices, we design a more versatile model architecture. This architecture will be combined with other advanced LoRA techniques to address the challenges of PEFT methods in IR tasks.	en_US
dc.description.tableofcontents	第一章緒論 1 1.1 前言 1 1.2 問題定義 3 第二章相關文獻探討 6 2.1 預訓練模型（PLM） 6 2.2 資訊檢索 9 2.2.1 稀疏檢索（ Sparse Retrieval ） 10 2.2.2 密集檢索 (Dense Retrieval) 11 2.3 LoRA（低秩適應） 14 2.3.1 LoRA+ 15 2.3.2 DoRA 15 2.3.3 VeRA 16 第三章研究方法18 3.1 雙編碼器訓練 19 3.2 LoRA 19 3.2.1 單一矩陣加權（ Single Matrix Re-Weight, SMRW ） 20 3.2.2 多矩陣加權（ Multiple Matrix Re-Weight, MMRW ） 21 3.2.3 LoRA+ 21 3.2.4 與 DoRA 結合 22 3.2.5 與 VeRA 結合 22 第四章實驗結果與討論 24 4.1 資料集 24 4.2 實驗指標 24 4.2.1 NDCG@K 25 4.2.2 Recall@K 26 4.3 實驗結果 26 4.4 增強不同 LoRA 變體的效果 27 4.5 訓練矩陣與加權矩陣的挑選 28 4.6 可訓練參數量對表象的影響 29 4.7 時間與空間 30 第五章結論 33 5.1 結論 33 參考文獻 35	zh_TW
dc.format.extent	1935394 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0111753169	en_US
dc.subject (關鍵詞)	資訊檢索	zh_TW
dc.subject (關鍵詞)	大語言模型	zh_TW
dc.subject (關鍵詞)	參數高效微調	zh_TW
dc.subject (關鍵詞)	低佚適應	zh_TW
dc.subject (關鍵詞)	Information Retrieval	en_US
dc.subject (關鍵詞)	LoRA	en_US
dc.subject (關鍵詞)	LLM	en_US
dc.subject (關鍵詞)	PEFT	en_US
dc.title (題名)	基於輕量化微調方法之進階檢索模型於改進文件檢索效能	zh_TW
dc.title (題名)	Lightweight Fine-Tuning Dense Retrieval Models for Enhancing Document Retrieval Performance	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	A. Aghajanyan, L. Zettlemoyer, and S. Gupta. Intrinsic dimensionality explains the effectiveness of language model fine-tuning, 2020. [2] V. Boteva, D. Gholipour, A. Sokolov, and S. Riezler. A full-text learning to rank dataset for medical information retrieval. In N. Ferro, F. Crestani, M.-F. Moens, J. Mothe, F. Silvestri, G. M. Di Nunzio, C. Hauff, and G. Silvello, editors, Advances in Information Retrieval, pages 716–722, Cham, 2016. Springer International Pub- lishing. [3] J. Chen, S. Xiao, P. Zhang, K. Luo, D. Lian, and Z. Liu. Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self- knowledge distillation. arXiv preprint arXiv:2402.03216, 2024. [4] T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer. Qlora: Efficient fine- tuning of quantized llms. Advances in Neural Information Processing Systems, 36, 2024. [5] J. Frankle and M. Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural networks, 2019. [6] S. Hayou, N. Ghosh, and B. Yu. Lora+: Efficient low rank adaptation of large models. arXiv preprint arXiv:2402.12354, 2024. [7] N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Ges- mundo, M. Attariyan, and S. Gelly. Parameter-efficient transfer learning for nlp. In International conference on machine learning, pages 2790–2799. PMLR, 2019. [8] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021. [9] N. Hyeon-Woo, M. Ye-Bin, and T.-H. Oh. Fedpara: Low-rank hadamard product for communication-efficient federated learning, 2023. 35 [10] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. [11] D. J. Kopiczko, T. Blankevoort, and Y. M. Asano. Vera: Vector-based random matrix adaptation, 2024. [12] T. Kwiatkowski, J. Palomaki, O. Redfield, M. Collins, A. Parikh, C. Alberti, D. Ep- stein, I. Polosukhin, M. Kelcey, J. Devlin, K. Lee, K. N. Toutanova, L. Jones, M.-W. Chang, A. Dai, J. Uszkoreit, Q. Le, and S. Petrov. Natural questions: a benchmark for question answering research. Transactions of the Association of Computational Linguistics, 2019. [13] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. K¨uttler, M. Lewis, W.-t. Yih, T. Rockt¨aschel, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Sys- tems, 33:9459–9474, 2020. [14] X. L. Li and P. Liang. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021. [15] Y. Li, Y. Yu, C. Liang, P. He, N. Karampatziakis, W. Chen, and T. Zhao. Loftq: Lora-fine-tuning-aware quantization for large language models. arXiv preprint arXiv:2310.08659, 2023. [16] S.-Y. Liu, C.-Y. Wang, H. Yin, P. Molchanov, Y.-C. F. Wang, K.-T. Cheng, and M.-H. Chen. Dora: Weight-decomposed low-rank adaptation. arXiv preprint arXiv:2402.09353, 2024. [17] X. Liu, Y. Zheng, Z. Du, M. Ding, Y. Qian, Z. Yang, and J. Tang. Gpt understands, too. AI Open, 2023. [18] K. Lu, A. Grover, P. Abbeel, and I. Mordatch. Pretrained transformers as universal computation engines, 2021. [19] X. Ma, J. Guo, R. Zhang, Y. Fan, and X. Cheng. Scattered or connected? an opti- mized parameter-efficient tuning approach for information retrieval. In Proceedings of the 31st ACM International Conference on Information & Knowledge Manage- ment, pages 1471–1480, 2022. [20] M. Maia, S. Handschuh, A. Freitas, B. Davis, R. McDermott, M. Zarrouk, and A. Balahur. Www’18 open challenge: Financial opinion mining and question an- swering. In Companion Proceedings of the The Web Conference 2018, WWW ’18, page 1941–1942, Republic and Canton of Geneva, CHE, 2018. International World Wide Web Conferences Steering Committee. 36 [21] N. Muennighoff, H. Su, L. Wang, N. Yang, F. Wei, T. Yu, A. Singh, and D. Kiela. Generative representational instruction tuning. arXiv preprint arXiv:2402.09906, 2024. [22] J. Pfeiffer, A. Kamath, A. R¨uckl´e, K. Cho, and I. Gurevych. Adapterfusion: Non- destructive task composition for transfer learning. arXiv preprint arXiv:2005.00247, 2020. [23] S.-A. Rebuffi, H. Bilen, and A. Vedaldi. Learning multiple visual domains with residual adapters. Advances in neural information processing systems, 30, 2017. [24] N. Reimers and I. Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084, 2019. [25] S. E. Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu, M. Gatford, et al. Okapi at trec-3. Nist Special Publication Sp, 109:109, 1995. [26] T. Salimans and D. P. Kingma. Weight normalization: A simple reparameterization to accelerate training of deep neural networks, 2016. [27] V. Sanh, L. Debut, J. Chaumond, and T. Wolf. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter, 2020. [28] N. Thakur, N. Reimers, J. Daxenberger, and I. Gurevych. Augmened sbert: Data augmentation method for improving bi-encoders for pairwise sentence scoring tasks. arXiv preprint arXiv:2010.08240, 2020. [29] N. Thakur, N. Reimers, A. R¨uckl´e, A. Srivastava, and I. Gurevych. Beir: A heteroge- nous benchmark for zero-shot evaluation of information retrieval models. arXiv preprint arXiv:2104.08663, 2021. [30] D. Wadden, S. Lin, K. Lo, L. L. Wang, M. van Zuylen, A. Cohan, and H. Hajishirzi. Fact or fiction: Verifying scientific claims. arXiv preprint arXiv:2004.14974, 2020. [31] L. Wang, N. Yang, X. Huang, B. Jiao, L. Yang, D. Jiang, R. Majumder, and F. Wei. Text embeddings by weakly-supervised contrastive pre-training. arXiv preprint arXiv:2212.03533, 2022. [32] A. Waswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. Gomez, L. Kaiser, and I. Polosukhin. Attention is all you need. In NIPS, 2017. [33] L. Xu, H. Xie, S.-Z. J. Qin, X. Tao, and F. L. Wang. Parameter-efficient fine-tuning methods for pretrained language models: A critical review and assessment. arXiv preprint arXiv:2312.12148, 2023. [34] S.-Y. Yeh, Y.-G. Hsieh, Z. Gao, B. B. W. Yang, G. Oh, and Y. Gong. Navigating text-to-image customization: From lycoris fine-tuning to model evaluation, 2024. [35] Q. Zhang, M. Chen, A. Bukharin, P. He, Y. Cheng, W. Chen, and T. Zhao. Adaptive budget allocation for parameter-efficient fine-tuning. In The Eleventh International Conference on Learning Representations, 2023.	zh_TW

學術產出-學位論文

文章檢視/開啟

書目匯出

Google ScholarTM

政大圖書館

引文資訊

TAIR相關學術產出

Google Scholar^TM