FlashRAG_ A Modular Toolkit for Efficient Retrieval-Augmented Generation Research

原创

mb5dc7e150492dd 2024-09-07 15:27:51 ©著作权

文章标签 人工智能 python sed lua ci 文章分类 HarmonyOS 后端开发

©著作权归作者所有：来自51CTO博客作者mb5dc7e150492dd的原创作品，请联系作者获取转载授权，否则将追究法律责任

文章目录

Introduction

RAG
Motivation
Contributions

FlashRAG

Component Module
Pipeline Module
Datasets and Corpus
Evaluation

Experimental Result and Discussion

Experimental Setup
Methods and Main results
Impact of Retrieval on RAG

Take away
References

FlashRAG_ A Modular Toolkit for Efficient Retrieval-Augmented Generation Research_python

Paper: https://arxiv.org/abs/2405.13576
Repo: https://github.com/RUC-NLPIR/FlashRAG

Introduction

RAG

A robust solution to mitigate hallucination issues in LLMs

FlashRAG_ A Modular Toolkit for Efficient Retrieval-Augmented Generation Research_python_02

FlashRAG_ A Modular Toolkit for Efficient Retrieval-Augmented Generation Research_python_03

Motivation

Many works are not open-source or have fixed settings in their open-source code, making it difficult to adapt to new data or innovative components.
The datasets and retrieval corpus used often vary, with resources being scattered.
Due to the complexity of RAG systems, involving multiple steps such as indexing, retrieval, and generation, researchers often need to implement many parts of the system themselves. Although there are some existing RAG toolkits like LangChain and LlamaIndex, they are typically large and cumbersome, hindering researchers from implementing customized processes and failing to address the aforementioned issues.

FlashRAG_ A Modular Toolkit for Efficient Retrieval-Augmented Generation Research_ci_04

Contributions

FlashRAG, an open-source library designed to enable researchers to easily reproduce existing RAG methods and develop their own RAG algorithms.

FlashRAG_ A Modular Toolkit for Efficient Retrieval-Augmented Generation Research_lua_05

FlashRAG

Component Module

The Component Module encompasses five main components: Judger, Retriever,

Reranker, Refiner, and Generator.

Judger functions as a preliminary component that assesses whether a query necessitates retrieval.
Retriever:

sparse retrieval, BM25
dense retrieval, BERT-based embedding models such as DPR, E5 and BGE.
vector database, FAISS

Reranker aims at refining the order of results returned by the retriever to enhance retrieval accuracy.

Cross-Encoder models, such as the bge-reranker and jina-reranker.

Refiner refines the input text for generators to reduce token usage and reduce noise from retrieved documents, improving the final RAG responses.

The Extractive Refiner employs an embedding model to extract semantic units, like sentences or phrases, from the retrieved text that hold higher semantic similarity with the query.
The Abstractive Refiner utilizes a seq2seq model to directly summarize the retrieved text.
Perplexity-based refiners，LLMLingua Refiner and Selective-Context Refiner.

Generator

vllm & FastChat
Transformers
OpenAI
Encoder-Decoder models

Pipeline Module

Sequential Pipeline implements a linear execution path for the query, formally represented as query -> retriever -> post-retrieval（reranker, refiner) -> generator.
Branching Pipeline executes multiple paths in parallel for a single query (often one path per retrieved document) and merges the results from all paths to form the ultimate output.

REPLUG: The REPLUG pipeline processes each retrieved document in parallel and combines the generation probabilities from all documents to produce the final answer.
SuRe: The SuRe pipeline generates a candidate answer from each retrieved document and then ranks all candidate answers.

Conditional Pipeline utilizes a judger to direct the query into different execution paths based on the judgement outcome.
Loop Pipeline involves complex interactions between retrieval and generation processes, often encompassing multiple cycles of retrieval and generation.

Iterative
Self-Ask
Self-RAG
FlARE

Datasets and Corpus

Datasets: pre-processes 32 benchmark datasets
Corpus: Wikipedia passages、MS MARCO passages

FlashRAG_ A Modular Toolkit for Efficient Retrieval-Augmented Generation Research_python_06

Evaluation

Retrieval-aspect metrics: recall@k, precision@k, F1@k, and mean average precision (MAP)
Generation-aspect metrics: token-level F1 score, exact match, accuracy, BLEU, and ROUGE-L
accommodate custom evaluation metrics

Experimental Result and Discussion

Experimental Setup

Experiment Component	Description
Generator Model	LLAMA3-8B-instruct
Retriever Model	E5-base-v2
Retrieval Corpus	Wikipedia data from December 2018
Max Input Length for Generator Model	4096
Documents Retrieved per Query	5
Default Prompt	Answer the question based on the given document. Only give me the answer and do not output any other words. The following are given documents:{retrieval documents}
Experimental Environment	8 NVIDIA A100 GPUs
Datasets	Natural Questions (NQ), TriviaQA, HotpotQA, 2WikiMultihopQA, PopQA, WebQuestions
Evaluation Metrics	Exact match for NQ, TriviaQA, WebQuestions; Token-level F1 for HotpotQA, 2WikiMultihopQA, PopQA

Methods and Main results

Methods are categorized based on the RAG component they primarily focused on optimizing.

retriever
refiner
generator and its related decoding methods
judger
entire RAG flow, including multiple retrievals and generation processes.

FlashRAG_ A Modular Toolkit for Efficient Retrieval-Augmented Generation Research_ci_07

Impact of Retrieval on RAG

Existing research works often employs a fixed retriever and a fixed number of retrieved documents.

FlashRAG_ A Modular Toolkit for Efficient Retrieval-Augmented Generation Research_人工智能_08

Main Findings:

Figure left:

the overall performance is optimal when the number of retrieved documents is 3 or 5.
Both an excessive and insufficient number of retrieved documents lead to a significant decrease in performance, with a drop of up to 40% (both dense and sparse retrieval methods).
Additionally, we observe that when the number of retrieved documents is large, the results of the three different quality retrievers converge. In contrast, for the top1 results, there is a substantial gap between dense methods (E5, Bge) and BM25, indicating that the fewer documents retrieved, the greater the impact of the retriever’s quality on the final result.

Figure right:

It can be seen that on most datasets, using top3 or top5 retrieved results yields the best performance, suggesting that this may represent a good balance between the quality of retrieved documents and noise.

Take away

FlashRAG is a Python toolkit for the reproduction and development of Retrieval Augmented Generation (RAG) research. Our toolkit includes 32 pre-processed benchmark RAG datasets and 14 state-of-the-art RAG algorithms.

References

Retrieval-Augmented Generation for Large Language Models: A Survey （https://arxiv.org/pdf/2312.10997）
FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research (https://arxiv.org/abs/2405.13576)

上一篇：git查找当前存在冲突的文件

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯