Retrieval Augmented Generation

检索增强生成 (RAG) 是一种有用的技术,可以克服大型语言模型在处理长篇内容、事实准确性和上下文感知方面存在的局限性。

Retrieval Augmented Generation (RAG) is a technique useful to overcome the limitations of large language models that struggle with long-form content, factual accuracy, and context-awareness.

Spring AI 通过提供模块化架构来支持 RAG,该架构允许您自行构建自定义 RAG 流程,或使用 Advisor API 使用开箱即用的 RAG 流程。

Spring AI supports RAG by providing a modular architecture that allows you to build custom RAG flows yourself or use out-of-the-box RAG flows using the Advisor API.

concepts 部分了解有关检索增强生成 (Retrieval Augmented Generation) 的更多信息。

Learn more about Retrieval Augmented Generation in the concepts section.

Advisors

Spring AI 通过 Advisor API 为常见的 RAG 流程提供开箱即用的支持。

Spring AI provides out-of-the-box support for common RAG flows using the Advisor API.

要使用 QuestionAnswerAdvisorRetrievalAugmentationAdvisor ,您需要将 spring-ai-advisors-vector-store 依赖项添加到您的项目中:

To use the QuestionAnswerAdvisor or RetrievalAugmentationAdvisor, you need to add the spring-ai-advisors-vector-store dependency to your project:

<dependency>
   <groupId>org.springframework.ai</groupId>
   <artifactId>spring-ai-advisors-vector-store</artifactId>
</dependency>

QuestionAnswerAdvisor

向量数据库存储 AI 模型不知道的数据。当用户问题发送到 AI 模型时, QuestionAnswerAdvisor 会查询向量数据库以获取与用户问题相关的文档。

A vector database stores data that the AI model is unaware of. When a user question is sent to the AI model, a QuestionAnswerAdvisor queries the vector database for documents related to the user question.

来自向量数据库的响应会附加到用户文本中,以提供上下文,供 AI 模型生成响应。

The response from the vector database is appended to the user text to provide context for the AI model to generate a response.

假设您已经将数据加载到 VectorStore 中,您可以通过向 ChatClient 提供 QuestionAnswerAdvisor 的实例来执行检索增强生成 (RAG)。

Assuming you have already loaded data into a VectorStore, you can perform Retrieval Augmented Generation (RAG) by providing an instance of QuestionAnswerAdvisor to the ChatClient.

ChatResponse response = ChatClient.builder(chatModel)
        .build().prompt()
        .advisors(new QuestionAnswerAdvisor(vectorStore))
        .user(userText)
        .call()
        .chatResponse();

在此示例中, QuestionAnswerAdvisor 将对向量数据库中的所有文档执行相似性搜索。为了限制搜索的文档类型, SearchRequest 采用 SQL 样式的过滤表达式,该表达式可跨所有 VectorStores 移植。

In this example, the QuestionAnswerAdvisor will perform a similarity search over all documents in the Vector Database. To restrict the types of documents that are searched, the SearchRequest takes an SQL like filter expression that is portable across all VectorStores.

此过滤表达式可以在创建 QuestionAnswerAdvisor 时进行配置,因此将始终应用于所有 ChatClient 请求,或者可以在运行时为每个请求提供。

This filter expression can be configured when creating the QuestionAnswerAdvisor and hence will always apply to all ChatClient requests, or it can be provided at runtime per request.

以下是创建 QuestionAnswerAdvisor 实例的方法,其中阈值为 0.8 ,并返回前 6 个结果。

Here is how to create an instance of QuestionAnswerAdvisor where the threshold is 0.8 and to return the top 6 results.

var qaAdvisor = QuestionAnswerAdvisor.builder(vectorStore)
        .searchRequest(SearchRequest.builder().similarityThreshold(0.8d).topK(6).build())
        .build();

Dynamic Filter Expressions

使用 FILTER_EXPRESSION 顾问上下文参数在运行时更新 SearchRequest 过滤表达式:

Update the SearchRequest filter expression at runtime using the FILTER_EXPRESSION advisor context parameter:

ChatClient chatClient = ChatClient.builder(chatModel)
    .defaultAdvisors(QuestionAnswerAdvisor.builder(vectorStore)
        .searchRequest(SearchRequest.builder().build())
        .build())
    .build();

// Update filter expression at runtime
String content = this.chatClient.prompt()
    .user("Please answer my question XYZ")
    .advisors(a -> a.param(QuestionAnswerAdvisor.FILTER_EXPRESSION, "type == 'Spring'"))
    .call()
    .content();

FILTER_EXPRESSION 参数允许您根据提供的表达式动态过滤搜索结果。

The FILTER_EXPRESSION parameter allows you to dynamically filter the search results based on the provided expression.

Custom Template

QuestionAnswerAdvisor 使用默认模板来增强用户问题与检索到的文档。您可以通过 .promptTemplate() 构建器方法提供自己的 PromptTemplate 对象来自定义此行为。

The QuestionAnswerAdvisor uses a default template to augment the user question with the retrieved documents. You can customize this behavior by providing your own PromptTemplate object via the .promptTemplate() builder method.

此处提供的 PromptTemplate 自定义了顾问如何将检索到的上下文与用户查询合并。这与在 ChatClient 本身(使用 .templateRenderer() )上配置 TemplateRenderer 不同,后者会影响顾问运行 before 之前初始用户/系统提示内容的呈现。有关客户端级别模板呈现的更多详细信息,请参阅 ChatClient Prompt Templates

The PromptTemplate provided here customizes how the advisor merges retrieved context with the user query. This is distinct from configuring a TemplateRenderer on the ChatClient itself (using .templateRenderer()), which affects the rendering of the initial user/system prompt content before the advisor runs. See ChatClient Prompt Templates for more details on client-level template rendering.

自定义 PromptTemplate 可以使用任何 TemplateRenderer 实现(默认情况下,它基于 StringTemplate 引擎使用 StPromptTemplate )。重要的要求是模板必须包含以下两个占位符:

The custom PromptTemplate can use any TemplateRenderer implementation (by default, it uses StPromptTemplate based on the StringTemplate engine). The important requirement is that the template must contain the following two placeholders:

  • 一个 query 占位符用于接收用户问题。

  • a query placeholder to receive the user question.

  • 一个 question_answer_context 占位符用于接收检索到的上下文。

  • a question_answer_context placeholder to receive the retrieved context.

PromptTemplate customPromptTemplate = PromptTemplate.builder()
    .renderer(StTemplateRenderer.builder().startDelimiterToken('<').endDelimiterToken('>').build())
    .template("""
            <query>

            Context information is below.

			---------------------
			<question_answer_context>
			---------------------

			Given the context information and no prior knowledge, answer the query.

			Follow these rules:

			1. If the answer is not in the context, just say that you don't know.
			2. Avoid statements like "Based on the context..." or "The provided information...".
            """)
    .build();

    String question = "Where does the adventure of Anacletus and Birba take place?";

    QuestionAnswerAdvisor qaAdvisor = QuestionAnswerAdvisor.builder(vectorStore)
        .promptTemplate(customPromptTemplate)
        .build();

    String response = ChatClient.builder(chatModel).build()
        .prompt(question)
        .advisors(qaAdvisor)
        .call()
        .content();

QuestionAnswerAdvisor.Builder.userTextAdvise() 方法已弃用,取而代之的是使用 .promptTemplate() 来实现更灵活的自定义。

The QuestionAnswerAdvisor.Builder.userTextAdvise() method is deprecated in favor of using .promptTemplate() for more flexible customization.

RetrievalAugmentationAdvisor

Spring AI 包含一个 library of RAG modules ,您可以使用它来构建自己的 RAG 流程。 RetrievalAugmentationAdvisor 是一个 Advisor ,它基于模块化架构,为最常见的 RAG 流程提供开箱即用的实现。

Spring AI includes a library of RAG modules that you can use to build your own RAG flows. The RetrievalAugmentationAdvisor is an Advisor providing an out-of-the-box implementation for the most common RAG flows, based on a modular architecture.

Sequential RAG Flows

Naive RAG
Advisor retrievalAugmentationAdvisor = RetrievalAugmentationAdvisor.builder()
        .documentRetriever(VectorStoreDocumentRetriever.builder()
                .similarityThreshold(0.50)
                .vectorStore(vectorStore)
                .build())
        .build();

String answer = chatClient.prompt()
        .advisors(retrievalAugmentationAdvisor)
        .user(question)
        .call()
        .content();

默认情况下, RetrievalAugmentationAdvisor 不允许检索到的上下文为空。当发生这种情况时,它会指示模型不回答用户查询。您可以按如下方式允许空上下文。

By default, the RetrievalAugmentationAdvisor does not allow the retrieved context to be empty. When that happens, it instructs the model not to answer the user query. You can allow empty context as follows.

Advisor retrievalAugmentationAdvisor = RetrievalAugmentationAdvisor.builder()
        .documentRetriever(VectorStoreDocumentRetriever.builder()
                .similarityThreshold(0.50)
                .vectorStore(vectorStore)
                .build())
        .queryAugmenter(ContextualQueryAugmenter.builder()
                .allowEmptyContext(true)
                .build())
        .build();

String answer = chatClient.prompt()
        .advisors(retrievalAugmentationAdvisor)
        .user(question)
        .call()
        .content();

VectorStoreDocumentRetriever 接受 FilterExpression 以根据元数据过滤搜索结果。您可以在实例化 VectorStoreDocumentRetriever 时提供一个,也可以在使用 FILTER_EXPRESSION 顾问上下文参数的运行时为每个请求提供一个。

The VectorStoreDocumentRetriever accepts a FilterExpression to filter the search results based on metadata. You can provide one when instantiating the VectorStoreDocumentRetriever or at runtime per request, using the FILTER_EXPRESSION advisor context parameter.

Advisor retrievalAugmentationAdvisor = RetrievalAugmentationAdvisor.builder()
        .documentRetriever(VectorStoreDocumentRetriever.builder()
                .similarityThreshold(0.50)
                .vectorStore(vectorStore)
                .build())
        .build();

String answer = chatClient.prompt()
        .advisors(retrievalAugmentationAdvisor)
        .advisors(a -> a.param(VectorStoreDocumentRetriever.FILTER_EXPRESSION, "type == 'Spring'"))
        .user(question)
        .call()
        .content();

有关更多信息,请参阅 VectorStoreDocumentRetriever

See VectorStoreDocumentRetriever for more information.

Advanced RAG
Advisor retrievalAugmentationAdvisor = RetrievalAugmentationAdvisor.builder()
        .queryTransformers(RewriteQueryTransformer.builder()
                .chatClientBuilder(chatClientBuilder.build().mutate())
                .build())
        .documentRetriever(VectorStoreDocumentRetriever.builder()
                .similarityThreshold(0.50)
                .vectorStore(vectorStore)
                .build())
        .build();

String answer = chatClient.prompt()
        .advisors(retrievalAugmentationAdvisor)
        .user(question)
        .call()
        .content();

您还可以使用 DocumentPostProcessor API 在将检索到的文档传递给模型之前对其进行后处理。例如,您可以使用这样的接口根据检索到的文档与查询的相关性进行重新排序、删除不相关或冗余的文档,或者压缩每个文档的内容以减少噪音和冗余。

You can also use the DocumentPostProcessor API to post-process the retrieved documents before passing them to the model. For example, you can use such an interface to perform re-ranking of the retrieved documents based on their relevance to the query, remove irrelevant or redundant documents, or compress the content of each document to reduce noise and redundancy.

Modules

Spring AI 实现了受论文“ Modular RAG: Transforming RAG Systems into LEGO-like Reconfigurable Frameworks ”中详细介绍的模块化概念启发的模块化 RAG 架构。

Spring AI implements a Modular RAG architecture inspired by the concept of modularity detailed in the paper "Modular RAG: Transforming RAG Systems into LEGO-like Reconfigurable Frameworks".

Pre-Retrieval

预检索模块负责处理用户查询以实现最佳检索结果。

Pre-Retrieval modules are responsible for processing the user query to achieve the best possible retrieval results.

Query Transformation

一个用于转换输入查询的组件,使其对检索任务更有效,解决诸如格式不佳的查询、模糊术语、复杂词汇或不支持的语言等挑战。

A component for transforming the input query to make it more effective for retrieval tasks, addressing challenges such as poorly formed queries, ambiguous terms, complex vocabulary, or unsupported languages.

使用 QueryTransformer 时,建议将 ChatClient.Builder 配置为低温度(例如,0.0),以确保更确定和准确的结果,从而提高检索质量。大多数聊天模型的默认温度通常太高,不利于最佳查询转换,从而导致检索效率降低。

When using a QueryTransformer, it’s recommended to configure the ChatClient.Builder with a low temperature (e.g., 0.0) to ensure more deterministic and accurate results, improving retrieval quality. The default temperature for most chat models is typically too high for optimal query transformation, leading to reduced retrieval effectiveness.

CompressionQueryTransformer

CompressionQueryTransformer 使用大型语言模型将对话历史和后续查询压缩成一个独立的查询,该查询捕获了对话的精髓。

A CompressionQueryTransformer uses a large language model to compress a conversation history and a follow-up query into a standalone query that captures the essence of the conversation.

当对话历史很长且后续查询与对话上下文相关时,此转换器非常有用。

This transformer is useful when the conversation history is long and the follow-up query is related to the conversation context.

Query query = Query.builder()
        .text("And what is its second largest city?")
        .history(new UserMessage("What is the capital of Denmark?"),
                new AssistantMessage("Copenhagen is the capital of Denmark."))
        .build();

QueryTransformer queryTransformer = CompressionQueryTransformer.builder()
        .chatClientBuilder(chatClientBuilder)
        .build();

Query transformedQuery = queryTransformer.transform(query);

此组件使用的提示可以通过构建器中可用的 promptTemplate() 方法进行自定义。

The prompt used by this component can be customized via the promptTemplate() method available in the builder.

RewriteQueryTransformer

RewriteQueryTransformer 使用大型语言模型重写用户查询,以便在查询目标系统(如向量存储或网络搜索引擎)时提供更好的结果。

A RewriteQueryTransformer uses a large language model to rewrite a user query to provide better results when querying a target system, such as a vector store or a web search engine.

当用户查询冗长、模糊或包含可能影响搜索结果质量的不相关信息时,此转换器非常有用。

This transformer is useful when the user query is verbose, ambiguous, or contains irrelevant information that may affect the quality of the search results.

Query query = new Query("I'm studying machine learning. What is an LLM?");

QueryTransformer queryTransformer = RewriteQueryTransformer.builder()
        .chatClientBuilder(chatClientBuilder)
        .build();

Query transformedQuery = queryTransformer.transform(query);

此组件使用的提示可以通过构建器中可用的 promptTemplate() 方法进行自定义。

The prompt used by this component can be customized via the promptTemplate() method available in the builder.

TranslationQueryTransformer

TranslationQueryTransformer 使用大型语言模型将查询翻译成嵌入模型用于生成文档嵌入的目标语言。如果查询已经采用目标语言,则按原样返回。如果查询的语言未知,则也按原样返回。

A TranslationQueryTransformer uses a large language model to translate a query to a target language that is supported by the embedding model used to generate the document embeddings. If the query is already in the target language, it is returned unchanged. If the language of the query is unknown, it is also returned unchanged.

当嵌入模型在特定语言上训练且用户查询采用不同语言时,此转换器非常有用。

This transformer is useful when the embedding model is trained on a specific language and the user query is in a different language.

Query query = new Query("Hvad er Danmarks hovedstad?");

QueryTransformer queryTransformer = TranslationQueryTransformer.builder()
        .chatClientBuilder(chatClientBuilder)
        .targetLanguage("english")
        .build();

Query transformedQuery = queryTransformer.transform(query);

此组件使用的提示可以通过构建器中可用的 promptTemplate() 方法进行自定义。

The prompt used by this component can be customized via the promptTemplate() method available in the builder.

Query Expansion

一个用于将输入查询扩展为查询列表的组件,通过提供替代查询表述或将复杂问题分解为更简单的子查询来解决诸如格式不佳的查询等挑战。

A component for expanding the input query into a list of queries, addressing challenges such as poorly formed queries by providing alternative query formulations, or by breaking down complex problems into simpler sub-queries.

MultiQueryExpander

MultiQueryExpander 使用大型语言模型将查询扩展为多个语义上不同的变体,以捕获不同的视角,有助于检索额外的上下文信息并增加找到相关结果的机会。

A MultiQueryExpander uses a large language model to expand a query into multiple semantically diverse variations to capture different perspectives, useful for retrieving additional contextual information and increasing the chances of finding relevant results.

MultiQueryExpander queryExpander = MultiQueryExpander.builder()
    .chatClientBuilder(chatClientBuilder)
    .numberOfQueries(3)
    .build();
List<Query> queries = queryExpander.expand(new Query("How to run a Spring Boot app?"));

默认情况下, MultiQueryExpander 在扩展查询列表中包含原始查询。您可以通过构建器中的 includeOriginal 方法禁用此行为。

By default, the MultiQueryExpander includes the original query in the list of expanded queries. You can disable this behavior via the includeOriginal method in the builder.

MultiQueryExpander queryExpander = MultiQueryExpander.builder()
    .chatClientBuilder(chatClientBuilder)
    .includeOriginal(false)
    .build();

此组件使用的提示可以通过构建器中可用的 promptTemplate() 方法进行自定义。

The prompt used by this component can be customized via the promptTemplate() method available in the builder.

Retrieval

检索模块负责查询向量存储等数据系统并检索最相关的文档。

Retrieval modules are responsible for querying data systems like vector store and retrieving the most relevant documents.

负责从底层数据源(例如搜索引擎、向量存储、数据库或知识图)检索 Documents 的组件。

Component responsible for retrieving Documents from an underlying data source, such as a search engine, a vector store, a database, or a knowledge graph.

VectorStoreDocumentRetriever

VectorStoreDocumentRetriever 从向量存储中检索与输入查询语义相似的文档。它支持基于元数据、相似度阈值和 top-k 结果进行过滤。

A VectorStoreDocumentRetriever retrieves documents from a vector store that are semantically similar to the input query. It supports filtering based on metadata, similarity threshold, and top-k results.

DocumentRetriever retriever = VectorStoreDocumentRetriever.builder()
    .vectorStore(vectorStore)
    .similarityThreshold(0.73)
    .topK(5)
    .filterExpression(new FilterExpressionBuilder()
        .eq("genre", "fairytale")
        .build())
    .build();
List<Document> documents = retriever.retrieve(new Query("What is the main character of the story?"));

筛选表达式可以是静态的,也可以是动态的。对于动态筛选表达式,你可以传递一个 Supplier

The filter expression can be static or dynamic. For dynamic filter expressions, you can pass a Supplier.

DocumentRetriever retriever = VectorStoreDocumentRetriever.builder()
    .vectorStore(vectorStore)
    .filterExpression(() -> new FilterExpressionBuilder()
        .eq("tenant", TenantContextHolder.getTenantIdentifier())
        .build())
    .build();
List<Document> documents = retriever.retrieve(new Query("What are the KPIs for the next semester?"));

你还可以通过 Query API,使用 FILTER_EXPRESSION 参数提供一个请求特定的筛选表达式。如果同时提供了请求特定和检索器特定的筛选表达式,请求特定的筛选表达式将优先。

You can also provide a request-specific filter expression via the Query API, using the FILTER_EXPRESSION parameter. If both the request-specific and the retriever-specific filter expressions are provided, the request-specific filter expression takes precedence.

Query query = Query.builder()
    .text("Who is Anacletus?")
    .context(Map.of(VectorStoreDocumentRetriever.FILTER_EXPRESSION, "location == 'Whispering Woods'"))
    .build();
List<Document> retrievedDocuments = documentRetriever.retrieve(query);

Document Join

一个组件,用于将基于多个查询和来自多个数据源检索到的文档合并成一个文档集合。作为连接过程的一部分,它还可以处理重复文档和互惠排名策略。

A component for combining documents retrieved based on multiple queries and from multiple data sources into a single collection of documents. As part of the joining process, it can also handle duplicate documents and reciprocal ranking strategies.

ConcatenationDocumentJoiner

ConcatenationDocumentJoiner 通过将基于多个查询和来自多个数据源检索到的文档连接起来,形成一个文档的单一集合。如果存在重复文档,则保留第一个出现的文档。每个文档的分数保持不变。

A ConcatenationDocumentJoiner combines documents retrieved based on multiple queries and from multiple data sources by concatenating them into a single collection of documents. In case of duplicate documents, the first occurrence is kept. The score of each document is kept as is.

Map<Query, List<List<Document>>> documentsForQuery = ...
DocumentJoiner documentJoiner = new ConcatenationDocumentJoiner();
List<Document> documents = documentJoiner.join(documentsForQuery);

Post-Retrieval

后检索模块负责处理检索到的文档,以获得最佳的生成结果。

Post-Retrieval modules are responsible for processing the retrieved documents to achieve the best possible generation results.

Document Post-Processing

一个组件,用于根据查询对检索到的文档进行后处理,解决 lost-in-the-middle 、模型上下文长度限制以及需要减少检索信息中的噪声和冗余等挑战。

A component for post-processing retrieved documents based on a query, addressing challenges such as lost-in-the-middle, context length restrictions from the model, and the need to reduce noise and redundancy in the retrieved information.

例如,它可以根据文档与查询的相关性进行排名,删除不相关或冗余的文档,或者压缩每个文档的内容以减少噪声和冗余。

For example, it could rank documents based on their relevance to the query, remove irrelevant or redundant documents, or compress the content of each document to reduce noise and redundancy.

Generation

生成模块负责根据用户查询和检索到的文档生成最终响应。

Generation modules are responsible for generating the final response based on the user query and retrieved documents.

Query Augmentation

一个组件,用于用额外数据增强输入查询,这对于为大型语言模型提供必要的上下文来回答用户查询很有用。

A component for augmenting an input query with additional data, useful to provide a large language model with the necessary context to answer the user query.

ContextualQueryAugmenter

ContextualQueryAugmenter 用提供的文档内容中的上下文数据增强用户查询。

The ContextualQueryAugmenter augments the user query with contextual data from the content of the provided documents.

QueryAugmenter queryAugmenter = ContextualQueryAugmenter.builder().build();

默认情况下, ContextualQueryAugmenter 不允许检索到的上下文为空。当发生这种情况时,它会指示模型不回答用户查询。

By default, the ContextualQueryAugmenter does not allow the retrieved context to be empty. When that happens, it instructs the model not to answer the user query.

你可以启用 allowEmptyContext 选项,即使检索到的上下文为空,也允许模型生成响应。

You can enable the allowEmptyContext option to allow the model to generate a response even when the retrieved context is empty.

QueryAugmenter queryAugmenter = ContextualQueryAugmenter.builder()
        .allowEmptyContext(true)
        .build();

此组件使用的提示可以通过构建器中提供的 promptTemplate()emptyContextPromptTemplate() 方法进行自定义。

The prompts used by this component can be customized via the promptTemplate() and emptyContextPromptTemplate() methods available in the builder.