Retrieval Augmented Generation
检索增强生成 (RAG) 是一种有用的技术,可以克服大型语言模型在处理长篇内容、事实准确性和上下文感知方面存在的局限性。
Retrieval Augmented Generation (RAG) is a technique useful to overcome the limitations of large language models that struggle with long-form content, factual accuracy, and context-awareness.
Spring AI 通过提供模块化架构来支持 RAG,该架构允许您自行构建自定义 RAG 流程,或使用 Advisor
API 使用开箱即用的 RAG 流程。
Spring AI supports RAG by providing a modular architecture that allows you to build custom RAG flows yourself
or use out-of-the-box RAG flows using the Advisor
API.
在 concepts 部分了解有关检索增强生成 (Retrieval Augmented Generation) 的更多信息。 |
Learn more about Retrieval Augmented Generation in the concepts section. |
Advisors
Spring AI 通过 Advisor
API 为常见的 RAG 流程提供开箱即用的支持。
Spring AI provides out-of-the-box support for common RAG flows using the Advisor
API.
要使用 QuestionAnswerAdvisor
或 RetrievalAugmentationAdvisor
,您需要将 spring-ai-advisors-vector-store
依赖项添加到您的项目中:
To use the QuestionAnswerAdvisor
or RetrievalAugmentationAdvisor
, you need to add the spring-ai-advisors-vector-store
dependency to your project:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-advisors-vector-store</artifactId>
</dependency>
QuestionAnswerAdvisor
向量数据库存储 AI 模型不知道的数据。当用户问题发送到 AI 模型时, QuestionAnswerAdvisor
会查询向量数据库以获取与用户问题相关的文档。
A vector database stores data that the AI model is unaware of. When a user question is sent to the AI model, a QuestionAnswerAdvisor
queries the vector database for documents related to the user question.
来自向量数据库的响应会附加到用户文本中,以提供上下文,供 AI 模型生成响应。
The response from the vector database is appended to the user text to provide context for the AI model to generate a response.
假设您已经将数据加载到 VectorStore
中,您可以通过向 ChatClient
提供 QuestionAnswerAdvisor
的实例来执行检索增强生成 (RAG)。
Assuming you have already loaded data into a VectorStore
, you can perform Retrieval Augmented Generation (RAG) by providing an instance of QuestionAnswerAdvisor
to the ChatClient
.
ChatResponse response = ChatClient.builder(chatModel)
.build().prompt()
.advisors(new QuestionAnswerAdvisor(vectorStore))
.user(userText)
.call()
.chatResponse();
在此示例中, QuestionAnswerAdvisor
将对向量数据库中的所有文档执行相似性搜索。为了限制搜索的文档类型, SearchRequest
采用 SQL 样式的过滤表达式,该表达式可跨所有 VectorStores
移植。
In this example, the QuestionAnswerAdvisor
will perform a similarity search over all documents in the Vector Database. To restrict the types of documents that are searched, the SearchRequest
takes an SQL like filter expression that is portable across all VectorStores
.
此过滤表达式可以在创建 QuestionAnswerAdvisor
时进行配置,因此将始终应用于所有 ChatClient
请求,或者可以在运行时为每个请求提供。
This filter expression can be configured when creating the QuestionAnswerAdvisor
and hence will always apply to all ChatClient
requests, or it can be provided at runtime per request.
以下是创建 QuestionAnswerAdvisor
实例的方法,其中阈值为 0.8
,并返回前 6
个结果。
Here is how to create an instance of QuestionAnswerAdvisor
where the threshold is 0.8
and to return the top 6
results.
var qaAdvisor = QuestionAnswerAdvisor.builder(vectorStore)
.searchRequest(SearchRequest.builder().similarityThreshold(0.8d).topK(6).build())
.build();
Dynamic Filter Expressions
使用 FILTER_EXPRESSION
顾问上下文参数在运行时更新 SearchRequest
过滤表达式:
Update the SearchRequest
filter expression at runtime using the FILTER_EXPRESSION
advisor context parameter:
ChatClient chatClient = ChatClient.builder(chatModel)
.defaultAdvisors(QuestionAnswerAdvisor.builder(vectorStore)
.searchRequest(SearchRequest.builder().build())
.build())
.build();
// Update filter expression at runtime
String content = this.chatClient.prompt()
.user("Please answer my question XYZ")
.advisors(a -> a.param(QuestionAnswerAdvisor.FILTER_EXPRESSION, "type == 'Spring'"))
.call()
.content();
FILTER_EXPRESSION
参数允许您根据提供的表达式动态过滤搜索结果。
The FILTER_EXPRESSION
parameter allows you to dynamically filter the search results based on the provided expression.
Custom Template
QuestionAnswerAdvisor
使用默认模板来增强用户问题与检索到的文档。您可以通过 .promptTemplate()
构建器方法提供自己的 PromptTemplate
对象来自定义此行为。
The QuestionAnswerAdvisor
uses a default template to augment the user question with the retrieved documents. You can customize this behavior by providing your own PromptTemplate
object via the .promptTemplate()
builder method.
此处提供的 |
The |
自定义 PromptTemplate
可以使用任何 TemplateRenderer
实现(默认情况下,它基于 StringTemplate 引擎使用 StPromptTemplate
)。重要的要求是模板必须包含以下两个占位符:
The custom PromptTemplate
can use any TemplateRenderer
implementation (by default, it uses StPromptTemplate
based on the StringTemplate engine). The important requirement is that the template must contain the following two placeholders:
-
一个
query
占位符用于接收用户问题。 -
a
query
placeholder to receive the user question. -
一个
question_answer_context
占位符用于接收检索到的上下文。 -
a
question_answer_context
placeholder to receive the retrieved context.
PromptTemplate customPromptTemplate = PromptTemplate.builder()
.renderer(StTemplateRenderer.builder().startDelimiterToken('<').endDelimiterToken('>').build())
.template("""
<query>
Context information is below.
---------------------
<question_answer_context>
---------------------
Given the context information and no prior knowledge, answer the query.
Follow these rules:
1. If the answer is not in the context, just say that you don't know.
2. Avoid statements like "Based on the context..." or "The provided information...".
""")
.build();
String question = "Where does the adventure of Anacletus and Birba take place?";
QuestionAnswerAdvisor qaAdvisor = QuestionAnswerAdvisor.builder(vectorStore)
.promptTemplate(customPromptTemplate)
.build();
String response = ChatClient.builder(chatModel).build()
.prompt(question)
.advisors(qaAdvisor)
.call()
.content();
|
The |
RetrievalAugmentationAdvisor
Spring AI 包含一个 library of RAG modules ,您可以使用它来构建自己的 RAG 流程。 RetrievalAugmentationAdvisor
是一个 Advisor
,它基于模块化架构,为最常见的 RAG 流程提供开箱即用的实现。
Spring AI includes a library of RAG modules that you can use to build your own RAG flows.
The RetrievalAugmentationAdvisor
is an Advisor
providing an out-of-the-box implementation for the most common RAG flows,
based on a modular architecture.
Sequential RAG Flows
Naive RAG
Advisor retrievalAugmentationAdvisor = RetrievalAugmentationAdvisor.builder()
.documentRetriever(VectorStoreDocumentRetriever.builder()
.similarityThreshold(0.50)
.vectorStore(vectorStore)
.build())
.build();
String answer = chatClient.prompt()
.advisors(retrievalAugmentationAdvisor)
.user(question)
.call()
.content();
默认情况下, RetrievalAugmentationAdvisor
不允许检索到的上下文为空。当发生这种情况时,它会指示模型不回答用户查询。您可以按如下方式允许空上下文。
By default, the RetrievalAugmentationAdvisor
does not allow the retrieved context to be empty. When that happens,
it instructs the model not to answer the user query. You can allow empty context as follows.
Advisor retrievalAugmentationAdvisor = RetrievalAugmentationAdvisor.builder()
.documentRetriever(VectorStoreDocumentRetriever.builder()
.similarityThreshold(0.50)
.vectorStore(vectorStore)
.build())
.queryAugmenter(ContextualQueryAugmenter.builder()
.allowEmptyContext(true)
.build())
.build();
String answer = chatClient.prompt()
.advisors(retrievalAugmentationAdvisor)
.user(question)
.call()
.content();
VectorStoreDocumentRetriever
接受 FilterExpression
以根据元数据过滤搜索结果。您可以在实例化 VectorStoreDocumentRetriever
时提供一个,也可以在使用 FILTER_EXPRESSION
顾问上下文参数的运行时为每个请求提供一个。
The VectorStoreDocumentRetriever
accepts a FilterExpression
to filter the search results based on metadata.
You can provide one when instantiating the VectorStoreDocumentRetriever
or at runtime per request,
using the FILTER_EXPRESSION
advisor context parameter.
Advisor retrievalAugmentationAdvisor = RetrievalAugmentationAdvisor.builder()
.documentRetriever(VectorStoreDocumentRetriever.builder()
.similarityThreshold(0.50)
.vectorStore(vectorStore)
.build())
.build();
String answer = chatClient.prompt()
.advisors(retrievalAugmentationAdvisor)
.advisors(a -> a.param(VectorStoreDocumentRetriever.FILTER_EXPRESSION, "type == 'Spring'"))
.user(question)
.call()
.content();
有关更多信息,请参阅 VectorStoreDocumentRetriever 。
See VectorStoreDocumentRetriever for more information.
Advanced RAG
Advisor retrievalAugmentationAdvisor = RetrievalAugmentationAdvisor.builder()
.queryTransformers(RewriteQueryTransformer.builder()
.chatClientBuilder(chatClientBuilder.build().mutate())
.build())
.documentRetriever(VectorStoreDocumentRetriever.builder()
.similarityThreshold(0.50)
.vectorStore(vectorStore)
.build())
.build();
String answer = chatClient.prompt()
.advisors(retrievalAugmentationAdvisor)
.user(question)
.call()
.content();
您还可以使用 DocumentPostProcessor
API 在将检索到的文档传递给模型之前对其进行后处理。例如,您可以使用这样的接口根据检索到的文档与查询的相关性进行重新排序、删除不相关或冗余的文档,或者压缩每个文档的内容以减少噪音和冗余。
You can also use the DocumentPostProcessor
API to post-process the retrieved documents before passing them to the model. For example, you can use such an interface to perform re-ranking of the retrieved documents based on their relevance to the query, remove irrelevant or redundant documents, or compress the content of each document to reduce noise and redundancy.
Modules
Spring AI 实现了受论文“ Modular RAG: Transforming RAG Systems into LEGO-like Reconfigurable Frameworks ”中详细介绍的模块化概念启发的模块化 RAG 架构。
Spring AI implements a Modular RAG architecture inspired by the concept of modularity detailed in the paper "Modular RAG: Transforming RAG Systems into LEGO-like Reconfigurable Frameworks".
Pre-Retrieval
预检索模块负责处理用户查询以实现最佳检索结果。
Pre-Retrieval modules are responsible for processing the user query to achieve the best possible retrieval results.
Query Transformation
一个用于转换输入查询的组件,使其对检索任务更有效,解决诸如格式不佳的查询、模糊术语、复杂词汇或不支持的语言等挑战。
A component for transforming the input query to make it more effective for retrieval tasks, addressing challenges such as poorly formed queries, ambiguous terms, complex vocabulary, or unsupported languages.
使用 QueryTransformer
时,建议将 ChatClient.Builder
配置为低温度(例如,0.0),以确保更确定和准确的结果,从而提高检索质量。大多数聊天模型的默认温度通常太高,不利于最佳查询转换,从而导致检索效率降低。
When using a QueryTransformer
, it’s recommended to configure the ChatClient.Builder
with a low temperature (e.g., 0.0) to ensure more deterministic and accurate results, improving retrieval quality. The default temperature for most chat models is typically too high for optimal query transformation, leading to reduced retrieval effectiveness.
CompressionQueryTransformer
CompressionQueryTransformer
使用大型语言模型将对话历史和后续查询压缩成一个独立的查询,该查询捕获了对话的精髓。
A CompressionQueryTransformer
uses a large language model to compress a conversation history and a follow-up query
into a standalone query that captures the essence of the conversation.
当对话历史很长且后续查询与对话上下文相关时,此转换器非常有用。
This transformer is useful when the conversation history is long and the follow-up query is related to the conversation context.
Query query = Query.builder()
.text("And what is its second largest city?")
.history(new UserMessage("What is the capital of Denmark?"),
new AssistantMessage("Copenhagen is the capital of Denmark."))
.build();
QueryTransformer queryTransformer = CompressionQueryTransformer.builder()
.chatClientBuilder(chatClientBuilder)
.build();
Query transformedQuery = queryTransformer.transform(query);
此组件使用的提示可以通过构建器中可用的 promptTemplate()
方法进行自定义。
The prompt used by this component can be customized via the promptTemplate()
method available in the builder.
RewriteQueryTransformer
RewriteQueryTransformer
使用大型语言模型重写用户查询,以便在查询目标系统(如向量存储或网络搜索引擎)时提供更好的结果。
A RewriteQueryTransformer
uses a large language model to rewrite a user query to provide better results when
querying a target system, such as a vector store or a web search engine.
当用户查询冗长、模糊或包含可能影响搜索结果质量的不相关信息时,此转换器非常有用。
This transformer is useful when the user query is verbose, ambiguous, or contains irrelevant information that may affect the quality of the search results.
Query query = new Query("I'm studying machine learning. What is an LLM?");
QueryTransformer queryTransformer = RewriteQueryTransformer.builder()
.chatClientBuilder(chatClientBuilder)
.build();
Query transformedQuery = queryTransformer.transform(query);
此组件使用的提示可以通过构建器中可用的 promptTemplate()
方法进行自定义。
The prompt used by this component can be customized via the promptTemplate()
method available in the builder.
TranslationQueryTransformer
TranslationQueryTransformer
使用大型语言模型将查询翻译成嵌入模型用于生成文档嵌入的目标语言。如果查询已经采用目标语言,则按原样返回。如果查询的语言未知,则也按原样返回。
A TranslationQueryTransformer
uses a large language model to translate a query to a target language that is supported
by the embedding model used to generate the document embeddings. If the query is already in the target language,
it is returned unchanged. If the language of the query is unknown, it is also returned unchanged.
当嵌入模型在特定语言上训练且用户查询采用不同语言时,此转换器非常有用。
This transformer is useful when the embedding model is trained on a specific language and the user query is in a different language.
Query query = new Query("Hvad er Danmarks hovedstad?");
QueryTransformer queryTransformer = TranslationQueryTransformer.builder()
.chatClientBuilder(chatClientBuilder)
.targetLanguage("english")
.build();
Query transformedQuery = queryTransformer.transform(query);
此组件使用的提示可以通过构建器中可用的 promptTemplate()
方法进行自定义。
The prompt used by this component can be customized via the promptTemplate()
method available in the builder.
Query Expansion
一个用于将输入查询扩展为查询列表的组件,通过提供替代查询表述或将复杂问题分解为更简单的子查询来解决诸如格式不佳的查询等挑战。
A component for expanding the input query into a list of queries, addressing challenges such as poorly formed queries by providing alternative query formulations, or by breaking down complex problems into simpler sub-queries.
MultiQueryExpander
MultiQueryExpander
使用大型语言模型将查询扩展为多个语义上不同的变体,以捕获不同的视角,有助于检索额外的上下文信息并增加找到相关结果的机会。
A MultiQueryExpander
uses a large language model to expand a query into multiple semantically diverse variations
to capture different perspectives, useful for retrieving additional contextual information and increasing the chances
of finding relevant results.
MultiQueryExpander queryExpander = MultiQueryExpander.builder()
.chatClientBuilder(chatClientBuilder)
.numberOfQueries(3)
.build();
List<Query> queries = queryExpander.expand(new Query("How to run a Spring Boot app?"));
默认情况下, MultiQueryExpander
在扩展查询列表中包含原始查询。您可以通过构建器中的 includeOriginal
方法禁用此行为。
By default, the MultiQueryExpander
includes the original query in the list of expanded queries. You can disable this behavior
via the includeOriginal
method in the builder.
MultiQueryExpander queryExpander = MultiQueryExpander.builder()
.chatClientBuilder(chatClientBuilder)
.includeOriginal(false)
.build();
此组件使用的提示可以通过构建器中可用的 promptTemplate()
方法进行自定义。
The prompt used by this component can be customized via the promptTemplate()
method available in the builder.
Retrieval
检索模块负责查询向量存储等数据系统并检索最相关的文档。
Retrieval modules are responsible for querying data systems like vector store and retrieving the most relevant documents.
Document Search
负责从底层数据源(例如搜索引擎、向量存储、数据库或知识图)检索 Documents
的组件。
Component responsible for retrieving Documents
from an underlying data source, such as a search engine, a vector store,
a database, or a knowledge graph.
VectorStoreDocumentRetriever
VectorStoreDocumentRetriever
从向量存储中检索与输入查询语义相似的文档。它支持基于元数据、相似度阈值和 top-k 结果进行过滤。
A VectorStoreDocumentRetriever
retrieves documents from a vector store that are semantically similar to the input
query. It supports filtering based on metadata, similarity threshold, and top-k results.
DocumentRetriever retriever = VectorStoreDocumentRetriever.builder()
.vectorStore(vectorStore)
.similarityThreshold(0.73)
.topK(5)
.filterExpression(new FilterExpressionBuilder()
.eq("genre", "fairytale")
.build())
.build();
List<Document> documents = retriever.retrieve(new Query("What is the main character of the story?"));
筛选表达式可以是静态的,也可以是动态的。对于动态筛选表达式,你可以传递一个 Supplier
。
The filter expression can be static or dynamic. For dynamic filter expressions, you can pass a Supplier
.
DocumentRetriever retriever = VectorStoreDocumentRetriever.builder()
.vectorStore(vectorStore)
.filterExpression(() -> new FilterExpressionBuilder()
.eq("tenant", TenantContextHolder.getTenantIdentifier())
.build())
.build();
List<Document> documents = retriever.retrieve(new Query("What are the KPIs for the next semester?"));
你还可以通过 Query
API,使用 FILTER_EXPRESSION
参数提供一个请求特定的筛选表达式。如果同时提供了请求特定和检索器特定的筛选表达式,请求特定的筛选表达式将优先。
You can also provide a request-specific filter expression via the Query
API, using the FILTER_EXPRESSION
parameter.
If both the request-specific and the retriever-specific filter expressions are provided, the request-specific filter expression takes precedence.
Query query = Query.builder()
.text("Who is Anacletus?")
.context(Map.of(VectorStoreDocumentRetriever.FILTER_EXPRESSION, "location == 'Whispering Woods'"))
.build();
List<Document> retrievedDocuments = documentRetriever.retrieve(query);
Document Join
一个组件,用于将基于多个查询和来自多个数据源检索到的文档合并成一个文档集合。作为连接过程的一部分,它还可以处理重复文档和互惠排名策略。
A component for combining documents retrieved based on multiple queries and from multiple data sources into a single collection of documents. As part of the joining process, it can also handle duplicate documents and reciprocal ranking strategies.
ConcatenationDocumentJoiner
ConcatenationDocumentJoiner
通过将基于多个查询和来自多个数据源检索到的文档连接起来,形成一个文档的单一集合。如果存在重复文档,则保留第一个出现的文档。每个文档的分数保持不变。
A ConcatenationDocumentJoiner
combines documents retrieved based on multiple queries and from multiple data sources
by concatenating them into a single collection of documents. In case of duplicate documents, the first occurrence is kept.
The score of each document is kept as is.
Map<Query, List<List<Document>>> documentsForQuery = ...
DocumentJoiner documentJoiner = new ConcatenationDocumentJoiner();
List<Document> documents = documentJoiner.join(documentsForQuery);
Post-Retrieval
后检索模块负责处理检索到的文档,以获得最佳的生成结果。
Post-Retrieval modules are responsible for processing the retrieved documents to achieve the best possible generation results.
Document Post-Processing
一个组件,用于根据查询对检索到的文档进行后处理,解决 lost-in-the-middle 、模型上下文长度限制以及需要减少检索信息中的噪声和冗余等挑战。
A component for post-processing retrieved documents based on a query, addressing challenges such as lost-in-the-middle, context length restrictions from the model, and the need to reduce noise and redundancy in the retrieved information.
例如,它可以根据文档与查询的相关性进行排名,删除不相关或冗余的文档,或者压缩每个文档的内容以减少噪声和冗余。
For example, it could rank documents based on their relevance to the query, remove irrelevant or redundant documents, or compress the content of each document to reduce noise and redundancy.
Generation
生成模块负责根据用户查询和检索到的文档生成最终响应。
Generation modules are responsible for generating the final response based on the user query and retrieved documents.
Query Augmentation
一个组件,用于用额外数据增强输入查询,这对于为大型语言模型提供必要的上下文来回答用户查询很有用。
A component for augmenting an input query with additional data, useful to provide a large language model with the necessary context to answer the user query.
ContextualQueryAugmenter
ContextualQueryAugmenter
用提供的文档内容中的上下文数据增强用户查询。
The ContextualQueryAugmenter
augments the user query with contextual data from the content of the provided documents.
QueryAugmenter queryAugmenter = ContextualQueryAugmenter.builder().build();
默认情况下, ContextualQueryAugmenter
不允许检索到的上下文为空。当发生这种情况时,它会指示模型不回答用户查询。
By default, the ContextualQueryAugmenter
does not allow the retrieved context to be empty. When that happens,
it instructs the model not to answer the user query.
你可以启用 allowEmptyContext
选项,即使检索到的上下文为空,也允许模型生成响应。
You can enable the allowEmptyContext
option to allow the model to generate a response even when the retrieved context is empty.
QueryAugmenter queryAugmenter = ContextualQueryAugmenter.builder()
.allowEmptyContext(true)
.build();
此组件使用的提示可以通过构建器中提供的 promptTemplate()
和 emptyContextPromptTemplate()
方法进行自定义。
The prompts used by this component can be customized via the promptTemplate()
and emptyContextPromptTemplate()
methods
available in the builder.