AI Concepts
本部分描述了 Spring AI 使用的核心概念。我们建议仔细阅读它,以了解 Spring AI 实现背后的思想。
This section describes core concepts that Spring AI uses. We recommend reading it closely to understand the ideas behind how Spring AI is implemented.
Models
AI 模型是设计用于处理和生成信息的算法,通常模仿人类认知功能。通过学习大量数据集中的模式和见解,这些模型可以进行预测、文本、图像或其他输出,从而增强各行业的各种应用程序。
AI models are algorithms designed to process and generate information, often mimicking human cognitive functions. By learning patterns and insights from large datasets, these models can make predictions, text, images, or other outputs, enhancing various applications across industries.
存在许多不同类型的 AI 模型,每种模型都适用于特定的用例。虽然 ChatGPT 及其生成式 AI 功能通过文本输入和输出吸引了用户,但许多模型和公司提供各种各样的输入和输出。在 ChatGPT 之前,许多人对 Midjourney 和 Stable Diffusion 等文本到图像生成模型着迷。
There are many different types of AI models, each suited for a specific use case. While ChatGPT and its generative AI capabilities have captivated users through text input and output, many models and companies offer diverse inputs and outputs. Before ChatGPT, many people were fascinated by text-to-image generation models such as Midjourney and Stable Diffusion.
下表根据输入和输出类型对几种模型进行了分类:
The following table categorizes several models based on their input and output types:

Spring AI 目前支持以语言、图像和音频处理输入和输出的模型。上表中的最后一行,接受文本作为输入并输出数字,更常被称为嵌入文本,代表 AI 模型中使用的内部数据结构。Spring AI 支持嵌入以实现更高级的用例。
Spring AI currently supports models that process input and output as language, image, and audio. The last row in the previous table, which accepts text as input and outputs numbers, is more commonly known as embedding text and represents the internal data structures used in an AI model. Spring AI has support for embeddings to enable more advanced use cases.
使 GPT 等模型与众不同的是它们的预训练性质,GPT 中的“P”表示 GPT-Chat 生成式预训练 Transformer。此预训练功能将 AI 转变为不需要广泛的机器学习或模型训练背景的通用开发工具。
What sets models like GPT apart is their pre-trained nature, as indicated by the "P" in GPT—Chat Generative Pre-trained Transformer. This pre-training feature transforms AI into a general developer tool that does not require an extensive machine learning or model training background.
Prompts
提示作为基于语言的输入的基础,这些输入指导 AI 模型产生特定输出。对于熟悉 ChatGPT 的人来说,提示可能看起来仅仅是发送给 API 的输入到对话框中的文本。然而,它涵盖的范围远不止这些。在许多 AI 模型中,提示的文本不仅仅是一个简单的字符串。
Prompts serve as the foundation for the language-based inputs that guide an AI model to produce specific outputs. For those familiar with ChatGPT, a prompt might seem like merely the text entered into a dialog box that is sent to the API. However, it encompasses much more than that. In many AI Models, the text for the prompt is not just a simple string.
ChatGPT 的 API 在一个提示内有多个文本输入,并且每个文本输入都被分配了一个角色。例如,有一个系统角色,它告诉模型如何执行并设置交互的上下文。还有用户角色,它通常是来自用户的输入。
ChatGPT’s API has multiple text inputs within a prompt, with each text input being assigned a role. For example, there is the system role, which tells the model how to behave and sets the context for the interaction. There is also the user role, which is typically the input from the user.
设计有效的提示既是一门艺术,也是一门科学。ChatGPT 是为人类对话而设计的。这与使用 SQL 等工具“提出问题”大相径庭。人们必须像与另一个人交谈一样与 AI 模型沟通。
Crafting effective prompts is both an art and a science. ChatGPT was designed for human conversations. This is quite a departure from using something like SQL to "ask a question". One must communicate with the AI model akin to conversing with another person.
这种交互方式的重要性在于,“提示工程”一词已经成为其自己的学科。有许多提高提示有效性的技术也在不断发展。花时间精心制作提示可以极大地改善最终的输出。
Such is the importance of this interaction style that the term "Prompt Engineering" has emerged as its own discipline. There is a burgeoning collection of techniques that improve the effectiveness of prompts. Investing time in crafting a prompt can drastically improve the resulting output.
分享提示已成为一种常见的做法,而且对此主题已开展积极的学术研究。作为创建有效提示有多么违反直觉的一个示例(例如,与 SQL 形成对比),一个 recent research paper 发现,你可以使用的最有效的提示之一以短语 “Take a deep breath and work on this step by step.” 开头,这应该告诉你语言为什么如此重要。我们还没有完全理解如何最有效地利用这一技术的以前迭代,例如 ChatGPT 3.5,更不用说正在开发的新版本。
Sharing prompts has become a communal practice, and there is active academic research being done on this subject. As an example of how counter-intuitive it can be to create an effective prompt (for example, contrasting with SQL), a recent research paper found that one of the most effective prompts you can use starts with the phrase, “Take a deep breath and work on this step by step.” That should give you an indication of why language is so important. We do not yet fully understand how to make the most effective use of previous iterations of this technology, such as ChatGPT 3.5, let alone new versions that are being developed.
Prompt Templates
创建有效的提示涉及建立请求的上下文,并将请求的部分用特定于用户输入的值替换。
Creating effective prompts involves establishing the context of the request and substituting parts of the request with values specific to the user’s input.
此过程使用传统的基于文本的模板引擎来创建和管理提示。Spring AI 为此使用了 OSS 库 StringTemplate。
This process uses traditional text-based template engines for prompt creation and management. Spring AI employs the OSS library StringTemplate for this purpose.
例如,考虑简单的提示模板:
For instance, consider the simple prompt template:
Tell me a {adjective} joke about {content}.
在 Spring AI 中,提示模板可以比作 Spring MVC 架构中的“视图”。通常会提供一个模型对象,即 java.util.Map
,用于填充模板中的占位符。“渲染”后的字符串成为提供给 AI 模型提示的内容。
In Spring AI, prompt templates can be likened to the "View" in Spring MVC architecture.
A model object, typically a java.util.Map
, is provided to populate placeholders within the template.
The "rendered" string becomes the content of the prompt supplied to the AI model.
发送给模型的提示的特定数据格式存在很大差异。提示最初从简单的字符串开始,现在已经演变为包含多个消息,其中每个消息中的每个字符串代表模型的不同角色。
There is considerable variability in the specific data format of the prompt sent to the model. Initially starting as simple strings, prompts have evolved to include multiple messages, where each string in each message represents a distinct role for the model.
Embeddings
使用 Gemini 翻译后的文本如下:嵌入是文本、图像或视频的数值表示,它们捕捉输入之间的关系。
Embeddings are numerical representations of text, images, or videos that capture relationships between inputs.
以下是使用Gemini将这段文字翻译成中文的结果:嵌入的工作原理是将文本、图像和视频转换为浮点数数组,称为向量。这些向量旨在捕捉文本、图像和视频的含义。嵌入数组的长度称为向量的维度。
Embeddings work by converting text, image, and video into arrays of floating point numbers, called vectors. These vectors are designed to capture the meaning of the text, images, and videos. The length of the embedding array is called the vector’s dimensionality.
通过计算两段文本的向量表示之间的数值距离,应用程序可以确定用于生成嵌入向量的对象之间的相似性。
By calculating the numerical distance between the vector representations of two pieces of text, an application can determine the similarity between the objects used to generate the embedding vectors.

作为一个探索 AI 的 Java 开发者,没有必要理解这些向量表示背后的复杂数学理论或具体的实现。只要对它们在 AI 系统中的作用和功能有一个基本的理解就足够了,尤其是在将 AI 功能集成到你的应用程序中时。
As a Java developer exploring AI, it’s not necessary to comprehend the intricate mathematical theories or the specific implementations behind these vector representations. A basic understanding of their role and function within AI systems suffices, particularly when you’re integrating AI functionalities into your applications.
嵌入(Embeddings)在像检索增强生成(RAG)模式这样的实际应用中特别重要。它们能够将数据表示为语义空间中的点,这类似于欧几里得几何的二维空间,但维度更高。这意味着,就像欧几里得几何中平面上的点可以根据它们的坐标靠近或远离一样,在语义空间中,点的接近程度反映了意义上的相似性。关于相似主题的句子在这个多维空间中位置更接近,就像图上彼此靠近的点一样。这种接近性有助于文本分类、语义搜索甚至产品推荐等任务,因为它允许 AI 根据这些概念在这个扩展的语义“景观”中的“位置”来识别和分组相关的概念。
Embeddings are particularly relevant in practical applications like the Retrieval Augmented Generation (RAG) pattern. They enable the representation of data as points in a semantic space, which is akin to the 2-D space of Euclidean geometry, but in higher dimensions. This means just like how points on a plane in Euclidean geometry can be close or far based on their coordinates, in a semantic space, the proximity of points reflects the similarity in meaning. Sentences about similar topics are positioned closer in this multi-dimensional space, much like points lying close to each other on a graph. This proximity aids in tasks like text classification, semantic search, and even product recommendations, as it allows the AI to discern and group related concepts based on their "location" in this expanded semantic landscape.
你可以将这个语义空间看作一个向量。
You can think of this semantic space as a vector.
Tokens
标记是 AI 模型工作原理的基础。在输入时,模型将单词转换为标记。在输出时,它们将标记转换回单词。
Tokens serve as the building blocks of how an AI model works. On input, models convert words to tokens. On output, they convert tokens back to words.
在英语中,一个 token 大约对应一个单词的 75%。作为参考,莎士比亚的全部作品,总计约 90 万个单词,大约相当于 120 万个 token。
In English, one token roughly corresponds to 75% of a word. For reference, Shakespeare’s complete works, totaling around 900,000 words, translate to approximately 1.2 million tokens.

也许更重要的是,Tokens = 金钱。在托管 AI 模型的背景下,你的费用由使用的 token 数量决定。输入和输出都计入总 token 数量。
Perhaps more important is that Tokens = Money. In the context of hosted AI models, your charges are determined by the number of tokens used. Both input and output contribute to the overall token count.
此外,模型还受到令牌限制,这限制了单次 API 调用中处理的文本量。此阈值通常被称为“上下文窗口”。模型不会处理任何超出此限制的文本。
Also, models are subject to token limits, which restrict the amount of text processed in a single API call. This threshold is often referred to as the "context window". The model does not process any text that exceeds this limit.
例如,ChatGPT3 的标记限制为 4K,而 GPT4 提供不同的选项,如 8K、16K 和 32K。Anthropic 的 Claude AI 模型具有 100K 的标记限制,Meta 的最近研究产生了一个 1M 的标记限制模型。
For instance, ChatGPT3 has a 4K token limit, while GPT4 offers varying options, such as 8K, 16K, and 32K. Anthropic’s Claude AI model features a 100K token limit, and Meta’s recent research yielded a 1M token limit model.
要通过 GPT4 总结莎士比亚的收集作品,你需要设计软件工程策略来切分数据并在模型的上下文窗口限制内呈现数据。Spring AI 项目有助于你完成此任务。
To summarize the collected works of Shakespeare with GPT4, you need to devise software engineering strategies to chop up the data and present the data within the model’s context window limits. The Spring AI project helps you with this task.
Structured Output
人工智能模型的输出传统上以 java.lang.String
的形式出现,即使您要求回复是 JSON 格式。它可能是一个正确的 JSON,但它不是一个 JSON 数据结构。它只是一个字符串。此外,要求 “for JSON” 作为提示的一部分并不是 100% 准确的。
The output of AI models traditionally arrives as a java.lang.String
, even if you ask for the reply to be in JSON.
It may be a correct JSON, but it is not a JSON data structure. It is just a string.
Also, asking “for JSON” as part of the prompt is not 100% accurate.
这种复杂性导致了一个专业领域的出现,该领域涉及创建提示以产生预期的输出,然后将生成的简单字符串转换为可用的数据结构以进行应用程序集成。
This intricacy has led to the emergence of a specialized field involving the creation of prompts to yield the intended output, followed by converting the resulting simple string into a usable data structure for application integration.

Structured output conversion 采用精心设计的提示,通常需要与模型进行多次交互才能实现所需的格式。
The Structured output conversion employs meticulously crafted prompts, often necessitating multiple interactions with the model to achieve the desired formatting.
Bringing Your Data & APIs to the AI Model
你如何用 AI 模型不知道的信息来装备 AI 模型?
How can you equip the AI model with information on which it has not been trained?
请注意,GPT 3.5/4.0 数据集仅延展到 2021 年 9 月。因此,该模型表示它不知道需要该日期之后知识的问题的答案。有趣的是,这个数据集大约有 650GB。
Note that the GPT 3.5/4.0 dataset extends only until September 2021. Consequently, the model says that it does not know the answer to questions that require knowledge beyond that date. An interesting bit of trivia is that this dataset is around 650GB.
有三种技术可用于自定义 AI 模型以纳入你的数据:
Three techniques exist for customizing the AI model to incorporate your data:
-
Fine Tuning : 这种传统的机器学习技术涉及调整模型并改变其内部权重。然而,对于机器学习专家来说,这是一个具有挑战性的过程,对于像 GPT 这样大小的模型来说,资源消耗极大。此外,一些模型可能不提供此选项。
-
Fine Tuning: This traditional machine learning technique involves tailoring the model and changing its internal weighting. However, it is a challenging process for machine learning experts and extremely resource-intensive for models like GPT due to their size. Additionally, some models might not offer this option.
-
Prompt Stuffing : 一个更实用的替代方法是将您的数据嵌入到提供给模型的提示中。鉴于模型的令牌限制,需要技术在模型的上下文窗口中呈现相关数据。这种方法俗称 “stuffing the prompt.” 。Spring AI 库可帮助您实现基于 “stuffing the prompt” 技术(也称为 Retrieval Augmented Generation (RAG) )的解决方案。
-
Prompt Stuffing: A more practical alternative involves embedding your data within the prompt provided to the model. Given a model’s token limits, techniques are required to present relevant data within the model’s context window. This approach is colloquially referred to as “stuffing the prompt.” The Spring AI library helps you implement solutions based on the “stuffing the prompt” technique otherwise known as Retrieval Augmented Generation (RAG).

-
Tool Calling : 这种技术允许注册工具(用户定义的服务),将大型语言模型连接到外部系统的 API。Spring AI 极大地简化了您为支持 tool calling 所需编写的代码。
-
Tool Calling: This technique allows registering tools (user-defined services) that connect the large language models to the APIs of external systems. Spring AI greatly simplifies code you need to write to support tool calling.
Retrieval Augmented Generation
一种称为检索增强生成 (RAG) 的技术已经出现,以解决将相关数据合并到提示中以获得准确的 AI 模型响应的挑战。
A technique termed Retrieval Augmented Generation (RAG) has emerged to address the challenge of incorporating relevant data into prompts for accurate AI model responses.
该方法涉及一个批处理样式的编程模型,其中作业会从你的文档中读取非结构化数据,转换数据,然后将其写入向量数据库。在较高层面上,这是一个 ETL(提取、转换和加载)管道。向量数据库用于 RAG 技术的检索部分。
The approach involves a batch processing style programming model, where the job reads unstructured data from your documents, transforms it, and then writes it into a vector database. At a high level, this is an ETL (Extract, Transform and Load) pipeline. The vector database is used in the retrieval part of RAG technique.
将非结构化数据加载到向量数据库时,最重要的转换之一是将原始文档拆分为更小的部分。将原始文档拆分为更小的部分的过程有两个重要的步骤:
As part of loading the unstructured data into the vector database, one of the most important transformations is to split the original document into smaller pieces. The procedure of splitting the original document into smaller pieces has two important steps:
-
将文档拆分为多个部分,同时保留内容的语义边界。例如,对于包含段落和表格的文档,应避免在段落或表格中间拆分文档。对于代码,避免在方法实现过程中拆分代码。
-
Split the document into parts while preserving the semantic boundaries of the content. For example, for a document with paragraphs and tables, one should avoid splitting the document in the middle of a paragraph or table. For code, avoid splitting the code in the middle of a method’s implementation.
-
进一步将文档各个部分拆分为其大小占 AI 模型标记限制很小百分比的部分。
-
Split the document’s parts further into parts whose size is a small percentage of the AI Model’s token limit.
RAG 的下一阶段是处理用户输入。当用户的问题要由 AI 模型回答时,问题和所有“类似的
”文档片段都将放入发送到 AI 模型的提示中。这就是使用向量数据库的原因。它非常擅长查找相似的内容。
The next phase in RAG is processing user input. When a user’s question is to be answered by an AI model, the question and all the “similar” document pieces are placed into the prompt that is sent to the AI model. This is the reason to use a vector database. It is very good at finding similar content.

-
ETL Pipeline 提供了有关协调从数据源提取数据并将其存储在结构化向量存储中的流程的更多信息,确保数据以最佳格式进行检索,以便在将其传递给 AI 模型时使用。
-
The ETL Pipeline provides further information about orchestrating the flow of extracting data from data sources and storing it in a structured vector store, ensuring data is in the optimal format for retrieval when passing it to the AI model.
-
ChatClient - RAG 解释了如何使用
QuestionAnswerAdvisor
在您的应用程序中启用 RAG 功能。 -
The ChatClient - RAG explains how to use the
QuestionAnswerAdvisor
to enable the RAG capability in your application.
Tool Calling
大型语言模型 (LLM) 在训练后就被冻结了,导致知识陈旧,并且它们无法访问或修改外部数据。
Large Language Models (LLMs) are frozen after training, leading to stale knowledge, and they are unable to access or modify external data.
The Tool Calling 机制解决了这些缺点。它允许您将自己的服务注册为工具,以将大型语言模型连接到外部系统的 API。这些系统可以为 LLM 提供实时数据,并代表它们执行数据处理操作。
The Tool Calling mechanism addresses these shortcomings. It allows you to register your own services as tools to connect the large language models to the APIs of external systems. These systems can provide LLMs with real-time data and perform data processing actions on their behalf.
Spring AI 极大地简化了您需要编写的代码以支持工具调用。它为您处理工具调用对话。您可以将您的工具作为 @Tool
注解方法提供,并在您的提示选项中提供它,以使其可用于模型。此外,您可以在单个提示中定义和引用多个工具。
Spring AI greatly simplifies code you need to write to support tool invocation.
It handles the tool invocation conversation for you.
You can provide your tool as a @Tool
-annotated method and provide it in your prompt options to make it available to the model.
Additionally, you can define and reference multiple tools in a single prompt.

-
当我们想让一个工具可用于模型时,我们会将它的定义包含在聊天请求中。每个工具定义都包含一个名称、一个描述以及输入参数的模式。
-
When we want to make a tool available to the model, we include its definition in the chat request. Each tool definition comprises of a name, a description, and the schema of the input parameters.
-
当模型决定调用一个工具时,它会发送一个包含工具名称和根据定义模式建模的输入参数的响应。
-
When the model decides to call a tool, it sends a response with the tool name and the input parameters modeled after the defined schema.
-
应用程序负责使用工具名称来识别并执行带有提供的输入参数的工具。
-
The application is responsible for using the tool name to identify and execute the tool with the provided input parameters.
-
工具调用的结果由应用程序处理。
-
The result of the tool call is processed by the application.
-
应用程序将工具调用结果发送回模型。
-
The application sends the tool call result back to the model.
-
模型使用工具调用结果作为附加上下文生成最终响应。
-
The model generates the final response using the tool call result as additional context.
有关如何将此功能与不同 AI 模型一起使用的更多信息,请参阅 Tool Calling 文档。
Follow the Tool Calling documentation for further information on how to use this feature with different AI models.
Evaluating AI responses
有效评估人工智能系统在响应用户请求时的输出对于确保最终应用程序的准确性和实用性非常重要。几种新兴技术使能够为此目的使用预训练模型本身。
Effectively evaluating the output of an AI system in response to user requests is very important to ensuring the accuracy and usefulness of the final application. Several emerging techniques enable the use of the pre-trained model itself for this purpose.
此评估过程涉及分析生成响应是否与用户的意图和查询上下文保持一致。诸如关联性、连贯性和事实正确性之类的指标用于衡量人工智能生成响应的质量。
This evaluation process involves analyzing whether the generated response aligns with the user’s intent and the context of the query. Metrics such as relevance, coherence, and factual correctness are used to gauge the quality of the AI-generated response.
一种方法包括向模型展示用户的请求和人工智能模型的响应,查询响应是否与提供的数据一致。
One approach involves presenting both the user’s request and the AI model’s response to the model, querying whether the response aligns with the provided data.
此外,利用存储在矢量数据库中的信息作为补充数据可以增强评估过程,帮助确定响应的相关性。
Furthermore, leveraging the information stored in the vector database as supplementary data can enhance the evaluation process, aiding in the determination of response relevance.
Spring AI 项目提供了一个 Evaluator
API,该 API 目前提供访问评估模型响应的基本策略。有关更多信息,请参阅 Evaluation Testing 文档。
The Spring AI project provides an Evaluator
API which currently gives access to basic strategies to evaluate model responses.
Follow the Evaluation Testing documentation for further information.