Transformers (ONNX) Embeddings

`TransformersEmbeddingClient`是 `EmbeddingClient`实现,它使用所选 sentence transformer在本地计算 sentence embeddings

The TransformersEmbeddingClient is an EmbeddingClient implementation that locally computes sentence embeddings using a selected sentence transformer.

它使用 pre-trained 转换器模型,序列化成 Open Neural Network Exchange (ONNX) 格式。

It uses pre-trained transformer models, serialized into the Open Neural Network Exchange (ONNX) format.

Deep Java Library 和 Microsoft ONNX Java Runtime 库用于运行 ONNX 模型和计算 Java 中的嵌入。

The Deep Java Library and the Microsoft ONNX Java Runtime libraries are applied to run the ONNX models and compute the embeddings in Java.

Serialize the Tokenizer and the Transformer Model

要在 Java 中运行,我们需要将分词器和转换器模型序列化为 ONNX 格式。

To run things in Java, we need to serialize the Tokenizer and the Transformer Model into ONNX format.

Serialize with optimum-cli

实现此目标的一种简单且快速的方法是使用 optimum-cli 命令行工具。

One, quick, way to achieve this, is to use the optimum-cli command line tool.

以下代码段准备一个 python 虚拟环境,安装必需的包,然后使用 optimum-cli 序列化(例如导出)指定的模型:

The following snippet prepares a python virtual environment, installs the required packages and serializes (e.g. exports) the specified model using optimum-cli :

python3 -m venv venv
source ./venv/bin/activate
(venv) pip install --upgrade pip
(venv) pip install optimum onnx onnxruntime
(venv) optimum-cli export onnx --generative sentence-transformers/all-MiniLM-L6-v2 onnx-output-folder

该代码段将转换器 ` sentence-transformers/all-MiniLM-L6-v2` 导出到文件夹 onnx-output-folder 中。稍后包含嵌入式客户端使用的文件 tokenizer.jsonmodel.onnx

The snippet exports the sentence-transformers/all-MiniLM-L6-v2 transformer into the onnx-output-folder folder. Later includes the tokenizer.json and model.onnx files used by the embedding client.

你可以选择任何 huggingface 变换器标识符来代替 all-MiniLM-L6-v2 或提供直接的文件路径。

In place of the all-MiniLM-L6-v2 you can pick any huggingface transformer identifier or provide direct file path.

Using the ONNX Transformers models

spring-ai-transformers 项目添加到你的 maven 依赖项:

Add the spring-ai-transformers project to your maven dependencies:

<dependency>
  <groupId>org.springframework.ai</groupId>
  <artifactId>spring-ai-transformers</artifactId>
</dependency>
  1. 参见 Dependency Management 部分,将 Spring AI BOM 添加到你的构建文件中。

Refer to the Dependency Management section to add the Spring AI BOM to your build file.

然后创建一个新的 TransformersEmbeddingClient 实例,并使用 setTokenizerResource(tokenizerJsonUri)setModelResource(modelOnnxUri) 方法设置导出 tokenizer.jsonmodel.onnx 文件的 URI。(支持`classpath:`, file:https: URI 架构)。

then create a new TransformersEmbeddingClient instance and use the setTokenizerResource(tokenizerJsonUri) and setModelResource(modelOnnxUri) methods to set the URIs of the exported tokenizer.json and model.onnx files. (classpath:, file: or https: URI schemas are supported).

如果没有明确设置模型,则 TransformersEmbeddingClient 默认值为 ` sentence-transformers/all-MiniLM-L6-v2`:

If the model is not explicitly set, TransformersEmbeddingClient defaults to sentence-transformers/all-MiniLM-L6-v2:

Dimensions

384

Avg. performance

58.80

Speed

14200 sentences/sec

Size

80MB

以下代码段说明了如何手动使用 TransformersEmbeddingClient

The following snippet illustrates how to use the TransformersEmbeddingClient manually:

TransformersEmbeddingClient embeddingClient = new TransformersEmbeddingClient();

// (optional) defaults to classpath:/onnx/all-MiniLM-L6-v2/tokenizer.json
embeddingClient.setTokenizerResource("classpath:/onnx/all-MiniLM-L6-v2/tokenizer.json");

// (optional) defaults to classpath:/onnx/all-MiniLM-L6-v2/model.onnx
embeddingClient.setModelResource("classpath:/onnx/all-MiniLM-L6-v2/model.onnx");

// (optional) defaults to ${java.io.tmpdir}/spring-ai-onnx-model
// Only the http/https resources are cached by default.
embeddingClient.setResourceCacheDirectory("/tmp/onnx-zoo");

// (optional) Set the tokenizer padding if you see an errors like:
// "ai.onnxruntime.OrtException: Supplied array is ragged, ..."
embeddingClient.setTokenizerOptions(Map.of("padding", "true"));

embeddingClient.afterPropertiesSet();

List<List<Double>> embeddings = embeddingClient.embed(List.of("Hello world", "World is big"));

手动创建时,必须在设置属性并使用客户端之前调用 afterPropertiesSet()

that when created manually, you must call the afterPropertiesSet() after setting the properties and before using the client.

第一个 embed() 调用下载大型 ONNX 模型并将其缓存在本地文件系统上。因此,第一次调用可能比平时花费的时间长。使用 #setResourceCacheDirectory(<path>) 方法设置存储 ONNX 模型的本地文件夹。默认缓存文件夹是 ${java.io.tmpdir}/spring-ai-onnx-model

The first embed() call downloads the large ONNX model and caches it on the local file system. Therefore, the first call might take longer than usual. Use the #setResourceCacheDirectory(<path>) method to set the local folder where the ONNX models as stored. The default cache folder is ${java.io.tmpdir}/spring-ai-onnx-model.

更为方便(也是首选)的是将 TransformersEmbeddingClient 创建为一个 Bean。然后你就不必手动调用 afterPropertiesSet()

It is more convenient (and preferred) to create the TransformersEmbeddingClient as a Bean. Then you don’t have to call the afterPropertiesSet() manually.

@Bean
public EmbeddingClient embeddingClient() {
   return new TransformersEmbeddingClient();
}

Transformers Embedding Spring Boot Starter

你可以使用以下 Spring Boot starter 启动并自动装配 TransformersEmbeddingClient

You can bootstrap and autowire the TransformersEmbeddingClient with the following Spring Boot starter:

<dependency>
   <groupId>org.springframework.ai</groupId>
   <artifactId>spring-ai-transformers-spring-boot-starter</artifactId>
</dependency>
  1. 参见 Dependency Management 部分,将 Spring AI BOM 添加到你的构建文件中。

Refer to the Dependency Management section to add the Spring AI BOM to your build file.

要配置它,请使用 spring.ai.embedding.transformer.* 属性。

To configure it, use the spring.ai.embedding.transformer.* properties.

例如,将以下内容添加到 application.properties 文件中以使用文本嵌入模型 ` intfloat/e5-small-v2` 配置客户端:

For example, add this to your application.properties file to configure the client with the intfloat/e5-small-v2 text embedding model:

spring.ai.embedding.transformer.onnx.modelUri=https://huggingface.co/intfloat/e5-small-v2/resolve/main/model.onnx
spring.ai.embedding.transformer.tokenizer.uri=https://huggingface.co/intfloat/e5-small-v2/raw/main/tokenizer.json

受支持的属性的完整列表:

The complete list of supported properties are:

Property Description Default

spring.ai.embedding.transformer.enabled

Enable the Transformer Embedding client.

true

spring.ai.embedding.transformer.tokenizer.uri

URI of a pre-trained HuggingFaceTokenizer created by the ONNX engine (e.g. tokenizer.json).

onnx/all-MiniLM-L6-v2/tokenizer.json

spring.ai.embedding.transformer.tokenizer.options

HuggingFaceTokenizer options such as '`addSpecialTokens’, '`modelMaxLength’, '`truncation’, '`padding’, '`maxLength’, '`stride’, '`padToMultipleOf’. Leave empty to fallback to the defaults.

empty

spring.ai.embedding.transformer.cache.enabled

Enable remote Resource caching.

true

spring.ai.embedding.transformer.cache.directory

Directory path to cache remote resources, such as the ONNX models

${java.io.tmpdir}/spring-ai-onnx-model

spring.ai.embedding.transformer.onnx.modelUri

Existing, pre-trained ONNX model.

onnx/all-MiniLM-L6-v2/model.onnx

spring.ai.embedding.transformer.onnx.gpuDeviceId

The GPU device ID to execute on. Only applicable if >= 0. Ignored otherwise.

-1

spring.ai.embedding.transformer.metadataMode

Specifies what parts of the Documents content and metadata will be used for computing the embeddings.

NONE

如果您看到类似 Caused by: ai.onnxruntime.OrtException: Supplied array is ragged,.. 的错误,则还需要按如下方式在 application.properties 中启用标记器填充:

If you see an error like Caused by: ai.onnxruntime.OrtException: Supplied array is ragged,.., you need to also enable the tokenizer padding in application.properties as follows:

spring.ai.embedding.transformer.tokenizer.options.padding=true