Transformers (ONNX) Embeddings

` TransformersEmbeddingModel ` 是 ` EmbeddingModel ` 的一个实现,它使用选定的 ` sentence transformer ` 在本地计算 ` sentence embeddings `。

The TransformersEmbeddingModel is an EmbeddingModel implementation that locally computes sentence embeddings using a selected sentence transformer.

您可以使用任何 ` HuggingFace Embedding model `。

You can use any HuggingFace Embedding model.

它使用 pre-trained 转换器模型,序列化成 Open Neural Network Exchange (ONNX) 格式。

It uses pre-trained transformer models, serialized into the Open Neural Network Exchange (ONNX) format.

Deep Java Library 和 Microsoft ONNX Java Runtime 库用于运行 ONNX 模型和计算 Java 中的嵌入。

The Deep Java Library and the Microsoft ONNX Java Runtime libraries are applied to run the ONNX models and compute the embeddings in Java.

Prerequisites

要在 Java 中运行,我们需要将 ` serialize the Tokenizer and the Transformer Model ` 转换为 ` ONNX ` 格式。

To run things in Java, we need to serialize the Tokenizer and the Transformer Model into ONNX format.

使用 optimum-cli 序列化 - 实现此目的的一种快速方法是使用 ` optimum-cli ` 命令行工具。以下代码片段准备一个 Python 虚拟环境,安装所需的包并使用 ` optimum-cli ` 序列化(例如导出)指定的模型:

Serialize with optimum-cli - One, quick, way to achieve this, is to use the optimum-cli command line tool. The following snippet prepares a python virtual environment, installs the required packages and serializes (e.g. exports) the specified model using optimum-cli :

python3 -m venv venv
source ./venv/bin/activate
(venv) pip install --upgrade pip
(venv) pip install optimum onnx onnxruntime sentence-transformers
(venv) optimum-cli export onnx --model sentence-transformers/all-MiniLM-L6-v2 onnx-output-folder

此代码片段将 ` sentence-transformers/all-MiniLM-L6-v2 ` 转换器导出到 ` onnx-output-folder ` 文件夹。后者包含嵌入模型使用的 ` tokenizer.json ` 和 ` model.onnx ` 文件。

The snippet exports the sentence-transformers/all-MiniLM-L6-v2 transformer into the onnx-output-folder folder. The latter includes the tokenizer.json and model.onnx files used by the embedding model.

你可以选择任何 huggingface 变换器标识符来代替 all-MiniLM-L6-v2 或提供直接的文件路径。

In place of the all-MiniLM-L6-v2 you can pick any huggingface transformer identifier or provide direct file path.

Auto-configuration

Spring AI 自动配置、启动器模块的工件名称发生了重大变化。请参阅 upgrade notes 以获取更多信息。

There has been a significant change in the Spring AI auto-configuration, starter modules' artifact names. Please refer to the upgrade notes for more information.

Spring AI 为 ONNX Transformer 嵌入模型提供 Spring Boot 自动配置。要启用它,请将以下依赖项添加到您项目的 Maven ` pom.xml ` 文件中:

Spring AI provides Spring Boot auto-configuration for the ONNX Transformer Embedding Model. To enable it add the following dependency to your project’s Maven pom.xml file:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-transformers</artifactId>
</dependency>

或添加到 Gradle build.gradle 构建文件中。

or to your Gradle build.gradle build file.

dependencies {
    implementation 'org.springframework.ai:spring-ai-starter-model-transformers'
}

请参阅 ` Dependency Management ` 部分,将 Spring AI BOM 添加到您的构建文件中。请参阅 ` Artifact Repositories ` 部分,将这些仓库添加到您的构建系统中。

Refer to the Dependency Management section to add the Spring AI BOM to your build file. Refer to the Artifact Repositories section to add these repositories to your build system.

要配置它,请使用 spring.ai.embedding.transformer.* 属性。

To configure it, use the spring.ai.embedding.transformer.* properties.

例如,将以下内容添加到 application.properties 文件中以使用文本嵌入模型 ` intfloat/e5-small-v2` 配置客户端:

For example, add this to your application.properties file to configure the client with the intfloat/e5-small-v2 text embedding model:

spring.ai.embedding.transformer.onnx.modelUri=https://huggingface.co/intfloat/e5-small-v2/resolve/main/model.onnx
spring.ai.embedding.transformer.tokenizer.uri=https://huggingface.co/intfloat/e5-small-v2/raw/main/tokenizer.json

受支持的属性的完整列表:

The complete list of supported properties are:

Embedding Properties

嵌入自动配置的启用和禁用现在通过前缀为 spring.ai.azure.openai.embedding 的顶级属性进行配置。

Enabling and disabling of the embedding auto-configurations are now configured via top level properties with the prefix spring.ai.model.embedding.

要启用,请设置 spring.ai.model.embedding=transformers(默认启用)。

To enable, spring.ai.model.embedding=transformers (It is enabled by default)

要禁用,请设置 spring.ai.model.embedding=none(或任何不匹配 transformers 的值)。

To disable, spring.ai.model.embedding=none (or any value which doesn’t match transformers)

此更改旨在允许配置多个模型。

This change is done to allow configuration of multiple models.

Property Description Default

spring.ai.embedding.transformer.enabled (Removed and no longer valid)

Enable the Transformer Embedding model.

true

spring.ai.model.embedding

Enable the Transformer Embedding model.

transformers

spring.ai.embedding.transformer.tokenizer.uri

URI of a pre-trained HuggingFaceTokenizer created by the ONNX engine (e.g. tokenizer.json).

onnx/all-MiniLM-L6-v2/tokenizer.json

spring.ai.embedding.transformer.tokenizer.options

HuggingFaceTokenizer options such as '`addSpecialTokens’, '`modelMaxLength’, '`truncation’, '`padding’, '`maxLength’, '`stride’, '`padToMultipleOf’. Leave empty to fallback to the defaults.

empty

spring.ai.embedding.transformer.cache.enabled

Enable remote Resource caching.

true

spring.ai.embedding.transformer.cache.directory

Directory path to cache remote resources, such as the ONNX models

${java.io.tmpdir}/spring-ai-onnx-model

spring.ai.embedding.transformer.onnx.modelUri

Existing, pre-trained ONNX model.

onnx/all-MiniLM-L6-v2/model.onnx

spring.ai.embedding.transformer.onnx.modelOutputName

The ONNX model’s output node name, which we’ll use for embedding calculation.

last_hidden_state

spring.ai.embedding.transformer.onnx.gpuDeviceId

The GPU device ID to execute on. Only applicable if >= 0. Ignored otherwise.(Requires additional onnxruntime_gpu dependency)

-1

spring.ai.embedding.transformer.metadataMode

Specifies what parts of the Documents content and metadata will be used for computing the embeddings.

NONE

Errors and special cases

如果您看到类似 Caused by: ai.onnxruntime.OrtException: Supplied array is ragged,.. 的错误,则还需要按如下方式在 application.properties 中启用标记器填充:

If you see an error like Caused by: ai.onnxruntime.OrtException: Supplied array is ragged,.., you need to also enable the tokenizer padding in application.properties as follows:

spring.ai.embedding.transformer.tokenizer.options.padding=true

如果您收到类似 ` The generative output names don’t contain expected: last_hidden_state. Consider one of the available model outputs: token_embeddings, …​. ` 的错误,则需要根据您的模型将模型输出名称设置为正确的值。请考虑错误消息中列出的名称。例如:

If you get an error like The generative output names don’t contain expected: last_hidden_state. Consider one of the available model outputs: token_embeddings, …​., you need to set the model output name to a correct value per your models. Consider the names listed in the error message. For example:

spring.ai.embedding.transformer.onnx.modelOutputName=token_embeddings

如果您收到类似 ` ai.onnxruntime.OrtException: Error code - ORT_FAIL - message: Deserialize tensor onnx::MatMul_10319 failed.GetFileLength for ./model.onnx_data failed:Invalid fd was supplied: -1 ` 的错误,这意味着您的模型大于 2GB 并已序列化为两个文件:` model.onnx ` 和 ` model.onnx_data `。

If you get an error like ai.onnxruntime.OrtException: Error code - ORT_FAIL - message: Deserialize tensor onnx::MatMul_10319 failed.GetFileLength for ./model.onnx_data failed:Invalid fd was supplied: -1, that means that you model is larger than 2GB and is serialized in two files: model.onnx and model.onnx_data.

model.onnx_data 被称为 External Data ,预计与 model.onnx 位于同一目录下。

The model.onnx_data is called External Data and is expected to be under the same directory of the model.onnx.

目前唯一的解决方法是将大型 model.onnx_data 复制到您运行 Boot 应用程序的文件夹中。

Currently the only workaround is to copy the large model.onnx_data in the folder you run your Boot application.

如果您收到类似 ai.onnxruntime.OrtException: Error code - ORT_EP_FAIL - message: Failed to find CUDA shared provider 的错误,这意味着您正在使用 GPU 参数 spring.ai.embedding.transformer.onnx.gpuDeviceId ,但缺少 onnxruntime_gpu 依赖项。

If you get an error like ai.onnxruntime.OrtException: Error code - ORT_EP_FAIL - message: Failed to find CUDA shared provider, that means that you are using the GPU parameters spring.ai.embedding.transformer.onnx.gpuDeviceId , but the onnxruntime_gpu dependency are missing.

<dependency>
    <groupId>com.microsoft.onnxruntime</groupId>
    <artifactId>onnxruntime_gpu</artifactId>
</dependency>

请根据 CUDA 版本 ( ONNX Java Runtime ) 选择合适的 onnxruntime_gpu 版本。

Please select the appropriate onnxruntime_gpu version based on the CUDA version(ONNX Java Runtime).

Manual Configuration

如果您未使用 Spring Boot,可以手动配置 Onnx Transformers Embedding Model。为此,将 spring-ai-transformers 依赖项添加到您项目的 Maven pom.xml 文件中:

If you are not using Spring Boot, you can manually configure the Onnx Transformers Embedding Model. For this add the spring-ai-transformers dependency to your project’s Maven pom.xml file:

<dependency>
  <groupId>org.springframework.ai</groupId>
  <artifactId>spring-ai-transformers</artifactId>
</dependency>
  1. 参见 Dependency Management 部分,将 Spring AI BOM 添加到你的构建文件中。

Refer to the Dependency Management section to add the Spring AI BOM to your build file.

然后创建一个新的 TransformersEmbeddingModel 实例,并使用 setTokenizerResource(tokenizerJsonUri)setModelResource(modelOnnxUri) 方法设置导出的 tokenizer.jsonmodel.onnx 文件的 URI。(支持 classpath:file:https: URI 架构)。

then create a new TransformersEmbeddingModel instance and use the setTokenizerResource(tokenizerJsonUri) and setModelResource(modelOnnxUri) methods to set the URIs of the exported tokenizer.json and model.onnx files. (classpath:, file: or https: URI schemas are supported).

如果未明确设置模型, TransformersEmbeddingModel 默认为 sentence-transformers/all-MiniLM-L6-v2

If the model is not explicitly set, TransformersEmbeddingModel defaults to sentence-transformers/all-MiniLM-L6-v2:

Dimensions

384

Avg. performance

58.80

Speed

14200 sentences/sec

Size

80MB

以下代码片段演示了如何手动使用 TransformersEmbeddingModel

The following snippet illustrates how to use the TransformersEmbeddingModel manually:

TransformersEmbeddingModel embeddingModel = new TransformersEmbeddingModel();

// (optional) defaults to classpath:/onnx/all-MiniLM-L6-v2/tokenizer.json
embeddingModel.setTokenizerResource("classpath:/onnx/all-MiniLM-L6-v2/tokenizer.json");

// (optional) defaults to classpath:/onnx/all-MiniLM-L6-v2/model.onnx
embeddingModel.setModelResource("classpath:/onnx/all-MiniLM-L6-v2/model.onnx");

// (optional) defaults to ${java.io.tmpdir}/spring-ai-onnx-model
// Only the http/https resources are cached by default.
embeddingModel.setResourceCacheDirectory("/tmp/onnx-zoo");

// (optional) Set the tokenizer padding if you see an errors like:
// "ai.onnxruntime.OrtException: Supplied array is ragged, ..."
embeddingModel.setTokenizerOptions(Map.of("padding", "true"));

embeddingModel.afterPropertiesSet();

List<List<Double>> embeddings = this.embeddingModel.embed(List.of("Hello world", "World is big"));

如果您手动创建 TransformersEmbeddingModel 的实例,则必须在设置属性之后和使用客户端之前调用 afterPropertiesSet() 方法。

If you create an instance of TransformersEmbeddingModel manually, you must call the afterPropertiesSet() method after setting the properties and before using the client.

第一个 embed() 调用下载大型 ONNX 模型并将其缓存在本地文件系统上。因此,第一次调用可能比平时花费的时间长。使用 #setResourceCacheDirectory(<path>) 方法设置存储 ONNX 模型的本地文件夹。默认缓存文件夹是 ${java.io.tmpdir}/spring-ai-onnx-model

The first embed() call downloads the large ONNX model and caches it on the local file system. Therefore, the first call might take longer than usual. Use the #setResourceCacheDirectory(<path>) method to set the local folder where the ONNX models as stored. The default cache folder is ${java.io.tmpdir}/spring-ai-onnx-model.

更方便(且首选)的方法是将 TransformersEmbeddingModel 创建为 Bean 。这样您就不必手动调用 afterPropertiesSet()

It is more convenient (and preferred) to create the TransformersEmbeddingModel as a Bean. Then you don’t have to call the afterPropertiesSet() manually.

@Bean
public EmbeddingModel embeddingModel() {
   return new TransformersEmbeddingModel();
}