PostgresML Embeddings

Spring AI 支持 PostgresML 文本嵌入模型。

Spring AI supports the PostgresML text embeddings models.

嵌入是文本的数字表示。它们用于将单词和句子表示为向量,即数字数组。嵌入可用于通过使用距离度量比较数字向量相似性来查找相似的文本,或者它们可以用作其他机器学习模型的输入特征,因为大多数算法不能直接使用文本。

Embeddings are a numeric representation of text. They are used to represent words and sentences as vectors, an array of numbers. Embeddings can be used to find similar pieces of text, by comparing the similarity of the numeric vectors using a distance measure, or they can be used as input features for other machine learning models, since most algorithms can’t use text directly.

在 PostgresML 内,许多经过预先训练的 LLM 可用于从文本中生成嵌入。您可浏览所有可用的 models 来在 Hugging Face 上找到最佳解决方案。

Many pre-trained LLMs can be used to generate embeddings from text within PostgresML. You can browse all the models available to find the best solution on Hugging Face.

Add Repositories and BOM

Spring AI 工件发布在 Maven Central 和 Spring Snapshot 存储库中。请参阅“添加 Spring AI 仓库”部分,将这些仓库添加到您的构建系统。

Spring AI artifacts are published in Maven Central and Spring Snapshot repositories. Refer to the Artifact Repositories section to add these repositories to your build system.

为了帮助进行依赖项管理,Spring AI 提供了一个 BOM(物料清单)以确保在整个项目中使用一致版本的 Spring AI。有关将 Spring AI BOM 添加到你的构建系统的说明,请参阅 Dependency Management 部分。

To help with dependency management, Spring AI provides a BOM (bill of materials) to ensure that a consistent version of Spring AI is used throughout the entire project. Refer to the Dependency Management section to add the Spring AI BOM to your build system.

Auto-configuration

Spring AI 自动配置、启动器模块的工件名称发生了重大变化。请参阅 upgrade notes 以获取更多信息。

There has been a significant change in the Spring AI auto-configuration, starter modules' artifact names. Please refer to the upgrade notes for more information.

Spring AI 为 Azure PostgresML 嵌入模型提供了 Spring Boot 自动配置。要启用它,请将以下依赖项添加到您的项目的 Maven pom.xml 文件中:

Spring AI provides Spring Boot auto-configuration for the Azure PostgresML Embedding Model. To enable it add the following dependency to your project’s Maven pom.xml file:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-postgresml-embedding</artifactId>
</dependency>

或添加到 Gradle build.gradle 构建文件中。

or to your Gradle build.gradle build file.

dependencies {
    implementation 'org.springframework.ai:spring-ai-starter-model-postgresml-embedding'
}
  1. 参见 Dependency Management 部分,将 Spring AI BOM 添加到你的构建文件中。

Refer to the Dependency Management section to add the Spring AI BOM to your build file.

使用 spring.ai.postgresml.embedding.options.* 属性来配置您的 PostgresMlEmbeddingModel 。链接

Use the spring.ai.postgresml.embedding.options.* properties to configure your PostgresMlEmbeddingModel. links

Embedding Properties

嵌入自动配置的启用和禁用现在通过前缀为 spring.ai.azure.openai.embedding 的顶级属性进行配置。

Enabling and disabling of the embedding auto-configurations are now configured via top level properties with the prefix spring.ai.model.embedding.

要启用,spring.ai.model.embedding=postgresml(默认已启用)

To enable, spring.ai.model.embedding=postgresml (It is enabled by default)

要禁用,spring.ai.model.embedding=none(或任何不匹配 postgresml 的值)

To disable, spring.ai.model.embedding=none (or any value which doesn’t match postgresml)

此更改旨在允许配置多个模型。

This change is done to allow configuration of multiple models.

前缀 spring.ai.postgresml.embedding 是用于配置 PostgresML 嵌入的 EmbeddingModel 实现的属性前缀。

The prefix spring.ai.postgresml.embedding is property prefix that configures the EmbeddingModel implementation for PostgresML embeddings.

Property

Description

Default

spring.ai.postgresml.embedding.enabled (Removed and no longer valid)

Enable PostgresML embedding model.

true

spring.ai.model.embedding

Enable PostgresML embedding model.

postgresml

spring.ai.postgresml.embedding.create-extension

Execute the SQL 'CREATE EXTENSION IF NOT EXISTS pgml' to enable the extesnion

false

spring.ai.postgresml.embedding.options.transformer

The Hugging Face transformer model to use for the embedding.

distilbert-base-uncased

spring.ai.postgresml.embedding.options.kwargs

Additional transformer specific options.

empty map

spring.ai.postgresml.embedding.options.vectorType

PostgresML vector type to use for the embedding. Two options are supported: PG_ARRAY and PG_VECTOR.

PG_ARRAY

spring.ai.postgresml.embedding.options.metadataMode

Document metadata aggregation mode

EMBED

所有以 spring.ai.postgresml.embedding.options 为前缀的属性都可以通过在 EmbeddingRequest 调用中添加特定于请求的 Runtime Options 在运行时覆盖。

All properties prefixed with spring.ai.postgresml.embedding.options can be overridden at runtime by adding a request specific Runtime Options to the EmbeddingRequest call.

Runtime Options

使用 PostgresMlEmbeddingOptions.java 配置 PostgresMlEmbeddingModel 的选项,例如要使用的模型等。

Use the PostgresMlEmbeddingOptions.java to configure the PostgresMlEmbeddingModel with options, such as the model to use and etc.

在启动时,您可以将 PostgresMlEmbeddingOptions 传递给 PostgresMlEmbeddingModel 构造函数,以配置用于所有嵌入请求的默认选项。

On start you can pass a PostgresMlEmbeddingOptions to the PostgresMlEmbeddingModel constructor to configure the default options used for all embedding requests.

在运行时,可以使用 EmbeddingRequest 中的 PostgresMlEmbeddingOptions 覆盖默认选项。

At run-time you can override the default options, using a PostgresMlEmbeddingOptions in your EmbeddingRequest.

例如,要覆盖特定请求的默认模型名称:

For example to override the default model name for a specific request:

EmbeddingResponse embeddingResponse = embeddingModel.call(
    new EmbeddingRequest(List.of("Hello World", "World is big and salvation is near"),
            PostgresMlEmbeddingOptions.builder()
                .transformer("intfloat/e5-small")
                .vectorType(VectorType.PG_ARRAY)
                .kwargs(Map.of("device", "gpu"))
                .build()));

Sample Controller

这将创建一个 EmbeddingModel 实现,您可以将其注入到您的类中。这是一个使用 EmbeddingModel 实现的简单 @Controller 类的示例。

This will create a EmbeddingModel implementation that you can inject into your class. Here is an example of a simple @Controller class that uses the EmbeddingModel implementation.

spring.ai.postgresml.embedding.options.transformer=distilbert-base-uncased
spring.ai.postgresml.embedding.options.vectorType=PG_ARRAY
spring.ai.postgresml.embedding.options.metadataMode=EMBED
spring.ai.postgresml.embedding.options.kwargs.device=cpu
@RestController
public class EmbeddingController {

    private final EmbeddingModel embeddingModel;

    @Autowired
    public EmbeddingController(EmbeddingModel embeddingModel) {
        this.embeddingModel = embeddingModel;
    }

    @GetMapping("/ai/embedding")
    public Map embed(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
        EmbeddingResponse embeddingResponse = this.embeddingModel.embedForResponse(List.of(message));
        return Map.of("embedding", embeddingResponse);
    }
}

Manual configuration

您可以使用 PostgresMlEmbeddingModel 手动创建 PostgresMlEmbeddingModel,而不是使用 Spring Boot 自动配置。为此,请将 spring-ai-postgresml 依赖项添加到您的项目的 Maven pom.xml 文件中:

Instead of using the Spring Boot auto-configuration, you can create the PostgresMlEmbeddingModel manually. For this add the spring-ai-postgresml dependency to your project’s Maven pom.xml file:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-postgresml</artifactId>
</dependency>

或添加到 Gradle build.gradle 构建文件中。

or to your Gradle build.gradle build file.

dependencies {
    implementation 'org.springframework.ai:spring-ai-postgresml'
}
  1. 参见 Dependency Management 部分,将 Spring AI BOM 添加到你的构建文件中。

Refer to the Dependency Management section to add the Spring AI BOM to your build file.

接下来,创建一个 PostgresMlEmbeddingModel 实例并使用它来计算两个输入文本之间的相似度:

Next, create an PostgresMlEmbeddingModel instance and use it to compute the similarity between two input texts:

var jdbcTemplate = new JdbcTemplate(dataSource); // your posgresml data source

PostgresMlEmbeddingModel embeddingModel = new PostgresMlEmbeddingModel(this.jdbcTemplate,
        PostgresMlEmbeddingOptions.builder()
            .transformer("distilbert-base-uncased") // huggingface transformer model name.
            .vectorType(VectorType.PG_VECTOR) //vector type in PostgreSQL.
            .kwargs(Map.of("device", "cpu")) // optional arguments.
            .metadataMode(MetadataMode.EMBED) // Document metadata mode.
            .build());

embeddingModel.afterPropertiesSet(); // initialize the jdbc template and database.

EmbeddingResponse embeddingResponse = this.embeddingModel
	.embedForResponse(List.of("Hello World", "World is big and salvation is near"));

手动创建时,您必须在设置属性并使用客户端之前调用 afterPropertiesSet() 。更方便(也更推荐)将 PostgresMlEmbeddingModel 创建为 @Bean 。然后您不必手动调用 afterPropertiesSet()

When created manually, you must call the afterPropertiesSet() after setting the properties and before using the client. It is more convenient (and preferred) to create the PostgresMlEmbeddingModel as a @Bean. Then you don’t have to call the afterPropertiesSet() manually:

@Bean
public EmbeddingModel embeddingModel(JdbcTemplate jdbcTemplate) {
    return new PostgresMlEmbeddingModel(jdbcTemplate,
        PostgresMlEmbeddingOptions.builder()
             ....
            .build());
}