Weaviate

本节将引导您设置Weaviate VectorStore以存储文档嵌入并执行相似性搜索。

This section walks you through setting up the Weaviate VectorStore to store document embeddings and perform similarity searches.

Weaviate 是一个开源向量数据库,它允许您存储数据对象和来自您喜爱的ML模型的向量嵌入,并无缝扩展到数十亿个数据对象。它提供工具来存储文档嵌入、内容和元数据,并搜索这些嵌入,包括元数据过滤。

Weaviate is an open-source vector database that allows you to store data objects and vector embeddings from your favorite ML-models and scale seamlessly into billions of data objects. It provides tools to store document embeddings, content, and metadata and to search through those embeddings, including metadata filtering.

Prerequisites

  • 如果需要,需要为 EmbeddingModel 提供 API 密钥,以生成由 WeaviateVectorStore 存储的嵌入。

  • If required, an API key for the EmbeddingModel to generate the embeddings stored by the WeaviateVectorStore.

Dependencies

Spring AI 自动配置、启动器模块的工件名称发生了重大变化。请参阅 upgrade notes 以获取更多信息。

There has been a significant change in the Spring AI auto-configuration, starter modules' artifact names. Please refer to the upgrade notes for more information.

将 Weaviate 向量存储依赖项添加到您的项目中:

Add the Weaviate Vector Store dependency to your project:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-weaviate-store</artifactId>
</dependency>

或添加到 Gradle build.gradle 构建文件中。

or to your Gradle build.gradle build file.

dependencies {
    implementation 'org.springframework.ai:spring-ai-weaviate-store'
}
  1. 参见 Dependency Management 部分,将 Spring AI BOM 添加到你的构建文件中。

Refer to the Dependency Management section to add the Spring AI BOM to your build file.

Configuration

要连接到 Weaviate 并使用 WeaviateVectorStore ,您需要提供实例的访问详细信息。配置可以通过 Spring Boot 的 application.properties 提供:

To connect to Weaviate and use the WeaviateVectorStore, you need to provide access details for your instance. Configuration can be provided via Spring Boot’s application.properties:

spring.ai.vectorstore.weaviate.host=<host_of_your_weaviate_instance>
spring.ai.vectorstore.weaviate.scheme=<http_or_https>
spring.ai.vectorstore.weaviate.api-key=<your_api_key>
# API key if needed, e.g. OpenAI
spring.ai.openai.api-key=<api-key>

如果您倾向于使用环境变量来存储 API 密钥等敏感信息,您有多种选择:

If you prefer to use environment variables for sensitive information like API keys, you have multiple options:

Option 1: Using Spring Expression Language (SpEL)

您可以使用自定义环境变量名称并在应用程序配置中引用它们:

You can use custom environment variable names and reference them in your application configuration:

# In application.yml
spring:
  ai:
    vectorstore:
      weaviate:
        host: ${WEAVIATE_HOST}
        scheme: ${WEAVIATE_SCHEME}
        api-key: ${WEAVIATE_API_KEY}
    openai:
      api-key: ${OPENAI_API_KEY}
# In your environment or .env file
export WEAVIATE_HOST=<host_of_your_weaviate_instance>
export WEAVIATE_SCHEME=<http_or_https>
export WEAVIATE_API_KEY=<your_api_key>
export OPENAI_API_KEY=<api-key>

Option 2: Accessing Environment Variables Programmatically

或者,您可以在 Java 代码中访问环境变量:

Alternatively, you can access environment variables in your Java code:

String weaviateApiKey = System.getenv("WEAVIATE_API_KEY");
String openAiApiKey = System.getenv("OPENAI_API_KEY");

如果您选择创建 shell 脚本来管理环境变量,请务必在启动应用程序之前通过“source”该文件来运行它,即 source <your_script_name>.sh

If you choose to create a shell script to manage your environment variables, be sure to run it prior to starting your application by "sourcing" the file, i.e. source <your_script_name>.sh.

Auto-configuration

Spring AI 为 Weaviate 向量存储提供了 Spring Boot 自动配置。要启用它,请将以下依赖项添加到您项目的 Maven pom.xml 文件中:

Spring AI provides Spring Boot auto-configuration for the Weaviate Vector Store. To enable it, add the following dependency to your project’s Maven pom.xml file:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-vector-store-weaviate</artifactId>
</dependency>

或添加到 Gradle build.gradle 构建文件中。

or to your Gradle build.gradle build file.

dependencies {
    implementation 'org.springframework.ai:spring-ai-starter-vector-store-weaviate'
}
  1. 参见 Dependency Management 部分,将 Spring AI BOM 添加到你的构建文件中。

Refer to the Dependency Management section to add the Spring AI BOM to your build file.

请查看向量存储的 configuration parameters 列表,了解默认值和配置选项。

Please have a look at the list of _weaviatevectorstore_properties for the vector store to learn about the default values and configuration options.

将Maven Central和/或Snapshot存储库添加到您的构建文件中,请参阅 Artifact Repositories 部分。

Refer to the Artifact Repositories section to add Maven Central and/or Snapshot Repositories to your build file.

此外,您还需要一个配置好的 EmbeddingModel bean。有关更多信息,请参阅 EmbeddingModel 部分。

Additionally, you will need a configured EmbeddingModel bean. Refer to the EmbeddingModel section for more information.

以下是所需 bean 的示例:

Here is an example of the required bean:

@Bean
public EmbeddingModel embeddingModel() {
    // Retrieve API key from a secure source or environment variable
    String apiKey = System.getenv("OPENAI_API_KEY");

    // Can be any other EmbeddingModel implementation
    return new OpenAiEmbeddingModel(OpenAiApi.builder().apiKey(apiKey).build());
}

现在您可以将 WeaviateVectorStore 作为向量存储在应用程序中自动装配。

Now you can auto-wire the WeaviateVectorStore as a vector store in your application.

Manual Configuration

除了使用 Spring Boot 自动配置,您还可以使用构建器模式手动配置 WeaviateVectorStore

Instead of using Spring Boot auto-configuration, you can manually configure the WeaviateVectorStore using the builder pattern:

@Bean
public WeaviateClient weaviateClient() {
    return new WeaviateClient(new Config("http", "localhost:8080"));
}

@Bean
public VectorStore vectorStore(WeaviateClient weaviateClient, EmbeddingModel embeddingModel) {
    return WeaviateVectorStore.builder(weaviateClient, embeddingModel)
        .objectClass("CustomClass")                    // Optional: defaults to "SpringAiWeaviate"
        .consistencyLevel(ConsistentLevel.QUORUM)      // Optional: defaults to ConsistentLevel.ONE
        .filterMetadataFields(List.of(                 // Optional: fields that can be used in filters
            MetadataField.text("country"),
            MetadataField.number("year")))
        .build();
}

Metadata filtering

您还可以将通用、可移植的 metadata filters 与 Weaviate 存储结合使用。

You can leverage the generic, portable metadata filters with Weaviate store as well.

例如,你可以使用文本表达式语言:

For example, you can use either the text expression language:

vectorStore.similaritySearch(
    SearchRequest.builder()
        .query("The World")
        .topK(TOP_K)
        .similarityThreshold(SIMILARITY_THRESHOLD)
        .filterExpression("country in ['UK', 'NL'] && year >= 2020").build());

或使用 Filter.Expression DSL 以编程方式:

or programmatically using the Filter.Expression DSL:

FilterExpressionBuilder b = new FilterExpressionBuilder();

vectorStore.similaritySearch(SearchRequest.builder()
    .query("The World")
    .topK(TOP_K)
    .similarityThreshold(SIMILARITY_THRESHOLD)
    .filterExpression(b.and(
        b.in("country", "UK", "NL"),
        b.gte("year", 2020)).build()).build());

这些(可移植的)过滤器表达式会自动转换为专有的 Weaviate where filters

Those (portable) filter expressions get automatically converted into the proprietary Weaviate where filters.

例如,此可移植的筛选器表达式:

For example, this portable filter expression:

country in ['UK', 'NL'] && year >= 2020

被转换为专有的 Weaviate GraphQL 过滤器格式:

is converted into the proprietary Weaviate GraphQL filter format:

operator: And
operands:
    [{
        operator: Or
        operands:
            [{
                path: ["meta_country"]
                operator: Equal
                valueText: "UK"
            },
            {
                path: ["meta_country"]
                operator: Equal
                valueText: "NL"
            }]
    },
    {
        path: ["meta_year"]
        operator: GreaterThanEqual
        valueNumber: 2020
    }]

Run Weaviate in Docker

要快速开始使用本地 Weaviate 实例,您可以在 Docker 中运行它:

To quickly get started with a local Weaviate instance, you can run it in Docker:

docker run -it --rm --name weaviate \
    -e AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=true \
    -e PERSISTENCE_DATA_PATH=/var/lib/weaviate \
    -e QUERY_DEFAULTS_LIMIT=25 \
    -e DEFAULT_VECTORIZER_MODULE=none \
    -e CLUSTER_HOSTNAME=node1 \
    -p 8080:8080 \
    semitechnologies/weaviate:1.22.4

这将启动一个 Weaviate 实例,可在 [role="bare"] [role="bare"]http://localhost:8080 访问。

This starts a Weaviate instance accessible at [role="bare"]http://localhost:8080.

WeaviateVectorStore properties

您可以在 Spring Boot 配置中使用以下属性来自定义 Weaviate 向量存储。

You can use the following properties in your Spring Boot configuration to customize the Weaviate vector store.

Property Description Default value

spring.ai.vectorstore.weaviate.host

The host of the Weaviate server

localhost:8080

spring.ai.vectorstore.weaviate.scheme

Connection schema

http

spring.ai.vectorstore.weaviate.api-key

The API key for authentication

spring.ai.vectorstore.weaviate.object-class

The class name for storing documents

SpringAiWeaviate

spring.ai.vectorstore.weaviate.consistency-level

Desired tradeoff between consistency and speed

ConsistentLevel.ONE

spring.ai.vectorstore.weaviate.filter-field

Configures metadata fields that can be used in filters. Format: spring.ai.vectorstore.weaviate.filter-field.<field-name>=<field-type>

Accessing the Native Client

Weaviate 向量存储实现通过 getNativeClient() 方法提供了对底层原生 Weaviate 客户端 ( WeaviateClient ) 的访问:

The Weaviate Vector Store implementation provides access to the underlying native Weaviate client (WeaviateClient) through the getNativeClient() method:

WeaviateVectorStore vectorStore = context.getBean(WeaviateVectorStore.class);
Optional<WeaviateClient> nativeClient = vectorStore.getNativeClient();

if (nativeClient.isPresent()) {
    WeaviateClient client = nativeClient.get();
    // Use the native client for Weaviate-specific operations
}

原生客户端允许您访问 Weaviate 特有的功能和操作,这些功能和操作可能不会通过 VectorStore 接口暴露。

The native client gives you access to Weaviate-specific features and operations that might not be exposed through the VectorStore interface.