Weaviate
本节将引导您设置Weaviate VectorStore以存储文档嵌入并执行相似性搜索。
This section walks you through setting up the Weaviate VectorStore to store document embeddings and perform similarity searches.
Weaviate 是一个开源向量数据库,它允许您存储数据对象和来自您喜爱的ML模型的向量嵌入,并无缝扩展到数十亿个数据对象。它提供工具来存储文档嵌入、内容和元数据,并搜索这些嵌入,包括元数据过滤。
Weaviate is an open-source vector database that allows you to store data objects and vector embeddings from your favorite ML-models and scale seamlessly into billions of data objects. It provides tools to store document embeddings, content, and metadata and to search through those embeddings, including metadata filtering.
Prerequisites
-
一个正在运行的Weaviate实例。以下选项可用:
-
Weaviate Cloud Service (需要创建帐户和API密钥)
-
Weaviate Cloud Service (requires account creation and API key)
-
-
A running Weaviate instance. The following options are available:
-
Weaviate Cloud Service (需要创建帐户和API密钥)
-
Weaviate Cloud Service (requires account creation and API key)
-
-
如果需要,需要为 EmbeddingModel 提供 API 密钥,以生成由
WeaviateVectorStore
存储的嵌入。 -
If required, an API key for the EmbeddingModel to generate the embeddings stored by the
WeaviateVectorStore
.
Dependencies
Spring AI 自动配置、启动器模块的工件名称发生了重大变化。请参阅 upgrade notes 以获取更多信息。 There has been a significant change in the Spring AI auto-configuration, starter modules' artifact names. Please refer to the upgrade notes for more information. |
将 Weaviate 向量存储依赖项添加到您的项目中:
Add the Weaviate Vector Store dependency to your project:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-weaviate-store</artifactId>
</dependency>
或添加到 Gradle build.gradle
构建文件中。
or to your Gradle build.gradle
build file.
dependencies {
implementation 'org.springframework.ai:spring-ai-weaviate-store'
}
|
Refer to the Dependency Management section to add the Spring AI BOM to your build file. |
Configuration
要连接到 Weaviate 并使用 WeaviateVectorStore
,您需要提供实例的访问详细信息。配置可以通过 Spring Boot 的 application.properties 提供:
To connect to Weaviate and use the WeaviateVectorStore
, you need to provide access details for your instance.
Configuration can be provided via Spring Boot’s application.properties:
spring.ai.vectorstore.weaviate.host=<host_of_your_weaviate_instance>
spring.ai.vectorstore.weaviate.scheme=<http_or_https>
spring.ai.vectorstore.weaviate.api-key=<your_api_key>
# API key if needed, e.g. OpenAI
spring.ai.openai.api-key=<api-key>
如果您倾向于使用环境变量来存储 API 密钥等敏感信息,您有多种选择:
If you prefer to use environment variables for sensitive information like API keys, you have multiple options:
Option 1: Using Spring Expression Language (SpEL)
您可以使用自定义环境变量名称并在应用程序配置中引用它们:
You can use custom environment variable names and reference them in your application configuration:
# In application.yml
spring:
ai:
vectorstore:
weaviate:
host: ${WEAVIATE_HOST}
scheme: ${WEAVIATE_SCHEME}
api-key: ${WEAVIATE_API_KEY}
openai:
api-key: ${OPENAI_API_KEY}
# In your environment or .env file
export WEAVIATE_HOST=<host_of_your_weaviate_instance>
export WEAVIATE_SCHEME=<http_or_https>
export WEAVIATE_API_KEY=<your_api_key>
export OPENAI_API_KEY=<api-key>
Option 2: Accessing Environment Variables Programmatically
或者,您可以在 Java 代码中访问环境变量:
Alternatively, you can access environment variables in your Java code:
String weaviateApiKey = System.getenv("WEAVIATE_API_KEY");
String openAiApiKey = System.getenv("OPENAI_API_KEY");
如果您选择创建 shell 脚本来管理环境变量,请务必在启动应用程序之前通过“source”该文件来运行它,即 |
If you choose to create a shell script to manage your environment variables, be sure to run it prior to starting your application by "sourcing" the file, i.e. |
Auto-configuration
Spring AI 为 Weaviate 向量存储提供了 Spring Boot 自动配置。要启用它,请将以下依赖项添加到您项目的 Maven pom.xml
文件中:
Spring AI provides Spring Boot auto-configuration for the Weaviate Vector Store.
To enable it, add the following dependency to your project’s Maven pom.xml
file:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-starter-vector-store-weaviate</artifactId>
</dependency>
或添加到 Gradle build.gradle
构建文件中。
or to your Gradle build.gradle
build file.
dependencies {
implementation 'org.springframework.ai:spring-ai-starter-vector-store-weaviate'
}
|
Refer to the Dependency Management section to add the Spring AI BOM to your build file. |
请查看向量存储的 configuration parameters 列表,了解默认值和配置选项。
Please have a look at the list of _weaviatevectorstore_properties for the vector store to learn about the default values and configuration options.
将Maven Central和/或Snapshot存储库添加到您的构建文件中,请参阅 Artifact Repositories 部分。 |
Refer to the Artifact Repositories section to add Maven Central and/or Snapshot Repositories to your build file. |
此外,您还需要一个配置好的 EmbeddingModel
bean。有关更多信息,请参阅 EmbeddingModel 部分。
Additionally, you will need a configured EmbeddingModel
bean. Refer to the EmbeddingModel section for more information.
以下是所需 bean 的示例:
Here is an example of the required bean:
@Bean
public EmbeddingModel embeddingModel() {
// Retrieve API key from a secure source or environment variable
String apiKey = System.getenv("OPENAI_API_KEY");
// Can be any other EmbeddingModel implementation
return new OpenAiEmbeddingModel(OpenAiApi.builder().apiKey(apiKey).build());
}
现在您可以将 WeaviateVectorStore
作为向量存储在应用程序中自动装配。
Now you can auto-wire the WeaviateVectorStore
as a vector store in your application.
Manual Configuration
除了使用 Spring Boot 自动配置,您还可以使用构建器模式手动配置 WeaviateVectorStore
:
Instead of using Spring Boot auto-configuration, you can manually configure the WeaviateVectorStore
using the builder pattern:
@Bean
public WeaviateClient weaviateClient() {
return new WeaviateClient(new Config("http", "localhost:8080"));
}
@Bean
public VectorStore vectorStore(WeaviateClient weaviateClient, EmbeddingModel embeddingModel) {
return WeaviateVectorStore.builder(weaviateClient, embeddingModel)
.objectClass("CustomClass") // Optional: defaults to "SpringAiWeaviate"
.consistencyLevel(ConsistentLevel.QUORUM) // Optional: defaults to ConsistentLevel.ONE
.filterMetadataFields(List.of( // Optional: fields that can be used in filters
MetadataField.text("country"),
MetadataField.number("year")))
.build();
}
Metadata filtering
您还可以将通用、可移植的 metadata filters 与 Weaviate 存储结合使用。
You can leverage the generic, portable metadata filters with Weaviate store as well.
例如,你可以使用文本表达式语言:
For example, you can use either the text expression language:
vectorStore.similaritySearch(
SearchRequest.builder()
.query("The World")
.topK(TOP_K)
.similarityThreshold(SIMILARITY_THRESHOLD)
.filterExpression("country in ['UK', 'NL'] && year >= 2020").build());
或使用 Filter.Expression
DSL 以编程方式:
or programmatically using the Filter.Expression
DSL:
FilterExpressionBuilder b = new FilterExpressionBuilder();
vectorStore.similaritySearch(SearchRequest.builder()
.query("The World")
.topK(TOP_K)
.similarityThreshold(SIMILARITY_THRESHOLD)
.filterExpression(b.and(
b.in("country", "UK", "NL"),
b.gte("year", 2020)).build()).build());
这些(可移植的)过滤器表达式会自动转换为专有的 Weaviate where filters 。 |
Those (portable) filter expressions get automatically converted into the proprietary Weaviate where filters. |
例如,此可移植的筛选器表达式:
For example, this portable filter expression:
country in ['UK', 'NL'] && year >= 2020
被转换为专有的 Weaviate GraphQL 过滤器格式:
is converted into the proprietary Weaviate GraphQL filter format:
operator: And
operands:
[{
operator: Or
operands:
[{
path: ["meta_country"]
operator: Equal
valueText: "UK"
},
{
path: ["meta_country"]
operator: Equal
valueText: "NL"
}]
},
{
path: ["meta_year"]
operator: GreaterThanEqual
valueNumber: 2020
}]
Run Weaviate in Docker
要快速开始使用本地 Weaviate 实例,您可以在 Docker 中运行它:
To quickly get started with a local Weaviate instance, you can run it in Docker:
docker run -it --rm --name weaviate \
-e AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=true \
-e PERSISTENCE_DATA_PATH=/var/lib/weaviate \
-e QUERY_DEFAULTS_LIMIT=25 \
-e DEFAULT_VECTORIZER_MODULE=none \
-e CLUSTER_HOSTNAME=node1 \
-p 8080:8080 \
semitechnologies/weaviate:1.22.4
这将启动一个 Weaviate 实例,可在 [role="bare"] [role="bare"]http://localhost:8080 访问。
This starts a Weaviate instance accessible at [role="bare"]http://localhost:8080.
WeaviateVectorStore properties
您可以在 Spring Boot 配置中使用以下属性来自定义 Weaviate 向量存储。
You can use the following properties in your Spring Boot configuration to customize the Weaviate vector store.
Property | Description | Default value |
---|---|---|
|
The host of the Weaviate server |
localhost:8080 |
|
Connection schema |
http |
|
The API key for authentication |
|
|
The class name for storing documents |
SpringAiWeaviate |
|
Desired tradeoff between consistency and speed |
ConsistentLevel.ONE |
|
Configures metadata fields that can be used in filters. Format: spring.ai.vectorstore.weaviate.filter-field.<field-name>=<field-type> |
Accessing the Native Client
Weaviate 向量存储实现通过 getNativeClient()
方法提供了对底层原生 Weaviate 客户端 ( WeaviateClient
) 的访问:
The Weaviate Vector Store implementation provides access to the underlying native Weaviate client (WeaviateClient
) through the getNativeClient()
method:
WeaviateVectorStore vectorStore = context.getBean(WeaviateVectorStore.class);
Optional<WeaviateClient> nativeClient = vectorStore.getNativeClient();
if (nativeClient.isPresent()) {
WeaviateClient client = nativeClient.get();
// Use the native client for Weaviate-specific operations
}
原生客户端允许您访问 Weaviate 特有的功能和操作,这些功能和操作可能不会通过 VectorStore
接口暴露。
The native client gives you access to Weaviate-specific features and operations that might not be exposed through the VectorStore
interface.