Neo4j

本部分指导你设置 Neo4jVectorStore,以存储文档嵌入并执行相似性搜索。

This section walks you through setting up Neo4jVectorStore to store document embeddings and perform similarity searches.

Neo4j 是开源 NoSQL 图形数据库。它是一个完全事务性的数据库 (ACID),用于存储以图形形式构建的数据,这些图形由节点组成,并通过关系连接。受到现实世界的结构启发,它可以在复杂数据上实现较高的查询性能,同时对开发者来说仍然直观且简单。

Neo4j is an open-source NoSQL graph database. It is a fully transactional database (ACID) that stores data structured as graphs consisting of nodes, connected by relationships. Inspired by the structure of the real world, it allows for high query performance on complex data while remaining intuitive and simple for the developer.

Neo4j’s Vector Search 允许用户从大型数据集查询向量嵌入。嵌入是数据对象的数值表示,例如文本、图像、音频或文档。嵌入可以存储在 Node 属性上,并且可以使用 db.index.vector.queryNodes() 函数进行查询。这些索引由 Lucene 提供支持,使用层次式可导航小世界图 (HNSW) 在向量字段上执行 k 近似最近邻 (k-ANN) 查询。

The Neo4j’s Vector Search allows users to query vector embeddings from large datasets. An embedding is a numerical representation of a data object, such as text, image, audio, or document. Embeddings can be stored on Node properties and can be queried with the db.index.vector.queryNodes() function. Those indexes are powered by Lucene using a Hierarchical Navigable Small World Graph (HNSW) to perform a k approximate nearest neighbors (k-ANN) query over the vector fields.

Prerequisites

  • 如果需要,为 @s12 提供 API 密钥,以生成 @s11 存储的嵌入。

  • If required, an API key for the EmbeddingModel to generate the embeddings stored by the Neo4jVectorStore.

Auto-configuration

Spring AI 自动配置、启动器模块的工件名称发生了重大变化。请参阅 upgrade notes 以获取更多信息。

There has been a significant change in the Spring AI auto-configuration, starter modules' artifact names. Please refer to the upgrade notes for more information.

Spring AI 为 Neo4j 向量存储提供了 Spring Boot 自动配置。要启用它,请将以下依赖项添加到你的项目 Maven @s13 文件中:

Spring AI provides Spring Boot auto-configuration for the Neo4j Vector Store. To enable it, add the following dependency to your project’s Maven pom.xml file:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-vector-store-neo4j</artifactId>
</dependency>

或添加到 Gradle build.gradle 构建文件中。

or to your Gradle build.gradle build file.

dependencies {
    implementation 'org.springframework.ai:spring-ai-starter-vector-store-neo4j'
}
  1. 参见 Dependency Management 部分,将 Spring AI BOM 添加到你的构建文件中。

Refer to the Dependency Management section to add the Spring AI BOM to your build file.

请查看向量存储的 @s14 列表,了解默认值和配置选项。

Please have a look at the list of neo4jvector-properties for the vector store to learn about the default values and configuration options.

将Maven Central和/或Snapshot存储库添加到您的构建文件中,请参阅 Artifact Repositories 部分。

Refer to the Artifact Repositories section to add Maven Central and/or Snapshot Repositories to your build file.

向量存储实现可以为你初始化所需模式,但你必须通过在适当的构造函数中指定 @s15 布尔值或在 @s17 文件中设置 @s16 来选择加入。

The vector store implementation can initialize the requisite schema for you, but you must opt-in by specifying the initializeSchema boolean in the appropriate constructor or by setting …​initialize-schema=true in the application.properties file.

这是一个重大更改!在早期版本的Spring AI中,此架构初始化是默认发生的。

this is a breaking change! In earlier versions of Spring AI, this schema initialization happened by default.

此外,您还需要一个配置好的 EmbeddingModel bean。有关更多信息,请参阅 EmbeddingModel 部分。

Additionally, you will need a configured EmbeddingModel bean. Refer to the EmbeddingModel section for more information.

现在,您可以在应用程序中将 Neo4jVectorStore 自动连接为向量存储。

Now you can auto-wire the Neo4jVectorStore as a vector store in your application.

@Autowired VectorStore vectorStore;

// ...

List<Document> documents = List.of(
    new Document("Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!!", Map.of("meta1", "meta1")),
    new Document("The World is Big and Salvation Lurks Around the Corner"),
    new Document("You walk forward facing the past and you turn back toward the future.", Map.of("meta2", "meta2")));

// Add the documents to Neo4j
vectorStore.add(documents);

// Retrieve documents similar to a query
List<Document> results = vectorStore.similaritySearch(SearchRequest.builder().query("Spring").topK(5).build());

Configuration Properties

要连接到 Neo4j 并使用 @s18,你需要提供实例的访问详细信息。可以通过 Spring Boot 的 @s19 提供简单的配置:

To connect to Neo4j and use the Neo4jVectorStore, you need to provide access details for your instance. A simple configuration can be provided via Spring Boot’s application.yml:

spring:
  neo4j:
    uri: <neo4j instance URI>
    authentication:
      username: <neo4j username>
      password: <neo4j password>
  ai:
    vectorstore:
      neo4j:
        initialize-schema: true
        database-name: neo4j
        index-name: custom-index
        embedding-dimension: 1536
        distance-type: cosine

以 @s20 开头的 Spring Boot 属性用于配置 Neo4j 客户端:

The Spring Boot properties starting with spring.neo4j.* are used to configure the Neo4j client:

Property Description Default Value

spring.neo4j.uri

URI for connecting to the Neo4j instance

neo4j://localhost:7687

spring.neo4j.authentication.username

Username for authentication with Neo4j

neo4j

spring.neo4j.authentication.password

Password for authentication with Neo4j

-

以 @s21 开头的属性用于配置 @s22:

Properties starting with spring.ai.vectorstore.neo4j.* are used to configure the Neo4jVectorStore:

Property Description Default Value

spring.ai.vectorstore.neo4j.initialize-schema

Whether to initialize the required schema

false

spring.ai.vectorstore.neo4j.database-name

The name of the Neo4j database to use

neo4j

spring.ai.vectorstore.neo4j.index-name

The name of the index to store the vectors

spring-ai-document-index

spring.ai.vectorstore.neo4j.embedding-dimension

The number of dimensions in the vector

1536

spring.ai.vectorstore.neo4j.distance-type

The distance function to use

cosine

spring.ai.vectorstore.neo4j.label

The label used for document nodes

Document

spring.ai.vectorstore.neo4j.embedding-property

The property name used to store embeddings

embedding

用Gemini翻译成中文:可用的距离函数有:

The following distance functions are available:

  • cosine - 默认,适用于大多数用例。测量向量之间的余弦相似度。

  • cosine - Default, suitable for most use cases. Measures cosine similarity between vectors.

  • 好的,这是将"@Euclidean distance between vectors. Lower values indicate higher similarity." 翻译成中文的几种方式,并附带解释:翻译方式一 (最简洁,适用于大多数情况):> euclidean - 欧氏距离。值越小,相似度越高。* 解释: 这是最直接和常用的翻译方式。省略了“向量之间”因为在数据科学语境下,欧氏距离默认就是衡量向量(或点)之间的距离。翻译方式二 (更准确,但略长):> euclidean - 向量间的欧氏距离。值越小,相似度越高。* 解释: 明确指出了“向量间”,对于需要精确表达的场景可能更合适。翻译方式三 (更口语化/描述性):> euclidean - 欧氏距离,衡量向量间的远近。值越小表示越相似。* 解释: 增加了“衡量向量间的远近”作为对“欧氏距离”的解释,对于非专业读者可能更容易理解。总结,我推荐使用第一种,因为它既简洁又准确: euclidean - 欧氏距离。值越小,相似度越高。

  • euclidean - Euclidean distance between vectors. Lower values indicate higher similarity.

Manual Configuration

以下是使用Gemini将您提供的文本翻译成中文的结果:翻译:您可以手动配置 Neo4j 向量存储,而不是使用 Spring Boot 自动配置。为此,您需要将 spring-ai-neo4j-store 添加到您的项目中:

Instead of using the Spring Boot auto-configuration, you can manually configure the Neo4j vector store. For this you need to add the spring-ai-neo4j-store to your project:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-neo4j-store</artifactId>
</dependency>

或添加到 Gradle build.gradle 构建文件中。

or to your Gradle build.gradle build file.

dependencies {
    implementation 'org.springframework.ai:spring-ai-neo4j-store'
}
  1. 参见 Dependency Management 部分,将 Spring AI BOM 添加到你的构建文件中。

Refer to the Dependency Management section to add the Spring AI BOM to your build file.

使用 Gemini 将文本翻译成中文如下:创建一个 Neo4j Driver bean。阅读 Neo4j Documentation 以获取有关自定义驱动程序配置的更深入信息。

Create a Neo4j Driver bean. Read the Neo4j Documentation for more in-depth information about the configuration of a custom driver.

@Bean
public Driver driver() {
    return GraphDatabase.driver("neo4j://<host>:<bolt-port>",
            AuthTokens.basic("<username>", "<password>"));
}

用Gemini将这段文字翻译成中文:然后使用构建器模式创建 Neo4jVectorStore bean:

Then create the Neo4jVectorStore bean using the builder pattern:

@Bean
public VectorStore vectorStore(Driver driver, EmbeddingModel embeddingModel) {
    return Neo4jVectorStore.builder(driver, embeddingModel)
        .databaseName("neo4j")                // Optional: defaults to "neo4j"
        .distanceType(Neo4jDistanceType.COSINE) // Optional: defaults to COSINE
        .embeddingDimension(1536)                      // Optional: defaults to 1536
        .label("Document")                     // Optional: defaults to "Document"
        .embeddingProperty("embedding")        // Optional: defaults to "embedding"
        .indexName("custom-index")             // Optional: defaults to "spring-ai-document-index"
        .initializeSchema(true)                // Optional: defaults to false
        .batchingStrategy(new TokenCountBatchingStrategy()) // Optional: defaults to TokenCountBatchingStrategy
        .build();
}

// This can be any EmbeddingModel implementation
@Bean
public EmbeddingModel embeddingModel() {
    return new OpenAiEmbeddingModel(new OpenAiApi(System.getenv("OPENAI_API_KEY")));
}

Metadata Filtering

您也可以将通用便携式 metadata filters 与 Neo4j 存储一起使用。

You can leverage the generic, portable metadata filters with Neo4j store as well.

例如,你可以使用文本表达式语言:

For example, you can use either the text expression language:

vectorStore.similaritySearch(
    SearchRequest.builder()
        .query("The World")
        .topK(TOP_K)
        .similarityThreshold(SIMILARITY_THRESHOLD)
        .filterExpression("author in ['john', 'jill'] && 'article_type' == 'blog'").build());

或使用 Filter.Expression DSL 以编程方式:

or programmatically using the Filter.Expression DSL:

FilterExpressionBuilder b = new FilterExpressionBuilder();

vectorStore.similaritySearch(SearchRequest.builder()
    .query("The World")
    .topK(TOP_K)
    .similarityThreshold(SIMILARITY_THRESHOLD)
    .filterExpression(b.and(
        b.in("author", "john", "jill"),
        b.eq("article_type", "blog")).build()).build());

这些(可移植)筛选表达式将自动转换为专有的 Neo4j WHERE filter expressions

Those (portable) filter expressions get automatically converted into the proprietary Neo4j WHERE filter expressions.

例如,此可移植的筛选器表达式:

For example, this portable filter expression:

author in ['john', 'jill'] && 'article_type' == 'blog'

将转换为专有的 Neo4j 筛选器格式:

is converted into the proprietary Neo4j filter format:

node.`metadata.author` IN ["john","jill"] AND node.`metadata.'article_type'` = "blog"

Accessing the Native Client

以下是使用 Gemini 将该文本翻译成中文的结果:Neo4j 向量存储实现通过 getNativeClient() 方法提供对底层原生 Neo4j 客户端 ( Driver ) 的访问:

The Neo4j Vector Store implementation provides access to the underlying native Neo4j client (Driver) through the getNativeClient() method:

Neo4jVectorStore vectorStore = context.getBean(Neo4jVectorStore.class);
Optional<Driver> nativeClient = vectorStore.getNativeClient();

if (nativeClient.isPresent()) {
    Driver driver = nativeClient.get();
    // Use the native client for Neo4j-specific operations
}

使用原生客户端可以访问 Neo4j 特有的功能和操作,这些功能和操作可能无法通过 VectorStore 接口公开。

The native client gives you access to Neo4j-specific features and operations that might not be exposed through the VectorStore interface.