Hugging Face Chat

Hugging Face文本生成推理（TGI）是一种专门的部署解决方案，用于在云中提供大型语言模型（LLM），通过API使其可访问。TGI通过连续批处理、令牌流和高效内存管理等功能，为文本生成任务提供优化的性能。

Hugging Face Text Generation Inference (TGI) is a specialized deployment solution for serving Large Language Models (LLMs) in the cloud, making them accessible via an API. TGI provides optimized performance for text generation tasks through features like continuous batching, token streaming, and efficient memory management.

文本生成推理要求模型与其架构特定的优化兼容。虽然许多流行的LLM都受支持，但并非所有Hugging Face Hub上的模型都可以使用TGI部署。如果你需要部署其他类型的模型，请考虑使用标准的Hugging Face推理端点。

Text Generation Inference requires models to be compatible with its architecture-specific optimizations. While many popular LLMs are supported, not all models on Hugging Face Hub can be deployed using TGI. If you need to deploy other types of models, consider using standard Hugging Face Inference Endpoints instead.

有关支持模型和架构的完整和最新列表，请参阅 Text Generation Inference supported models documentation 。

For a complete and up-to-date list of supported models and architectures, see the Text Generation Inference supported models documentation.

Prerequisites

您需要在 Hugging Face 上创建一个推理端点并创建一个 API 令牌才能访问该端点。更多详细信息请参见 here 。

You will need to create an Inference Endpoint on Hugging Face and create an API token to access the endpoint. Further details can be found here.

Spring AI 项目定义了两个配置属性：

The Spring AI project defines two configuration properties:

spring.ai.huggingface.chat.api-key ：将其设置为从 Hugging Face 获取的 API 令牌的值。
spring.ai.huggingface.chat.api-key: Set this to the value of the API token obtained from Hugging Face.
spring.ai.huggingface.chat.url ：将其设置为在 Hugging Face 中配置模型时获取的推理端点 URL。
spring.ai.huggingface.chat.url: Set this to the inference endpoint URL obtained when provisioning your model in Hugging Face.

您可以在推理端点的 UI here 上找到您的推理端点 URL。

You can find your inference endpoint URL on the Inference Endpoint’s UI here.

您可以在 application.properties 文件中设置这些配置属性：

You can set these configuration properties in your application.properties file:

spring.ai.huggingface.chat.api-key=<your-huggingface-api-key>
spring.ai.huggingface.chat.url=<your-inference-endpoint-url>

为了在处理敏感信息（如 API 密钥）时增强安全性，您可以使用 Spring 表达式语言 (SpEL) 来引用自定义环境变量：

For enhanced security when handling sensitive information like API keys, you can use Spring Expression Language (SpEL) to reference custom environment variables:

# In application.yml
spring:
  ai:
    huggingface:
      chat:
        api-key: ${HUGGINGFACE_API_KEY}
        url: ${HUGGINGFACE_ENDPOINT_URL}

# In your environment or .env file
export HUGGINGFACE_API_KEY=<your-huggingface-api-key>
export HUGGINGFACE_ENDPOINT_URL=<your-inference-endpoint-url>

您还可以在应用程序代码中以编程方式设置这些配置：

You can also set these configurations programmatically in your application code:

// Retrieve API key and endpoint URL from secure sources or environment variables
String apiKey = System.getenv("HUGGINGFACE_API_KEY");
String endpointUrl = System.getenv("HUGGINGFACE_ENDPOINT_URL");

Add Repositories and BOM

Spring AI 工件发布在 Maven Central 和 Spring Snapshot 存储库中。请参阅“添加 Spring AI 仓库”部分，将这些仓库添加到您的构建系统。

Spring AI artifacts are published in Maven Central and Spring Snapshot repositories. Refer to the Artifact Repositories section to add these repositories to your build system.

为了帮助进行依赖项管理，Spring AI 提供了一个 BOM（物料清单）以确保在整个项目中使用一致版本的 Spring AI。有关将 Spring AI BOM 添加到你的构建系统的说明，请参阅 Dependency Management 部分。

To help with dependency management, Spring AI provides a BOM (bill of materials) to ensure that a consistent version of Spring AI is used throughout the entire project. Refer to the Dependency Management section to add the Spring AI BOM to your build system.

Auto-configuration

Spring AI 自动配置、启动器模块的工件名称发生了重大变化。请参阅 upgrade notes 以获取更多信息。

There has been a significant change in the Spring AI auto-configuration, starter modules' artifact names. Please refer to the upgrade notes for more information.

Spring AI 为 Hugging Face 聊天客户端提供了 Spring Boot 自动配置。要启用它，请将以下依赖项添加到项目的 Maven pom.xml 文件中：

Spring AI provides Spring Boot auto-configuration for the Hugging Face Chat Client. To enable it add the following dependency to your project’s Maven pom.xml file:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-huggingface</artifactId>
</dependency>

或添加到 Gradle build.gradle 构建文件中。

or to your Gradle build.gradle build file.

dependencies {
    implementation 'org.springframework.ai:spring-ai-starter-model-huggingface'
}

参见 Dependency Management 部分，将 Spring AI BOM 添加到你的构建文件中。

Refer to the Dependency Management section to add the Spring AI BOM to your build file.

Chat Properties

聊天自动配置的启用和禁用现在通过前缀为 spring.ai.model.chat 的顶级属性进行配置。

Enabling and disabling of the chat auto-configurations are now configured via top level properties with the prefix spring.ai.model.chat.

要启用，spring.ai.model.chat=huggingface（默认已启用）

To enable, spring.ai.model.chat=huggingface (It is enabled by default)

要禁用，spring.ai.model.chat=none（或任何不匹配 huggingface 的值）

To disable, spring.ai.model.chat=none (or any value which doesn’t match huggingface)

此更改旨在允许配置多个模型。

This change is done to allow configuration of multiple models.

前缀 spring.ai.huggingface 是属性前缀，可用于配置 Hugging Face 的聊天模型实现。

The prefix spring.ai.huggingface is the property prefix that lets you configure the chat model implementation for Hugging Face.

Property

Description

Default

spring.ai.huggingface.chat.api-key

API Key to authenticate with the Inference Endpoint.

spring.ai.huggingface.chat.url

URL of the Inference Endpoint to connect to

spring.ai.huggingface.chat.enabled (Removed and no longer valid)

Enable Hugging Face chat model.

true

spring.ai.model.chat (Removed and no longer valid)

Enable Hugging Face chat model.

huggingface

Sample Controller (Auto-configuration)

Create 一个新的 Spring Boot 项目，并将 spring-ai-starter-model-huggingface 添加到您的 pom（或 gradle）依赖项中。

Create a new Spring Boot project and add the spring-ai-starter-model-huggingface to your pom (or gradle) dependencies.

在 src/main/resources 目录下添加一个 application.properties 文件，以启用和配置 Hugging Face 聊天模型：

Add an application.properties file, under the src/main/resources directory, to enable and configure the Hugging Face chat model:

spring.ai.huggingface.chat.api-key=YOUR_API_KEY
spring.ai.huggingface.chat.url=YOUR_INFERENCE_ENDPOINT_URL

将 api-key 和 url 替换为您的 Hugging Face 值。

replace the api-key and url with your Hugging Face values.

这将创建一个 HuggingfaceChatModel 实现，您可以将其注入到您的类中。以下是一个简单的 @Controller 类示例，它使用聊天模型进行文本生成。

This will create a HuggingfaceChatModel implementation that you can inject into your class. Here is an example of a simple @Controller class that uses the chat model for text generations.

@RestController
public class ChatController {

    private final HuggingfaceChatModel chatModel;

    @Autowired
    public ChatController(HuggingfaceChatModel chatModel) {
        this.chatModel = chatModel;
    }

    @GetMapping("/ai/generate")
    public Map generate(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
        return Map.of("generation", this.chatModel.call(message));
    }
}

Manual Configuration

HuggingfaceChatModel 实现了 ChatModel 接口，并使用 [low-level-api] 连接到 Hugging Face 推理端点。

The HuggingfaceChatModel implements the ChatModel interface and uses the [low-level-api] to connect to the Hugging Face inference endpoints.

将 spring-ai-huggingface 依赖项添加到项目的 Maven pom.xml 文件中：

Add the spring-ai-huggingface dependency to your project’s Maven pom.xml file:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-huggingface</artifactId>
</dependency>

或添加到 Gradle build.gradle 构建文件中。

or to your Gradle build.gradle build file.

dependencies {
    implementation 'org.springframework.ai:spring-ai-huggingface'
}

参见 Dependency Management 部分，将 Spring AI BOM 添加到你的构建文件中。

Refer to the Dependency Management section to add the Spring AI BOM to your build file.

接下来，创建一个 HuggingfaceChatModel 并将其用于文本生成：

Next, create a HuggingfaceChatModel and use it for text generations:

HuggingfaceChatModel chatModel = new HuggingfaceChatModel(apiKey, url);

ChatResponse response = this.chatModel.call(
    new Prompt("Generate the names of 5 famous pirates."));

System.out.println(response.getGeneration().getResult().getOutput().getContent());