Ollama Chat

使用 Ollama ,你可以在本地运行各种大型语言模型(LLM)并从中生成文本。Spring AI 通过 OllamaChatModel API 支持 Ollama 聊天完成功能。

With Ollama you can run various Large Language Models (LLMs) locally and generate text from them. Spring AI supports the Ollama chat completion capabilities with the OllamaChatModel API.

Ollama 还提供一个与 OpenAI API 兼容的端点。 OpenAI API compatibility 部分解释了如何使用 Spring AI OpenAI 连接到 Ollama 服务器。

Ollama offers an OpenAI API compatible endpoint as well. The _openai_api_compatibility section explains how to use the Spring AI OpenAI to connect to an Ollama server.

Prerequisites

您首先需要访问 Ollama 实例。有几个选项,包括以下内容:

You first need access to an Ollama instance. There are a few options, including the following:

你可以从 Ollama model library 拉取要在应用程序中使用的模型:

You can pull the models you want to use in your application from the Ollama model library:

ollama pull <model-name>

您还可以提取数千个免费的 GGUF Hugging Face Models

You can also pull any of the thousands, free, GGUF Hugging Face Models:

ollama pull hf.co/<username>/<model-repository>

或者,您可以启用自动下载任何所需模型的选项: Auto-pulling Models

Alternatively, you can enable the option to download automatically any needed model: auto-pulling-models.

Auto-configuration

Spring AI 自动配置、启动器模块的工件名称发生了重大变化。请参阅 upgrade notes 以获取更多信息。

There has been a significant change in the Spring AI auto-configuration, starter modules' artifact names. Please refer to the upgrade notes for more information.

Spring AI 为 Ollama 聊天集成提供了 Spring Boot 自动配置。要启用它,请将以下依赖项添加到项目的 Maven pom.xml 或 Gradle build.gradle 构建文件中:

Spring AI provides Spring Boot auto-configuration for the Ollama chat integration. To enable it add the following dependency to your project’s Maven pom.xml or Gradle build.gradle build files:

  • Maven

  • Gradle

<dependency>
   <groupId>org.springframework.ai</groupId>
   <artifactId>spring-ai-starter-model-ollama</artifactId>
</dependency>
dependencies {
    implementation 'org.springframework.ai:spring-ai-starter-model-ollama'
}
  1. 参见 Dependency Management 部分,将 Spring AI BOM 添加到你的构建文件中。

Refer to the Dependency Management section to add the Spring AI BOM to your build file.

Base Properties

前缀 spring.ai.ollama 是用于配置与 Ollama 连接的属性前缀。

The prefix spring.ai.ollama is the property prefix to configure the connection to Ollama.

Property

Description

Default

spring.ai.ollama.base-url

Base URL where Ollama API server is running.

http://localhost:11434

以下是初始化 Ollama 集成和 auto-pulling models 的属性。

Here are the properties for initializing the Ollama integration and auto-pulling-models.

Property

Description

Default

spring.ai.ollama.init.pull-model-strategy

Whether to pull models at startup-time and how.

never

spring.ai.ollama.init.timeout

How long to wait for a model to be pulled.

5m

spring.ai.ollama.init.max-retries

Maximum number of retries for the model pull operation.

0

spring.ai.ollama.init.chat.include

Include this type of models in the initialization task.

true

spring.ai.ollama.init.chat.additional-models

Additional models to initialize besides the ones configured via default properties.

[]

Chat Properties

聊天自动配置的启用和禁用现在通过前缀为 spring.ai.model.chat 的顶级属性进行配置。

Enabling and disabling of the chat auto-configurations are now configured via top level properties with the prefix spring.ai.model.chat.

要启用,spring.ai.model.chat=ollama(默认启用)

To enable, spring.ai.model.chat=ollama (It is enabled by default)

要禁用,spring.ai.model.chat=none(或任何不匹配 ollama 的值)

To disable, spring.ai.model.chat=none (or any value which doesn’t match ollama)

此更改旨在允许配置多个模型。

This change is done to allow configuration of multiple models.

前缀 spring.ai.ollama.chat.options 是配置 Ollama 聊天模型的属性前缀。它包括 Ollama 请求(高级)参数,例如 modelkeep-aliveformat ,以及 Ollama 模型 options 属性。

The prefix spring.ai.ollama.chat.options is the property prefix that configures the Ollama chat model. It includes the Ollama request (advanced) parameters such as the model, keep-alive, and format as well as the Ollama model options properties.

以下是 Ollama 聊天模型的高级请求参数:

Here are the advanced request parameter for the Ollama chat model:

Property

Description

Default

spring.ai.ollama.chat.enabled (Removed and no longer valid)

Enable Ollama chat model.

true

spring.ai.model.chat

Enable Ollama chat model.

ollama

spring.ai.ollama.chat.options.model

The name of the supported model to use.

mistral

spring.ai.ollama.chat.options.format

The format to return a response in. Currently, the only accepted value is json

-

spring.ai.ollama.chat.options.keep_alive

Controls how long the model will stay loaded into memory following the request

5m

其余的 options 属性基于 Ollama Valid Parameters and ValuesOllama Types 。默认值基于 Ollama Types Defaults

The remaining options properties are based on the Ollama Valid Parameters and Values and Ollama Types. The default values are based on the Ollama Types Defaults.

Property

Description

Default

spring.ai.ollama.chat.options.numa

Whether to use NUMA.

false

spring.ai.ollama.chat.options.num-ctx

Sets the size of the context window used to generate the next token.

2048

spring.ai.ollama.chat.options.num-batch

Prompt processing maximum batch size.

512

spring.ai.ollama.chat.options.num-gpu

The number of layers to send to the GPU(s). On macOS it defaults to 1 to enable metal support, 0 to disable. 1 here indicates that NumGPU should be set dynamically

-1

spring.ai.ollama.chat.options.main-gpu

When using multiple GPUs this option controls which GPU is used for small tensors for which the overhead of splitting the computation across all GPUs is not worthwhile. The GPU in question will use slightly more VRAM to store a scratch buffer for temporary results.

0

spring.ai.ollama.chat.options.low-vram

-

false

spring.ai.ollama.chat.options.f16-kv

-

true

spring.ai.ollama.chat.options.logits-all

Return logits for all the tokens, not just the last one. To enable completions to return logprobs, this must be true.

-

spring.ai.ollama.chat.options.vocab-only

Load only the vocabulary, not the weights.

-

spring.ai.ollama.chat.options.use-mmap

By default, models are mapped into memory, which allows the system to load only the necessary parts of the model as needed. However, if the model is larger than your total amount of RAM or if your system is low on available memory, using mmap might increase the risk of pageouts, negatively impacting performance. Disabling mmap results in slower load times but may reduce pageouts if you’re not using mlock. Note that if the model is larger than the total amount of RAM, turning off mmap would prevent the model from loading at all.

null

spring.ai.ollama.chat.options.use-mlock

Lock the model in memory, preventing it from being swapped out when memory-mapped. This can improve performance but trades away some of the advantages of memory-mapping by requiring more RAM to run and potentially slowing down load times as the model loads into RAM.

false

spring.ai.ollama.chat.options.num-thread

Sets the number of threads to use during computation. By default, Ollama will detect this for optimal performance. It is recommended to set this value to the number of physical CPU cores your system has (as opposed to the logical number of cores). 0 = let the runtime decide

0

spring.ai.ollama.chat.options.num-keep

-

4

spring.ai.ollama.chat.options.seed

Sets the random number seed to use for generation. Setting this to a specific number will make the model generate the same text for the same prompt.

-1

spring.ai.ollama.chat.options.num-predict

Maximum number of tokens to predict when generating text. (-1 = infinite generation, -2 = fill context)

-1

spring.ai.ollama.chat.options.top-k

Reduces the probability of generating nonsense. A higher value (e.g., 100) will give more diverse answers, while a lower value (e.g., 10) will be more conservative.

40

spring.ai.ollama.chat.options.top-p

Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text.

0.9

spring.ai.ollama.chat.options.min-p

Alternative to the top_p, and aims to ensure a balance of quality and variety. The parameter p represents the minimum probability for a token to be considered, relative to the probability of the most likely token. For example, with p=0.05 and the most likely token having a probability of 0.9, logits with a value less than 0.045 are filtered out.

0.0

spring.ai.ollama.chat.options.tfs-z

Tail-free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting.

1.0

spring.ai.ollama.chat.options.typical-p

-

1.0

spring.ai.ollama.chat.options.repeat-last-n

Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)

64

spring.ai.ollama.chat.options.temperature

The temperature of the model. Increasing the temperature will make the model answer more creatively.

0.8

spring.ai.ollama.chat.options.repeat-penalty

Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient.

1.1

spring.ai.ollama.chat.options.presence-penalty

-

0.0

spring.ai.ollama.chat.options.frequency-penalty

-

0.0

spring.ai.ollama.chat.options.mirostat

Enable Mirostat sampling for controlling perplexity. (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)

0

spring.ai.ollama.chat.options.mirostat-tau

Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text.

5.0

spring.ai.ollama.chat.options.mirostat-eta

Influences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive.

0.1

spring.ai.ollama.chat.options.penalize-newline

-

true

spring.ai.ollama.chat.options.stop

Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return. Multiple stop patterns may be set by specifying multiple separate stop parameters in a modelfile.

-

spring.ai.ollama.chat.options.functions

List of functions, identified by their names, to enable for function calling in a single prompt requests. Functions with those names must exist in the functionCallbacks registry.

-

spring.ai.ollama.chat.options.proxy-tool-calls

If true, the Spring AI will not handle the function calls internally, but will proxy them to the client. Then is the client’s responsibility to handle the function calls, dispatch them to the appropriate function, and return the results. If false (the default), the Spring AI will handle the function calls internally. Applicable only for chat models with function calling support

false

所有带有前缀 spring.ai.ollama.chat.options 的属性都可以在运行时通过向 Prompt 调用添加请求特定的 Runtime Options 来覆盖。

All properties prefixed with spring.ai.ollama.chat.options can be overridden at runtime by adding request-specific Runtime Options to the Prompt call.

Runtime Options

OllamaOptions.java 类提供模型配置,例如要使用的模型、温度等。

The OllamaOptions.java class provides model configurations, such as the model to use, the temperature, etc.

在启动时,可以使用 OllamaChatModel(api, options) 构造函数或 spring.ai.ollama.chat.options.* 属性配置默认选项。

On start-up, the default options can be configured with the OllamaChatModel(api, options) constructor or the spring.ai.ollama.chat.options.* properties.

在运行时,你可以通过向 Prompt 调用添加新的、请求特定的选项来覆盖默认选项。例如,要覆盖特定请求的默认模型和温度:

At run-time, you can override the default options by adding new, request-specific options to the Prompt call. For example, to override the default model and temperature for a specific request:

ChatResponse response = chatModel.call(
    new Prompt(
        "Generate the names of 5 famous pirates.",
        OllamaOptions.builder()
            .model(OllamaModel.LLAMA3_1)
            .temperature(0.4)
            .build()
    ));

除了模型特定的 OllamaOptions ,您还可以使用通过 ChatOptionsBuilder#builder() 创建的便携式 ChatOptions 实例。

In addition to the model specific OllamaOptions you can use a portable ChatOptions instance, created with ChatOptionsBuilder#builder().

Auto-pulling Models

Spring AI Ollama 可以在您的 Ollama 实例中没有模型时自动拉取模型。此功能对于开发和测试以及将应用程序部署到新环境特别有用。

Spring AI Ollama can automatically pull models when they are not available in your Ollama instance. This feature is particularly useful for development and testing as well as for deploying your applications to new environments.

您还可以按名称拉取数千个免费的 GGUF Hugging Face Models 中的任何一个。

You can also pull, by name, any of the thousands, free, GGUF Hugging Face Models.

有三种拉取模型的策略:

There are three strategies for pulling models:

  • always (在 PullModelStrategy.ALWAYS 中定义):始终拉取模型,即使它已经可用。用于确保您使用的是最新版本的模型。

  • always (defined in PullModelStrategy.ALWAYS): Always pull the model, even if it’s already available. Useful to ensure you’re using the latest version of the model.

  • when_missing (在 PullModelStrategy.WHEN_MISSING 中定义):仅在模型尚未可用时才拉取模型。这可能导致使用较旧版本的模型。

  • when_missing (defined in PullModelStrategy.WHEN_MISSING): Only pull the model if it’s not already available. This may result in using an older version of the model.

  • never (在 PullModelStrategy.NEVER 中定义):从不自动拉取模型。

  • never (defined in PullModelStrategy.NEVER): Never pull the model automatically.

由于下载模型可能存在延迟,因此不建议在生产环境中使用自动拉取。相反,请考虑提前评估并预先下载必要的模型。

Due to potential delays while downloading models, automatic pulling is not recommended for production environments. Instead, consider assessing and pre-downloading the necessary models in advance.

通过配置属性和默认选项定义的所有模型都可以在启动时自动拉取。您可以使用配置属性配置拉取策略、超时和最大重试次数:

All models defined via configuration properties and default options can be automatically pulled at startup time. You can configure the pull strategy, timeout, and maximum number of retries using configuration properties:

spring:
  ai:
    ollama:
      init:
        pull-model-strategy: always
        timeout: 60s
        max-retries: 1

应用程序将不会完成初始化,直到所有指定的模型都在 Ollama 中可用。根据模型大小和互联网连接速度,这可能会显著降低应用程序的启动时间。

The application will not complete its initialization until all specified models are available in Ollama. Depending on the model size and internet connection speed, this may significantly slow down your application’s startup time.

您可以在启动时初始化其他模型,这对于在运行时动态使用的模型非常有用:

You can initialize additional models at startup, which is useful for models used dynamically at runtime:

spring:
  ai:
    ollama:
      init:
        pull-model-strategy: always
        chat:
          additional-models:
            - llama3.2
            - qwen2.5

如果你只想将拉取策略应用于特定类型的模型,可以从初始化任务中排除聊天模型:

If you want to apply the pulling strategy only to specific types of models, you can exclude chat models from the initialization task:

spring:
  ai:
    ollama:
      init:
        pull-model-strategy: always
        chat:
          include: false

此配置将拉取策略应用于除聊天模型之外的所有模型。

This configuration will apply the pulling strategy to all models except chat models.

Function Calling

你可以使用 OllamaChatModel 注册自定义 Java 函数,并让 Ollama 模型智能地选择输出包含参数的 JSON 对象,以调用一个或多个注册函数。这是一种将 LLM 功能与外部工具和 API 连接起来的强大技术。阅读更多关于 Tool Calling 的信息。

You can register custom Java functions with the OllamaChatModel and have the Ollama model intelligently choose to output a JSON object containing arguments to call one or many of the registered functions. This is a powerful technique to connect the LLM capabilities with external tools and APIs. Read more about Tool Calling.

您需要 Ollama 0.2.8 或更高版本才能使用函数调用功能,并且需要 Ollama 0.4.6 或更高版本才能在流模式下使用它们。

You need Ollama 0.2.8 or newer to use the functional calling capabilities and Ollama 0.4.6 or newer to use them in streaming mode.

Multimodal

多模态是指模型同时理解和处理来自各种来源(包括文本、图像、音频和其他数据格式)信息的能力。

Multimodality refers to a model’s ability to simultaneously understand and process information from various sources, including text, images, audio, and other data formats.

Ollama 中支持多模态的一些模型是 LLaVABakLLaVA (请参阅 full list )。有关更多详细信息,请参阅 LLaVA: Large Language and Vision Assistant

Some of the models available in Ollama with multimodality support are LLaVA and BakLLaVA (see the full list). For further details, refer to the LLaVA: Large Language and Vision Assistant.

Ollama Message API 提供了一个“images”参数,用于在消息中包含一个 base64 编码图像列表。

The Ollama Message API provides an "images" parameter to incorporate a list of base64-encoded images with the message.

Spring AI 的 Message 接口通过引入 Media 类型来促进多模态 AI 模型。此类型包含有关消息中媒体附件的数据和详细信息,利用 Spring 的 org.springframework.util.MimeType 和用于原始媒体数据的 org.springframework.core.io.Resource

Spring AI’s Message interface facilitates multimodal AI models by introducing the Media type. This type encompasses data and details regarding media attachments in messages, utilizing Spring’s org.springframework.util.MimeType and a org.springframework.core.io.Resource for the raw media data.

下面是一个摘自 OllamaChatModelMultimodalIT.java 的简单代码示例,展示了用户文本与图像的融合。

Below is a straightforward code example excerpted from OllamaChatModelMultimodalIT.java, illustrating the fusion of user text with an image.

var imageResource = new ClassPathResource("/multimodal.test.png");

var userMessage = new UserMessage("Explain what do you see on this picture?",
        new Media(MimeTypeUtils.IMAGE_PNG, this.imageResource));

ChatResponse response = chatModel.call(new Prompt(this.userMessage,
        OllamaOptions.builder().model(OllamaModel.LLAVA)).build());

该示例展示了一个模型将 multimodal.test.png 图像作为输入:

The example shows a model taking as an input the multimodal.test.png image:

multimodal.test

以及文本消息“解释一下你在这张图片上看到了什么?”,并生成如下响应:

along with the text message "Explain what do you see on this picture?", and generating a response like this:

The image shows a small metal basket filled with ripe bananas and red apples. The basket is placed on a surface,
which appears to be a table or countertop, as there's a hint of what seems like a kitchen cabinet or drawer in
the background. There's also a gold-colored ring visible behind the basket, which could indicate that this
photo was taken in an area with metallic decorations or fixtures. The overall setting suggests a home environment
where fruits are being displayed, possibly for convenience or aesthetic purposes.

Structured Outputs

Ollama 提供自定义 Structured Outputs API,确保您的模型生成的响应严格符合您提供的 JSON Schema 。除了现有的 Spring AI 模型无关 Structured Output Converter 之外,这些 API 提供了增强的控制和精度。

Ollama provides custom Structured Outputs APIs that ensure your model generates responses conforming strictly to your provided JSON Schema. In addition to the existing Spring AI model-agnostic Structured Output Converter, these APIs offer enhanced control and precision.

Configuration

Spring AI 允许您使用 OllamaOptions 构建器以编程方式配置响应格式。

Spring AI allows you to configure your response format programmatically using the OllamaOptions builder.

Using the Chat Options Builder

您可以使用 OllamaOptions 构建器以编程方式设置响应格式,如下所示:

You can set the response format programmatically with the OllamaOptions builder as shown below:

String jsonSchema = """
        {
            "type": "object",
            "properties": {
                "steps": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "explanation": { "type": "string" },
                            "output": { "type": "string" }
                        },
                        "required": ["explanation", "output"],
                        "additionalProperties": false
                    }
                },
                "final_answer": { "type": "string" }
            },
            "required": ["steps", "final_answer"],
            "additionalProperties": false
        }
        """;

Prompt prompt = new Prompt("how can I solve 8x + 7 = -23",
        OllamaOptions.builder()
            .model(OllamaModel.LLAMA3_2.getName())
            .format(new ObjectMapper().readValue(jsonSchema, Map.class))
            .build());

ChatResponse response = this.ollamaChatModel.call(this.prompt);

Integrating with BeanOutputConverter Utilities

您可以利用现有的 BeanOutputConverter 实用程序自动从您的领域对象生成 JSON 架构,然后将结构化响应转换为领域特定的实例:

You can leverage existing BeanOutputConverter utilities to automatically generate the JSON Schema from your domain objects and later convert the structured response into domain-specific instances:

record MathReasoning(
    @JsonProperty(required = true, value = "steps") Steps steps,
    @JsonProperty(required = true, value = "final_answer") String finalAnswer) {

    record Steps(
        @JsonProperty(required = true, value = "items") Items[] items) {

        record Items(
            @JsonProperty(required = true, value = "explanation") String explanation,
            @JsonProperty(required = true, value = "output") String output) {
        }
    }
}

var outputConverter = new BeanOutputConverter<>(MathReasoning.class);

Prompt prompt = new Prompt("how can I solve 8x + 7 = -23",
        OllamaOptions.builder()
            .model(OllamaModel.LLAMA3_2.getName())
            .format(outputConverter.getJsonSchemaMap())
            .build());

ChatResponse response = this.ollamaChatModel.call(this.prompt);
String content = this.response.getResult().getOutput().getText();

MathReasoning mathReasoning = this.outputConverter.convert(this.content);

确保您使用 @JsonProperty(required = true,…​) 注解生成一个准确地将字段标记为 required 的架构。虽然这对于 JSON 架构是可选的,但建议将其用于结构化响应才能正常运行。

Ensure you use the @JsonProperty(required = true,…​) annotation for generating a schema that accurately marks fields as required. Although this is optional for JSON Schema, it’s recommended for the structured response to function correctly.

OpenAI API Compatibility

Ollama 与 OpenAI API 兼容,您可以使用 Spring AI OpenAI 客户端与 Ollama 进行通信并使用工具。为此,您需要将 OpenAI 基本 URL 配置为您的 Ollama 实例: spring.ai.openai.chat.base-url=http://localhost:11434 并选择一个提供的 Ollama 模型: spring.ai.openai.chat.options.model=mistral

Ollama is OpenAI API-compatible and you can use the Spring AI OpenAI client to talk to Ollama and use tools. For this, you need to configure the OpenAI base URL to your Ollama instance: spring.ai.openai.chat.base-url=http://localhost:11434 and select one of the provided Ollama models: spring.ai.openai.chat.options.model=mistral.

spring ai ollama over openai

查看 OllamaWithOpenAiChatModelIT.java 测试以获取通过 Spring AI OpenAI 使用 Ollama 的示例。

Check the OllamaWithOpenAiChatModelIT.java tests for examples of using Ollama over Spring AI OpenAI.

HuggingFace Models

Ollama 可以直接访问所有 GGUF Hugging Face 聊天模型。您可以通过名称拉取其中任何一个模型: ollama pull hf.co/<username>/<model-repository> ,或配置自动拉取策略: Auto-pulling Models

Ollama can access, out of the box, all GGUF Hugging Face Chat Models. You can pull any of these models by name: ollama pull hf.co/<username>/<model-repository> or configure the auto-pulling strategy: auto-pulling-models:

spring.ai.ollama.chat.options.model=hf.co/bartowski/gemma-2-2b-it-GGUF
spring.ai.ollama.init.pull-model-strategy=always
  • 以下是使用Gemini将该文本翻译成中文的版本:中文翻译:@ {s0}:指定要使用的 @ {s1}。

  • spring.ai.ollama.chat.options.model: Specifies the Hugging Face GGUF model to use.

  • 好的,这是用Gemini翻译的中文版本:@s0: (可选)在启动时启用自动模型拉取。对于生产环境,您应该预先下载模型以避免延迟:@s1

  • spring.ai.ollama.init.pull-model-strategy=always: (optional) Enables automatic model pulling at startup time. For production, you should pre-download the models to avoid delays: ollama pull hf.co/bartowski/gemma-2-2b-it-GGUF.

Sample Controller

好的,这是用 Gemini 翻译的中文版本:创建一个新的 Spring Boot 项目,并将 spring-ai-starter-model-ollama 添加到你的 pom (或 gradle) 依赖中。

Create a new Spring Boot project and add the spring-ai-starter-model-ollama to your pom (or gradle) dependencies.

src/main/resources 目录下添加一个 application.yaml 文件,以启用和配置 Ollama 聊天模型:

Add a application.yaml file, under the src/main/resources directory, to enable and configure the Ollama chat model:

spring:
  ai:
    ollama:
      base-url: http://localhost:11434
      chat:
        options:
          model: mistral
          temperature: 0.7

base-url 替换为您的 Ollama 服务器 URL。

Replace the base-url with your Ollama server URL.

这将创建一个 OllamaChatModel 实现,您可以将其注入到您的类中。这是一个简单的 @RestController 类示例,它使用聊天模型进行文本生成。

This will create an OllamaChatModel implementation that you can inject into your classes. Here is an example of a simple @RestController class that uses the chat model for text generations.

@RestController
public class ChatController {

    private final OllamaChatModel chatModel;

    @Autowired
    public ChatController(OllamaChatModel chatModel) {
        this.chatModel = chatModel;
    }

    @GetMapping("/ai/generate")
    public Map<String,String> generate(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
        return Map.of("generation", this.chatModel.call(message));
    }

    @GetMapping("/ai/generateStream")
	public Flux<ChatResponse> generateStream(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
        Prompt prompt = new Prompt(new UserMessage(message));
        return this.chatModel.stream(prompt);
    }

}

Manual Configuration

如果您不想使用 Spring Boot 自动配置,您可以手动在应用程序中配置 OllamaChatModelOllamaChatModel 实现了 ChatModelStreamingChatModel ,并使用 Low-level OllamaApi Client 连接到 Ollama 服务。

If you don’t want to use the Spring Boot auto-configuration, you can manually configure the OllamaChatModel in your application. The OllamaChatModel implements the ChatModel and StreamingChatModel and uses the Low-level OllamaApi Client to connect to the Ollama service.

要使用它,请将 spring-ai-ollama 依赖项添加到您的项目的 Maven pom.xml 或 Gradle build.gradle 构建文件中:

To use it, add the spring-ai-ollama dependency to your project’s Maven pom.xml or Gradle build.gradle build files:

  • Maven

  • Gradle

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-ollama</artifactId>
</dependency>
dependencies {
    implementation 'org.springframework.ai:spring-ai-ollama'
}
  1. 参见 Dependency Management 部分,将 Spring AI BOM 添加到你的构建文件中。

Refer to the Dependency Management section to add the Spring AI BOM to your build file.

spring-ai-ollama 依赖项还提供对 OllamaEmbeddingModel 的访问。有关 OllamaEmbeddingModel 的更多信息,请参阅 Ollama Embedding Model 部分。

The spring-ai-ollama dependency provides access also to the OllamaEmbeddingModel. For more information about the OllamaEmbeddingModel refer to the Ollama Embedding Model section.

接下来,创建一个 OllamaChatModel 实例并使用它发送文本生成请求:

Next, create an OllamaChatModel instance and use it to send requests for text generation:

var ollamaApi = OllamaApi.builder().build();

var chatModel = OllamaChatModel.builder()
                    .ollamaApi(ollamaApi)
                    .defaultOptions(
                        OllamaOptions.builder()
                            .model(OllamaModel.MISTRAL)
                            .temperature(0.9)
                            .build())
                    .build();

ChatResponse response = this.chatModel.call(
    new Prompt("Generate the names of 5 famous pirates."));

// Or with streaming responses
Flux<ChatResponse> response = this.chatModel.stream(
    new Prompt("Generate the names of 5 famous pirates."));

OllamaOptions 为所有聊天请求提供配置信息。

The OllamaOptions provides the configuration information for all chat requests.

Low-level OllamaApi Client

OllamaApi 为 Ollama 聊天完成 API Ollama Chat Completion API 提供了一个轻量级的 Java 客户端。

The OllamaApi provides a lightweight Java client for the Ollama Chat Completion API Ollama Chat Completion API.

以下类图说明了 OllamaApi 聊天接口和构建块:

The following class diagram illustrates the OllamaApi chat interfaces and building blocks:

ollama chat completion api

OllamaApi 是一个低级 API,不建议直接使用。请改用 OllamaChatModel

The OllamaApi is a low-level API and is not recommended for direct use. Use the OllamaChatModel instead.

这是一个简单的代码片段,展示了如何以编程方式使用 API:

Here is a simple snippet showing how to use the API programmatically:

OllamaApi ollamaApi = new OllamaApi("YOUR_HOST:YOUR_PORT");

// Sync request
var request = ChatRequest.builder("orca-mini")
    .stream(false) // not streaming
    .messages(List.of(
            Message.builder(Role.SYSTEM)
                .content("You are a geography teacher. You are talking to a student.")
                .build(),
            Message.builder(Role.USER)
                .content("What is the capital of Bulgaria and what is the size? "
                        + "What is the national anthem?")
                .build()))
    .options(OllamaOptions.builder().temperature(0.9).build())
    .build();

ChatResponse response = this.ollamaApi.chat(this.request);

// Streaming request
var request2 = ChatRequest.builder("orca-mini")
    .ttream(true) // streaming
    .messages(List.of(Message.builder(Role.USER)
        .content("What is the capital of Bulgaria and what is the size? " + "What is the national anthem?")
        .build()))
    .options(OllamaOptions.builder().temperature(0.9).build().toMap())
    .build();

Flux<ChatResponse> streamingResponse = this.ollamaApi.streamingChat(this.request2);