OpenAI Text-to-Speech (TTS)

Introduction

音频 API 提供基于 OpenAI 的 TTS(文本转语音)模型的语音端点,使用户能够:

The Audio API provides a speech endpoint based on OpenAI’s TTS (text-to-speech) model, enabling users to:

  • 叙述书面博客文章。

  • Narrate a written blog post.

  • 生成多种语言的语音音频。

  • Produce spoken audio in multiple languages.

  • 使用流式传输提供实时音频输出。

  • Give real-time audio output using streaming.

Prerequisites

  1. 创建 OpenAI 帐户并获取 API 密钥。您可以在 OpenAI signup page 注册并在 API Keys page 生成 API 密钥。

  2. Create an OpenAI account and obtain an API key. You can sign up at the OpenAI signup page and generate an API key on the API Keys page.

  3. spring-ai-openai 依赖项添加到项目的构建文件中。有关更多信息,请参阅 Dependency Management 部分。

  4. Add the spring-ai-openai dependency to your project’s build file. For more information, refer to the Dependency Management section.

Auto-configuration

Spring AI 自动配置、启动器模块的工件名称发生了重大变化。请参阅 upgrade notes 以获取更多信息。

There has been a significant change in the Spring AI auto-configuration, starter modules' artifact names. Please refer to the upgrade notes for more information.

Spring AI 为 OpenAI 文本转语音客户端提供 Spring Boot 自动配置。要启用它,请将以下依赖项添加到项目的 Maven pom.xml 文件中:

Spring AI provides Spring Boot auto-configuration for the OpenAI Text-to-Speech Client. To enable it add the following dependency to your project’s Maven pom.xml file:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-openai</artifactId>
</dependency>

或添加到您的 Gradle build.gradle 构建文件中:

or to your Gradle build.gradle build file:

dependencies {
    implementation 'org.springframework.ai:spring-ai-starter-model-openai'
}
  1. 参见 Dependency Management 部分,将 Spring AI BOM 添加到你的构建文件中。

Refer to the Dependency Management section to add the Spring AI BOM to your build file.

Speech Properties

Connection Properties

spring.ai.openai 前缀用作可让你连接到 Open AI 的属性前缀。

The prefix spring.ai.openai is used as the property prefix that lets you connect to OpenAI.

Property

Description

Default

spring.ai.openai.base-url

The URL to connect to

[role="bare"]https://api.openai.com

spring.ai.openai.api-key

The API Key

-

spring.ai.openai.organization-id

Optionally you can specify which organization used for an API request.

-

spring.ai.openai.project-id

Optionally, you can specify which project is used for an API request.

-

对于属于多个组织(或通过其旧版用户 API 密钥访问其项目)的用户,您可以选择指定用于 API 请求的组织和项目。这些 API 请求的使用将计为指定组织和项目的使用。

For users that belong to multiple organizations (or are accessing their projects through their legacy user API key), optionally, you can specify which organization and project is used for an API request. Usage from these API requests will count as usage for the specified organization and project.

Configuration Properties

现在通过前缀为 spring.ai.model.audio.speech 的顶级属性配置音频语音自动配置的启用和禁用。

Enabling and disabling of the audio speech auto-configurations are now configured via top level properties with the prefix spring.ai.model.audio.speech.

要启用,spring.ai.model.audio.speech=openai(默认启用)

To enable, spring.ai.model.audio.speech=openai (It is enabled by default)

要禁用,spring.ai.model.audio.speech=none(或任何与 openai 不匹配的值)

To disable, spring.ai.model.audio.speech=none (or any value which doesn’t match openai)

此更改旨在允许配置多个模型。

This change is done to allow configuration of multiple models.

前缀 spring.ai.openai.audio.speech 用作属性前缀,允许您配置 OpenAI 文本转语音客户端。

The prefix spring.ai.openai.audio.speech is used as the property prefix that lets you configure the OpenAI Text-to-Speech client.

Property Description Default

spring.ai.model.audio.speech

Enable Audio Speech Model

openai

spring.ai.openai.audio.speech.base-url

The URL to connect to

[role="bare"]https://api.openai.com

spring.ai.openai.audio.speech.api-key

The API Key

-

spring.ai.openai.audio.speech.organization-id

Optionally you can specify which organization used for an API request.

-

spring.ai.openai.audio.speech.project-id

Optionally, you can specify which project is used for an API request.

-

spring.ai.openai.audio.speech.options.model

ID of the model to use for generating the audio. For OpenAI’s TTS API, use one of the available models: tts-1 or tts-1-hd.

tts-1

spring.ai.openai.audio.speech.options.voice

The voice to use for synthesis. For OpenAI’s TTS API, One of the available voices for the chosen model: alloy, echo, fable, onyx, nova, and shimmer.

alloy

spring.ai.openai.audio.speech.options.response-format

The format of the audio output. Supported formats are mp3, opus, aac, flac, wav, and pcm.

mp3

spring.ai.openai.audio.speech.options.speed

The speed of the voice synthesis. The acceptable range is from 0.25 (slowest) to 4.0 (fastest).

1.0

您可以覆盖常见的 spring.ai.openai.base-urlspring.ai.openai.api-keyspring.ai.openai.organization-idspring.ai.openai.project-id 属性。如果设置了 spring.ai.openai.audio.speech.base-urlspring.ai.openai.audio.speech.api-keyspring.ai.openai.audio.speech.organization-idspring.ai.openai.audio.speech.project-id 属性,它们将优先于公共属性。如果您想对不同的模型和不同的模型端点使用不同的 OpenAI 帐户,这会很有用。

You can override the common spring.ai.openai.base-url, spring.ai.openai.api-key, spring.ai.openai.organization-id and spring.ai.openai.project-id properties. The spring.ai.openai.audio.speech.base-url, spring.ai.openai.audio.speech.api-key, spring.ai.openai.audio.speech.organization-id and spring.ai.openai.audio.speech.project-id properties if set take precedence over the common properties. This is useful if you want to use different OpenAI accounts for different models and different model endpoints.

所有以 spring.ai.openai.image.options 为前缀的属性都可以在运行时被覆盖。

All properties prefixed with spring.ai.openai.image.options can be overridden at runtime.

Runtime Options

OpenAiAudioSpeechOptions 类提供了进行文本转语音请求时使用的选项。启动时,使用 spring.ai.openai.audio.speech 指定的选项,但您可以在运行时覆盖这些选项。

The OpenAiAudioSpeechOptions class provides the options to use when making a text-to-speech request. On start-up, the options specified by spring.ai.openai.audio.speech are used but you can override these at runtime.

例如:

For example:

OpenAiAudioSpeechOptions speechOptions = OpenAiAudioSpeechOptions.builder()
    .model("tts-1")
    .voice(OpenAiAudioApi.SpeechRequest.Voice.ALLOY)
    .responseFormat(OpenAiAudioApi.SpeechRequest.AudioResponseFormat.MP3)
    .speed(1.0f)
    .build();

SpeechPrompt speechPrompt = new SpeechPrompt("Hello, this is a text-to-speech example.", speechOptions);
SpeechResponse response = openAiAudioSpeechModel.call(speechPrompt);

Manual Configuration

添加 spring-ai-openai 依赖到你的项目的 Maven pom.xml 文件中:

Add the spring-ai-openai dependency to your project’s Maven pom.xml file:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-openai</artifactId>
</dependency>

或添加到您的 Gradle build.gradle 构建文件中:

or to your Gradle build.gradle build file:

dependencies {
    implementation 'org.springframework.ai:spring-ai-openai'
}
  1. 参见 Dependency Management 部分,将 Spring AI BOM 添加到你的构建文件中。

Refer to the Dependency Management section to add the Spring AI BOM to your build file.

接下来,创建 OpenAiAudioSpeechModel

Next, create an OpenAiAudioSpeechModel:

var openAiAudioApi = new OpenAiAudioApi()
    .apiKey(System.getenv("OPENAI_API_KEY"))
    .build();

var openAiAudioSpeechModel = new OpenAiAudioSpeechModel(openAiAudioApi);

var speechOptions = OpenAiAudioSpeechOptions.builder()
    .responseFormat(OpenAiAudioApi.SpeechRequest.AudioResponseFormat.MP3)
    .speed(1.0f)
    .model(OpenAiAudioApi.TtsModel.TTS_1.value)
    .build();

var speechPrompt = new SpeechPrompt("Hello, this is a text-to-speech example.", speechOptions);
SpeechResponse response = openAiAudioSpeechModel.call(speechPrompt);

// Accessing metadata (rate limit info)
OpenAiAudioSpeechResponseMetadata metadata = response.getMetadata();

byte[] responseAsBytes = response.getResult().getOutput();

Streaming Real-time Audio

语音 API 提供使用分块传输编码的实时音频流支持。这意味着音频可以在完整文件生成并可用之前播放。

The Speech API provides support for real-time audio streaming using chunk transfer encoding. This means that the audio is able to be played before the full file has been generated and made accessible.

var openAiAudioApi = new OpenAiAudioApi()
    .apiKey(System.getenv("OPENAI_API_KEY"))
    .build();

var openAiAudioSpeechModel = new OpenAiAudioSpeechModel(openAiAudioApi);

OpenAiAudioSpeechOptions speechOptions = OpenAiAudioSpeechOptions.builder()
    .voice(OpenAiAudioApi.SpeechRequest.Voice.ALLOY)
    .speed(1.0f)
    .responseFormat(OpenAiAudioApi.SpeechRequest.AudioResponseFormat.MP3)
    .model(OpenAiAudioApi.TtsModel.TTS_1.value)
    .build();

SpeechPrompt speechPrompt = new SpeechPrompt("Today is a wonderful day to build something people love!", speechOptions);

Flux<SpeechResponse> responseStream = openAiAudioSpeechModel.stream(speechPrompt);

Example Code