OpenAI Text-to-Speech (TTS)
Introduction
音频 API 提供基于 OpenAI 的 TTS(文本转语音)模型的语音端点,使用户能够:
The Audio API provides a speech endpoint based on OpenAI’s TTS (text-to-speech) model, enabling users to:
-
叙述书面博客文章。
-
Narrate a written blog post.
-
生成多种语言的语音音频。
-
Produce spoken audio in multiple languages.
-
使用流式传输提供实时音频输出。
-
Give real-time audio output using streaming.
Prerequisites
-
创建 OpenAI 帐户并获取 API 密钥。您可以在 OpenAI signup page 注册并在 API Keys page 生成 API 密钥。
-
Create an OpenAI account and obtain an API key. You can sign up at the OpenAI signup page and generate an API key on the API Keys page.
-
将
spring-ai-openai
依赖项添加到项目的构建文件中。有关更多信息,请参阅 Dependency Management 部分。 -
Add the
spring-ai-openai
dependency to your project’s build file. For more information, refer to the Dependency Management section.
Auto-configuration
Spring AI 自动配置、启动器模块的工件名称发生了重大变化。请参阅 upgrade notes 以获取更多信息。 There has been a significant change in the Spring AI auto-configuration, starter modules' artifact names. Please refer to the upgrade notes for more information. |
Spring AI 为 OpenAI 文本转语音客户端提供 Spring Boot 自动配置。要启用它,请将以下依赖项添加到项目的 Maven pom.xml
文件中:
Spring AI provides Spring Boot auto-configuration for the OpenAI Text-to-Speech Client.
To enable it add the following dependency to your project’s Maven pom.xml
file:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-starter-model-openai</artifactId>
</dependency>
或添加到您的 Gradle build.gradle
构建文件中:
or to your Gradle build.gradle
build file:
dependencies {
implementation 'org.springframework.ai:spring-ai-starter-model-openai'
}
|
Refer to the Dependency Management section to add the Spring AI BOM to your build file. |
Speech Properties
Connection Properties
spring.ai.openai
前缀用作可让你连接到 Open AI 的属性前缀。
The prefix spring.ai.openai
is used as the property prefix that lets you connect to OpenAI.
Property |
Description |
Default |
spring.ai.openai.base-url |
The URL to connect to |
[role="bare"]https://api.openai.com |
spring.ai.openai.api-key |
The API Key |
- |
spring.ai.openai.organization-id |
Optionally you can specify which organization used for an API request. |
- |
spring.ai.openai.project-id |
Optionally, you can specify which project is used for an API request. |
- |
对于属于多个组织(或通过其旧版用户 API 密钥访问其项目)的用户,您可以选择指定用于 API 请求的组织和项目。这些 API 请求的使用将计为指定组织和项目的使用。 |
For users that belong to multiple organizations (or are accessing their projects through their legacy user API key), optionally, you can specify which organization and project is used for an API request. Usage from these API requests will count as usage for the specified organization and project. |
Configuration Properties
现在通过前缀为 Enabling and disabling of the audio speech auto-configurations are now configured via top level properties with the prefix 要启用,spring.ai.model.audio.speech=openai(默认启用) To enable, spring.ai.model.audio.speech=openai (It is enabled by default) 要禁用,spring.ai.model.audio.speech=none(或任何与 openai 不匹配的值) To disable, spring.ai.model.audio.speech=none (or any value which doesn’t match openai) 此更改旨在允许配置多个模型。 This change is done to allow configuration of multiple models. |
前缀 spring.ai.openai.audio.speech
用作属性前缀,允许您配置 OpenAI 文本转语音客户端。
The prefix spring.ai.openai.audio.speech
is used as the property prefix that lets you configure the OpenAI Text-to-Speech client.
Property | Description | Default |
---|---|---|
spring.ai.model.audio.speech |
Enable Audio Speech Model |
openai |
spring.ai.openai.audio.speech.base-url |
The URL to connect to |
[role="bare"]https://api.openai.com |
spring.ai.openai.audio.speech.api-key |
The API Key |
- |
spring.ai.openai.audio.speech.organization-id |
Optionally you can specify which organization used for an API request. |
- |
spring.ai.openai.audio.speech.project-id |
Optionally, you can specify which project is used for an API request. |
- |
spring.ai.openai.audio.speech.options.model |
ID of the model to use for generating the audio. For OpenAI’s TTS API, use one of the available models: tts-1 or tts-1-hd. |
tts-1 |
spring.ai.openai.audio.speech.options.voice |
The voice to use for synthesis. For OpenAI’s TTS API, One of the available voices for the chosen model: alloy, echo, fable, onyx, nova, and shimmer. |
alloy |
spring.ai.openai.audio.speech.options.response-format |
The format of the audio output. Supported formats are mp3, opus, aac, flac, wav, and pcm. |
mp3 |
spring.ai.openai.audio.speech.options.speed |
The speed of the voice synthesis. The acceptable range is from 0.25 (slowest) to 4.0 (fastest). |
1.0 |
您可以覆盖常见的 |
You can override the common |
所有以 |
All properties prefixed with |
Runtime Options
OpenAiAudioSpeechOptions
类提供了进行文本转语音请求时使用的选项。启动时,使用 spring.ai.openai.audio.speech
指定的选项,但您可以在运行时覆盖这些选项。
The OpenAiAudioSpeechOptions
class provides the options to use when making a text-to-speech request.
On start-up, the options specified by spring.ai.openai.audio.speech
are used but you can override these at runtime.
例如:
For example:
OpenAiAudioSpeechOptions speechOptions = OpenAiAudioSpeechOptions.builder()
.model("tts-1")
.voice(OpenAiAudioApi.SpeechRequest.Voice.ALLOY)
.responseFormat(OpenAiAudioApi.SpeechRequest.AudioResponseFormat.MP3)
.speed(1.0f)
.build();
SpeechPrompt speechPrompt = new SpeechPrompt("Hello, this is a text-to-speech example.", speechOptions);
SpeechResponse response = openAiAudioSpeechModel.call(speechPrompt);
Manual Configuration
添加 spring-ai-openai
依赖到你的项目的 Maven pom.xml
文件中:
Add the spring-ai-openai
dependency to your project’s Maven pom.xml
file:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai</artifactId>
</dependency>
或添加到您的 Gradle build.gradle
构建文件中:
or to your Gradle build.gradle
build file:
dependencies {
implementation 'org.springframework.ai:spring-ai-openai'
}
|
Refer to the Dependency Management section to add the Spring AI BOM to your build file. |
接下来,创建 OpenAiAudioSpeechModel
:
Next, create an OpenAiAudioSpeechModel
:
var openAiAudioApi = new OpenAiAudioApi()
.apiKey(System.getenv("OPENAI_API_KEY"))
.build();
var openAiAudioSpeechModel = new OpenAiAudioSpeechModel(openAiAudioApi);
var speechOptions = OpenAiAudioSpeechOptions.builder()
.responseFormat(OpenAiAudioApi.SpeechRequest.AudioResponseFormat.MP3)
.speed(1.0f)
.model(OpenAiAudioApi.TtsModel.TTS_1.value)
.build();
var speechPrompt = new SpeechPrompt("Hello, this is a text-to-speech example.", speechOptions);
SpeechResponse response = openAiAudioSpeechModel.call(speechPrompt);
// Accessing metadata (rate limit info)
OpenAiAudioSpeechResponseMetadata metadata = response.getMetadata();
byte[] responseAsBytes = response.getResult().getOutput();
Streaming Real-time Audio
语音 API 提供使用分块传输编码的实时音频流支持。这意味着音频可以在完整文件生成并可用之前播放。
The Speech API provides support for real-time audio streaming using chunk transfer encoding. This means that the audio is able to be played before the full file has been generated and made accessible.
var openAiAudioApi = new OpenAiAudioApi()
.apiKey(System.getenv("OPENAI_API_KEY"))
.build();
var openAiAudioSpeechModel = new OpenAiAudioSpeechModel(openAiAudioApi);
OpenAiAudioSpeechOptions speechOptions = OpenAiAudioSpeechOptions.builder()
.voice(OpenAiAudioApi.SpeechRequest.Voice.ALLOY)
.speed(1.0f)
.responseFormat(OpenAiAudioApi.SpeechRequest.AudioResponseFormat.MP3)
.model(OpenAiAudioApi.TtsModel.TTS_1.value)
.build();
SpeechPrompt speechPrompt = new SpeechPrompt("Today is a wonderful day to build something people love!", speechOptions);
Flux<SpeechResponse> responseStream = openAiAudioSpeechModel.stream(speechPrompt);
Example Code
-
OpenAiSpeechModelIT.java 测试提供了一些使用该库的常见示例。
-
The OpenAiSpeechModelIT.java test provides some general examples of how to use the library.