OpenAI Transcriptions

OpenAI Transcriptions

Spring AI 支持 OpenAI’s Transcription model

Spring AI supports OpenAI’s Transcription model.

Prerequisites

您需要使用 OpenAI 创建 API 密钥来访问 ChatGPT 模型。在 OpenAI signup page上创建一个帐户并在 API Keys page上生成令牌。Spring AI 项目定义了一个配置属性,名为 spring.ai.openai.api-key,您应将其设置为从 openai.com 获得的 `API Key`的值。导出环境变量是设置该配置属性的一种方法:

You will need to create an API key with OpenAI to access ChatGPT models. Create an account at OpenAI signup page and generate the token on the API Keys page. The Spring AI project defines a configuration property named spring.ai.openai.api-key that you should set to the value of the API Key obtained from openai.com. Exporting an environment variable is one way to set that configuration property:

Auto-configuration

Spring AI 自动配置、启动器模块的工件名称发生了重大变化。请参阅 upgrade notes 以获取更多信息。

There has been a significant change in the Spring AI auto-configuration, starter modules' artifact names. Please refer to the upgrade notes for more information.

Spring AI 为 OpenAI 转录客户端提供了 Spring Boot 自动配置。要启用它,请将以下依赖项添加到项目的 Maven pom.xml 文件中:

Spring AI provides Spring Boot auto-configuration for the OpenAI Transcription Client. To enable it add the following dependency to your project’s Maven pom.xml file:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-openai</artifactId>
</dependency>

或添加到 Gradle build.gradle 构建文件中。

or to your Gradle build.gradle build file.

dependencies {
    implementation 'org.springframework.ai:spring-ai-starter-model-openai'
}
  1. 参见 Dependency Management 部分,将 Spring AI BOM 添加到你的构建文件中。

Refer to the Dependency Management section to add the Spring AI BOM to your build file.

Transcription Properties

Connection Properties

spring.ai.openai 前缀用作可让你连接到 Open AI 的属性前缀。

The prefix spring.ai.openai is used as the property prefix that lets you connect to OpenAI.

Property

Description

Default

spring.ai.openai.base-url

The URL to connect to

[role="bare"]https://api.openai.com

spring.ai.openai.api-key

The API Key

-

spring.ai.openai.organization-id

Optionally you can specify which organization used for an API request.

-

spring.ai.openai.project-id

Optionally, you can specify which project is used for an API request.

-

对于属于多个组织(或通过其旧版用户 API 密钥访问其项目)的用户,您可以选择指定用于 API 请求的组织和项目。这些 API 请求的使用将计为指定组织和项目的使用。

For users that belong to multiple organizations (or are accessing their projects through their legacy user API key), optionally, you can specify which organization and project is used for an API request. Usage from these API requests will count as usage for the specified organization and project.

Configuration Properties

音频转录自动配置的启用和禁用现在通过前缀为 spring.ai.model.audio.transcription 的顶级属性进行配置。

Enabling and disabling of the audio transcription auto-configurations are now configured via top level properties with the prefix spring.ai.model.audio.transcription.

要启用,spring.ai.model.audio.transcription=openai(默认启用)

To enable, spring.ai.model.audio.transcription=openai (It is enabled by default)

要禁用,spring.ai.model.audio.transcription=none(或任何不匹配 openai 的值)

To disable, spring.ai.model.audio.transcription=none (or any value which doesn’t match openai)

此更改旨在允许配置多个模型。

This change is done to allow configuration of multiple models.

前缀 spring.ai.openai.audio.transcription 用作属性前缀,允许您配置 OpenAI 转录模型的重试机制。

The prefix spring.ai.openai.audio.transcription is used as the property prefix that lets you configure the retry mechanism for the OpenAI transcription model.

Property Description Default

spring.ai.model.audio.transcription

Enable OpenAI Audio Transcription Model

openai

spring.ai.openai.audio.transcription.base-url

The URL to connect to

[role="bare"]https://api.openai.com

spring.ai.openai.audio.transcription.api-key

The API Key

-

spring.ai.openai.audio.transcription.organization-id

Optionally you can specify which organization used for an API request.

-

spring.ai.openai.audio.transcription.project-id

Optionally, you can specify which project is used for an API request.

-

spring.ai.openai.audio.transcription.options.model

ID of the model to use. Only whisper-1 (which is powered by our open source Whisper V2 model) is currently available.

whisper-1

spring.ai.openai.audio.transcription.options.response-format

The format of the transcript output, in one of these options: json, text, srt, verbose_json, or vtt.

json

spring.ai.openai.audio.transcription.options.prompt

An optional text to guide the model’s style or continue a previous audio segment. The prompt should match the audio language.

spring.ai.openai.audio.transcription.options.language

The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.

spring.ai.openai.audio.transcription.options.temperature

The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.

0

spring.ai.openai.audio.transcription.options.timestamp_granularities

The timestamp granularities to populate for this transcription. response_format must be set verbose_json to use timestamp granularities. Either or both of these options are supported: word, or segment. Note: There is no additional latency for segment timestamps, but generating word timestamps incurs additional latency.

segment

您可以覆盖通用的 spring.ai.openai.base-urlspring.ai.openai.api-keyspring.ai.openai.organization-idspring.ai.openai.project-id 属性。如果设置了 spring.ai.openai.audio.transcription.base-urlspring.ai.openai.audio.transcription.api-keyspring.ai.openai.audio.transcription.organization-idspring.ai.openai.audio.transcription.project-id 属性,它们将优先于通用属性。如果您希望为不同的模型和不同的模型端点使用不同的 OpenAI 帐户,这将非常有用。

You can override the common spring.ai.openai.base-url, spring.ai.openai.api-key, spring.ai.openai.organization-id and spring.ai.openai.project-id properties. The spring.ai.openai.audio.transcription.base-url, spring.ai.openai.audio.transcription.api-key, spring.ai.openai.audio.transcription.organization-id and spring.ai.openai.audio.transcription.project-id properties if set take precedence over the common properties. This is useful if you want to use different OpenAI accounts for different models and different model endpoints.

所有带有 spring.ai.openai.transcription.options 前缀的属性都可以在运行时被覆盖。

All properties prefixed with spring.ai.openai.transcription.options can be overridden at runtime.

Runtime Options

OpenAiAudioTranscriptionOptions 类提供了在制作转录文时可用的选项。在启动时,使用 spring.ai.openai.audio.transcription 指定的选项,但您可以在运行时覆盖这些选项。

The OpenAiAudioTranscriptionOptions class provides the options to use when making a transcription. On start-up, the options specified by spring.ai.openai.audio.transcription are used but you can override these at runtime.

例如:

For example:

OpenAiAudioApi.TranscriptResponseFormat responseFormat = OpenAiAudioApi.TranscriptResponseFormat.VTT;

OpenAiAudioTranscriptionOptions transcriptionOptions = OpenAiAudioTranscriptionOptions.builder()
    .language("en")
    .prompt("Ask not this, but ask that")
    .temperature(0f)
    .responseFormat(this.responseFormat)
    .build();
AudioTranscriptionPrompt transcriptionRequest = new AudioTranscriptionPrompt(audioFile, this.transcriptionOptions);
AudioTranscriptionResponse response = openAiTranscriptionModel.call(this.transcriptionRequest);

Manual Configuration

添加 spring-ai-openai 依赖到你的项目的 Maven pom.xml 文件中:

Add the spring-ai-openai dependency to your project’s Maven pom.xml file:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-openai</artifactId>
</dependency>

或添加到 Gradle build.gradle 构建文件中。

or to your Gradle build.gradle build file.

dependencies {
    implementation 'org.springframework.ai:spring-ai-openai'
}
  1. 参见 Dependency Management 部分,将 Spring AI BOM 添加到你的构建文件中。

Refer to the Dependency Management section to add the Spring AI BOM to your build file.

接下来,创建一个 OpenAiAudioTranscriptionModel

Next, create a OpenAiAudioTranscriptionModel

var openAiAudioApi = new OpenAiAudioApi(System.getenv("OPENAI_API_KEY"));

var openAiAudioTranscriptionModel = new OpenAiAudioTranscriptionModel(this.openAiAudioApi);

var transcriptionOptions = OpenAiAudioTranscriptionOptions.builder()
    .responseFormat(TranscriptResponseFormat.TEXT)
    .temperature(0f)
    .build();

var audioFile = new FileSystemResource("/path/to/your/resource/speech/jfk.flac");

AudioTranscriptionPrompt transcriptionRequest = new AudioTranscriptionPrompt(this.audioFile, this.transcriptionOptions);
AudioTranscriptionResponse response = openAiTranscriptionModel.call(this.transcriptionRequest);

Example Code