Azure OpenAI Transcriptions

Prerequisites

从 Azure Portal 上的 Azure OpenAI 服务部分获取您的 Azure OpenAI endpoint 和 api-key 。Spring AI 定义了一个名为 spring.ai.azure.openai.api-key 的配置属性，您应该将其设置为从 Azure 获取的 API Key 的值。还有一个名为 spring.ai.azure.openai.endpoint 的配置属性，您应该将其设置为在 Azure 中预配模型时获取的端点 URL。导出环境变量是设置该配置属性的一种方法：

Obtain your Azure OpenAI endpoint and api-key from the Azure OpenAI Service section on the Azure Portal. Spring AI defines a configuration property named spring.ai.azure.openai.api-key that you should set to the value of the API Key obtained from Azure. There is also a configuration property named spring.ai.azure.openai.endpoint that you should set to the endpoint URL obtained when provisioning your model in Azure. Exporting an environment variable is one way to set that configuration property:

Auto-configuration

Spring AI 自动配置、启动器模块的工件名称发生了重大变化。请参阅 upgrade notes 以获取更多信息。

There has been a significant change in the Spring AI auto-configuration, starter modules' artifact names. Please refer to the upgrade notes for more information.

Spring AI 为 Azure OpenAI 转录生成客户端提供 Spring Boot 自动配置。要启用它，请将以下依赖项添加到项目的 Maven pom.xml 文件中：

Spring AI provides Spring Boot auto-configuration for the Azure OpenAI Transcription Generation Client. To enable it, add the following dependency to your project’s Maven pom.xml file:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-azure-openai</artifactId>
</dependency>

或添加到 Gradle build.gradle 构建文件中。

or to your Gradle build.gradle build file.

dependencies {
    implementation 'org.springframework.ai:spring-ai-starter-model-azure-openai'
}

参见 Dependency Management 部分，将 Spring AI BOM 添加到你的构建文件中。

Refer to the Dependency Management section to add the Spring AI BOM to your build file.

Transcription Properties

音频转录自动配置的启用和禁用现在通过前缀为 spring.ai.model.audio.transcription 的顶级属性进行配置。

Enabling and disabling of the audio transcription auto-configurations are now configured via top level properties with the prefix spring.ai.model.audio.transcription.

要启用，请设置 spring.ai.model.audio.transcription=azure-openai（默认启用）

To enable, spring.ai.model.audio.transcription=azure-openai (It is enabled by default)

要禁用，请设置 spring.ai.model.audio.transcription=none（或任何不匹配 azure-openai 的值）

To disable, spring.ai.model.audio.transcription=none (or any value which doesn’t match azure-openai)

此更改旨在允许配置多个模型。

This change is done to allow configuration of multiple models.

前缀 spring.ai.openai.audio.transcription 用作属性前缀，允许您为 OpenAI 图像模型配置重试机制。

The prefix spring.ai.openai.audio.transcription is used as the property prefix that lets you configure the retry mechanism for the OpenAI image model.

Property	Description	Default
spring.ai.azure.openai.audio.transcription.enabled (Removed and no longer valid)	Enable Azure OpenAI transcription model.	true
spring.ai.model.audio.transcription	Enable Azure OpenAI transcription model.	azure-openai
spring.ai.azure.openai.audio.transcription.options.model	ID of the model to use. Only whisper is currently available.	whisper
spring.ai.azure.openai.audio.transcription.options.deployment-name	The deployment name under which the model is deployed.
spring.ai.azure.openai.audio.transcription.options.response-format	The format of the transcript output, in one of these options: json, text, srt, verbose_json, or vtt.	json
spring.ai.azure.openai.audio.transcription.options.prompt	An optional text to guide the model’s style or continue a previous audio segment. The prompt should match the audio language.
spring.ai.azure.openai.audio.transcription.options.language	The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.
spring.ai.azure.openai.audio.transcription.options.temperature	The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.	0
spring.ai.azure.openai.audio.transcription.options.timestamp-granularities	The timestamp granularities to populate for this transcription. response_format must be set verbose_json to use timestamp granularities. Either or both of these options are supported: word, or segment. Note: There is no additional latency for segment timestamps, but generating word timestamps incurs additional latency.	segment

Property

Description

Default

spring.ai.azure.openai.audio.transcription.enabled (Removed and no longer valid)

Enable Azure OpenAI transcription model.

true

spring.ai.model.audio.transcription

Enable Azure OpenAI transcription model.

azure-openai

spring.ai.azure.openai.audio.transcription.options.model

ID of the model to use. Only whisper is currently available.

whisper

spring.ai.azure.openai.audio.transcription.options.deployment-name

The deployment name under which the model is deployed.

spring.ai.azure.openai.audio.transcription.options.response-format

The format of the transcript output, in one of these options: json, text, srt, verbose_json, or vtt.

json

spring.ai.azure.openai.audio.transcription.options.prompt

An optional text to guide the model’s style or continue a previous audio segment. The prompt should match the audio language.

spring.ai.azure.openai.audio.transcription.options.language

The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.

spring.ai.azure.openai.audio.transcription.options.temperature

The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.

spring.ai.azure.openai.audio.transcription.options.timestamp-granularities

The timestamp granularities to populate for this transcription. response_format must be set verbose_json to use timestamp granularities. Either or both of these options are supported: word, or segment. Note: There is no additional latency for segment timestamps, but generating word timestamps incurs additional latency.

segment

Runtime Options

AzureOpenAiAudioTranscriptionOptions 类提供了进行转录时要使用的选项。启动时，使用 spring.ai.azure.openai.audio.transcription 指定的选项，但您可以在运行时覆盖这些选项。

The AzureOpenAiAudioTranscriptionOptions class provides the options to use when making a transcription. On start-up, the options specified by spring.ai.azure.openai.audio.transcription are used, but you can override these at runtime.

例如：

For example:

AzureOpenAiAudioTranscriptionOptions.TranscriptResponseFormat responseFormat = AzureOpenAiAudioTranscriptionOptions.TranscriptResponseFormat.VTT;

AzureOpenAiAudioTranscriptionOptions transcriptionOptions = AzureOpenAiAudioTranscriptionOptions.builder()
    .language("en")
    .prompt("Ask not this, but ask that")
    .temperature(0f)
    .responseFormat(this.responseFormat)
    .build();
AudioTranscriptionPrompt transcriptionRequest = new AudioTranscriptionPrompt(audioFile, this.transcriptionOptions);
AudioTranscriptionResponse response = azureOpenAiTranscriptionModel.call(this.transcriptionRequest);

Manual Configuration

添加 spring-ai-openai 依赖到你的项目的 Maven pom.xml 文件中：

Add the spring-ai-openai dependency to your project’s Maven pom.xml file:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-azure-openai</artifactId>
</dependency>

或添加到 Gradle build.gradle 构建文件中。

or to your Gradle build.gradle build file.

dependencies {
    implementation 'org.springframework.ai:spring-ai-azure-openai'
}

参见 Dependency Management 部分，将 Spring AI BOM 添加到你的构建文件中。

Refer to the Dependency Management section to add the Spring AI BOM to your build file.

接下来，创建一个 AzureOpenAiAudioTranscriptionModel

Next, create a AzureOpenAiAudioTranscriptionModel

var openAIClient = new OpenAIClientBuilder()
    .credential(new AzureKeyCredential(System.getenv("AZURE_OPENAI_API_KEY")))
    .endpoint(System.getenv("AZURE_OPENAI_ENDPOINT"))
    .buildClient();

var azureOpenAiAudioTranscriptionModel = new AzureOpenAiAudioTranscriptionModel(this.openAIClient, null);

var transcriptionOptions = AzureOpenAiAudioTranscriptionOptions.builder()
    .responseFormat(TranscriptResponseFormat.TEXT)
    .temperature(0f)
    .build();

var audioFile = new FileSystemResource("/path/to/your/resource/speech/jfk.flac");

AudioTranscriptionPrompt transcriptionRequest = new AudioTranscriptionPrompt(this.audioFile, this.transcriptionOptions);
AudioTranscriptionResponse response = this.azureOpenAiAudioTranscriptionModel.call(this.transcriptionRequest);