Using Chat/Embedding Response Usage

Overview

Spring AI 通过在 Usage 接口中引入 getNativeUsage() 方法并提供 DefaultUsage 实现,增强了其模型使用处理能力。这一改变简化了不同 AI 模型跟踪和报告其使用指标的方式,同时保持了框架内的一致性。

Spring AI has enhanced its Model Usage handling by introducing getNativeUsage() method in the Usage interface and providing a DefaultUsage implementation. This change simplifies how different AI models can track and report their usage metrics while maintaining consistency across the framework.

Key Changes

Usage Interface Enhancement

Usage 接口现在包含一个新方法:

The Usage interface now includes a new method:

Object getNativeUsage();

此方法允许访问模型特定的本机使用数据,从而在需要时实现更详细的使用跟踪。

This method allows access to the model-specific native usage data, enabling more detailed usage tracking when needed.

Using with ChatModel

Here’s a complete example showing how to track usage with OpenAI’s ChatModel:

@SpringBootConfiguration
public class Configuration {

        @Bean
        public OpenAiApi chatCompletionApi() {
            return OpenAiApi.builder()
                .apiKey(System.getenv("OPENAI_API_KEY"))
                .build();
        }

        @Bean
        public OpenAiChatModel openAiClient(OpenAiApi openAiApi) {
            return OpenAiChatModel.builder()
                .openAiApi(openAiApi)
                .build();
        }

    }

@Service
public class ChatService {

    private final OpenAiChatModel chatModel;

    public ChatService(OpenAiChatModel chatModel) {
        this.chatModel = chatModel;
    }

    public void demonstrateUsage() {
        // Create a chat prompt
        Prompt prompt = new Prompt("What is the weather like today?");

        // Get the chat response
        ChatResponse response = this.chatModel.call(prompt);

        // Access the usage information
        Usage usage = response.getMetadata().getUsage();

        // Get standard usage metrics
        System.out.println("Prompt Tokens: " + usage.getPromptTokens());
        System.out.println("Completion Tokens: " + usage.getCompletionTokens());
        System.out.println("Total Tokens: " + usage.getTotalTokens());

        // Access native OpenAI usage data with detailed token information
        if (usage.getNativeUsage() instanceof org.springframework.ai.openai.api.OpenAiApi.Usage) {
            org.springframework.ai.openai.api.OpenAiApi.Usage nativeUsage =
                (org.springframework.ai.openai.api.OpenAiApi.Usage) usage.getNativeUsage();

            // Detailed prompt token information
            System.out.println("Prompt Tokens Details:");
            System.out.println("- Audio Tokens: " + nativeUsage.promptTokensDetails().audioTokens());
            System.out.println("- Cached Tokens: " + nativeUsage.promptTokensDetails().cachedTokens());

            // Detailed completion token information
            System.out.println("Completion Tokens Details:");
            System.out.println("- Reasoning Tokens: " + nativeUsage.completionTokenDetails().reasoningTokens());
            System.out.println("- Accepted Prediction Tokens: " + nativeUsage.completionTokenDetails().acceptedPredictionTokens());
            System.out.println("- Audio Tokens: " + nativeUsage.completionTokenDetails().audioTokens());
            System.out.println("- Rejected Prediction Tokens: " + nativeUsage.completionTokenDetails().rejectedPredictionTokens());
        }
    }
}

Using with ChatClient

If you are using the ChatClient , you can access the usage information using the ChatResponse object:

If you are using the ChatClient, you can access the usage information using the ChatResponse object:

// Create a chat prompt
Prompt prompt = new Prompt("What is the weather like today?");

// Create a chat client
ChatClient chatClient = ChatClient.create(chatModel);

// Get the chat response
ChatResponse response = chatClient.prompt(prompt)
        .call()
        .chatResponse();

// Access the usage information
Usage usage = response.getMetadata().getUsage();

Benefits

Standardization : Provides a consistent way to handle usage across different AI models Flexibility : Supports model-specific usage data through the native usage feature Simplification : Reduces boilerplate code with the default implementation Extensibility : Easy to extend for specific model requirements while maintaining compatibility

Standardization: Provides a consistent way to handle usage across different AI models Flexibility: Supports model-specific usage data through the native usage feature Simplification: Reduces boilerplate code with the default implementation Extensibility: Easy to extend for specific model requirements while maintaining compatibility

Type Safety Considerations

当处理原生用法数据时,请仔细考虑类型转换:

When working with native usage data, consider type casting carefully:

// Safe way to access native usage
if (usage.getNativeUsage() instanceof org.springframework.ai.openai.api.OpenAiApi.Usage) {
    org.springframework.ai.openai.api.OpenAiApi.Usage nativeUsage =
        (org.springframework.ai.openai.api.OpenAiApi.Usage) usage.getNativeUsage();
    // Work with native usage data
}