Count the # of Prompt caching tokens for Azure OpenAI service

youyang 0

Hi, Azure team, I deploy the gpt-4o-mini-2024-07-18 model on azure openai service and call it using AzureOpenAI client:

        client = AzureOpenAI(
            api_key=<api key>,
            azure_endpoint=https://xxxx.openai.azure.com/
            api_version=2024-10-01-preview,
        )

and send messages using:

    completion = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        stream=False,
        temperature=0.4,
    )

However, when I print the completion.Usage, it outputs:

usage=CompletionUsage(completion_tokens=212, prompt_tokens=12554, total_tokens=12766)

I can't find fields like "prompt_tokens_details" or "cached_tokens" as the shown in https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/prompt-caching Is there any work around if I want to count the # of cached tokens in prompt?

Thanks

navba-MSFT 25,765 Reputation points Microsoft Employee

2024-11-22T05:51:20.6066667+00:00

@youyang Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

.

You can leverage the Azure OpenAI metrics as shown below to gather the inference and cached token usage details:

You can also apply Splitting for the model deployment name:

Hope this helps.
navba-MSFT 25,765 Reputation points Microsoft Employee

2024-11-25T05:22:21.8966667+00:00

@youyang Just following up to check if my suggestion helped. Please let me know if you have any further queries. I would be happy to help.

Your answer