Deployed model using unclear tokens

Matej Jakubčík 85 Reputation points
2025-03-05T08:23:08.54+00:00

I have deployed a gpt4 model in Azure foundry. The problem is that sometimes in the metrics it is displaying that the model is using tokens but I did not do anything with it. How is it possible? What could be the cause of that? What can I do in order to control the number of tokens this model process in order to not be surprised with costly bill?

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,222 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Alekhya Vaddepally 0 Reputation points Microsoft External Staff
    2025-03-05T14:27:11.4866667+00:00

    Hi Matej Jakubčík,

    When you deploy a model like GPT-4 in Azure, the token usage can occur even if you are not actively making requests.

    these are the possible reasons for why you are seeing token usage:

    System Activity or Background Processes may send automated requests to consume tokens

    If Retained Sessions application sessions open, interactions might still be happening without you realizing it, and also other applications, scripts, or users with access may be making requests.

    To control the number of tokens processed and avoid unexpected costs, consider the following steps:

    Use Azure Monitor and Application Insights to track requests being made to the model. This will help identify the source of token usage.

    You can also define token limits per request and set quotas using Azure OpenAI service configuration.

    check only authorized applications and users can send requests. Rotate keys if needed.

    Reduce unnecessary token usage by refining your prompt structures and limiting response length

    Set up budget alerts in Azure Cost Management to avoid unexpected charges.

    By implementing these strategies, you can gain more control over token usage and manage costs effectively.

    https://learn.microsoft.com/en-us/azure/ai-foundry/model-inference/how-to/manage-costs#understand-model-inference-billing-model

    https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/manage-costs#understand-the-azure-openai-full-billing-model

    https://learn.microsoft.com/en-us/azure/azure-monitor/app/app-insights-overview

    https://learn.microsoft.com/en-us/azure/ai-services/openai/quotas-limits

    If the answer is helpful, please click Accept Answer and kindly upvote it so that other people who faces similar issue may get benefitted from it.

    Let me know if you have any further Queries.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.