Hi Matej Jakubčík,
When you deploy a model like GPT-4 in Azure, the token usage can occur even if you are not actively making requests.
these are the possible reasons for why you are seeing token usage:
System Activity or Background Processes may send automated requests to consume tokens
If Retained Sessions application sessions open, interactions might still be happening without you realizing it, and also other applications, scripts, or users with access may be making requests.
To control the number of tokens processed and avoid unexpected costs, consider the following steps:
Use Azure Monitor and Application Insights to track requests being made to the model. This will help identify the source of token usage.
You can also define token limits per request and set quotas using Azure OpenAI service configuration.
check only authorized applications and users can send requests. Rotate keys if needed.
Reduce unnecessary token usage by refining your prompt structures and limiting response length
Set up budget alerts in Azure Cost Management to avoid unexpected charges.
By implementing these strategies, you can gain more control over token usage and manage costs effectively.
https://learn.microsoft.com/en-us/azure/azure-monitor/app/app-insights-overview
https://learn.microsoft.com/en-us/azure/ai-services/openai/quotas-limits
If the answer is helpful, please click Accept Answer and kindly upvote it so that other people who faces similar issue may get benefitted from it.
Let me know if you have any further Queries.