Plan to manage costs for model inference in Azure AI Services
This article describes how you can plan for and manage costs for model inference in Azure AI Services. After you start using model inference in Azure AI Services resources, use Cost Management features to set budgets and monitor costs.
Although this article is about planning for and managing costs for model inference in Azure AI Services, you're billed for all Azure services and resources used in your Azure subscription.
Prerequisites
- Cost analysis in Cost Management supports most Azure account types, but not all of them. To view the full list of supported account types, see Understand Cost Management data.
- To view cost data, you need at least read access for an Azure account. For information about assigning access to cost management data, see Assign access to data.
Understand model inference billing model
Models deployed in Azure AI Services are charged per 1,000 tokens. Language models understand and process text by breaking it down into tokens. For reference, each token is roughly four characters for typical English text. Costs per token vary depending on which model series you choose. Models that can process images break down images in tokens too. The number of tokens per image depends on the model and the resolution of the input image.
Token costs are for both input and output. For example, suppose you have a 1,000 token JavaScript code sample that you ask a model to convert to Python. You would be charged approximately 1,000 tokens for the initial input request sent, and 1,000 more tokens for the output that is received in response for a total of 2,000 tokens.
In practice, for this type of completion call, the token input/output wouldn't be perfectly 1:1. A conversion from one programming language to another could result in a longer or shorter output depending on many factors. One such factor is the value assigned to the max_tokens
parameter.
Cost breakdown
To understand the breakdown of what makes up the cost, it can be helpful to use Cost Analysis tool in Azure portal. Follow these steps to understand the cost of inference:
Go to Azure AI Foundry Portal.
In the upper right corner of the screen, select on the name of your Azure AI Services resource, or if you're working on an AI project, on the name of the project.
Select the name of the project. Azure portal opens in a new window.
Under Cost Management select Cost analysis
By default, cost analysis is scoped to the selected resource group.
Important
It's important to scope Cost Analysis to the resource group where the Azure AI Services resource is deployed. Cost meters associated with some provider model providers, like Mistral AI or Cohere, are displayed under the resource group instead of the Azure AI Services resource.
Modify Group by to Meter. You can now see that for this particular resource group, the source of the costs comes from different models series.
The following sections explain the entries in details.
Azure OpenAI and Microsoft models
Azure OpenAI and Microsoft's family of models (like Phi) are charged directly and they show up as billing meters under each Azure AI services resource. This billing happens directly through Microsoft. When you inspect your bill, you notice billing meters accounting for inputs and outputs for each consumed model.
Provider models
Models provided by another provider, like Mistral AI, Cohere, Meta AI, or AI21 Labs, are billed using Azure Marketplace. As opposite to Microsoft billing meters, those entries are associated with the resource group where your Azure AI services is deployed instead of to the Azure AI Services resource itself. You see entries under the Service Name SaaS accounting for inputs and outputs for each consumed model.
Using Azure Prepayment
You can pay for Azure OpenAI and Microsoft's models charges with your Azure Prepayment credit. However, you can't use Azure Prepayment credit to pay for charges for other provider models given they're billed through Azure Marketplace.
HTTP Error response code and billing status
If the service performs processing, you're charged even if the status code isn't successful (not 200). For example, a 400 error due to a content filter or input limit, or a 408 error due to a time-out.
If the service doesn't perform processing, you aren't charged. For example, a 401 error due to authentication or a 429 error due to exceeding the Rate Limit.
Other costs
Enabling capabilities such as sending data to Azure Monitor Logs and alerting incurs extra costs for those services. These costs are visible under those other services and at the subscription level, but aren't visible when scoped just to your Azure AI services resource.
Monitor costs
Azure resource usage unit costs vary by time intervals, such as seconds, minutes, hours, and days, or by unit usage, such as bytes and megabytes. As soon as Azure AI services use starts, costs can be incurred and you can see the costs in the cost analysis.
You can get more detailed billing information by using Cost Analysis:
To understand the breakdown of what makes up that cost, it can be helpful to use Cost Analysis tool in Azure portal.
Go to Azure AI Foundry Portal.
In the upper right corner of the screen, select on the name of your Azure AI Services resource, or if you're working on an AI project, on the name of the project.
Select the name of the project. Azure portal opens in a new window.
Under Cost Management select Cost analysis
By default, cost analysis is scoped to the resource group you have selected.
Since we're seeing the cost of all the resource group, it's useful to see the cost by resource. In that case, select View > Cost by resource.
Now you can see the resources generating each of the billing meters.
Azure OpenAI models and Microsoft models, as explained before, are displayed as meters under each Azure AI services resource:
Some providers' models are displayed as meters under Global resources. Notice that the word Global isn't related to the SKU of the model deployment (for instance, Global standard). If you have multiple Azure AI services resources, your bill contains one entry for each model for each Azure AI services resource. The resource meters have the format [model-name]-[GUID] where [GUID] is an identifier unique an associated with a given Azure AI Services resource. You notice billing meters accounting for inputs and outputs for each model you have consumed.
It's important to understand scope when you evaluate costs associated with Azure AI Services. If your resources are part of the same resource group, you can scope Cost Analysis at that level to understand the effect on costs. If your resources are spread across multiple resource groups, you can scope to the subscription level.
Create budgets
You can create budgets to manage costs and create alerts that notify stakeholders of spending anomalies and overspending risks. Alerts are based on spending compared to budget and cost thresholds. You create budgets and alerts for Azure subscriptions and resource groups. They're useful as part of an overall cost monitoring strategy.
You can create budgets with filters for specific resources or services in Azure if you want more granularity present in your monitoring. Filters help ensure that you don't accidentally create new resources that cost you more money. For more information about the filter options available when you create a budget, see Group and filter options.
Export cost data
You can also export your cost data to a storage account, which is helpful when you need others to do extra data analysis for costs. For example, a finance team can analyze the data using Excel or Power BI. You can export your costs on a daily, weekly, or monthly schedule and set a custom date range. We recommend exporting cost data as the way to retrieve cost datasets.
Next steps
- Learn how to optimize your cloud investment with cost management.
- Learn more about managing costs with cost analysis.
- Learn about how to prevent unexpected costs.
- Take the Cost Management guided learning course.