Overview: Deploy AI models in Azure AI Foundry portal
The model catalog in Azure AI Foundry portal is the hub to discover and use a wide range of models for building generative AI applications. Models need to be deployed to make them available for receiving inference requests. Azure AI Foundry offers a comprehensive suite of deployment options for those models depending on your needs and model requirements.
Deploying models
Deployment options vary depending on the model offering:
- Azure OpenAI models: The latest OpenAI models that have enterprise features from Azure with flexible billing options.
- Models-as-a-Service models: These models don't require compute quota from your subscription and are billed per token in a pay-as-you-go fashion.
- Open and custom models: The model catalog offers access to a large variety of models across modalities, including models of open access. You can host open models in your own subscription with a managed infrastructure, virtual machines, and the number of instances for capacity management.
Azure AI Foundry offers four different deployment options:
Name | Azure OpenAI service | Azure AI model inference | Serverless API | Managed compute |
---|---|---|---|---|
Which models can be deployed? | Azure OpenAI models | Azure OpenAI models and Models-as-a-Service | Models-as-a-Service | Open and custom models |
Deployment resource | Azure OpenAI resource | Azure AI services resource | AI project resource | AI project resource |
Requires Hubs/Projects | No | No | Yes | Yes |
Data processing options | Regional Data-zone Global |
Global | Regional | Regional |
Private networking | Yes | Yes | Yes | Yes |
Content filtering | Yes | Yes | Yes | No |
Custom content filtering | Yes | Yes | No | No |
Key-less authentication | Yes | Yes | No | No |
Best suited when | You are planning to use only OpenAI models | You are planning to take advantage of the flagship models in Azure AI catalog, including OpenAI. | You are planning to use a single model from a specific provider (excluding OpenAI). | If you plan to use open models and you have enough compute quota available in your subscription. |
Billing bases | Token usage & PTU | Token usage | Token usage1 | Compute core hours2 |
Deployment instructions | Deploy to Azure OpenAI Service | Deploy to Azure AI model inference | Deploy to Serverless API | Deploy to Managed compute |
1 A minimal endpoint infrastructure is billed per minute. You aren't billed for the infrastructure that hosts the model in pay-as-you-go. After you delete the endpoint, no further charges accrue.
2 Billing is on a per-minute basis, depending on the product tier and the number of instances used in the deployment since the moment of creation. After you delete the endpoint, no further charges accrue.
Tip
To learn more about how to track costs, see Monitor costs for models offered through Azure Marketplace.
How should I think about deployment options?
Azure AI Foundry encourages customers to explore the deployment options and pick the one that best suites their business and technical needs. In general you can use the following thinking process:
Start with Azure AI model inference which is the option with the bigger scope. This allows you to iterate and prototype faster in your application without having to rebuild your architecture each time you decide to change something. If you are using Azure AI Foundry Hubs or Projects, enable it by turning on Azure AI model inference.
When you are looking to use a specific model:
When you are interested in Azure OpenAI models, use the Azure OpenAI Service which offers a wide range of capabilities for them and it's designed for them.
When you are interested in a particular model from Models-as-a-Service, and you don't expect to use any other type of model, use Serverless API endpoints. They allow deployment of a single model under a unique set of endpoint URL and keys.
When your model is not available in Models-as-a-Service and you have compute quota available in your subscription, use Managed Compute which support deployment of open and custom models. It also allows high level of customization of the deployment inference server, protocols, and detailed configuration.