Overview: Deploy AI models in Azure AI Foundry portal

發行項
03/25/2025

The model catalog in Azure AI Foundry portal is the hub to discover and use a wide range of models for building generative AI applications. Models need to be deployed to make them available for receiving inference requests. Azure AI Foundry offers a comprehensive suite of deployment options for models, depending on your needs and model requirements.

Deploying models

Deployment options vary depending on the model offering:

Azure OpenAI models: The latest OpenAI models that have enterprise features from Azure with flexible billing options.
Models-as-a-Service models: These models don't require compute quota from your subscription and are billed per token in a pay-as-you-go fashion.
Open and custom models: The model catalog offers access to a large variety of models across modalities, including models of open access. You can host open models in your own subscription with a managed infrastructure, virtual machines, and the number of instances for capacity management.

Azure AI Foundry offers four different deployment options:

Name	Azure OpenAI service	Azure AI model inference	Serverless API	Managed compute
Which models can be deployed?	Azure OpenAI models	Azure OpenAI models and Models-as-a-Service	Models-as-a-Service	Open and custom models
Deployment resource	Azure OpenAI resource	Azure AI services resource	AI project resource	AI project resource
Requires Hubs/Projects	No	No	Yes	Yes
Data processing options	Regional Data-zone Global	Global	Regional	Regional
Private networking	Yes	Yes	Yes	Yes
Content filtering	Yes	Yes	Yes	No
Custom content filtering	Yes	Yes	No	No
Key-less authentication	Yes	Yes	No	No
Best suited when	You're planning to use only OpenAI models	You're planning to take advantage of the flagship models in Azure AI catalog, including OpenAI.	You're planning to use a single model from a specific provider (excluding OpenAI).	If you plan to use open models and you have enough compute quota available in your subscription.
Billing bases	Token usage & provisioned throughput units	Token usage	Token usage¹	Compute core hours²
Deployment instructions	Deploy to Azure OpenAI Service	Deploy to Azure AI model inference	Deploy to Serverless API	Deploy to Managed compute

¹ A minimal endpoint infrastructure is billed per minute. You aren't billed for the infrastructure that hosts the model in pay-as-you-go. After you delete the endpoint, no further charges accrue.

² Billing is on a per-minute basis, depending on the product tier and the number of instances used in the deployment since the moment of creation. After you delete the endpoint, no further charges accrue.

Tip

To learn more about how to track costs, see Monitor costs for models offered through Azure Marketplace.

How should I think about deployment options?

Azure AI Foundry encourages you to explore various deployment options and choose the one that best suites your business and technical needs. In general, Consider using the following approach to select a deployment option:

Start with Azure AI model inference, which is the option with the largest scope. This option allows you to iterate and prototype faster in your application without having to rebuild your architecture each time you decide to change something. If you're using Azure AI Foundry hubs or projects, enable this option by turning on the Azure AI model inference feature.
When you're looking to use a specific model:
- If you're interested in Azure OpenAI models, use the Azure OpenAI Service. This option is designed for Azure OpenAI models and offers a wide range of capabilities for them.
- If you're interested in a particular model from Models-as-a-Service, and you don't expect to use any other type of model, use Serverless API endpoints. Serverless endpoints allow deployment of a single model under a unique set of endpoint URL and keys.
When your model isn't available in Models-as-a-Service and you have compute quota available in your subscription, use Managed Compute, which supports deployment of open and custom models. It also allows a high level of customization of the deployment inference server, protocols, and detailed configuration.

共用方式為

Overview: Deploy AI models in Azure AI Foundry portal

Deploying models

How should I think about deployment options?

意見反應

其他資源

共用方式為

Overview: Deploy AI models in Azure AI Foundry portal

Deploying models

How should I think about deployment options?

Related content

意見反應

其他資源