Editar

Compartir vía


Deployment types in Azure AI model inference

Azure AI model inference makes models available using the model deployment concept in Azure AI Services resources. Model deployments are also Azure resources and, when created, they give access to a given model under certain configurations. Such configuration includes the infrastructure require to process the requests.

Azure AI model inference provides customers with choices on the hosting structure that fits their business and usage patterns. Those options are translated to different deployments types (or SKUs) that are available at model deployment time in the Azure AI Services resource.

Screenshot showing how to customize the deployment type for a given model deployment.

Different model providers offer different deployments SKUs that you can select from. When selecting a deployment type, consider your data residency needs and call volume/capacity requirements.

Deployment types for Azure OpenAI models

The service offers two main types of deployments: standard and provisioned. For a given deployment type, customers can align their workloads with their data processing requirements by choosing an Azure geography (Standard or Provisioned-Managed), Microsoft specified data zone (DataZone-Standard or DataZone Provisioned-Managed), or Global (Global-Standard or Global Provisioned-Managed) processing options.

To learn more about deployment options for Azure OpenAI models see Azure OpenAI documentation.

Deployment types for Models-as-a-Service models

Models from third-party model providers with pay-as-you-go billing (collectively called Models-as-a-Service), makes models available in Azure AI model inference under standard deployments with a Global processing option (Global-Standard).

Models-as-a-Service offers regional deployment options under Serverless API endpoints in Azure AI Foundry. Prompts and outputs are processed within the geography specified during deployment. However, those deployments can't be accessed using the Azure AI model inference endpoint in Azure AI Services.

Global-Standard

Global deployments leverage Azure's global infrastructure to dynamically route traffic to the data center with best availability for each request. Global standard provides the highest default quota and eliminates the need to load balance across multiple resources. Data stored at rest remains in the designated Azure geography, while data may be processed for inferencing in any Azure location. Learn more about data residency.

Control deployment options

Administrators can control which model deployment types are available to their users by using Azure Policies. Learn more about How to control AI model deployment with custom policies.