What is Azure AI model inference?

Article
01/24/2025

Azure AI model inference provides access to the most powerful models available in the Azure AI model catalog. The models come from key model providers in the industry, including OpenAI, Microsoft, Meta, Mistral, Cohere, G42, and AI21 Labs. These models can be integrated with software solutions to deliver a wide range of tasks that include content generation, summarization, image understanding, semantic search, and code generation.

Azure AI model inference provides a way to consume models as APIs without hosting them on your infrastructure. Models are hosted in a Microsoft-managed infrastructure, which enables API-based access to the model provider's model. API-based access can dramatically reduce the cost of accessing a model and simplify the provisioning experience.

Azure AI model inference is part of Azure AI Services, and users can access the service through REST APIs, SDKs in several languages such as Python, C#, JavaScript, and Java. You can also use the Azure AI model inference from Azure AI Foundry by configuring a connection.

Models

You can get access to the key model providers in the industry including OpenAI, Microsoft, Meta, Mistral, Cohere, G42, and AI21 Labs. Model providers define the license terms and set the price for use of their models. The following list shows all the models available:

Tip

See the Models article for a detailed view of the models, capabilities, and details.

Provider	Models
AI21 Labs	- AI21-Jamba-1.5-Mini - AI21-Jamba-1.5-Large
Azure OpenAI	- o1 - gpt-4o - o1-preview - o1-mini - gpt-4o-mini - text-embedding-3-large - text-embedding-3-small
Cohere	- Cohere-embed-v3-english - Cohere-embed-v3-multilingual - Cohere-command-r-plus-08-2024 - Cohere-command-r-08-2024 - Cohere-command-r-plus - Cohere-command-r
Core42	- jais-30b-chat
Meta	- Llama-3.3-70B-Instruct - Llama-3.2-11B-Vision-Instruct - Llama-3.2-90B-Vision-Instruct - Meta-Llama-3.1-405B-Instruct - Meta-Llama-3-8B-Instruct - Meta-Llama-3.1-70B-Instruct - Meta-Llama-3.1-8B-Instruct - Meta-Llama-3-70B-Instruct
Microsoft	- Phi-3-mini-128k-instruct - Phi-3-mini-4k-instruct - Phi-3-small-8k-instruct - Phi-3-medium-128k-instruct - Phi-3-medium-4k-instruct - Phi-3.5-vision-instruct - Phi-3.5-MoE-instruct - Phi-3-small-128k-instruct - Phi-3.5-mini-instruct - Phi-4
Mistral AI	- Ministral-3B - Mistral-large - Mistral-small - Mistral-Nemo - Mistral-large-2407 - Mistral-Large-2411 - Codestral-2501
NTT Data	- Tsuzumi-7b

Pricing

For models from non-Microsoft providers (for example, Meta AI and Mistral models), billing is through Azure Marketplace. For such models, you're required to subscribe to the particular model offering in accordance with the Microsoft Commercial Marketplace Terms of Use. Users accept license terms for use of the models. Pricing information for consumption is provided during deployment.

For Microsoft models (for example, Phi-3 models and Azure OpenAI models) billing is via Azure meters as First Party Consumption Services. As described in the Product Terms, you purchase First Party Consumption Services by using Azure meters, but they aren't subject to Azure service terms.

Tip

Learn how to monitor and manage cost in Azure AI model inference.

Responsible AI

At Microsoft, we're committed to the advancement of AI driven by principles that put people first. Generative models such as the ones available in Azure AI models have significant potential benefits, but without careful design and thoughtful mitigations, such models have the potential to generate incorrect or even harmful content.

Microsoft helps guard against abuse and unintended harm by taking the following actions:

Incorporating Microsoft's principles for responsible AI use
Adopting a code of conduct for use of the service
Building content filters to support customers
Providing responsible AI information and guidance that customers should consider when using Azure OpenAI.

Getting started

Azure AI model inference is a new feature offering on Azure AI Services resources. You can get started with it the same way as any other Azure product where you create and configure your resource for Azure AI model inference, or instance of the service, in your Azure Subscription. You can create as many resources as needed and configure them independently in case you have multiple teams with different requirements.

Once you create an Azure AI Services resource, you must deploy a model before you can start making API calls. By default, no models are available on it, so you can control which ones to start from. See the tutorial Create your first model deployment in Azure AI model inference.

Next steps

Create your first model deployment in Azure AI model inference

Partager via