What is Azure AI model inference?
Azure AI model inference provides access to the most powerful models available in the Azure AI model catalog. The models come from key model providers in the industry, including OpenAI, Microsoft, Meta, Mistral, Cohere, G42, and AI21 Labs. These models can be integrated with software solutions to deliver a wide range of tasks that include content generation, summarization, image understanding, semantic search, and code generation.
Azure AI model inference provides a way to consume models as APIs without hosting them on your infrastructure. Models are hosted in a Microsoft-managed infrastructure, which enables API-based access to the model provider's model. API-based access can dramatically reduce the cost of accessing a model and simplify the provisioning experience.
Azure AI model inference is part of Azure AI Services, and users can access the service through REST APIs, SDKs in several languages such as Python, C#, JavaScript, and Java. You can also use the Azure AI model inference from Azure AI Foundry by configuring a connection.
Models
You can get access to the key model providers in the industry including OpenAI, Microsoft, Meta, Mistral, Cohere, G42, and AI21 Labs. Model providers define the license terms and set the price for use of their models. The following list shows all the models available:
Tip
See the Models article for a detailed view of the models, capabilities, and details.
Provider | Models |
---|---|
AI21 Labs | - AI21-Jamba-1.5-Mini - AI21-Jamba-1.5-Large |
Azure OpenAI | - o1 - gpt-4o - o1-preview - o1-mini - gpt-4o-mini - text-embedding-3-large - text-embedding-3-small |
Cohere | - Cohere-embed-v3-english - Cohere-embed-v3-multilingual - Cohere-command-r-plus-08-2024 - Cohere-command-r-08-2024 - Cohere-command-r-plus - Cohere-command-r |
Core42 | - jais-30b-chat |
Meta | - Llama-3.3-70B-Instruct - Llama-3.2-11B-Vision-Instruct - Llama-3.2-90B-Vision-Instruct - Meta-Llama-3.1-405B-Instruct - Meta-Llama-3-8B-Instruct - Meta-Llama-3.1-70B-Instruct - Meta-Llama-3.1-8B-Instruct - Meta-Llama-3-70B-Instruct |
Microsoft | - Phi-3-mini-128k-instruct - Phi-3-mini-4k-instruct - Phi-3-small-8k-instruct - Phi-3-medium-128k-instruct - Phi-3-medium-4k-instruct - Phi-3.5-vision-instruct - Phi-3.5-MoE-instruct - Phi-3-small-128k-instruct - Phi-3.5-mini-instruct - Phi-4 |
Mistral AI | - Ministral-3B - Mistral-large - Mistral-small - Mistral-Nemo - Mistral-large-2407 - Mistral-Large-2411 - Codestral-2501 |
NTT Data | - Tsuzumi-7b |
Pricing
For models from non-Microsoft providers (for example, Meta AI and Mistral models), billing is through Azure Marketplace. For such models, you're required to subscribe to the particular model offering in accordance with the Microsoft Commercial Marketplace Terms of Use. Users accept license terms for use of the models. Pricing information for consumption is provided during deployment.
For Microsoft models (for example, Phi-3 models and Azure OpenAI models) billing is via Azure meters as First Party Consumption Services. As described in the Product Terms, you purchase First Party Consumption Services by using Azure meters, but they aren't subject to Azure service terms.
Tip
Learn how to monitor and manage cost in Azure AI model inference.
Responsible AI
At Microsoft, we're committed to the advancement of AI driven by principles that put people first. Generative models such as the ones available in Azure AI models have significant potential benefits, but without careful design and thoughtful mitigations, such models have the potential to generate incorrect or even harmful content.
Microsoft helps guard against abuse and unintended harm by taking the following actions:
- Incorporating Microsoft's principles for responsible AI use
- Adopting a code of conduct for use of the service
- Building content filters to support customers
- Providing responsible AI information and guidance that customers should consider when using Azure OpenAI.
Getting started
Azure AI model inference is a new feature offering on Azure AI Services resources. You can get started with it the same way as any other Azure product where you create and configure your resource for Azure AI model inference, or instance of the service, in your Azure Subscription. You can create as many resources as needed and configure them independently in case you have multiple teams with different requirements.
Once you create an Azure AI Services resource, you must deploy a model before you can start making API calls. By default, no models are available on it, so you can control which ones to start from. See the tutorial Create your first model deployment in Azure AI model inference.