Models available in Azure AI model inference

Azure AI model inference in Azure AI Foundry gives you access to flagship models in Azure AI to consume them as APIs without hosting them on your infrastructure.

An animation showing Azure AI studio model catalog section and the models available.

Model availability varies by model provider, deployment SKU, and cloud. All models available in Azure AI Model Inference support the Global standard deployment type which uses global capacity to guarantee throughput. Azure OpenAI models also support regional deployments and sovereign clouds—Azure Government, Azure Germany, and Azure China 21Vianet.

Learn more about specific deployment capabilities for Azure OpenAI at Azure OpenAI Model availability.

Tip

The Azure AI model catalog offers a larger selection of models, from a bigger range of providers. However, those models might require you to host them on your infrastructure, including the creation of an AI hub and project. Azure AI model service provides a way to consume the models as APIs without hosting them on your infrastructure, with a pay-as-you-go billing. Learn more about the Azure AI model catalog.

You can see all the models available to you in the model catalog for Azure AI Foundry portal.

AI21 Labs

The Jamba family models are AI21's production-grade Mamba-based large language model (LLM) which uses AI21's hybrid Mamba-Transformer architecture. It's an instruction-tuned version of AI21's hybrid structured state space model (SSM) transformer Jamba model. The Jamba family models are built for reliable commercial use with respect to quality and performance.

Model Type Tier Capabilities
AI21-Jamba-1.5-Mini chat-completion Global standard - Input: text (262,144 tokens)
- Output: (4,096 tokens)
- Languages: en, fr, es, pt, de, ar, and he
- Tool calling: Yes
- Response formats: Text, JSON, structured outputs
AI21-Jamba-1.5-Large chat-completion Global standard - Input: text (262,144 tokens)
- Output: (4,096 tokens)
- Languages: en, fr, es, pt, de, ar, and he
- Tool calling: Yes
- Response formats: Text, JSON, structured outputs

See this model collection in Azure AI Foundry portal.

Azure OpenAI

Azure OpenAI Service offers a diverse set of models with different capabilities and price points. These models include:

  • State-of-the-art models designed to tackle reasoning and problem-solving tasks with increased focus and capability
  • Models that can understand and generate natural language and code
  • Models that can transcribe and translate speech to text
Model Type Tier Capabilities
o1 chat-completion Global standard - Input: text and image (200,000 tokens)
- Output: text (100,000 tokens)
- Languages: en, it, af, es, de, fr, id, ru, pl, uk, el, lv, zh, ar, tr, ja, sw, cy, ko, is, bn, ur, ne, th, pa, mr, and te.
- Tool calling: Yes
- Response formats: Text, JSON, structured outputs
o1-preview chat-completion Global standard
Standard
- Input: text (128,000 tokens)
- Output: (32,768 tokens)
- Languages: en, it, af, es, de, fr, id, ru, pl, uk, el, lv, zh, ar, tr, ja, sw, cy, ko, is, bn, ur, ne, th, pa, mr, and te.
- Tool calling: Yes
- Response formats: Text, JSON, structured outputs
o1-mini chat-completion Global standard
Standard
- Input: text (128,000 tokens)
- Output: (65,536 tokens)
- Languages: en, it, af, es, de, fr, id, ru, pl, uk, el, lv, zh, ar, tr, ja, sw, cy, ko, is, bn, ur, ne, th, pa, mr, and te.
- Tool calling: Yes
- Response formats: Text, JSON, structured outputs
gpt-4o-realtime-preview real-time Global standard - Input: control, text, and audio (131,072 tokens)
- Output: text and audio (16,384 tokens)
- Languages: en
- Tool calling: Yes
- Response formats: Text, JSON
gpt-4o chat-completion Global standard
Standard
Batch
Provisioned
Global provisioned
Data Zone
- Input: text and image (131,072 tokens)
- Output: text (16,384 tokens)
- Languages: en, it, af, es, de, fr, id, ru, pl, uk, el, lv, zh, ar, tr, ja, sw, cy, ko, is, bn, ur, ne, th, pa, mr, and te.
- Tool calling: Yes
- Response formats: Text, JSON, structured outputs
gpt-4o-mini chat-completion Global standard
Standard
Batch
Provisioned
Global provisioned
Data Zone
- Input: text, image, and audio (131,072 tokens)
- Output: (16,384 tokens)
- Languages: en, it, af, es, de, fr, id, ru, pl, uk, el, lv, zh, ar, tr, ja, sw, cy, ko, is, bn, ur, ne, th, pa, mr, and te.
- Tool calling: Yes
- Response formats: Text, JSON, structured outputs
text-embedding-3-large embeddings Global standard
Standard
Provisioned
Global provisioned
- Input: text (8,191 tokens)
- Output: Vector (3,072 dim.)
- Languages: en
text-embedding-3-small embeddings Global standard
Standard
Provisioned
Global provisioned
- Input: text (8,191 tokens)
- Output: Vector (1,536 dim.)
- Languages: en

See this model collection in Azure AI Foundry portal.

Cohere

The Cohere family of models includes various models optimized for different use cases, including chat completions and embeddings. Cohere models are optimized for various use cases that include reasoning, summarization, and question answering.

Model Type Tier Capabilities
Cohere-embed-v3-english embeddings
image-embeddings
Global standard - Input: text (512 tokens)
- Output: Vector (1,024 dim.)
- Languages: en
Cohere-embed-v3-multilingual embeddings
image-embeddings
Global standard - Input: text (512 tokens)
- Output: Vector (1,024 dim.)
- Languages: en, fr, es, it, de, pt-br, ja, ko, zh-cn, and ar
Cohere-command-r-plus-08-2024 chat-completion Global standard - Input: text (131,072 tokens)
- Output: (4,096 tokens)
- Languages: en, fr, es, it, de, pt-br, ja, ko, zh-cn, and ar
- Tool calling: Yes
- Response formats: Text, JSON
Cohere-command-r-08-2024 chat-completion Global standard - Input: text (131,072 tokens)
- Output: (4,096 tokens)
- Languages: en, fr, es, it, de, pt-br, ja, ko, zh-cn, and ar
- Tool calling: Yes
- Response formats: Text, JSON
Cohere-command-r-plus chat-completion Global standard - Input: text (131,072 tokens)
- Output: (4,096 tokens)
- Languages: en, fr, es, it, de, pt-br, ja, ko, zh-cn, and ar
- Tool calling: Yes
- Response formats: Text, JSON
Cohere-command-r chat-completion Global standard - Input: text (131,072 tokens)
- Output: (4,096 tokens)
- Languages: en, fr, es, it, de, pt-br, ja, ko, zh-cn, and ar
- Tool calling: Yes
- Response formats: Text, JSON

See this model collection in Azure AI Foundry portal.

Core42

Core42 includes autoregressive bi-lingual LLMs for Arabic & English with state-of-the-art capabilities in Arabic.

Model Type Tier Capabilities
jais-30b-chat chat-completion Global standard - Input: text (8,192 tokens)
- Output: (4,096 tokens)
- Languages: en and ar
- Tool calling: Yes
- Response formats: Text, JSON

See this model collection in Azure AI Foundry portal.

Meta

Meta Llama models and tools are a collection of pretrained and fine-tuned generative AI text and image reasoning models. Meta models range is scale to include:

  • Small language models (SLMs) like 1B and 3B Base and Instruct models for on-device and edge inferencing
  • Mid-size large language models (LLMs) like 7B, 8B, and 70B Base and Instruct models
  • High-performant models like Meta Llama 3.1-405B Instruct for synthetic data generation and distillation use cases.
Model Type Tier Capabilities
Llama-3.3-70B-Instruct chat-completion Global standard - Input: text (128,000 tokens)
- Output: text (8,192 tokens)
- Languages: en, de, fr, it, pt, hi, es, and th
- Tool calling: No*
- Response formats: Text
Llama-3.2-11B-Vision-Instruct chat-completion Global standard - Input: text and image (128,000 tokens)
- Output: (8,192 tokens)
- Languages: en
- Tool calling: No*
- Response formats: Text
Llama-3.2-90B-Vision-Instruct chat-completion Global standard - Input: text and image (128,000 tokens)
- Output: (8,192 tokens)
- Languages: en
- Tool calling: No*
- Response formats: Text
Meta-Llama-3.1-405B-Instruct chat-completion Global standard - Input: text (131,072 tokens)
- Output: (8,192 tokens)
- Languages: en, de, fr, it, pt, hi, es, and th
- Tool calling: No*
- Response formats: Text
Meta-Llama-3-8B-Instruct chat-completion Global standard - Input: text (8,192 tokens)
- Output: (8,192 tokens)
- Languages: en
- Tool calling: No*
- Response formats: Text
Meta-Llama-3.1-70B-Instruct chat-completion Global standard - Input: text (131,072 tokens)
- Output: (8,192 tokens)
- Languages: en, de, fr, it, pt, hi, es, and th
- Tool calling: No*
- Response formats: Text
Meta-Llama-3.1-8B-Instruct chat-completion Global standard - Input: text (131,072 tokens)
- Output: (8,192 tokens)
- Languages: en, de, fr, it, pt, hi, es, and th
- Tool calling: No*
- Response formats: Text
Meta-Llama-3-70B-Instruct chat-completion Global standard - Input: text (8,192 tokens)
- Output: (8,192 tokens)
- Languages: en
- Tool calling: No*
- Response formats: Text

See this model collection in Azure AI Foundry portal.

Microsoft

Phi is a family of lightweight, state-of-the-art open models. These models were trained with Phi-3 datasets. The datasets include both synthetic data and the filtered, publicly available websites data, with a focus on high quality and reasoning-dense properties. The models underwent a rigorous enhancement process, incorporating both supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure precise instruction adherence and robust safety measures.

Model Type Tier Capabilities
Phi-3-mini-128k-instruct chat-completion Global standard - Input: text (131,072 tokens)
- Output: (4,096 tokens)
- Languages: en
- Tool calling: No
- Response formats: Text
Phi-3-mini-4k-instruct chat-completion Global standard - Input: text (4,096 tokens)
- Output: (4,096 tokens)
- Languages: en
- Tool calling: No
- Response formats: Text
Phi-3-small-8k-instruct chat-completion Global standard - Input: text (131,072 tokens)
- Output: (4,096 tokens)
- Languages: en
- Tool calling: No
- Response formats: Text
Phi-3-medium-128k-instruct chat-completion Global standard - Input: text (131,072 tokens)
- Output: (4,096 tokens)
- Languages: en
- Tool calling: No
- Response formats: Text
Phi-3-medium-4k-instruct chat-completion Global standard - Input: text (4,096 tokens)
- Output: (4,096 tokens)
- Languages: en
- Tool calling: No
- Response formats: Text
Phi-3.5-vision-instruct chat-completion Global standard - Input: text and image (131,072 tokens)
- Output: (4,096 tokens)
- Languages: en
- Tool calling: No
- Response formats: Text
Phi-3.5-MoE-instruct chat-completion Global standard - Input: text (131,072 tokens)
- Output: text (4,096 tokens)
- Languages: en, ar, zh, cs, da, nl, fi, fr, de, he, hu, it, ja, ko, no, pl, pt, ru, es, sv, th, tr, and uk
- Tool calling: No
- Response formats: Text
Phi-3-small-128k-instruct chat-completion Global standard - Input: text (131,072 tokens)
- Output: (4,096 tokens)
- Languages: en
- Tool calling: No
- Response formats: Text
Phi-3.5-mini-instruct chat-completion Global standard - Input: text (131,072 tokens)
- Output: (4,096 tokens)
- Languages: en, ar, zh, cs, da, nl, fi, fr, de, he, hu, it, ja, ko, no, pl, pt, ru, es, sv, th, tr, and uk
- Tool calling: No
- Response formats: Text
Phi-4 chat-completion Global standard - Input: text (16,384 tokens)
- Output: (16,384 tokens)
- Languages: en, ar, bn, cs, da, de, el, es, fa, fi, fr, gu, ha, he, hi, hu, id, it, ja, jv, kn, ko, ml, mr, nl, no, or, pa, pl, ps, pt, ro, ru, sv, sw, ta, te, th, tl, tr, uk, ur, vi, yo, and zh - Tool calling: No
- Response formats: Text

See this model collection in Azure AI Foundry portal.

Mistral AI

Mistral AI offers two categories of models: premium models including Mistral Large and Mistral Small and open models including Mistral Nemo.

Model Type Tier Capabilities
Ministral-3B chat-completion Global standard - Input: text (131,072 tokens)
- Output: text (4,096 tokens)
- Languages: fr, de, es, it, and en
- Tool calling: Yes
- Response formats: Text, JSON
Mistral-large chat-completion Global standard - Input: text (32,768 tokens)
- Output: (4,096 tokens)
- Languages: fr, de, es, it, and en
- Tool calling: Yes
- Response formats: Text, JSON
Mistral-small chat-completion Global standard - Input: text (32,768 tokens)
- Output: text (4,096 tokens)
- Languages: fr, de, es, it, and en
- Tool calling: Yes
- Response formats: Text, JSON
Mistral-Nemo chat-completion Global standard - Input: text (131,072 tokens)
- Output: text (4,096 tokens)
- Languages: en, fr, de, es, it, zh, ja, ko, pt, nl, and pl
- Tool calling: Yes
- Response formats: Text, JSON
Mistral-large-2407 chat-completion Global standard - Input: text (131,072 tokens)
- Output: (4,096 tokens)
- Languages: en, fr, de, es, it, zh, ja, ko, pt, nl, and pl
- Tool calling: Yes
- Response formats: Text, JSON
Mistral-Large-2411 chat-completion Global standard - Input: text (128,000 tokens)
- Output: text (4,096 tokens)
- Languages: en, fr, de, es, it, zh, ja, ko, pt, nl, and pl
- Tool calling: Yes
- Response formats: Text, JSON
Codestral-2501 chat-completion Global standard - Input: text (262,144 tokens)
- Output: text (4,096 tokens)
- Languages: en
- Tool calling: No
- Response formats: Text

See this model collection in Azure AI Foundry portal.

NTT Data

Tsuzumi is an autoregressive language optimized transformer. The tuned versions use supervised fine-tuning (SFT). Tsuzumi is handles both Japanese and English language with high efficiency.

Model Type Tier Capabilities
Tsuzumi-7b chat-completion Global standard - Input: text (8,192 tokens)
- Output: text (8,192 tokens)
- Languages: en and jp
- Tool calling: No
- Response formats: Text

Next steps