Azure AI model inference service frequently asked questions

If you can't find answers to your questions in this document, and still need help check the Azure AI services support options guide.

General

What's the difference between Azure OpenAI service and Azure AI model inference service?

Azure OpenAI service gives customers access to advanced language models from OpenAI. Azure AI model inference service gives customers access to all the flagship models in Azure AI, including Azure OpenAI, Cohere, Mistral AI, Meta Llama, AI21 labs, etc. This access is under the same service, endpoint, and credentials. Customers can seamlessly switch between models without changing their code.

Both Azure OpenAI Service and Azure AI model inference service are part of the Azure AI services family and build on top of the same security and enterprise promise of Azure.

While Azure AI model inference service focus on inference, Azure OpenAI Service can be used with more advanced APIs like batch, fine-tuning, assistants, and files.

What's the difference between OpenAI and Azure OpenAI?

Azure AI Models and Azure OpenAI Service give customers access to advanced language models from OpenAI with the security and enterprise promise of Azure. Azure OpenAI codevelops the APIs with OpenAI, ensuring compatibility and a smooth transition from one to the other.

Customers get the security capabilities of Microsoft Azure while running the same models as OpenAI. It offers private networking, regional availability, and responsible AI content filtering.

Learn more about the Azure OpenAI service.

What's the difference between Azure AI model inference and Azure AI Foundry?

Azure AI services are a suite of AI services that provide prebuilt APIs for common AI scenarios. One of them is Azure AI model inference service which focuses on inference service of different state-of-the-art models. Azure AI Foundry portal is a web-based tool that allows you to build, train, and deploy machine learning models. Azure AI services can be used in Azure AI Foundry portal to enhance your models with prebuilt AI capabilities.

What's the difference between Azure AI model inference service and Serverless API model deployments in Azure AI Foundry portal?

Both technologies allow you to deploy models without requiring compute resources as they are based on the Models as a Service idea. Serverless API model deployments allow you to deploy a single model under a unique endpoint and credentials. You need to create a different endpoint for each model you want to deploy. On top of that, they are always created in the context of the project and while they can be shared by creating connections from other projects, they live in the context of a given project.

Azure AI model inference service allows you to deploy multiple models under the same endpoint and credentials. You can switch between models without changing your code. They are also in the context of a shared resource, the Azure AI Services resource, which implies you can connect the resource to any project or hub that requires to consume the models you made available. Azure AI model inference service comes with a built-in model routing capability that routes the request to the right model based on the model name you pass in the request.

These two model deployment options have some differences in terms of their capabilities too. You can read about them at [../concepts/deployment-overview.md]

Models

Why aren't all the models in the Azure AI model catalog supported in Azure AI model inference in Azure AI Services?

The Azure AI model inference service in AI services supports all the models in the Azure AI catalog with pay-as-you-go billing (per-token). For more information, see the Models section.

The Azure AI model catalog contains a wider list of models, however, those models require compute quota from your subscription. They also need to have a project or AI hub where to host the deployment. For more information, see deployment options in Azure AI Foundry portal.

Why I can't add OpenAI o1-preview or OpenA o1-mini-preview to my resource?

The Azure OpenAI Service o1 models require registration and are eligible only to customers on the Enterprise Agreement Offer. Subscriptions not under the Enterprise Agreement Offer are subject to denial. We onboard eligible customers as we have space. Due to high demand, eligible customers may remain on the waitlist until space is available.

Other models (see list) don't require registration. Learn more about limited access to Azure OpenAI Service.

SDKs and programming languages

Which are the supported SDKs and programming languages for Azure AI model inference service?

You can use Azure Inference SDK with any model that is supported by:

  • The Azure AI Inference SDK
  • The AzureOpenAI class in OpenAI SDK
  • The Azure OpenAI SDK

Cohere SDK, Mistral SDK, and model provider-specific SDKs are not supported when connected to Azure AI model inference service.

For more information, see supported SDKs and programming languages.

Does Azure AI model inference service work with the latest Python library released by OpenAI (version>=1.0)?

The latest release of the OpenAI Python library (version>=1.0) supports Azure AI services.

I'm making a request for a model that Azure AI model inference service supports, but I'm getting a 404 error. What should I do?

Ensure you created a deployment for the given model and that the deployment name matches exactly the value you're passing in model parameter. Although routing isn't case sensitive, ensure there's no special punctuation or spaces typos.

I'm using the azure-ai-inference package for Python and I get a 401 error when I try to authenticate using keys. What should I do?

Azure AI Services resource requires the version azure-ai-inference>=1.0.0b5 for Python. Ensure you're using that version.

I'm using OpenAI SDK and indicated the Azure OpenAI inference endpoint as base URL (https://<resource-name>.openai.azure.com). However, I get a 404 error. What should I do?

Ensure you're using the correct endpoint for the Azure OpenAI service and the right set of credentials. Also, ensure that you're using the class AzureOpenAI from the OpenAI SDK as the authentication mechanism and URLs used are different.

Does Azure AI model inference service support custom API headers? We append other custom headers to our API requests and are seeing HTTP 431 failure errors.

Our current APIs allow up to 10 custom headers, which are passed through the pipeline, and returned. We notice some customers now exceed this header count resulting in HTTP 431 errors. There's no solution for this error, other than to reduce header volume. We recommend customers not depend on custom headers in future system architectures.

Pricing and Billing

How is Azure AI model inference service billed?

You're billed for inputs and outputs to the APIs, typically in tokens. There are no cost associated with the resource itself or the deployments.

The token price varies per each model and you're billed per 1,000 tokens. You can see the pricing details before deploying a given model.

Where can I see the bill details?

Billing and costs are displayed in Microsoft Cost Management + Billing. You can see the usage details in the Azure portal.

Billing isn't shown in Azure AI Foundry portal.

How can I place a spending limit to my bill?

You can set up a spending limit in the Azure portal under Cost Management. This limit prevents you from spending more than the amount you set. Once the spending limit is reached, the subscription is disabled and you can't use the endpoint until the next billing cycle. For more information, see Tutorial: Create and manage budgets.

Data and Privacy

Do you use my company data to train any of the models?

Azure AI model inference doesn't use customer data to retrain models. Your data is never shared with model providers.