How to deploy Azure OpenAI models with Azure AI Foundry

Article
11/21/2024

Important

Items marked (preview) in this article are currently in public preview. This preview is provided without a service-level agreement, and we don't recommend it for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

In this article, you learn to create Azure OpenAI model deployments in Azure AI Foundry portal.

Azure OpenAI Service offers a diverse set of models with different capabilities and price points. When you deploy Azure OpenAI models in Azure AI Foundry portal, you can consume the deployments, using prompt flow or another tool. Model availability varies by region. To learn more about the details of each model see Azure OpenAI Service models.

To modify and interact with an Azure OpenAI model in the Azure AI Foundry playground, first you need to deploy a base Azure OpenAI model to your project. Once the model is deployed and available in your project, you can consume its REST API endpoint as-is or customize further with your own data and other components (embeddings, indexes, and more).

Prerequisites

An Azure subscription with a valid payment method. Free or trial Azure subscriptions won't work. If you don't have an Azure subscription, create a paid Azure account to begin.
An Azure AI Foundry project.

Deploy an Azure OpenAI model from the model catalog

Follow the steps below to deploy an Azure OpenAI model such as gpt-4o-mini to a real-time endpoint from the Azure AI Foundry portal model catalog:

Sign in to Azure AI Foundry.
If you’re not already in your project, select it.
Select Model catalog from the left navigation pane.

In the Collections filter, select Azure OpenAI.
Select a model such as gpt-4o-mini from the Azure OpenAI collection.
Select Deploy to open the deployment window.
Select the resource that you want to deploy the model to. If you don't have a resource, you can create one.
Specify the deployment name and modify other default settings depending on your requirements.
Select Deploy.
You land on the deployment details page. Select Open in playground.
Select View Code to obtain code samples that can be used to consume the deployed model in your application.

Deploy an Azure OpenAI model from your project

Alternatively, you can initiate deployment by starting from your project in Azure AI Foundry portal.

Go to your project in Azure AI Foundry portal.
From the left sidebar of your project, go to My assets > Models + endpoints.
Select + Deploy model > Deploy base model.
In the Collections filter, select Azure OpenAI.
Select a model such as gpt-4o-mini from the Azure OpenAI collection.
Select Confirm to open the deployment window.
Specify the deployment name and modify other default settings depending on your requirements.
Select Deploy.
You land on the deployment details page. Select Open in playground.
Select View Code to obtain code samples that can be used to consume the deployed model in your application.

Inferencing the Azure OpenAI model

To perform inferencing on the deployed model, you can use the playground or code samples. The playground is a web-based interface that allows you to interact with the model in real-time. You can use the playground to test the model with different prompts and see the model's responses.

For more examples of how to consume the deployed model in your application, see the following Azure OpenAI quickstarts:

Regional availability and quota limits of a model

For Azure OpenAI models, the default quota for models varies by model and region. Certain models might only be available in some regions. For more information on availability and quota limits, see Azure OpenAI Service quotas and limits.

Quota for deploying and inferencing a model

For Azure OpenAI models, deploying and inferencing consume quota that is assigned to your subscription on a per-region, per-model basis in units of Tokens-per-Minute (TPM). When you sign up for Azure AI Foundry, you receive default quota for most of the available models. Then, you assign TPM to each deployment as it is created, thus reducing the available quota for that model by the amount you assigned. You can continue to create deployments and assign them TPMs until you reach your quota limit.

Once you reach your quota limit, the only way for you to create new deployments of that model is to:

Request more quota by submitting a quota increase form.
Adjust the allocated quota on other model deployments to free up tokens for new deployments on the Azure OpenAI Portal.

To learn more about quota, see Azure AI Foundry quota and Manage Azure OpenAI Service quota.

Learn more about what you can do in Azure AI Foundry
Get answers to frequently asked questions in the Azure AI FAQ article

Share via

How to deploy Azure OpenAI models with Azure AI Foundry

Prerequisites

Deploy an Azure OpenAI model from the model catalog

Deploy an Azure OpenAI model from your project

Inferencing the Azure OpenAI model

Regional availability and quota limits of a model

Quota for deploying and inferencing a model

Feedback

Additional resources

Share via

How to deploy Azure OpenAI models with Azure AI Foundry

Prerequisites

Deploy an Azure OpenAI model from the model catalog

Deploy an Azure OpenAI model from your project

Inferencing the Azure OpenAI model

Regional availability and quota limits of a model

Quota for deploying and inferencing a model

Related content

Feedback

Additional resources