How to deploy Azure OpenAI models with Azure AI Foundry

Important

Items marked (preview) in this article are currently in public preview. This preview is provided without a service-level agreement, and we don't recommend it for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

In this article, you learn to create Azure OpenAI model deployments in Azure AI Foundry portal.

Azure OpenAI Service offers a diverse set of models with different capabilities and price points. When you deploy Azure OpenAI models in Azure AI Foundry portal, you can consume the deployments, using prompt flow or another tool. Model availability varies by region. To learn more about the details of each model see Azure OpenAI Service models.

To modify and interact with an Azure OpenAI model in the Azure AI Foundry playground, first you need to deploy a base Azure OpenAI model to your project. Once the model is deployed and available in your project, you can consume its REST API endpoint as-is or customize further with your own data and other components (embeddings, indexes, and more).

Prerequisites

Deploy an Azure OpenAI model from the model catalog

Follow the steps below to deploy an Azure OpenAI model such as gpt-4o-mini to a real-time endpoint from the AI Foundry portal model catalog:

  1. Sign in to Azure AI Foundry.
  2. If you’re not already in your project, select it.
  3. Select Model catalog from the left navigation pane.
  1. In the Collections filter, select Azure OpenAI.

    A screenshot showing how to filter by Azure OpenAI models in the catalog.

  2. Select a model such as gpt-4o-mini from the Azure OpenAI collection.

  3. Select Deploy to open the deployment window.

  4. Select the resource that you want to deploy the model to. If you don't have a resource, you can create one.

  5. Specify the deployment name and modify other default settings depending on your requirements.

  6. Select Deploy.

  7. You land on the deployment details page. Select Open in playground.

  8. Select View Code to obtain code samples that can be used to consume the deployed model in your application.

Deploy an Azure OpenAI model from your project

Alternatively, you can initiate deployment by starting from your project in AI Foundry portal.

  1. Go to your project in AI Foundry portal.
  2. From the left sidebar of your project, go to My assets > Models + endpoints.
  3. Select + Deploy model > Deploy base model.
  4. In the Collections filter, select Azure OpenAI.
  5. Select a model such as gpt-4o-mini from the Azure OpenAI collection.
  6. Select Confirm to open the deployment window.
  7. Specify the deployment name and modify other default settings depending on your requirements.
  8. Select Deploy.
  9. You land on the deployment details page. Select Open in playground.
  10. Select View Code to obtain code samples that can be used to consume the deployed model in your application.

Inferencing the Azure OpenAI model

To perform inferencing on the deployed model, you can use the playground or code samples. The playground is a web-based interface that allows you to interact with the model in real-time. You can use the playground to test the model with different prompts and see the model's responses.

For more examples of how to consume the deployed model in your application, see the following Azure OpenAI quickstarts:

Regional availability and quota limits of a model

For Azure OpenAI models, the default quota for models varies by model and region. Certain models might only be available in some regions. For more information on availability and quota limits, see Azure OpenAI Service quotas and limits.

Quota for deploying and inferencing a model

For Azure OpenAI models, deploying and inferencing consume quota that is assigned to your subscription on a per-region, per-model basis in units of Tokens-per-Minute (TPM). When you sign up for Azure AI Foundry, you receive default quota for most of the available models. Then, you assign TPM to each deployment as it is created, thus reducing the available quota for that model by the amount you assigned. You can continue to create deployments and assign them TPMs until you reach your quota limit.

Once you reach your quota limit, the only way for you to create new deployments of that model is to:

To learn more about quota, see Azure AI Foundry quota and Manage Azure OpenAI Service quota.