Add and configure models to Azure AI model inference service

You can decide and configure which models are available for inference in the resource's model inference endpoint. When a given model is configured, you can then generate predictions from it by indicating its model name or deployment name on your requests. No further changes are required in your code to use it.

In this article, you learn how to add a new model to the Azure AI model inference service in Azure AI services.

Prerequisites

To complete this article, you need:

Add a model

As opposite to GitHub Models where all the models are already configured, the Azure AI Services resource allows you to control which models are available in your endpoint and under which configuration.

You can add all the models you need in the endpoint by using Azure AI Foundry for GitHub. In the following example, we add a Mistral-Large model in the service:

  1. Go to Model catalog section in Azure AI Foundry for GitHub.

  2. Scroll to the model you're interested in and select it.

  3. You can review the details of the model in the model card.

  4. Select Deploy.

  5. For models providers that require extra terms of contract, you're asked to accept those terms. For instance, Mistral models ask you to accept other terms. Accept the terms on those cases by selecting Subscribe and deploy.

    A screenshot showing how to agree the terms and conditions of a Mistral-Large model.

  6. You can configure the deployment settings at this time. By default, the deployment receives the name of the model you're deploying. The deployment name is used in the model parameter for request to route to this particular model deployment. This setting allows you to also configure specific names for your models when you attach specific configurations. For instance, o1-preview-safe for a model with a strict content safety content filter.

Tip

Each model may support different deployments types, providing different data residency or throughput guarantees. See deployment types for more details.

  1. Use the Customize option if you need to change settings like content filter or rate limiting (if available).

A screenshot showing how to customize the deployment if needed.

  1. Select Deploy.

  2. Once the deployment completes, the new model will be listed in the page and it's ready to be used.

Use the model

Deployed models in Azure AI services can be consumed using the Azure AI model's inference endpoint for the resource.

To use it:

  1. Get the Azure AI model's inference endpoint URL and keys from the deployment page or the Overview page. If you're using Microsoft Entra ID authentication, you don't need a key.

    A screenshot showing how to get the URL and key associated with the deployment.

  2. Use the model inference endpoint URL and the keys from before when constructing your client. The following example uses the Azure AI Inference package:

    Install the package azure-ai-inference using your package manager, like pip:

    pip install azure-ai-inference>=1.0.0b5
    

    Warning

    Azure AI Services resource requires the version azure-ai-inference>=1.0.0b5 for Python.

    Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:

    import os
    from azure.ai.inference import ChatCompletionsClient
    from azure.core.credentials import AzureKeyCredential
    
    client = ChatCompletionsClient(
        endpoint=os.environ["AZUREAI_ENDPOINT_URL"],
        credential=AzureKeyCredential(os.environ["AZUREAI_ENDPOINT_KEY"]),
    )
    

    Explore our samples and read the API reference documentation to get yourself started.

  3. When constructing your request, indicate the parameter model and insert the model deployment name you created.

    from azure.ai.inference.models import SystemMessage, UserMessage
    
    response = client.complete(
        messages=[
            SystemMessage(content="You are a helpful assistant."),
            UserMessage(content="Explain Riemann's conjecture in 1 paragraph"),
        ],
        model="mistral-large"
    )
    
    print(response.choices[0].message.content)
    

Tip

When using the endpoint, you can change the model parameter to any available model deployment in your resource.

Additionally, Azure OpenAI models can be consumed using the Azure OpenAI service endpoint in the resource. This endpoint is exclusive for each model deployment and has its own URL.

Model deployment customization

When creating model deployments, you can configure other settings including content filtering and rate limits. To configure more settings, select the option Customize in the deployment wizard.

Note

Configurations may vary depending on the model you're deploying.

Next steps