Azure AI Model Inference REST API reference

The Azure AI model inference is an API that exposes a common set of capabilities for foundational models and that can be used by developers to consume predictions from a diverse set of models in a uniform and consistent way. Developers can talk with different models deployed in Azure AI Foundry portal without changing the underlying code they are using.

Benefits

Foundational models, such as language models, have indeed made remarkable strides in recent years. These advancements have revolutionized various fields, including natural language processing and computer vision, and they have enabled applications like chatbots, virtual assistants, and language translation services.

While foundational models excel in specific domains, they lack a uniform set of capabilities. Some models are better at specific task and even across the same task, some models may approach the problem in one way while others in another. Developers can benefit from this diversity by using the right model for the right job allowing them to:

  • Improve the performance in a specific downstream task.
  • Use more efficient models for simpler tasks.
  • Use smaller models that can run faster on specific tasks.
  • Compose multiple models to develop intelligent experiences.

Having a uniform way to consume foundational models allow developers to realize all those benefits without sacrificing portability or changing the underlying code.

Inference SDK support

The Azure AI Inference package allows you to consume all models supporting the Azure AI model inference API and easily change among them. Azure AI Inference package is part of the Azure AI Foundry SDK.

Language Documentation Package Examples
C# Reference azure-ai-inference (NuGet) C# examples
Java Reference azure-ai-inference (Maven) Java examples
JavaScript Reference @azure/ai-inference (npm) JavaScript examples
Python Reference azure-ai-inference (PyPi) Python examples

Capabilities

The following section describes some of the capabilities the API exposes:

Modalities

The API indicates how developers can consume predictions for the following modalities:

  • Get info: Returns the information about the model deployed under the endpoint.
  • Text embeddings: Creates an embedding vector representing the input text.
  • Chat completions: Creates a model response for the given chat conversation.
  • Image embeddings: Creates an embedding vector representing the input text and image.

Extensibility

The Azure AI Model Inference API specifies a set of modalities and parameters that models can subscribe to. However, some models may have further capabilities that the ones the API indicates. On those cases, the API allows the developer to pass them as extra parameters in the payload.

By setting a header extra-parameters: pass-through, the API will attempt to pass any unknown parameter directly to the underlying model. If the model can handle that parameter, the request completes.

The following example shows a request passing the parameter safe_prompt supported by Mistral-Large, which isn't specified in the Azure AI Model Inference API.

Request

POST /chat/completions?api-version=2024-05-01-preview
Authorization: Bearer <bearer-token>
Content-Type: application/json
extra-parameters: pass-through
{
    "messages": [
    {
        "role": "system",
        "content": "You are a helpful assistant"
    },
    {
        "role": "user",
        "content": "Explain Riemann's conjecture in 1 paragraph"
    }
    ],
    "temperature": 0,
    "top_p": 1,
    "response_format": { "type": "text" },
    "safe_prompt": true
}

Note

The default value for extra-parameters is error which returns an error if an extra parameter is indicated in the payload. Alternatively, you can set extra-parameters: drop to drop any unknown parameter in the request. Use this capability in case you happen to be sending requests with extra parameters that you know the model won't support but you want the request to completes anyway. A typical example of this is indicating seed parameter.

Models with disparate set of capabilities

The Azure AI Model Inference API indicates a general set of capabilities but each of the models can decide to implement them or not. A specific error is returned on those cases where the model can't support a specific parameter.

The following example shows the response for a chat completion request indicating the parameter reponse_format and asking for a reply in JSON format. In the example, since the model doesn't support such capability an error 422 is returned to the user.

Request

POST /chat/completions?api-version=2024-05-01-preview
Authorization: Bearer <bearer-token>
Content-Type: application/json
{
    "messages": [
    {
        "role": "system",
        "content": "You are a helpful assistant"
    },
    {
        "role": "user",
        "content": "Explain Riemann's conjecture in 1 paragraph"
    }
    ],
    "temperature": 0,
    "top_p": 1,
    "response_format": { "type": "json_object" },
}

Response

{
    "status": 422,
    "code": "parameter_not_supported",
    "detail": {
        "loc": [ "body", "response_format" ],
        "input": "json_object"
    },
    "message": "One of the parameters contain invalid values."
}

Tip

You can inspect the property details.loc to understand the location of the offending parameter and details.input to see the value that was passed in the request.

Content safety

The Azure AI model inference API supports Azure AI Content Safety. When using deployments with Azure AI Content Safety on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering (preview) system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions.

The following example shows the response for a chat completion request that has triggered content safety.

Request

POST /chat/completions?api-version=2024-05-01-preview
Authorization: Bearer <bearer-token>
Content-Type: application/json
{
    "messages": [
    {
        "role": "system",
        "content": "You are a helpful assistant"
    },
    {
        "role": "user",
        "content": "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills."
    }
    ],
    "temperature": 0,
    "top_p": 1,
}

Response

{
    "status": 400,
    "code": "content_filter",
    "message": "The response was filtered",
    "param": "messages",
    "type": null
}

Getting started

Azure AI model inference API is available on Azure AI Services resources. You can get started with it the same way as any other Azure product where you create and configure your resource for Azure AI model inference, or instance of the service, in your Azure Subscription. You can create as many resources as needed and configure them independently in case you have multiple teams with different requirements.

Once you create an Azure AI Services resource, you must deploy a model before you can start making API calls. By default, no models are available on it, so you can control which ones to start from. See the tutorial Create your first model deployment in Azure AI model inference.