Get Chat Completions - Get Chat Completions

Gets chat completions for the provided chat messages. Completions support a wide variety of tasks and generate text that continues from or "completes" provided prompt data. The method makes a REST API call to the /chat/completions route on the given endpoint.

POST https:///chat/completions?api-version=2024-05-01-preview

URI Parameters

Name In Required Type Description
api-version
query True

string

The API version to use for this operation.

Request Header

Name Required Type Description
extra-parameters

ExtraParameters

Controls what happens if extra parameters, undefined by the REST API, are passed in the JSON request payload. This sets the HTTP request header extra-parameters.

Request Body

Name Required Type Description
messages True ChatRequestMessage[]:

The collection of context messages associated with this chat completions request. Typical usage begins with a chat message for the System role that provides instructions for the behavior of the assistant, followed by alternating messages between the User and Assistant roles.

frequency_penalty

number

A value that influences the probability of generated tokens appearing based on their cumulative frequency in generated text. Positive values will make tokens less likely to appear as their frequency increases and decrease the likelihood of the model repeating the same statements verbatim. Supported range is [-2, 2].

max_tokens

integer

The maximum number of tokens to generate.

modalities

ChatCompletionsModality[]

The modalities that the model is allowed to use for the chat completions response. The default modality is text. Indicating an unsupported modality combination results in an 422 error.

model

string

ID of the specific AI model to use, if more than one model is available on the endpoint.

presence_penalty

number

A value that influences the probability of generated tokens appearing based on their existing presence in generated text. Positive values will make tokens less likely to appear when they already exist and increase the model's likelihood to output new topics. Supported range is [-2, 2].

response_format ChatCompletionsResponseFormat:

An object specifying the format that the model must output.

Setting to { "type": "json_schema", "json_schema": {...} } enables Structured Outputs which ensures the model will match your supplied JSON schema.

Setting to { "type": "json_object" } enables JSON mode, which ensures the message the model generates is valid JSON.

Important: when using JSON mode, you must also instruct the model to produce JSON yourself via a system or user message. Without this, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly "stuck" request. Also note that the message content may be partially cut off if finish_reason="length", which indicates the generation exceeded max_tokens or the conversation exceeded the max context length.

seed

integer

If specified, the system will make a best effort to sample deterministically such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed.

stop

string[]

A collection of textual sequences that will end completions generation.

stream

boolean

A value indicating whether chat completions should be streamed for this request.

temperature

number

The sampling temperature to use that controls the apparent creativity of generated completions. Higher values will make output more random while lower values will make results more focused and deterministic. It is not recommended to modify temperature and top_p for the same completions request as the interaction of these two settings is difficult to predict. Supported range is [0, 1].

tool_choice

If specified, the model will configure which of the provided tools it can use for the chat completions response.

tools

ChatCompletionsToolDefinition[]

A list of tools the model may request to call. Currently, only functions are supported as a tool. The model may response with a function call request and provide the input arguments in JSON format for that function.

top_p

number

An alternative to sampling with temperature called nucleus sampling. This value causes the model to consider the results of tokens with the provided probability mass. As an example, a value of 0.15 will cause only the tokens comprising the top 15% of probability mass to be considered. It is not recommended to modify temperature and top_p for the same completions request as the interaction of these two settings is difficult to predict. Supported range is [0, 1].

Responses

Name Type Description
200 OK

ChatCompletions

The request has succeeded.

Other Status Codes

Azure.Core.Foundations.ErrorResponse

An unexpected error response.

Headers

x-ms-error-code: string

Security

api-key

Type: apiKey
In: header

OAuth2Auth

Type: oauth2
Flow: implicit
Authorization URL: https://login.microsoftonline.com/common/oauth2/v2.0/authorize

Scopes

Name Description
https://ml.azure.com/.default

Examples

Audio modality chat completion
maximum set chat completion
minimum set chat completion

Audio modality chat completion

Sample request

POST https:///chat/completions?api-version=2024-05-01-preview


{
  "modalities": [
    "text",
    "audio"
  ],
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant"
    },
    {
      "role": "user",
      "content": [
        {
          "type": "input_audio",
          "input_audio": {
            "data": "<base64 encoded audio data>",
            "format": "wav"
          }
        }
      ]
    },
    {
      "role": "assistant",
      "content": null,
      "audio": {
        "id": "abcdef1234"
      }
    },
    {
      "role": "user",
      "content": [
        {
          "type": "input_audio",
          "input_audio": {
            "data": "<base64 encoded audio data>",
            "format": "wav"
          }
        }
      ]
    }
  ],
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "temperature": 0,
  "top_p": 0,
  "seed": 21,
  "model": "my-model-name"
}

Sample response

{
  "id": "kgousajxgzyhugvqekuswuqbk",
  "object": "chat.completion",
  "created": 1696522361,
  "model": "my-model-name",
  "usage": {
    "completion_tokens": 19,
    "prompt_tokens": 28,
    "total_tokens": 16,
    "completion_tokens_details": {
      "audio_tokens": 5,
      "total_tokens": 5
    },
    "prompt_tokens_details": {
      "audio_tokens": 10,
      "cached_tokens": 0
    }
  },
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": null,
        "audio": {
          "id": "abcdef1234",
          "format": "wav",
          "data": "<base64 encoded audio data>",
          "expires_at": 1896522361,
          "transcript": "This is a sample transcript"
        }
      }
    }
  ]
}

maximum set chat completion

Sample request

POST https:///chat/completions?api-version=2024-05-01-preview


{
  "modalities": [
    "text"
  ],
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant"
    },
    {
      "role": "user",
      "content": "Explain Riemann's conjecture"
    },
    {
      "role": "assistant",
      "content": "The Riemann Conjecture is a deep mathematical conjecture around prime numbers and how they can be predicted. It was first published in Riemann's groundbreaking 1859 paper. The conjecture states that the Riemann zeta function has its zeros only at the negative even integers and complex numbers with real part 1/21. Many consider it to be the most important unsolved problem in pure mathematics. The Riemann hypothesis is a way to predict the probability that numbers in a certain range are prime that was also devised by German mathematician Bernhard Riemann in 18594."
    },
    {
      "role": "user",
      "content": "Ist it proved?"
    }
  ],
  "frequency_penalty": 0,
  "stream": true,
  "presence_penalty": 0,
  "temperature": 0,
  "top_p": 0,
  "max_tokens": 255,
  "response_format": {
    "type": "text"
  },
  "stop": [
    "<|endoftext|>"
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "my-function-name",
        "description": "A function useful to know if a theroem is proved or not"
      }
    }
  ],
  "seed": 21,
  "model": "my-model-name"
}

Sample response

{
  "id": "kgousajxgzyhugvqekuswuqbk",
  "object": "chat.completion",
  "created": 18,
  "model": "my-model-name",
  "usage": {
    "completion_tokens": 19,
    "prompt_tokens": 28,
    "total_tokens": 16
  },
  "choices": [
    {
      "index": 7,
      "finish_reason": "stop",
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "yrobmilsrugmbwukmzo",
            "type": "function",
            "function": {
              "name": "my-function-name",
              "arguments": "{ \"arg1\": \"value1\", \"arg2\": \"value2\" }"
            }
          }
        ]
      }
    }
  ]
}

minimum set chat completion

Sample request

POST https:///chat/completions?api-version=2024-05-01-preview

{
  "messages": [
    {
      "role": "user",
      "content": "Explain Riemann's conjecture"
    }
  ]
}

Sample response

{
  "id": "kgousajxgzyhugvqekuswuqbk",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "my-model-name",
  "usage": {
    "prompt_tokens": 205,
    "completion_tokens": 5,
    "total_tokens": 210
  },
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "message": {
        "role": "assistant",
        "content": "The Riemann Conjecture is a deep mathematical conjecture around prime numbers and how they can be predicted. It was first published in Riemann's groundbreaking 1859 paper. The conjecture states that the Riemann zeta function has its zeros only at the negative even integers and complex numbers with real part 1/21. Many consider it to be the most important unsolved problem in pure mathematics. The Riemann hypothesis is a way to predict the probability that numbers in a certain range are prime that was also devised by German mathematician Bernhard Riemann in 18594"
      }
    }
  ]
}

Definitions

Name Description
AudioContentFormat

A representation of the possible audio formats for audio.

Azure.Core.Foundations.Error

The error object.

Azure.Core.Foundations.ErrorResponse

A response containing error details.

Azure.Core.Foundations.InnerError

An object containing more specific information about the error. As per Microsoft One API guidelines - https://github.com/Microsoft/api-guidelines/blob/vNext/Guidelines.md#7102-error-condition-responses.

ChatChoice

The representation of a single prompt completion as part of an overall chat completions request. Generally, n choices are generated per provided prompt with a default value of 1. Token limits and other settings may limit the number of choices generated.

ChatCompletions

Representation of the response data from a chat completions request. Completions support a wide variety of tasks and generate text that continues from or "completes" provided prompt data.

ChatCompletionsAudio

A representation of the audio generated by the model.

ChatCompletionsModality

The modalities that the model is allowed to use for the chat completions response.

ChatCompletionsOptions

The configuration information for a chat completions request. Completions support a wide variety of tasks and generate text that continues from or "completes" provided prompt data.

ChatCompletionsResponseFormatJsonObject

A response format for Chat Completions that restricts responses to emitting valid JSON objects. Note that to enable JSON mode, some AI models may also require you to instruct the model to produce JSON via a system or user message.

ChatCompletionsResponseFormatJsonSchema

A response format for Chat Completions that restricts responses to emitting valid JSON objects, with a JSON schema specified by the caller.

ChatCompletionsResponseFormatJsonSchemaDefinition

The definition of the required JSON schema in the response, and associated metadata.

ChatCompletionsResponseFormatText

A response format for Chat Completions that emits text responses. This is the default response format.

ChatCompletionsToolCall

A function tool call requested by the AI model.

ChatCompletionsToolDefinition

The definition of a chat completions tool that can call a function.

ChatRequestAssistantMessage

A request chat message representing response or action from the assistant.

ChatRequestAudioReference

A reference to an audio response generated by the model.

ChatRequestSystemMessage

A request chat message containing system instructions that influence how the model will generate a chat completions response.

ChatRequestToolMessage

A request chat message representing requested output from a configured tool.

ChatRequestUserMessage

A request chat message representing user input to the assistant.

ChatResponseMessage

A representation of a chat message as received in a response.

ChatRole

A description of the intended purpose of a message within a chat completions interaction.

CompletionsFinishReason

Representation of the manner in which a completions response concluded.

CompletionsUsage

Representation of the token counts processed for a completions request. Counts consider all tokens across prompts, choices, choice alternates, best_of generations, and other consumers.

CompletionsUsageDetails

A breakdown of tokens used in a completion.

ExtraParameters

Controls what happens if extra parameters, undefined by the REST API, are passed in the JSON request payload.

FunctionCall

The name and arguments of a function that should be called, as generated by the model.

FunctionDefinition

The definition of a caller-specified function that chat completions may invoke in response to matching user input.

PromptUsageDetails

A breakdown of tokens used in the prompt/chat history.

AudioContentFormat

A representation of the possible audio formats for audio.

Value Description
mp3

Specifies audio in MP3 format.

wav

Specifies audio in WAV format.

Azure.Core.Foundations.Error

The error object.

Name Type Description
code

string

One of a server-defined set of error codes.

details

Azure.Core.Foundations.Error[]

An array of details about specific errors that led to this reported error.

innererror

Azure.Core.Foundations.InnerError

An object containing more specific information than the current object about the error.

message

string

A human-readable representation of the error.

target

string

The target of the error.

Azure.Core.Foundations.ErrorResponse

A response containing error details.

Name Type Description
error

Azure.Core.Foundations.Error

The error object.

Azure.Core.Foundations.InnerError

An object containing more specific information about the error. As per Microsoft One API guidelines - https://github.com/Microsoft/api-guidelines/blob/vNext/Guidelines.md#7102-error-condition-responses.

Name Type Description
code

string

One of a server-defined set of error codes.

innererror

Azure.Core.Foundations.InnerError

Inner error.

ChatChoice

The representation of a single prompt completion as part of an overall chat completions request. Generally, n choices are generated per provided prompt with a default value of 1. Token limits and other settings may limit the number of choices generated.

Name Type Description
finish_reason

CompletionsFinishReason

The reason that this chat completions choice completed its generated.

index

integer

The ordered index associated with this chat completions choice.

message

ChatResponseMessage

The chat message for a given chat completions prompt.

ChatCompletions

Representation of the response data from a chat completions request. Completions support a wide variety of tasks and generate text that continues from or "completes" provided prompt data.

Name Type Description
choices

ChatChoice[]

The collection of completions choices associated with this completions response. Generally, n choices are generated per provided prompt with a default value of 1. Token limits and other settings may limit the number of choices generated.

created

integer

The first timestamp associated with generation activity for this completions response, represented as seconds since the beginning of the Unix epoch of 00:00 on 1 Jan 1970.

id

string

A unique identifier associated with this chat completions response.

model

string

The model used for the chat completion.

object enum:

chat.completion

The response object type, which is always chat.completion.

usage

CompletionsUsage

Usage information for tokens processed and generated as part of this completions operation.

ChatCompletionsAudio

A representation of the audio generated by the model.

Name Type Description
data

string

Base64 encoded audio data

expires_at

integer

The Unix timestamp (in seconds) at which the audio piece expires and can't be any longer referenced by its ID in multi-turn conversations.

format

AudioContentFormat

The format of the audio content. If format is not provided, it will match the format used in the input audio request.

id

string

Unique identifier for the audio response. This value can be used in chat history messages instead of passing the full audio object.

transcript

string

The transcript of the audio file.

ChatCompletionsModality

The modalities that the model is allowed to use for the chat completions response.

Value Description
audio

The model is allowed to generate audio.

text

The model is only allowed to generate text.

ChatCompletionsOptions

The configuration information for a chat completions request. Completions support a wide variety of tasks and generate text that continues from or "completes" provided prompt data.

Name Type Default value Description
frequency_penalty

number

0

A value that influences the probability of generated tokens appearing based on their cumulative frequency in generated text. Positive values will make tokens less likely to appear as their frequency increases and decrease the likelihood of the model repeating the same statements verbatim. Supported range is [-2, 2].

max_tokens

integer

The maximum number of tokens to generate.

messages ChatRequestMessage[]:

The collection of context messages associated with this chat completions request. Typical usage begins with a chat message for the System role that provides instructions for the behavior of the assistant, followed by alternating messages between the User and Assistant roles.

modalities

ChatCompletionsModality[]

The modalities that the model is allowed to use for the chat completions response. The default modality is text. Indicating an unsupported modality combination results in an 422 error.

model

string

ID of the specific AI model to use, if more than one model is available on the endpoint.

presence_penalty

number

0

A value that influences the probability of generated tokens appearing based on their existing presence in generated text. Positive values will make tokens less likely to appear when they already exist and increase the model's likelihood to output new topics. Supported range is [-2, 2].

response_format ChatCompletionsResponseFormat:

An object specifying the format that the model must output.

Setting to { "type": "json_schema", "json_schema": {...} } enables Structured Outputs which ensures the model will match your supplied JSON schema.

Setting to { "type": "json_object" } enables JSON mode, which ensures the message the model generates is valid JSON.

Important: when using JSON mode, you must also instruct the model to produce JSON yourself via a system or user message. Without this, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly "stuck" request. Also note that the message content may be partially cut off if finish_reason="length", which indicates the generation exceeded max_tokens or the conversation exceeded the max context length.

seed

integer

If specified, the system will make a best effort to sample deterministically such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed.

stop

string[]

A collection of textual sequences that will end completions generation.

stream

boolean

A value indicating whether chat completions should be streamed for this request.

temperature

number

0.7

The sampling temperature to use that controls the apparent creativity of generated completions. Higher values will make output more random while lower values will make results more focused and deterministic. It is not recommended to modify temperature and top_p for the same completions request as the interaction of these two settings is difficult to predict. Supported range is [0, 1].

tool_choice

If specified, the model will configure which of the provided tools it can use for the chat completions response.

tools

ChatCompletionsToolDefinition[]

A list of tools the model may request to call. Currently, only functions are supported as a tool. The model may response with a function call request and provide the input arguments in JSON format for that function.

top_p

number

1

An alternative to sampling with temperature called nucleus sampling. This value causes the model to consider the results of tokens with the provided probability mass. As an example, a value of 0.15 will cause only the tokens comprising the top 15% of probability mass to be considered. It is not recommended to modify temperature and top_p for the same completions request as the interaction of these two settings is difficult to predict. Supported range is [0, 1].

ChatCompletionsResponseFormatJsonObject

A response format for Chat Completions that restricts responses to emitting valid JSON objects. Note that to enable JSON mode, some AI models may also require you to instruct the model to produce JSON via a system or user message.

Name Type Description
type string:

json_object

The response format type to use for chat completions.

ChatCompletionsResponseFormatJsonSchema

A response format for Chat Completions that restricts responses to emitting valid JSON objects, with a JSON schema specified by the caller.

Name Type Description
json_schema

ChatCompletionsResponseFormatJsonSchemaDefinition

The definition of the required JSON schema in the response, and associated metadata.

type string:

json_schema

The response format type to use for chat completions.

ChatCompletionsResponseFormatJsonSchemaDefinition

The definition of the required JSON schema in the response, and associated metadata.

Name Type Default value Description
description

string

A description of the response format, used by the AI model to determine how to generate responses in this format.

name

string

The name of the response format. Must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64.

schema

The definition of the JSON schema

strict

boolean

False

Whether to enable strict schema adherence when generating the output. If set to true, the model will always follow the exact schema defined in the schema field. Only a subset of JSON Schema is supported when strict is true.

ChatCompletionsResponseFormatText

A response format for Chat Completions that emits text responses. This is the default response format.

Name Type Description
type string:

text

The response format type to use for chat completions.

ChatCompletionsToolCall

A function tool call requested by the AI model.

Name Type Description
function

FunctionCall

The details of the function call requested by the AI model.

id

string

The ID of the tool call.

type enum:

function

The type of tool call. Currently, only function is supported.

ChatCompletionsToolDefinition

The definition of a chat completions tool that can call a function.

Name Type Description
function

FunctionDefinition

The function definition details for the function tool.

type enum:

function

The type of the tool. Currently, only function is supported.

ChatRequestAssistantMessage

A request chat message representing response or action from the assistant.

Name Type Description
audio

ChatRequestAudioReference

The audio generated by a previous response in a multi-turn conversation.

content

string

The content of the message.

role string:

assistant

The chat role associated with this message.

tool_calls

ChatCompletionsToolCall[]

The tool calls that must be resolved and have their outputs appended to subsequent input messages for the chat completions request to resolve as configured.

ChatRequestAudioReference

A reference to an audio response generated by the model.

Name Type Description
id

string

Unique identifier for the audio response. This value corresponds to the id of a previous audio completion.

ChatRequestSystemMessage

A request chat message containing system instructions that influence how the model will generate a chat completions response.

Name Type Description
content

string

The contents of the system message.

role string:

system

The chat role associated with this message.

ChatRequestToolMessage

A request chat message representing requested output from a configured tool.

Name Type Description
content

string

The content of the message.

role string:

tool

The chat role associated with this message.

tool_call_id

string

The ID of the tool call resolved by the provided content.

ChatRequestUserMessage

A request chat message representing user input to the assistant.

Name Type Description
content

The contents of the user message, with available input types varying by selected model.

role string:

user

The chat role associated with this message.

ChatResponseMessage

A representation of a chat message as received in a response.

Name Type Description
audio

ChatCompletionsAudio

The audio generated by the model as a response to the messages if the model is configured to generate audio.

content

string

The content of the message.

role

ChatRole

The chat role associated with the message.

tool_calls

ChatCompletionsToolCall[]

The tool calls that must be resolved and have their outputs appended to subsequent input messages for the chat completions request to resolve as configured.

ChatRole

A description of the intended purpose of a message within a chat completions interaction.

Value Description
assistant

The role that provides responses to system-instructed, user-prompted input.

developer

The role that provides instructions to the model prioritized ahead of user messages.

system

The role that instructs or sets the behavior of the assistant.

tool

The role that represents extension tool activity within a chat completions operation.

user

The role that provides input for chat completions.

CompletionsFinishReason

Representation of the manner in which a completions response concluded.

Value Description
content_filter

Completions generated a response that was identified as potentially sensitive per content moderation policies.

length

Completions exhausted available token limits before generation could complete.

stop

Completions ended normally and reached its end of token generation.

tool_calls

Completion ended with the model calling a provided tool for output.

CompletionsUsage

Representation of the token counts processed for a completions request. Counts consider all tokens across prompts, choices, choice alternates, best_of generations, and other consumers.

Name Type Description
completion_tokens

integer

The number of tokens generated across all completions emissions.

completion_tokens_details

CompletionsUsageDetails

Breakdown of tokens used in a completion.

prompt_tokens

integer

The number of tokens in the provided prompts for the completions request.

prompt_tokens_details

PromptUsageDetails

Breakdown of tokens used in the prompt/chat history.

total_tokens

integer

The total number of tokens processed for the completions request and response.

CompletionsUsageDetails

A breakdown of tokens used in a completion.

Name Type Description
audio_tokens

integer

The number of tokens corresponding to audio input.

total_tokens

integer

The total number of tokens processed for the completions request and response.

ExtraParameters

Controls what happens if extra parameters, undefined by the REST API, are passed in the JSON request payload.

Value Description
drop

The service will ignore (drop) extra parameters in the request payload. It will only pass the known parameters to the back-end AI model.

error

The service will error if it detected extra parameters in the request payload. This is the service default.

pass-through

The service will pass extra parameters to the back-end AI model.

FunctionCall

The name and arguments of a function that should be called, as generated by the model.

Name Type Description
arguments

string

The arguments to call the function with, as generated by the model in JSON format. Note that the model does not always generate valid JSON, and may hallucinate parameters not defined by your function schema. Validate the arguments in your code before calling your function.

name

string

The name of the function to call.

FunctionDefinition

The definition of a caller-specified function that chat completions may invoke in response to matching user input.

Name Type Description
description

string

A description of what the function does. The model will use this description when selecting the function and interpreting its parameters.

name

string

The name of the function to be called.

parameters

The parameters the function accepts, described as a JSON Schema object.

PromptUsageDetails

A breakdown of tokens used in the prompt/chat history.

Name Type Description
audio_tokens

integer

The number of tokens corresponding to audio input.

cached_tokens

integer

The total number of tokens cached.