Get Chat Completions - Get Chat Completions
Gets chat completions for the provided chat messages.
Completions support a wide variety of tasks and generate text that continues from or "completes"
provided prompt data. The method makes a REST API call to the /chat/completions
route
on the given endpoint.
POST https:///chat/completions?api-version=2024-05-01-preview
URI Parameters
Name | In | Required | Type | Description |
---|---|---|---|---|
api-version
|
query | True |
string |
The API version to use for this operation. |
Request Header
Name | Required | Type | Description |
---|---|---|---|
extra-parameters |
Controls what happens if extra parameters, undefined by the REST API,
are passed in the JSON request payload.
This sets the HTTP request header |
Request Body
Name | Required | Type | Description |
---|---|---|---|
messages | True | ChatRequestMessage[]: |
The collection of context messages associated with this chat completions request. Typical usage begins with a chat message for the System role that provides instructions for the behavior of the assistant, followed by alternating messages between the User and Assistant roles. |
frequency_penalty |
number |
A value that influences the probability of generated tokens appearing based on their cumulative frequency in generated text. Positive values will make tokens less likely to appear as their frequency increases and decrease the likelihood of the model repeating the same statements verbatim. Supported range is [-2, 2]. |
|
max_tokens |
integer |
The maximum number of tokens to generate. |
|
modalities |
The modalities that the model is allowed to use for the chat completions response. The default modality
is |
||
model |
string |
ID of the specific AI model to use, if more than one model is available on the endpoint. |
|
presence_penalty |
number |
A value that influences the probability of generated tokens appearing based on their existing presence in generated text. Positive values will make tokens less likely to appear when they already exist and increase the model's likelihood to output new topics. Supported range is [-2, 2]. |
|
response_format | ChatCompletionsResponseFormat: |
An object specifying the format that the model must output. Setting to Setting to Important: when using JSON mode, you must also instruct the model to produce JSON yourself via a system or user message. Without this, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly "stuck" request. Also note that the message content may be partially cut off if |
|
seed |
integer |
If specified, the system will make a best effort to sample deterministically such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed. |
|
stop |
string[] |
A collection of textual sequences that will end completions generation. |
|
stream |
boolean |
A value indicating whether chat completions should be streamed for this request. |
|
temperature |
number |
The sampling temperature to use that controls the apparent creativity of generated completions. Higher values will make output more random while lower values will make results more focused and deterministic. It is not recommended to modify temperature and top_p for the same completions request as the interaction of these two settings is difficult to predict. Supported range is [0, 1]. |
|
tool_choice |
|
If specified, the model will configure which of the provided tools it can use for the chat completions response. |
|
tools |
A list of tools the model may request to call. Currently, only functions are supported as a tool. The model may response with a function call request and provide the input arguments in JSON format for that function. |
||
top_p |
number |
An alternative to sampling with temperature called nucleus sampling. This value causes the model to consider the results of tokens with the provided probability mass. As an example, a value of 0.15 will cause only the tokens comprising the top 15% of probability mass to be considered. It is not recommended to modify temperature and top_p for the same completions request as the interaction of these two settings is difficult to predict. Supported range is [0, 1]. |
Responses
Name | Type | Description |
---|---|---|
200 OK |
The request has succeeded. |
|
Other Status Codes |
An unexpected error response. Headers x-ms-error-code: string |
Security
api-key
Type:
apiKey
In:
header
OAuth2Auth
Type:
oauth2
Flow:
implicit
Authorization URL:
https://login.microsoftonline.com/common/oauth2/v2.0/authorize
Scopes
Name | Description |
---|---|
https://ml.azure.com/.default |
Examples
Audio modality chat completion |
maximum set chat completion |
minimum set chat completion |
Audio modality chat completion
Sample request
POST https:///chat/completions?api-version=2024-05-01-preview
{
"modalities": [
"text",
"audio"
],
"messages": [
{
"role": "system",
"content": "You are a helpful assistant"
},
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "<base64 encoded audio data>",
"format": "wav"
}
}
]
},
{
"role": "assistant",
"content": null,
"audio": {
"id": "abcdef1234"
}
},
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "<base64 encoded audio data>",
"format": "wav"
}
}
]
}
],
"frequency_penalty": 0,
"presence_penalty": 0,
"temperature": 0,
"top_p": 0,
"seed": 21,
"model": "my-model-name"
}
Sample response
{
"id": "kgousajxgzyhugvqekuswuqbk",
"object": "chat.completion",
"created": 1696522361,
"model": "my-model-name",
"usage": {
"completion_tokens": 19,
"prompt_tokens": 28,
"total_tokens": 16,
"completion_tokens_details": {
"audio_tokens": 5,
"total_tokens": 5
},
"prompt_tokens_details": {
"audio_tokens": 10,
"cached_tokens": 0
}
},
"choices": [
{
"index": 0,
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": null,
"tool_calls": null,
"audio": {
"id": "abcdef1234",
"format": "wav",
"data": "<base64 encoded audio data>",
"expires_at": 1896522361,
"transcript": "This is a sample transcript"
}
}
}
]
}
maximum set chat completion
Sample request
POST https:///chat/completions?api-version=2024-05-01-preview
{
"modalities": [
"text"
],
"messages": [
{
"role": "system",
"content": "You are a helpful assistant"
},
{
"role": "user",
"content": "Explain Riemann's conjecture"
},
{
"role": "assistant",
"content": "The Riemann Conjecture is a deep mathematical conjecture around prime numbers and how they can be predicted. It was first published in Riemann's groundbreaking 1859 paper. The conjecture states that the Riemann zeta function has its zeros only at the negative even integers and complex numbers with real part 1/21. Many consider it to be the most important unsolved problem in pure mathematics. The Riemann hypothesis is a way to predict the probability that numbers in a certain range are prime that was also devised by German mathematician Bernhard Riemann in 18594."
},
{
"role": "user",
"content": "Ist it proved?"
}
],
"frequency_penalty": 0,
"stream": true,
"presence_penalty": 0,
"temperature": 0,
"top_p": 0,
"max_tokens": 255,
"response_format": {
"type": "text"
},
"stop": [
"<|endoftext|>"
],
"tools": [
{
"type": "function",
"function": {
"name": "my-function-name",
"description": "A function useful to know if a theroem is proved or not"
}
}
],
"seed": 21,
"model": "my-model-name"
}
Sample response
{
"id": "kgousajxgzyhugvqekuswuqbk",
"object": "chat.completion",
"created": 18,
"model": "my-model-name",
"usage": {
"completion_tokens": 19,
"prompt_tokens": 28,
"total_tokens": 16
},
"choices": [
{
"index": 7,
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": null,
"tool_calls": [
{
"id": "yrobmilsrugmbwukmzo",
"type": "function",
"function": {
"name": "my-function-name",
"arguments": "{ \"arg1\": \"value1\", \"arg2\": \"value2\" }"
}
}
]
}
}
]
}
minimum set chat completion
Sample request
POST https:///chat/completions?api-version=2024-05-01-preview
{
"messages": [
{
"role": "user",
"content": "Explain Riemann's conjecture"
}
]
}
Sample response
{
"id": "kgousajxgzyhugvqekuswuqbk",
"object": "chat.completion",
"created": 1234567890,
"model": "my-model-name",
"usage": {
"prompt_tokens": 205,
"completion_tokens": 5,
"total_tokens": 210
},
"choices": [
{
"index": 0,
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "The Riemann Conjecture is a deep mathematical conjecture around prime numbers and how they can be predicted. It was first published in Riemann's groundbreaking 1859 paper. The conjecture states that the Riemann zeta function has its zeros only at the negative even integers and complex numbers with real part 1/21. Many consider it to be the most important unsolved problem in pure mathematics. The Riemann hypothesis is a way to predict the probability that numbers in a certain range are prime that was also devised by German mathematician Bernhard Riemann in 18594"
}
}
]
}
Definitions
Name | Description |
---|---|
Audio |
A representation of the possible audio formats for audio. |
Azure. |
The error object. |
Azure. |
A response containing error details. |
Azure. |
An object containing more specific information about the error. As per Microsoft One API guidelines - https://github.com/Microsoft/api-guidelines/blob/vNext/Guidelines.md#7102-error-condition-responses. |
Chat |
The representation of a single prompt completion as part of an overall chat completions request.
Generally, |
Chat |
Representation of the response data from a chat completions request. Completions support a wide variety of tasks and generate text that continues from or "completes" provided prompt data. |
Chat |
A representation of the audio generated by the model. |
Chat |
The modalities that the model is allowed to use for the chat completions response. |
Chat |
The configuration information for a chat completions request. Completions support a wide variety of tasks and generate text that continues from or "completes" provided prompt data. |
Chat |
A response format for Chat Completions that restricts responses to emitting valid JSON objects. Note that to enable JSON mode, some AI models may also require you to instruct the model to produce JSON via a system or user message. |
Chat |
A response format for Chat Completions that restricts responses to emitting valid JSON objects, with a JSON schema specified by the caller. |
Chat |
The definition of the required JSON schema in the response, and associated metadata. |
Chat |
A response format for Chat Completions that emits text responses. This is the default response format. |
Chat |
A function tool call requested by the AI model. |
Chat |
The definition of a chat completions tool that can call a function. |
Chat |
A request chat message representing response or action from the assistant. |
Chat |
A reference to an audio response generated by the model. |
Chat |
A request chat message containing system instructions that influence how the model will generate a chat completions response. |
Chat |
A request chat message representing requested output from a configured tool. |
Chat |
A request chat message representing user input to the assistant. |
Chat |
A representation of a chat message as received in a response. |
Chat |
A description of the intended purpose of a message within a chat completions interaction. |
Completions |
Representation of the manner in which a completions response concluded. |
Completions |
Representation of the token counts processed for a completions request. Counts consider all tokens across prompts, choices, choice alternates, best_of generations, and other consumers. |
Completions |
A breakdown of tokens used in a completion. |
Extra |
Controls what happens if extra parameters, undefined by the REST API, are passed in the JSON request payload. |
Function |
The name and arguments of a function that should be called, as generated by the model. |
Function |
The definition of a caller-specified function that chat completions may invoke in response to matching user input. |
Prompt |
A breakdown of tokens used in the prompt/chat history. |
AudioContentFormat
A representation of the possible audio formats for audio.
Value | Description |
---|---|
mp3 |
Specifies audio in MP3 format. |
wav |
Specifies audio in WAV format. |
Azure.Core.Foundations.Error
The error object.
Name | Type | Description |
---|---|---|
code |
string |
One of a server-defined set of error codes. |
details |
An array of details about specific errors that led to this reported error. |
|
innererror |
An object containing more specific information than the current object about the error. |
|
message |
string |
A human-readable representation of the error. |
target |
string |
The target of the error. |
Azure.Core.Foundations.ErrorResponse
A response containing error details.
Name | Type | Description |
---|---|---|
error |
The error object. |
Azure.Core.Foundations.InnerError
An object containing more specific information about the error. As per Microsoft One API guidelines - https://github.com/Microsoft/api-guidelines/blob/vNext/Guidelines.md#7102-error-condition-responses.
Name | Type | Description |
---|---|---|
code |
string |
One of a server-defined set of error codes. |
innererror |
Inner error. |
ChatChoice
The representation of a single prompt completion as part of an overall chat completions request.
Generally, n
choices are generated per provided prompt with a default value of 1.
Token limits and other settings may limit the number of choices generated.
Name | Type | Description |
---|---|---|
finish_reason |
The reason that this chat completions choice completed its generated. |
|
index |
integer |
The ordered index associated with this chat completions choice. |
message |
The chat message for a given chat completions prompt. |
ChatCompletions
Representation of the response data from a chat completions request. Completions support a wide variety of tasks and generate text that continues from or "completes" provided prompt data.
Name | Type | Description |
---|---|---|
choices |
The collection of completions choices associated with this completions response.
Generally, |
|
created |
integer |
The first timestamp associated with generation activity for this completions response, represented as seconds since the beginning of the Unix epoch of 00:00 on 1 Jan 1970. |
id |
string |
A unique identifier associated with this chat completions response. |
model |
string |
The model used for the chat completion. |
object |
enum:
chat. |
The response object type, which is always |
usage |
Usage information for tokens processed and generated as part of this completions operation. |
ChatCompletionsAudio
A representation of the audio generated by the model.
Name | Type | Description |
---|---|---|
data |
string |
Base64 encoded audio data |
expires_at |
integer |
The Unix timestamp (in seconds) at which the audio piece expires and can't be any longer referenced by its ID in multi-turn conversations. |
format |
The format of the audio content. If format is not provided, it will match the format used in the input audio request. |
|
id |
string |
Unique identifier for the audio response. This value can be used in chat history messages instead of passing the full audio object. |
transcript |
string |
The transcript of the audio file. |
ChatCompletionsModality
The modalities that the model is allowed to use for the chat completions response.
Value | Description |
---|---|
audio |
The model is allowed to generate audio. |
text |
The model is only allowed to generate text. |
ChatCompletionsOptions
The configuration information for a chat completions request. Completions support a wide variety of tasks and generate text that continues from or "completes" provided prompt data.
Name | Type | Default value | Description |
---|---|---|---|
frequency_penalty |
number |
0 |
A value that influences the probability of generated tokens appearing based on their cumulative frequency in generated text. Positive values will make tokens less likely to appear as their frequency increases and decrease the likelihood of the model repeating the same statements verbatim. Supported range is [-2, 2]. |
max_tokens |
integer |
The maximum number of tokens to generate. |
|
messages | ChatRequestMessage[]: |
The collection of context messages associated with this chat completions request. Typical usage begins with a chat message for the System role that provides instructions for the behavior of the assistant, followed by alternating messages between the User and Assistant roles. |
|
modalities |
The modalities that the model is allowed to use for the chat completions response. The default modality
is |
||
model |
string |
ID of the specific AI model to use, if more than one model is available on the endpoint. |
|
presence_penalty |
number |
0 |
A value that influences the probability of generated tokens appearing based on their existing presence in generated text. Positive values will make tokens less likely to appear when they already exist and increase the model's likelihood to output new topics. Supported range is [-2, 2]. |
response_format | ChatCompletionsResponseFormat: |
An object specifying the format that the model must output. Setting to Setting to Important: when using JSON mode, you must also instruct the model to produce JSON yourself via a system or user message. Without this, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly "stuck" request. Also note that the message content may be partially cut off if |
|
seed |
integer |
If specified, the system will make a best effort to sample deterministically such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed. |
|
stop |
string[] |
A collection of textual sequences that will end completions generation. |
|
stream |
boolean |
A value indicating whether chat completions should be streamed for this request. |
|
temperature |
number |
0.7 |
The sampling temperature to use that controls the apparent creativity of generated completions. Higher values will make output more random while lower values will make results more focused and deterministic. It is not recommended to modify temperature and top_p for the same completions request as the interaction of these two settings is difficult to predict. Supported range is [0, 1]. |
tool_choice |
|
If specified, the model will configure which of the provided tools it can use for the chat completions response. |
|
tools |
A list of tools the model may request to call. Currently, only functions are supported as a tool. The model may response with a function call request and provide the input arguments in JSON format for that function. |
||
top_p |
number |
1 |
An alternative to sampling with temperature called nucleus sampling. This value causes the model to consider the results of tokens with the provided probability mass. As an example, a value of 0.15 will cause only the tokens comprising the top 15% of probability mass to be considered. It is not recommended to modify temperature and top_p for the same completions request as the interaction of these two settings is difficult to predict. Supported range is [0, 1]. |
ChatCompletionsResponseFormatJsonObject
A response format for Chat Completions that restricts responses to emitting valid JSON objects. Note that to enable JSON mode, some AI models may also require you to instruct the model to produce JSON via a system or user message.
Name | Type | Description |
---|---|---|
type |
string:
json_object |
The response format type to use for chat completions. |
ChatCompletionsResponseFormatJsonSchema
A response format for Chat Completions that restricts responses to emitting valid JSON objects, with a JSON schema specified by the caller.
Name | Type | Description |
---|---|---|
json_schema |
The definition of the required JSON schema in the response, and associated metadata. |
|
type |
string:
json_schema |
The response format type to use for chat completions. |
ChatCompletionsResponseFormatJsonSchemaDefinition
The definition of the required JSON schema in the response, and associated metadata.
Name | Type | Default value | Description |
---|---|---|---|
description |
string |
A description of the response format, used by the AI model to determine how to generate responses in this format. |
|
name |
string |
The name of the response format. Must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64. |
|
schema |
|
The definition of the JSON schema |
|
strict |
boolean |
False |
Whether to enable strict schema adherence when generating the output.
If set to true, the model will always follow the exact schema defined in the |
ChatCompletionsResponseFormatText
A response format for Chat Completions that emits text responses. This is the default response format.
Name | Type | Description |
---|---|---|
type |
string:
text |
The response format type to use for chat completions. |
ChatCompletionsToolCall
A function tool call requested by the AI model.
Name | Type | Description |
---|---|---|
function |
The details of the function call requested by the AI model. |
|
id |
string |
The ID of the tool call. |
type |
enum:
function |
The type of tool call. Currently, only |
ChatCompletionsToolDefinition
The definition of a chat completions tool that can call a function.
Name | Type | Description |
---|---|---|
function |
The function definition details for the function tool. |
|
type |
enum:
function |
The type of the tool. Currently, only |
ChatRequestAssistantMessage
A request chat message representing response or action from the assistant.
Name | Type | Description |
---|---|---|
audio |
The audio generated by a previous response in a multi-turn conversation. |
|
content |
string |
The content of the message. |
role |
string:
assistant |
The chat role associated with this message. |
tool_calls |
The tool calls that must be resolved and have their outputs appended to subsequent input messages for the chat completions request to resolve as configured. |
ChatRequestAudioReference
A reference to an audio response generated by the model.
Name | Type | Description |
---|---|---|
id |
string |
Unique identifier for the audio response. This value corresponds to the id of a previous audio completion. |
ChatRequestSystemMessage
A request chat message containing system instructions that influence how the model will generate a chat completions response.
Name | Type | Description |
---|---|---|
content |
string |
The contents of the system message. |
role |
string:
system |
The chat role associated with this message. |
ChatRequestToolMessage
A request chat message representing requested output from a configured tool.
Name | Type | Description |
---|---|---|
content |
string |
The content of the message. |
role |
string:
tool |
The chat role associated with this message. |
tool_call_id |
string |
The ID of the tool call resolved by the provided content. |
ChatRequestUserMessage
A request chat message representing user input to the assistant.
Name | Type | Description |
---|---|---|
content |
|
The contents of the user message, with available input types varying by selected model. |
role |
string:
user |
The chat role associated with this message. |
ChatResponseMessage
A representation of a chat message as received in a response.
Name | Type | Description |
---|---|---|
audio |
The audio generated by the model as a response to the messages if the model is configured to generate audio. |
|
content |
string |
The content of the message. |
role |
The chat role associated with the message. |
|
tool_calls |
The tool calls that must be resolved and have their outputs appended to subsequent input messages for the chat completions request to resolve as configured. |
ChatRole
A description of the intended purpose of a message within a chat completions interaction.
Value | Description |
---|---|
assistant |
The role that provides responses to system-instructed, user-prompted input. |
developer |
The role that provides instructions to the model prioritized ahead of user messages. |
system |
The role that instructs or sets the behavior of the assistant. |
tool |
The role that represents extension tool activity within a chat completions operation. |
user |
The role that provides input for chat completions. |
CompletionsFinishReason
Representation of the manner in which a completions response concluded.
Value | Description |
---|---|
content_filter |
Completions generated a response that was identified as potentially sensitive per content moderation policies. |
length |
Completions exhausted available token limits before generation could complete. |
stop |
Completions ended normally and reached its end of token generation. |
tool_calls |
Completion ended with the model calling a provided tool for output. |
CompletionsUsage
Representation of the token counts processed for a completions request. Counts consider all tokens across prompts, choices, choice alternates, best_of generations, and other consumers.
Name | Type | Description |
---|---|---|
completion_tokens |
integer |
The number of tokens generated across all completions emissions. |
completion_tokens_details |
Breakdown of tokens used in a completion. |
|
prompt_tokens |
integer |
The number of tokens in the provided prompts for the completions request. |
prompt_tokens_details |
Breakdown of tokens used in the prompt/chat history. |
|
total_tokens |
integer |
The total number of tokens processed for the completions request and response. |
CompletionsUsageDetails
A breakdown of tokens used in a completion.
Name | Type | Description |
---|---|---|
audio_tokens |
integer |
The number of tokens corresponding to audio input. |
total_tokens |
integer |
The total number of tokens processed for the completions request and response. |
ExtraParameters
Controls what happens if extra parameters, undefined by the REST API, are passed in the JSON request payload.
Value | Description |
---|---|
drop |
The service will ignore (drop) extra parameters in the request payload. It will only pass the known parameters to the back-end AI model. |
error |
The service will error if it detected extra parameters in the request payload. This is the service default. |
pass-through |
The service will pass extra parameters to the back-end AI model. |
FunctionCall
The name and arguments of a function that should be called, as generated by the model.
Name | Type | Description |
---|---|---|
arguments |
string |
The arguments to call the function with, as generated by the model in JSON format. Note that the model does not always generate valid JSON, and may hallucinate parameters not defined by your function schema. Validate the arguments in your code before calling your function. |
name |
string |
The name of the function to call. |
FunctionDefinition
The definition of a caller-specified function that chat completions may invoke in response to matching user input.
Name | Type | Description |
---|---|---|
description |
string |
A description of what the function does. The model will use this description when selecting the function and interpreting its parameters. |
name |
string |
The name of the function to be called. |
parameters |
|
The parameters the function accepts, described as a JSON Schema object. |
PromptUsageDetails
A breakdown of tokens used in the prompt/chat history.
Name | Type | Description |
---|---|---|
audio_tokens |
integer |
The number of tokens corresponding to audio input. |
cached_tokens |
integer |
The total number of tokens cached. |