Reference: Completions | Azure AI Foundry
Important
Items marked (preview) in this article are currently in public preview. This preview is provided without a service-level agreement, and we don't recommend it for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.
Creates a completion for the provided prompt and parameters.
POST /completions?api-version=2024-04-01-preview
Name | In | Required | Type | Description |
---|---|---|---|---|
api-version | query | True | string | The version of the API in the format "YYYY-MM-DD" or "YYYY-MM-DD-preview". |
Request Header
Name | Required | Type | Description |
---|---|---|---|
extra-parameters | string | The behavior of the API when extra parameters are indicated in the payload. Using pass-through makes the API to pass the parameter to the underlying model. Use this value when you want to pass parameters that you know the underlying model can support. Using ignore makes the API to drop any unsupported parameter. Use this value when you need to use the same payload across different models, but one of the extra parameters may make a model to error out if not supported. Using error makes the API to reject any extra parameter in the payload. Only parameters specified in this API can be indicated, or a 400 error is returned. |
|
azureml-model-deployment | string | Name of the deployment you want to route the request to. Supported for endpoints that support multiple deployments. |
Request Body
Name | Required | Type | Description |
---|---|---|---|
prompt | True | The prompts to generate completions for, encoded as a string, array of strings, array of tokens, or array of token arrays. Note that <\|endoftext\|> is the document separator that the model sees during training, so if a prompt is not specified the model generates as if from the beginning of a new document. |
|
frequency_penalty | number | Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. | |
max_tokens | integer | The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length. |
|
presence_penalty | number | Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. | |
seed | integer | If specified, the model makes a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend. |
|
stop | Sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence. | ||
stream | boolean | Whether to stream back partial progress. If set, tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. |
|
temperature | number | What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering temperature or top_p but not both. |
|
top_p | number | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering top_p or temperature but not both. |
Responses
Name | Type | Description |
---|---|---|
200 OK | CreateCompletionResponse | OK |
401 Unauthorized | UnauthorizedError | Access token is missing or invalid Headers x-ms-error-code: string |
404 Not Found | NotFoundError | Modality not supported by the model. Check the documentation of the model to see which routes are available. Headers x-ms-error-code: string |
422 Unprocessable Entity | UnprocessableContentError | The request contains unprocessable content Headers x-ms-error-code: string |
429 Too Many Requests | TooManyRequestsError | You have hit your assigned rate limit and your request need to be paced. Headers x-ms-error-code: string |
Other Status Codes | ContentFilterError | Bad request Headers x-ms-error-code: string |
Security
Authorization
The token with the Bearer:
prefix, e.g. Bearer abcde12345
Type: apiKey
In: header
AADToken
Azure Active Directory OAuth2 authentication
Type: oauth2
Flow: application
Token URL: https://login.microsoftonline.com/common/oauth2/v2.0/token
Examples
Creates a completion for the provided prompt and parameters
Sample Request
POST /completions?api-version=2024-04-01-preview
{
"prompt": "This is a very good text",
"frequency_penalty": 0,
"presence_penalty": 0,
"max_tokens": 256,
"seed": 42,
"stop": "<|endoftext|>",
"stream": false,
"temperature": 0,
"top_p": 1
}
Sample Response
Status code: 200
{
"id": "1234567890",
"model": "llama2-7b",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"text": ", indeed it is a good one."
}
],
"created": 1234567890,
"object": "text_completion",
"usage": {
"prompt_tokens": 15,
"completion_tokens": 8,
"total_tokens": 23
}
}
Definitions
Name | Description |
---|---|
Choices | A list of chat completion choices. |
CompletionFinishReason | The reason the model stopped generating tokens. This is stop if the model hit a natural stop point or a provided stop sequence, length if the maximum number of tokens specified in the request was reached, content_filter if content was omitted due to a flag from our content filters. |
CompletionUsage | Usage statistics for the completion request. |
ContentFilterError | The API call fails when the prompt triggers a content filter as configured. Modify the prompt and try again. |
CreateCompletionRequest | |
CreateCompletionResponse | Represents a completion response from the API. |
Detail | |
TextCompletionObject | The object type, which is always "text_completion" |
UnprocessableContentError |
Choices
A list of chat completion choices.
Name | Type | Description |
---|---|---|
finish_reason | CompletionFinishReason | The reason the model stopped generating tokens. This is stop if the model hit a natural stop point or a provided stop sequence, length if the maximum number of tokens specified in the request was reached, content_filter if content was omitted due to a flag from our content filters, tool_calls if the model called a tool. |
index | integer | The index of the choice in the list of choices. |
text | string | The generated text. |
CompletionFinishReason
The reason the model stopped generating tokens. This is stop
if the model hit a natural stop point or a provided stop sequence, length
if the maximum number of tokens specified in the request was reached, content_filter
if content was omitted due to a flag from our content filters.
Name | Type | Description |
---|---|---|
content_filter | string | |
length | string | |
stop | string |
CompletionUsage
Usage statistics for the completion request.
Name | Type | Description |
---|---|---|
completion_tokens | integer | Number of tokens in the generated completion. |
prompt_tokens | integer | Number of tokens in the prompt. |
total_tokens | integer | Total number of tokens used in the request (prompt + completion). |
ContentFilterError
The API call fails when the prompt triggers a content filter as configured. Modify the prompt and try again.
Name | Type | Description |
---|---|---|
code | string | The error code. |
error | string | The error description. |
message | string | The error message. |
param | string | The parameter that triggered the content filter. |
status | integer | The HTTP status code. |
CreateCompletionRequest
Name | Type | Default Value | Description |
---|---|---|---|
frequency_penalty | number | 0 | Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. |
max_tokens | integer | 256 | The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length. |
presence_penalty | number | 0 | Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. |
prompt | <\|endoftext\|> |
The prompts to generate completions for, encoded as a string, array of strings, array of tokens, or array of token arrays. Note that <\|endoftext\|> is the document separator that the model sees during training, so if a prompt is not specified the model generates as if from the beginning of a new document. |
|
seed | integer | If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend. |
|
stop | Sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence. | ||
stream | boolean | False | Whether to stream back partial progress. If set, tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. |
temperature | number | 1 | What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both. |
top_p | number | 1 | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both. |
CreateCompletionResponse
Represents a completion response from the API. Note: both the streamed and nonstreamed response objects share the same shape (unlike the chat endpoint).
Name | Type | Description |
---|---|---|
choices | Choices[] | The list of completion choices the model generated for the input prompt. |
created | integer | The Unix timestamp (in seconds) of when the completion was created. |
ID | string | A unique identifier for the completion. |
model | string | The model used for completion. |
object | TextCompletionObject | The object type, which is always "text_completion" |
system_fingerprint | string | This fingerprint represents the backend configuration that the model runs with. Can be used with the seed request parameter to understand when backend changes have been made that might impact determinism. |
usage | CompletionUsage | Usage statistics for the completion request. |
Detail
Name | Type | Description |
---|---|---|
loc | string[] | The parameter causing the issue |
value | string | The value passed to the parameter causing issues. |
TextCompletionObject
The object type, which is always "text_completion"
Name | Type | Description |
---|---|---|
text_completion | string |
ListObject
The object type, which is always "list".
Name | Type | Description |
---|---|---|
list | string |
NotFoundError
Name | Type | Description |
---|---|---|
error | string | The error description. |
message | string | The error message. |
status | integer | The HTTP status code. |
TooManyRequestsError
Name | Type | Description |
---|---|---|
error | string | The error description. |
message | string | The error message. |
status | integer | The HTTP status code. |
UnauthorizedError
Name | Type | Description |
---|---|---|
error | string | The error description. |
message | string | The error message. |
status | integer | The HTTP status code. |
UnprocessableContentError
Name | Type | Description |
---|---|---|
code | string | The error code. |
detail | Detail | |
error | string | The error description. |
message | string | The error message. |
status | integer | The HTTP status code. |