Reference: Completions | Azure AI Foundry

Important

Items marked (preview) in this article are currently in public preview. This preview is provided without a service-level agreement, and we don't recommend it for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

Creates a completion for the provided prompt and parameters.

POST /completions?api-version=2024-04-01-preview
Name In Required Type Description
api-version query True string The version of the API in the format "YYYY-MM-DD" or "YYYY-MM-DD-preview".

Request Header

Name Required Type Description
extra-parameters string The behavior of the API when extra parameters are indicated in the payload. Using pass-through makes the API to pass the parameter to the underlying model. Use this value when you want to pass parameters that you know the underlying model can support. Using ignore makes the API to drop any unsupported parameter. Use this value when you need to use the same payload across different models, but one of the extra parameters may make a model to error out if not supported. Using error makes the API to reject any extra parameter in the payload. Only parameters specified in this API can be indicated, or a 400 error is returned.
azureml-model-deployment string Name of the deployment you want to route the request to. Supported for endpoints that support multiple deployments.

Request Body

Name Required Type Description
prompt True The prompts to generate completions for, encoded as a string, array of strings, array of tokens, or array of token arrays. Note that <\|endoftext\|> is the document separator that the model sees during training, so if a prompt is not specified the model generates as if from the beginning of a new document.
frequency_penalty number Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
max_tokens integer The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
presence_penalty number Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
seed integer If specified, the model makes a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
stop Sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
stream boolean Whether to stream back partial progress. If set, tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message.
temperature number What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

We generally recommend altering temperature or top_p but not both.
top_p number An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

We generally recommend altering top_p or temperature but not both.

Responses

Name Type Description
200 OK CreateCompletionResponse OK
401 Unauthorized UnauthorizedError Access token is missing or invalid

Headers

x-ms-error-code: string
404 Not Found NotFoundError Modality not supported by the model. Check the documentation of the model to see which routes are available.

Headers

x-ms-error-code: string
422 Unprocessable Entity UnprocessableContentError The request contains unprocessable content

Headers

x-ms-error-code: string
429 Too Many Requests TooManyRequestsError You have hit your assigned rate limit and your request need to be paced.

Headers

x-ms-error-code: string
Other Status Codes ContentFilterError Bad request

Headers

x-ms-error-code: string

Security

Authorization

The token with the Bearer: prefix, e.g. Bearer abcde12345

Type: apiKey
In: header

AADToken

Azure Active Directory OAuth2 authentication

Type: oauth2
Flow: application
Token URL: https://login.microsoftonline.com/common/oauth2/v2.0/token

Examples

Creates a completion for the provided prompt and parameters

Sample Request

POST /completions?api-version=2024-04-01-preview

{
  "prompt": "This is a very good text",
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "max_tokens": 256,
  "seed": 42,
  "stop": "<|endoftext|>",
  "stream": false,
  "temperature": 0,
  "top_p": 1
}

Sample Response

Status code: 200

{
  "id": "1234567890",
  "model": "llama2-7b",
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "text": ", indeed it is a good one."
    }
  ],
  "created": 1234567890,
  "object": "text_completion",
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 8,
    "total_tokens": 23
  }
}

Definitions

Name Description
Choices A list of chat completion choices.
CompletionFinishReason The reason the model stopped generating tokens. This is stop if the model hit a natural stop point or a provided stop sequence, length if the maximum number of tokens specified in the request was reached, content_filter if content was omitted due to a flag from our content filters.
CompletionUsage Usage statistics for the completion request.
ContentFilterError The API call fails when the prompt triggers a content filter as configured. Modify the prompt and try again.
CreateCompletionRequest
CreateCompletionResponse Represents a completion response from the API.
Detail
TextCompletionObject The object type, which is always "text_completion"
UnprocessableContentError

Choices

A list of chat completion choices.

Name Type Description
finish_reason CompletionFinishReason The reason the model stopped generating tokens. This is stop if the model hit a natural stop point or a provided stop sequence, length if the maximum number of tokens specified in the request was reached, content_filter if content was omitted due to a flag from our content filters, tool_calls if the model called a tool.
index integer The index of the choice in the list of choices.
text string The generated text.

CompletionFinishReason

The reason the model stopped generating tokens. This is stop if the model hit a natural stop point or a provided stop sequence, length if the maximum number of tokens specified in the request was reached, content_filter if content was omitted due to a flag from our content filters.

Name Type Description
content_filter string
length string
stop string

CompletionUsage

Usage statistics for the completion request.

Name Type Description
completion_tokens integer Number of tokens in the generated completion.
prompt_tokens integer Number of tokens in the prompt.
total_tokens integer Total number of tokens used in the request (prompt + completion).

ContentFilterError

The API call fails when the prompt triggers a content filter as configured. Modify the prompt and try again.

Name Type Description
code string The error code.
error string The error description.
message string The error message.
param string The parameter that triggered the content filter.
status integer The HTTP status code.

CreateCompletionRequest

Name Type Default Value Description
frequency_penalty number 0 Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
max_tokens integer 256 The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
presence_penalty number 0 Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
prompt <\|endoftext\|> The prompts to generate completions for, encoded as a string, array of strings, array of tokens, or array of token arrays. Note that <\|endoftext\|> is the document separator that the model sees during training, so if a prompt is not specified the model generates as if from the beginning of a new document.
seed integer If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
stop Sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
stream boolean False Whether to stream back partial progress. If set, tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message.
temperature number 1 What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

We generally recommend altering this or top_p but not both.
top_p number 1 An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

We generally recommend altering this or temperature but not both.

CreateCompletionResponse

Represents a completion response from the API. Note: both the streamed and nonstreamed response objects share the same shape (unlike the chat endpoint).

Name Type Description
choices Choices[] The list of completion choices the model generated for the input prompt.
created integer The Unix timestamp (in seconds) of when the completion was created.
ID string A unique identifier for the completion.
model string The model used for completion.
object TextCompletionObject The object type, which is always "text_completion"
system_fingerprint string This fingerprint represents the backend configuration that the model runs with.

Can be used with the seed request parameter to understand when backend changes have been made that might impact determinism.
usage CompletionUsage Usage statistics for the completion request.

Detail

Name Type Description
loc string[] The parameter causing the issue
value string The value passed to the parameter causing issues.

TextCompletionObject

The object type, which is always "text_completion"

Name Type Description
text_completion string

ListObject

The object type, which is always "list".

Name Type Description
list string

NotFoundError

Name Type Description
error string The error description.
message string The error message.
status integer The HTTP status code.

TooManyRequestsError

Name Type Description
error string The error description.
message string The error message.
status integer The HTTP status code.

UnauthorizedError

Name Type Description
error string The error description.
message string The error message.
status integer The HTTP status code.

UnprocessableContentError

Name Type Description
code string The error code.
detail Detail
error string The error description.
message string The error message.
status integer The HTTP status code.