ChatCompletionsClient Class

Reference

ChatCompletionsClient.

Inheritance: azure.ai.inference._client.ChatCompletionsClient

ChatCompletionsClient

Constructor

ChatCompletionsClient(endpoint: str, credential: AzureKeyCredential | TokenCredential, *, frequency_penalty: float | None = None, presence_penalty: float | None = None, temperature: float | None = None, top_p: float | None = None, max_tokens: int | None = None, response_format: ChatCompletionsResponseFormat | None = None, stop: List[str] | None = None, tools: List[ChatCompletionsToolDefinition] | None = None, tool_choice: str | ChatCompletionsToolChoicePreset | ChatCompletionsNamedToolChoice | None = None, seed: int | None = None, model: str | None = None, model_extras: Dict[str, Any] | None = None, **kwargs: Any)

Parameters

Name	Description
endpoint Required	str Service host. Required.
credential Required	AzureKeyCredential or TokenCredential Credential used to authenticate requests to the service. Is either a AzureKeyCredential type or a TokenCredential type. Required.

Keyword-Only Parameters

Name	Description
frequency_penalty	float A value that influences the probability of generated tokens appearing based on their cumulative frequency in generated text. Positive values will make tokens less likely to appear as their frequency increases and decrease the likelihood of the model repeating the same statements verbatim. Supported range is [-2, 2]. Default value is None.
presence_penalty	float A value that influences the probability of generated tokens appearing based on their existing presence in generated text. Positive values will make tokens less likely to appear when they already exist and increase the model's likelihood to output new topics. Supported range is [-2, 2]. Default value is None.
temperature	float The sampling temperature to use that controls the apparent creativity of generated completions. Higher values will make output more random while lower values will make results more focused and deterministic. It is not recommended to modify temperature and top_p for the same completions request as the interaction of these two settings is difficult to predict. Supported range is [0, 1]. Default value is None.
top_p	float An alternative to sampling with temperature called nucleus sampling. This value causes the model to consider the results of tokens with the provided probability mass. As an example, a value of 0.15 will cause only the tokens comprising the top 15% of probability mass to be considered. It is not recommended to modify temperature and top_p for the same completions request as the interaction of these two settings is difficult to predict. Supported range is [0, 1]. Default value is None.
max_tokens	int The maximum number of tokens to generate. Default value is None.
response_format	ChatCompletionsResponseFormat The format that the model must output. Use this to enable JSON mode instead of the default text mode. Note that to enable JSON mode, some AI models may also require you to instruct the model to produce JSON via a system or user message. Default value is None.
stop	list[str] A collection of textual sequences that will end completions generation. Default value is None.
tools	list[ChatCompletionsToolDefinition] The available tool definitions that the chat completions request can use, including caller-defined functions. Default value is None.
tool_choice	str or ChatCompletionsToolChoicePreset or ChatCompletionsNamedToolChoice If specified, the model will configure which of the provided tools it can use for the chat completions response. Is either a Union[str, "_models.ChatCompletionsToolChoicePreset"] type or a ChatCompletionsNamedToolChoice type. Default value is None.
seed	int If specified, the system will make a best effort to sample deterministically such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed. Default value is None.
model	str ID of the specific AI model to use, if more than one model is available on the endpoint. Default value is None.
model_extras	dict[str, Any] Additional, model-specific parameters that are not in the standard request payload. They will be added as-is to the root of the JSON in the request body. How the service handles these extra parameters depends on the value of the `extra-parameters` request header. Default value is None.
api_version	str The API version to use for this operation. Default value is "2024-05-01-preview". Note that overriding this default value may result in unsupported behavior.

Methods

close
complete	Gets chat completions for the provided chat messages. Completions support a wide variety of tasks and generate text that continues from or "completes" provided prompt data. When using this method with stream=True, the response is streamed back to the client. Iterate over the resulting StreamingChatCompletions object to get content updates as they arrive.
get_model_info	Returns information about the AI model. The method makes a REST API call to the `/info` route on the given endpoint. This method will only work when using Serverless API or Managed Compute endpoint. It will not work for GitHub Models endpoint or Azure OpenAI endpoint.
send_request	Runs the network request through the client's chained policies. `>>> from azure.core.rest import HttpRequest >>> request = HttpRequest("GET", "https://www.example.org/") <HttpRequest [GET], url: 'https://www.example.org/'> >>> response = client.send_request(request) <HttpResponse: 200 OK>` For more information on this code flow, see https://aka.ms/azsdk/dpcodegen/python/send_request

close

close() -> None

complete

Gets chat completions for the provided chat messages. Completions support a wide variety of tasks and generate text that continues from or "completes" provided prompt data. When using this method with stream=True, the response is streamed back to the client. Iterate over the resulting StreamingChatCompletions object to get content updates as they arrive.

complete(*, messages: List[ChatRequestMessage] | List[Dict[str, Any]], stream: Literal[False] = False, frequency_penalty: float | None = None, presence_penalty: float | None = None, temperature: float | None = None, top_p: float | None = None, max_tokens: int | None = None, response_format: ChatCompletionsResponseFormat | None = None, stop: List[str] | None = None, tools: List[ChatCompletionsToolDefinition] | None = None, tool_choice: str | ChatCompletionsToolChoicePreset | ChatCompletionsNamedToolChoice | None = None, seed: int | None = None, model: str | None = None, model_extras: Dict[str, Any] | None = None, **kwargs: Any) -> ChatCompletions

Parameters

Name	Description
body	<xref:JSON> or IO[bytes] Is either a MutableMapping[str, Any] type (like a dictionary) or a IO[bytes] type that specifies the full request payload. Required.

Keyword-Only Parameters

Name	Description
messages	list[ChatRequestMessage] or list[dict[str, Any]] The collection of context messages associated with this chat completions request. Typical usage begins with a chat message for the System role that provides instructions for the behavior of the assistant, followed by alternating messages between the User and Assistant roles. Required. Default value: <object object at 0x00000150EFE97F20>
stream	bool A value indicating whether chat completions should be streamed for this request. Default value is False. If streaming is enabled, the response will be a StreamingChatCompletions. Otherwise the response will be a ChatCompletions.
frequency_penalty	float A value that influences the probability of generated tokens appearing based on their cumulative frequency in generated text. Positive values will make tokens less likely to appear as their frequency increases and decrease the likelihood of the model repeating the same statements verbatim. Supported range is [-2, 2]. Default value is None.
presence_penalty	float A value that influences the probability of generated tokens appearing based on their existing presence in generated text. Positive values will make tokens less likely to appear when they already exist and increase the model's likelihood to output new topics. Supported range is [-2, 2]. Default value is None.
temperature	float The sampling temperature to use that controls the apparent creativity of generated completions. Higher values will make output more random while lower values will make results more focused and deterministic. It is not recommended to modify temperature and top_p for the same completions request as the interaction of these two settings is difficult to predict. Supported range is [0, 1]. Default value is None.
top_p	float An alternative to sampling with temperature called nucleus sampling. This value causes the model to consider the results of tokens with the provided probability mass. As an example, a value of 0.15 will cause only the tokens comprising the top 15% of probability mass to be considered. It is not recommended to modify temperature and top_p for the same completions request as the interaction of these two settings is difficult to predict. Supported range is [0, 1]. Default value is None.
max_tokens	int The maximum number of tokens to generate. Default value is None.
response_format	ChatCompletionsResponseFormat The format that the model must output. Use this to enable JSON mode instead of the default text mode. Note that to enable JSON mode, some AI models may also require you to instruct the model to produce JSON via a system or user message. Default value is None.
stop	list[str] A collection of textual sequences that will end completions generation. Default value is None.
tools	list[ChatCompletionsToolDefinition] The available tool definitions that the chat completions request can use, including caller-defined functions. Default value is None.
tool_choice	str or ChatCompletionsToolChoicePreset or ChatCompletionsNamedToolChoice If specified, the model will configure which of the provided tools it can use for the chat completions response. Is either a Union[str, "_models.ChatCompletionsToolChoicePreset"] type or a ChatCompletionsNamedToolChoice type. Default value is None.
seed	int If specified, the system will make a best effort to sample deterministically such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed. Default value is None.
model	str ID of the specific AI model to use, if more than one model is available on the endpoint. Default value is None.
model_extras	dict[str, Any] Additional, model-specific parameters that are not in the standard request payload. They will be added as-is to the root of the JSON in the request body. How the service handles these extra parameters depends on the value of the `extra-parameters` request header. Default value is None.

Returns

Type	Description
ChatCompletions, StreamingChatCompletions	ChatCompletions for non-streaming, or Iterable[StreamingChatCompletionsUpdate] for streaming.

Exceptions

Type	Description
HttpResponseError

get_model_info

Returns information about the AI model. The method makes a REST API call to the /info route on the given endpoint. This method will only work when using Serverless API or Managed Compute endpoint. It will not work for GitHub Models endpoint or Azure OpenAI endpoint.

get_model_info(**kwargs: Any) -> ModelInfo

Returns

Type	Description
ModelInfo	ModelInfo. The ModelInfo is compatible with MutableMapping

Exceptions

Type	Description
HttpResponseError

send_request

Runs the network request through the client's chained policies.


>>> from azure.core.rest import HttpRequest
>>> request = HttpRequest("GET", "https://www.example.org/")
<HttpRequest [GET], url: 'https://www.example.org/'>
>>> response = client.send_request(request)
<HttpResponse: 200 OK>

For more information on this code flow, see https://aka.ms/azsdk/dpcodegen/python/send_request

send_request(request: HttpRequest, *, stream: bool = False, **kwargs: Any) -> HttpResponse

Parameters

Name	Description
request Required	HttpRequest The network request you want to make. Required.

Keyword-Only Parameters

Name	Description
stream	bool Whether the response payload will be streamed. Defaults to False.

Returns

Type	Description
HttpResponse	The response of your network call. Does not do error handling on your response.

Share via

ChatCompletionsClient Class

Constructor

Parameters

Keyword-Only Parameters

Methods

close

complete

Parameters

Keyword-Only Parameters

Returns

Exceptions

get_model_info

Returns

Exceptions

send_request

Parameters

Keyword-Only Parameters

Returns

Additional resources