ChatCompletionsClient Class
ChatCompletionsClient.
- Inheritance
-
azure.ai.inference._client.ChatCompletionsClientChatCompletionsClient
Constructor
ChatCompletionsClient(endpoint: str, credential: AzureKeyCredential | TokenCredential, *, frequency_penalty: float | None = None, presence_penalty: float | None = None, temperature: float | None = None, top_p: float | None = None, max_tokens: int | None = None, response_format: ChatCompletionsResponseFormat | None = None, stop: List[str] | None = None, tools: List[ChatCompletionsToolDefinition] | None = None, tool_choice: str | ChatCompletionsToolChoicePreset | ChatCompletionsNamedToolChoice | None = None, seed: int | None = None, model: str | None = None, model_extras: Dict[str, Any] | None = None, **kwargs: Any)
Parameters
Name | Description |
---|---|
endpoint
Required
|
Service host. Required. |
credential
Required
|
Credential used to authenticate requests to the service. Is either a AzureKeyCredential type or a TokenCredential type. Required. |
Keyword-Only Parameters
Name | Description |
---|---|
frequency_penalty
|
A value that influences the probability of generated tokens appearing based on their cumulative frequency in generated text. Positive values will make tokens less likely to appear as their frequency increases and decrease the likelihood of the model repeating the same statements verbatim. Supported range is [-2, 2]. Default value is None. |
presence_penalty
|
A value that influences the probability of generated tokens appearing based on their existing presence in generated text. Positive values will make tokens less likely to appear when they already exist and increase the model's likelihood to output new topics. Supported range is [-2, 2]. Default value is None. |
temperature
|
The sampling temperature to use that controls the apparent creativity of generated completions. Higher values will make output more random while lower values will make results more focused and deterministic. It is not recommended to modify temperature and top_p for the same completions request as the interaction of these two settings is difficult to predict. Supported range is [0, 1]. Default value is None. |
top_p
|
An alternative to sampling with temperature called nucleus sampling. This value causes the model to consider the results of tokens with the provided probability mass. As an example, a value of 0.15 will cause only the tokens comprising the top 15% of probability mass to be considered. It is not recommended to modify temperature and top_p for the same completions request as the interaction of these two settings is difficult to predict. Supported range is [0, 1]. Default value is None. |
max_tokens
|
The maximum number of tokens to generate. Default value is None. |
response_format
|
The format that the model must output. Use this to enable JSON mode instead of the default text mode. Note that to enable JSON mode, some AI models may also require you to instruct the model to produce JSON via a system or user message. Default value is None. |
stop
|
A collection of textual sequences that will end completions generation. Default value is None. |
tools
|
The available tool definitions that the chat completions request can use, including caller-defined functions. Default value is None. |
tool_choice
|
If specified, the model will configure which of the provided tools it can use for the chat completions response. Is either a Union[str, "_models.ChatCompletionsToolChoicePreset"] type or a ChatCompletionsNamedToolChoice type. Default value is None. |
seed
|
If specified, the system will make a best effort to sample deterministically such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed. Default value is None. |
model
|
ID of the specific AI model to use, if more than one model is available on the endpoint. Default value is None. |
model_extras
|
Additional, model-specific parameters that are not in the
standard request payload. They will be added as-is to the root of the JSON in the request body.
How the service handles these extra parameters depends on the value of the
|
api_version
|
The API version to use for this operation. Default value is "2024-05-01-preview". Note that overriding this default value may result in unsupported behavior. |
Methods
close | |
complete |
Gets chat completions for the provided chat messages. Completions support a wide variety of tasks and generate text that continues from or "completes" provided prompt data. When using this method with stream=True, the response is streamed back to the client. Iterate over the resulting StreamingChatCompletions object to get content updates as they arrive. |
get_model_info |
Returns information about the AI model.
The method makes a REST API call to the |
send_request |
Runs the network request through the client's chained policies.
For more information on this code flow, see https://aka.ms/azsdk/dpcodegen/python/send_request |
close
close() -> None
complete
Gets chat completions for the provided chat messages. Completions support a wide variety of tasks and generate text that continues from or "completes" provided prompt data. When using this method with stream=True, the response is streamed back to the client. Iterate over the resulting StreamingChatCompletions object to get content updates as they arrive.
complete(*, messages: List[ChatRequestMessage] | List[Dict[str, Any]], stream: Literal[False] = False, frequency_penalty: float | None = None, presence_penalty: float | None = None, temperature: float | None = None, top_p: float | None = None, max_tokens: int | None = None, response_format: ChatCompletionsResponseFormat | None = None, stop: List[str] | None = None, tools: List[ChatCompletionsToolDefinition] | None = None, tool_choice: str | ChatCompletionsToolChoicePreset | ChatCompletionsNamedToolChoice | None = None, seed: int | None = None, model: str | None = None, model_extras: Dict[str, Any] | None = None, **kwargs: Any) -> ChatCompletions
Parameters
Name | Description |
---|---|
body
|
Is either a MutableMapping[str, Any] type (like a dictionary) or a IO[bytes] type that specifies the full request payload. Required. |
Keyword-Only Parameters
Name | Description |
---|---|
messages
|
The collection of context messages associated with this chat completions request. Typical usage begins with a chat message for the System role that provides instructions for the behavior of the assistant, followed by alternating messages between the User and Assistant roles. Required. Default value: <object object at 0x00000150EFE97F20>
|
stream
|
A value indicating whether chat completions should be streamed for this request. Default value is False. If streaming is enabled, the response will be a StreamingChatCompletions. Otherwise the response will be a ChatCompletions. |
frequency_penalty
|
A value that influences the probability of generated tokens appearing based on their cumulative frequency in generated text. Positive values will make tokens less likely to appear as their frequency increases and decrease the likelihood of the model repeating the same statements verbatim. Supported range is [-2, 2]. Default value is None. |
presence_penalty
|
A value that influences the probability of generated tokens appearing based on their existing presence in generated text. Positive values will make tokens less likely to appear when they already exist and increase the model's likelihood to output new topics. Supported range is [-2, 2]. Default value is None. |
temperature
|
The sampling temperature to use that controls the apparent creativity of generated completions. Higher values will make output more random while lower values will make results more focused and deterministic. It is not recommended to modify temperature and top_p for the same completions request as the interaction of these two settings is difficult to predict. Supported range is [0, 1]. Default value is None. |
top_p
|
An alternative to sampling with temperature called nucleus sampling. This value causes the model to consider the results of tokens with the provided probability mass. As an example, a value of 0.15 will cause only the tokens comprising the top 15% of probability mass to be considered. It is not recommended to modify temperature and top_p for the same completions request as the interaction of these two settings is difficult to predict. Supported range is [0, 1]. Default value is None. |
max_tokens
|
The maximum number of tokens to generate. Default value is None. |
response_format
|
The format that the model must output. Use this to enable JSON mode instead of the default text mode. Note that to enable JSON mode, some AI models may also require you to instruct the model to produce JSON via a system or user message. Default value is None. |
stop
|
A collection of textual sequences that will end completions generation. Default value is None. |
tools
|
The available tool definitions that the chat completions request can use, including caller-defined functions. Default value is None. |
tool_choice
|
If specified, the model will configure which of the provided tools it can use for the chat completions response. Is either a Union[str, "_models.ChatCompletionsToolChoicePreset"] type or a ChatCompletionsNamedToolChoice type. Default value is None. |
seed
|
If specified, the system will make a best effort to sample deterministically such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed. Default value is None. |
model
|
ID of the specific AI model to use, if more than one model is available on the endpoint. Default value is None. |
model_extras
|
Additional, model-specific parameters that are not in the
standard request payload. They will be added as-is to the root of the JSON in the request body.
How the service handles these extra parameters depends on the value of the
|
Returns
Type | Description |
---|---|
ChatCompletions for non-streaming, or Iterable[StreamingChatCompletionsUpdate] for streaming. |
Exceptions
Type | Description |
---|---|
get_model_info
Returns information about the AI model.
The method makes a REST API call to the /info
route on the given endpoint.
This method will only work when using Serverless API or Managed Compute endpoint.
It will not work for GitHub Models endpoint or Azure OpenAI endpoint.
get_model_info(**kwargs: Any) -> ModelInfo
Returns
Type | Description |
---|---|
ModelInfo. The ModelInfo is compatible with MutableMapping |
Exceptions
Type | Description |
---|---|
send_request
Runs the network request through the client's chained policies.
>>> from azure.core.rest import HttpRequest
>>> request = HttpRequest("GET", "https://www.example.org/")
<HttpRequest [GET], url: 'https://www.example.org/'>
>>> response = client.send_request(request)
<HttpResponse: 200 OK>
For more information on this code flow, see https://aka.ms/azsdk/dpcodegen/python/send_request
send_request(request: HttpRequest, *, stream: bool = False, **kwargs: Any) -> HttpResponse
Parameters
Name | Description |
---|---|
request
Required
|
The network request you want to make. Required. |
Keyword-Only Parameters
Name | Description |
---|---|
stream
|
Whether the response payload will be streamed. Defaults to False. |
Returns
Type | Description |
---|---|
The response of your network call. Does not do error handling on your response. |
Azure SDK for Python