Assistants API (Preview) reference
Note
- File search can ingest up to 10,000 files per assistant - 500 times more than before. It is fast, supports parallel queries through multi-threaded searches, and features enhanced reranking and query rewriting.
- Vector store is a new object in the API. Once a file is added to a vector store, it's automatically parsed, chunked, and embedded, made ready to be searched. Vector stores can be used across assistants and threads, simplifying file management and billing.
- We've added support for the
tool_choice
parameter which can be used to force the use of a specific tool (like file search, code interpreter, or a function) in a particular run.
This article provides reference documentation for Python and REST for the new Assistants API (Preview). More in-depth step-by-step guidance is provided in the getting started guide.
Create an assistant
POST https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants?api-version=2024-05-01-preview
Create an assistant with a model and instructions.
Request body
Name | Type | Required | Description |
---|---|---|---|
model | string | Required | Model deployment name of the model to use. |
name | string or null | Optional | The name of the assistant. The maximum length is 256 characters. |
description | string or null | Optional | The description of the assistant. The maximum length is 512 characters. |
instructions | string or null | Optional | The system instructions that the assistant uses. The maximum length is 256,000 characters. |
tools | array | Optional | Defaults to []. A list of tools enabled on the assistant. There can be a maximum of 128 tools per assistant. Tools can currently be of types code_interpreter , or function . A function description can be a maximum of 1,024 characters. |
metadata | map | Optional | Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long. |
temperature | number or null | Optional | Defaults to 1. Determines what sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. |
top_p | number or null | Optional | Defaults to 1. An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both. |
response_format | string or object | Optional | Specifies the format that the model must output. Compatible with GPT-4 Turbo and all GPT-3.5 Turbo models since gpt-3.5-turbo-1106. Setting this parameter to { "type": "json_object" } enables JSON mode, which guarantees the message the model generates is valid JSON. Importantly, when using JSON mode, you must also instruct the model to produce JSON yourself using a system or user message. Without this instruction, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly "stuck" request. Additionally, the message content may be partially cut off if you use finish_reason="length" , which indicates the generation exceeded max_tokens or the conversation exceeded the max context length. |
tool_resources | object | Optional | A set of resources that are used by the assistant's tools. The resources are specific to the type of tool. For example, the code_interpreter tool requires a list of file IDs, while the file_search tool requires a list of vector store IDs. |
response_format types
string
auto
is the default value.
object
Possible type
values: text
, json_object
, json_schema
.
json_schema
Name | Type | Description | Default | Required/Optional |
---|---|---|---|---|
description |
string | A description of what the response format is for, used by the model to determine how to respond in the format. | Optional | |
name |
string | The name of the response format. Must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64. | Required | |
schema |
object | The schema for the response format, described as a JSON Schema object. | Optional | |
strict |
boolean or null | Whether to enable strict schema adherence when generating the output. If set to true, the model will always follow the exact schema defined in the schema field. Only a subset of JSON Schema is supported when strict is true . |
false | Optional |
tool_resources properties
code_interpreter
Name | Type | Description | Default |
---|---|---|---|
file_ids |
array | A list of file IDs made available to the code_interpreter tool. There can be a maximum of 20 files associated with the tool. | [] |
file_search
Name | Type | Description | Required/Optional |
---|---|---|---|
vector_store_ids |
array | The vector store attached to this thread. There can be a maximum of 1 vector store attached to the thread. | Optional |
vector_stores |
array | A helper to create a vector store with file_ids and attach it to this thread. There can be a maximum of 1 vector store attached to the thread. | Optional |
vector_stores
Name | Type | Description | Required/Optional |
---|---|---|---|
file_ids |
array | A list of file IDs to add to the vector store. There can be a maximum of 10000 files in a vector store. | Optional |
chunking_strategy |
object | The chunking strategy used to chunk the file(s). If not set, will use the auto strategy. | Optional |
metadata |
map | Set of 16 key-value pairs that can be attached to a vector store. This can be useful for storing additional information about the vector store in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long. | Optional |
chunking_strategy
Name | Type | Description | Required/optional |
---|---|---|---|
Auto Chunking Strategy |
object | The default strategy. This strategy currently uses a max_chunk_size_tokens of 800 and chunk_overlap_tokens of 400 . type is always auto |
Required |
Static Chunking Strategy |
object | type Always static |
Required |
Static Chunking Strategy
Name | Type | Description | Required/Optional |
---|---|---|---|
max_chunk_size_tokens |
integer | The maximum number of tokens in each chunk. The default value is 800 . The minimum value is 100 and the maximum value is 4096 . |
Required |
chunk_overlap_tokens |
integer | The number of tokens that overlap between chunks. The default value is 400 . Note that the overlap must not exceed half of max_chunk_size_tokens . |
Required |
Returns
An assistant object.
Example create assistant request
from openai import AzureOpenAI
client = AzureOpenAI(
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
api_version="2024-08-01-preview",
azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
)
assistant = client.beta.assistants.create(
instructions="You are an AI assistant that can write code to help answer math questions",
model="<REPLACE WITH MODEL DEPLOYMENT NAME>", # replace with model deployment name.
tools=[{"type": "code_interpreter"}]
)
List assistants
GET https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants?api-version=2024-05-01-preview
Returns a list of all assistants.
Query parameters
Parameter | Type | Required | Description |
---|---|---|---|
limit |
integer | Optional | A limit on the number of objects to be returned. Limit can range between 1 and 100, and the default is 20. |
order |
string | Optional - Defaults to desc | Sort order by the created_at timestamp of the objects. asc for ascending order and desc for descending order. |
after |
string | Optional | A cursor for use in pagination. after is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, ending with obj_foo, your subsequent call can include after=obj_foo in order to fetch the next page of the list. |
before |
string | Optional | A cursor for use in pagination. before is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, ending with obj_foo, your subsequent call can include before=obj_foo in order to fetch the previous page of the list. |
Returns
A list of assistant objects
Example list assistants
from openai import AzureOpenAI
client = AzureOpenAI(
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
api_version="2024-08-01-preview",
azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
)
my_assistants = client.beta.assistants.list(
order="desc",
limit="20",
)
print(my_assistants.data)
Retrieve assistant
GET https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants/{assistant_id}?api-version=2024-08-01-preview
Retrieves an assistant.
Path parameters
Parameter | Type | Required | Description |
---|---|---|---|
assistant_id |
string | Required | The ID of the assistant to retrieve. |
Returns
The assistant object matching the specified ID.
Example retrieve assistant
client = AzureOpenAI(
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
api_version="2024-08-01-preview",
azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
)
my_assistant = client.beta.assistants.retrieve("asst_abc123")
print(my_assistant)
Modify assistant
POST https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants/{assistant_id}?api-version=2024-08-01-preview
Modifies an assistant.
Path parameters
Parameter | Type | Required | Description |
---|---|---|---|
assistant_id | string | Required | The ID of the assistant the file belongs to. |
Request Body
Parameter | Type | Required | Description |
---|---|---|---|
model |
Optional | The model deployment name of the model to use. | |
name |
string or null | Optional | The name of the assistant. The maximum length is 256 characters. |
description |
string or null | Optional | The description of the assistant. The maximum length is 512 characters. |
instructions |
string or null | Optional | The system instructions that the assistant uses. The maximum length is 32768 characters. |
tools |
array | Optional | Defaults to []. A list of tools enabled on the assistant. There can be a maximum of 128 tools per assistant. Tools can be of types code_interpreter, or function. A function description can be a maximum of 1,024 characters. |
metadata |
map | Optional | Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long. |
temperature |
number or null | Optional | Defaults to 1. Determines what sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. |
top_p |
number or null | Optional | Defaults to 1. An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both. |
response_format |
string or object | Optional | Specifies the format that the model must output. Compatible with GPT-4 Turbo and all GPT-3.5 Turbo models since gpt-3.5-turbo-1106. Setting this parameter to { "type": "json_object" } enables JSON mode, which guarantees the message the model generates is valid JSON. Importantly, when using JSON mode, you must also instruct the model to produce JSON yourself using a system or user message. Without this instruction, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly "stuck" request. Additionally, the message content may be partially cut off if you use finish_reason="length" , which indicates the generation exceeded max_tokens or the conversation exceeded the max context length. |
tool_resources |
object | Optional | A set of resources that are used by the assistant's tools. The resources are specific to the type of tool. For example, the code_interpreter tool requires a list of file IDs, while the file_search tool requires a list of vector store IDs. |
Returns
The modified assistant object.
Example modify assistant
client = AzureOpenAI(
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
api_version="2024-08-01-preview",
azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
)
my_updated_assistant = client.beta.assistants.update(
"asst_abc123",
instructions="You are an HR bot, and you have access to files to answer employee questions about company policies. Always respond with info from either of the files.",
name="HR Helper",
tools=[{"type": "code-interpreter"}],
model="gpt-4", #model = model deployment name
)
print(my_updated_assistant)
Delete assistant
DELETE https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants/{assistant_id}?api-version=2024-08-01-preview
Delete an assistant.
Path parameters
Parameter | Type | Required | Description |
---|---|---|---|
assistant_id |
string | Required | The ID of the assistant the file belongs to. |
Returns
Deletion status.
Example delete assistant
client = AzureOpenAI(
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
api_version="2024-08-01-preview",
azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
)
response = client.beta.assistants.delete("asst_abc123")
print(response)
File upload API reference
Assistants use the same API for file upload as fine-tuning. When uploading a file you have to specify an appropriate value for the purpose parameter.
Assistant object
Field | Type | Description |
---|---|---|
id |
string | The identifier, which can be referenced in API endpoints. |
object |
string | The object type, which is always assistant. |
created_at |
integer | The Unix timestamp (in seconds) for when the assistant was created. |
name |
string or null | The name of the assistant. The maximum length is 256 characters. |
description |
string or null | The description of the assistant. The maximum length is 512 characters. |
model |
string | Name of the model deployment name to use. |
instructions |
string or null | The system instructions that the assistant uses. The maximum length is 32768 characters. |
tools |
array | A list of tool enabled on the assistant. There can be a maximum of 128 tools per assistant. Tools can be of types code_interpreter, or function. A function description can be a maximum of 1,024 characters. |
metadata |
map | Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long. |
temperature |
number or null | Defaults to 1. Determines what sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. |
top_p |
number or null | Defaults to 1. An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both. |
response_format |
string or object | Specifies the format that the model must output. Compatible with GPT-4 Turbo and all GPT-3.5 Turbo models since gpt-3.5-turbo-1106. Setting this parameter to { "type": "json_object" } enables JSON mode, which guarantees the message the model generates is valid JSON. Importantly, when using JSON mode, you must also instruct the model to produce JSON yourself using a system or user message. Without this instruction, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly "stuck" request. Additionally, the message content may be partially cut off if you use finish_reason="length" , which indicates the generation exceeded max_tokens or the conversation exceeded the max context length. |
tool_resources |
object | A set of resources that are used by the assistant's tools. The resources are specific to the type of tool. For example, the code_interpreter tool requires a list of file IDs, while the file_search tool requires a list of vector store IDs. |