Assistants API (Preview) reference

Note

  • File search can ingest up to 10,000 files per assistant - 500 times more than before. It is fast, supports parallel queries through multi-threaded searches, and features enhanced reranking and query rewriting.
    • Vector store is a new object in the API. Once a file is added to a vector store, it's automatically parsed, chunked, and embedded, made ready to be searched. Vector stores can be used across assistants and threads, simplifying file management and billing.
  • We've added support for the tool_choice parameter which can be used to force the use of a specific tool (like file search, code interpreter, or a function) in a particular run.

This article provides reference documentation for Python and REST for the new Assistants API (Preview). More in-depth step-by-step guidance is provided in the getting started guide.

Create an assistant

POST https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants?api-version=2024-05-01-preview

Create an assistant with a model and instructions.

Request body

Name Type Required Description
model string Required Model deployment name of the model to use.
name string or null Optional The name of the assistant. The maximum length is 256 characters.
description string or null Optional The description of the assistant. The maximum length is 512 characters.
instructions string or null Optional The system instructions that the assistant uses. The maximum length is 256,000 characters.
tools array Optional Defaults to []. A list of tools enabled on the assistant. There can be a maximum of 128 tools per assistant. Tools can currently be of types code_interpreter, or function. A function description can be a maximum of 1,024 characters.
metadata map Optional Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
temperature number or null Optional Defaults to 1. Determines what sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
top_p number or null Optional Defaults to 1. An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
response_format string or object Optional Specifies the format that the model must output. Compatible with GPT-4 Turbo and all GPT-3.5 Turbo models since gpt-3.5-turbo-1106. Setting this parameter to { "type": "json_object" } enables JSON mode, which guarantees the message the model generates is valid JSON. Importantly, when using JSON mode, you must also instruct the model to produce JSON yourself using a system or user message. Without this instruction, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly "stuck" request. Additionally, the message content may be partially cut off if you use finish_reason="length", which indicates the generation exceeded max_tokens or the conversation exceeded the max context length.
tool_resources object Optional A set of resources that are used by the assistant's tools. The resources are specific to the type of tool. For example, the code_interpreter tool requires a list of file IDs, while the file_search tool requires a list of vector store IDs.

response_format types

string

auto is the default value.

object

Possible type values: text, json_object, json_schema.

json_schema

Name Type Description Default Required/Optional
description string A description of what the response format is for, used by the model to determine how to respond in the format. Optional
name string The name of the response format. Must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64. Required
schema object The schema for the response format, described as a JSON Schema object. Optional
strict boolean or null Whether to enable strict schema adherence when generating the output. If set to true, the model will always follow the exact schema defined in the schema field. Only a subset of JSON Schema is supported when strict is true. false Optional

tool_resources properties

code_interpreter

Name Type Description Default
file_ids array A list of file IDs made available to the code_interpreter tool. There can be a maximum of 20 files associated with the tool. []

file_search

Name Type Description Required/Optional
vector_store_ids array The vector store attached to this thread. There can be a maximum of 1 vector store attached to the thread. Optional
vector_stores array A helper to create a vector store with file_ids and attach it to this thread. There can be a maximum of 1 vector store attached to the thread. Optional

vector_stores

Name Type Description Required/Optional
file_ids array A list of file IDs to add to the vector store. There can be a maximum of 10000 files in a vector store. Optional
chunking_strategy object The chunking strategy used to chunk the file(s). If not set, will use the auto strategy. Optional
metadata map Set of 16 key-value pairs that can be attached to a vector store. This can be useful for storing additional information about the vector store in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long. Optional

chunking_strategy

Name Type Description Required/optional
Auto Chunking Strategy object The default strategy. This strategy currently uses a max_chunk_size_tokens of 800 and chunk_overlap_tokens of 400. type is always auto Required
Static Chunking Strategy object type Always static Required

Static Chunking Strategy

Name Type Description Required/Optional
max_chunk_size_tokens integer The maximum number of tokens in each chunk. The default value is 800. The minimum value is 100 and the maximum value is 4096. Required
chunk_overlap_tokens integer The number of tokens that overlap between chunks. The default value is 400. Note that the overlap must not exceed half of max_chunk_size_tokens. Required

Returns

An assistant object.

Example create assistant request

from openai import AzureOpenAI
    
client = AzureOpenAI(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),  
    api_version="2024-08-01-preview",
    azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
    )

assistant = client.beta.assistants.create(
  instructions="You are an AI assistant that can write code to help answer math questions",
  model="<REPLACE WITH MODEL DEPLOYMENT NAME>", # replace with model deployment name. 
  tools=[{"type": "code_interpreter"}]
)

List assistants

GET https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants?api-version=2024-05-01-preview

Returns a list of all assistants.

Query parameters

Parameter Type Required Description
limit integer Optional A limit on the number of objects to be returned. Limit can range between 1 and 100, and the default is 20.
order string Optional - Defaults to desc Sort order by the created_at timestamp of the objects. asc for ascending order and desc for descending order.
after string Optional A cursor for use in pagination. after is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, ending with obj_foo, your subsequent call can include after=obj_foo in order to fetch the next page of the list.
before string Optional A cursor for use in pagination. before is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, ending with obj_foo, your subsequent call can include before=obj_foo in order to fetch the previous page of the list.

Returns

A list of assistant objects

Example list assistants

from openai import AzureOpenAI
    
client = AzureOpenAI(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),  
    api_version="2024-08-01-preview",
    azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
    )

my_assistants = client.beta.assistants.list(
    order="desc",
    limit="20",
)
print(my_assistants.data)

Retrieve assistant

GET https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants/{assistant_id}?api-version=2024-08-01-preview

Retrieves an assistant.

Path parameters

Parameter Type Required Description
assistant_id string Required The ID of the assistant to retrieve.

Returns

The assistant object matching the specified ID.

Example retrieve assistant

client = AzureOpenAI(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),  
    api_version="2024-08-01-preview",
    azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
    )

my_assistant = client.beta.assistants.retrieve("asst_abc123")
print(my_assistant)

Modify assistant

POST https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants/{assistant_id}?api-version=2024-08-01-preview

Modifies an assistant.

Path parameters

Parameter Type Required Description
assistant_id string Required The ID of the assistant the file belongs to.

Request Body

Parameter Type Required Description
model Optional The model deployment name of the model to use.
name string or null Optional The name of the assistant. The maximum length is 256 characters.
description string or null Optional The description of the assistant. The maximum length is 512 characters.
instructions string or null Optional The system instructions that the assistant uses. The maximum length is 32768 characters.
tools array Optional Defaults to []. A list of tools enabled on the assistant. There can be a maximum of 128 tools per assistant. Tools can be of types code_interpreter, or function. A function description can be a maximum of 1,024 characters.
metadata map Optional Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
temperature number or null Optional Defaults to 1. Determines what sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
top_p number or null Optional Defaults to 1. An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
response_format string or object Optional Specifies the format that the model must output. Compatible with GPT-4 Turbo and all GPT-3.5 Turbo models since gpt-3.5-turbo-1106. Setting this parameter to { "type": "json_object" } enables JSON mode, which guarantees the message the model generates is valid JSON. Importantly, when using JSON mode, you must also instruct the model to produce JSON yourself using a system or user message. Without this instruction, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly "stuck" request. Additionally, the message content may be partially cut off if you use finish_reason="length", which indicates the generation exceeded max_tokens or the conversation exceeded the max context length.
tool_resources object Optional A set of resources that are used by the assistant's tools. The resources are specific to the type of tool. For example, the code_interpreter tool requires a list of file IDs, while the file_search tool requires a list of vector store IDs.

Returns

The modified assistant object.

Example modify assistant

client = AzureOpenAI(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),  
    api_version="2024-08-01-preview",
    azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
    )

my_updated_assistant = client.beta.assistants.update(
  "asst_abc123",
  instructions="You are an HR bot, and you have access to files to answer employee questions about company policies. Always respond with info from either of the files.",
  name="HR Helper",
  tools=[{"type": "code-interpreter"}],
  model="gpt-4", #model = model deployment name
)

print(my_updated_assistant)

Delete assistant

DELETE https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants/{assistant_id}?api-version=2024-08-01-preview

Delete an assistant.

Path parameters

Parameter Type Required Description
assistant_id string Required The ID of the assistant the file belongs to.

Returns

Deletion status.

Example delete assistant

client = AzureOpenAI(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),  
    api_version="2024-08-01-preview",
    azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
    )

response = client.beta.assistants.delete("asst_abc123")
print(response)

File upload API reference

Assistants use the same API for file upload as fine-tuning. When uploading a file you have to specify an appropriate value for the purpose parameter.

Assistant object

Field Type Description
id string The identifier, which can be referenced in API endpoints.
object string The object type, which is always assistant.
created_at integer The Unix timestamp (in seconds) for when the assistant was created.
name string or null The name of the assistant. The maximum length is 256 characters.
description string or null The description of the assistant. The maximum length is 512 characters.
model string Name of the model deployment name to use.
instructions string or null The system instructions that the assistant uses. The maximum length is 32768 characters.
tools array A list of tool enabled on the assistant. There can be a maximum of 128 tools per assistant. Tools can be of types code_interpreter, or function. A function description can be a maximum of 1,024 characters.
metadata map Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
temperature number or null Defaults to 1. Determines what sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
top_p number or null Defaults to 1. An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
response_format string or object Specifies the format that the model must output. Compatible with GPT-4 Turbo and all GPT-3.5 Turbo models since gpt-3.5-turbo-1106. Setting this parameter to { "type": "json_object" } enables JSON mode, which guarantees the message the model generates is valid JSON. Importantly, when using JSON mode, you must also instruct the model to produce JSON yourself using a system or user message. Without this instruction, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly "stuck" request. Additionally, the message content may be partially cut off if you use finish_reason="length", which indicates the generation exceeded max_tokens or the conversation exceeded the max context length.
tool_resources object A set of resources that are used by the assistant's tools. The resources are specific to the type of tool. For example, the code_interpreter tool requires a list of file IDs, while the file_search tool requires a list of vector store IDs.