Assistants API (Preview) reference

Artikkeli
02/27/2025

Note

File search can ingest up to 10,000 files per assistant - 500 times more than before. It is fast, supports parallel queries through multi-threaded searches, and features enhanced reranking and query rewriting.
- Vector store is a new object in the API. Once a file is added to a vector store, it's automatically parsed, chunked, and embedded, made ready to be searched. Vector stores can be used across assistants and threads, simplifying file management and billing.
We've added support for the tool_choice parameter which can be used to force the use of a specific tool (like file search, code interpreter, or a function) in a particular run.

This article provides reference documentation for Python and REST for the new Assistants API (Preview). More in-depth step-by-step guidance is provided in the getting started guide.

Create an assistant

POST https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants?api-version=2024-05-01-preview

Create an assistant with a model and instructions.

Request body

Name	Type	Required	Description
model	string	Required	Model deployment name of the model to use.
name	string or null	Optional	The name of the assistant. The maximum length is 256 characters.
description	string or null	Optional	The description of the assistant. The maximum length is 512 characters.
instructions	string or null	Optional	The system instructions that the assistant uses. The maximum length is 256,000 characters.
tools	array	Optional	Defaults to []. A list of tools enabled on the assistant. There can be a maximum of 128 tools per assistant. Tools can currently be of types `code_interpreter`, or `function`. A `function` description can be a maximum of 1,024 characters.
metadata	map	Optional	Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
temperature	number or null	Optional	Defaults to 1. Determines what sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
top_p	number or null	Optional	Defaults to 1. An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
response_format	string or object	Optional	Specifies the format that the model must output. Compatible with GPT-4 Turbo and all GPT-3.5 Turbo models since gpt-3.5-turbo-1106. Setting this parameter to `{ "type": "json_object" }` enables JSON mode, which guarantees the message the model generates is valid JSON. Importantly, when using JSON mode, you must also instruct the model to produce JSON yourself using a system or user message. Without this instruction, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly "stuck" request. Additionally, the message content may be partially cut off if you use `finish_reason="length"`, which indicates the generation exceeded `max_tokens` or the conversation exceeded the max context length.
tool_resources	object	Optional	A set of resources that are used by the assistant's tools. The resources are specific to the type of tool. For example, the `code_interpreter` tool requires a list of file IDs, while the `file_search` tool requires a list of vector store IDs.

response_format types

string

auto is the default value.

object

Possible type values: text, json_object, json_schema.

json_schema

Name	Type	Description	Default	Required/Optional
`description`	string	A description of what the response format is for, used by the model to determine how to respond in the format.		Optional
`name`	string	The name of the response format. Must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64.		Required
`schema`	object	The schema for the response format, described as a JSON Schema object.		Optional
`strict`	boolean or null	Whether to enable strict schema adherence when generating the output. If set to true, the model will always follow the exact schema defined in the `schema` field. Only a subset of JSON Schema is supported when `strict` is `true`.	false	Optional

tool_resources properties

code_interpreter

Name	Type	Description	Default
`file_ids`	array	A list of file IDs made available to the code_interpreter tool. There can be a maximum of 20 files associated with the tool.	`[]`

file_search

Name	Type	Description	Required/Optional
`vector_store_ids`	array	The vector store attached to this thread. There can be a maximum of 1 vector store attached to the thread.	Optional
`vector_stores`	array	A helper to create a vector store with file_ids and attach it to this thread. There can be a maximum of 1 vector store attached to the thread.	Optional

vector_stores

Name	Type	Description	Required/Optional
`file_ids`	array	A list of file IDs to add to the vector store. There can be a maximum of 10000 files in a vector store.	Optional
`chunking_strategy`	object	The chunking strategy used to chunk the file(s). If not set, will use the auto strategy.	Optional
`metadata`	map	Set of 16 key-value pairs that can be attached to a vector store. This can be useful for storing additional information about the vector store in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.	Optional

chunking_strategy

Name	Type	Description	Required/optional
`Auto Chunking Strategy`	object	The default strategy. This strategy currently uses a `max_chunk_size_tokens` of `800` and `chunk_overlap_tokens` of `400`. `type` is always `auto`	Required
`Static Chunking Strategy`	object	`type` Always `static`	Required

Static Chunking Strategy

Name	Type	Description	Required/Optional
`max_chunk_size_tokens`	integer	The maximum number of tokens in each chunk. The default value is `800`. The minimum value is `100` and the maximum value is `4096`.	Required
`chunk_overlap_tokens`	integer	The number of tokens that overlap between chunks. The default value is `400`. Note that the overlap must not exceed half of `max_chunk_size_tokens`.	Required

Returns

An assistant object.

from openai import AzureOpenAI
    
client = AzureOpenAI(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),  
    api_version="2024-08-01-preview",
    azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
    )

assistant = client.beta.assistants.create(
  instructions="You are an AI assistant that can write code to help answer math questions",
  model="<REPLACE WITH MODEL DEPLOYMENT NAME>", # replace with model deployment name. 
  tools=[{"type": "code_interpreter"}]
)

curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants?api-version=2024-08-01-preview \
  -H "api-key: $AZURE_OPENAI_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "instructions": "You are an AI assistant that can write code to help answer math questions.",
    "tools": [
      { "type": "code_interpreter" }
    ],
    "model": "gpt-4-1106-preview"
  }'

List assistants

GET https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants?api-version=2024-05-01-preview

Returns a list of all assistants.

Query parameters

Parameter	Type	Required	Description
`limit`	integer	Optional	A limit on the number of objects to be returned. Limit can range between 1 and 100, and the default is 20.
`order`	string	Optional - Defaults to desc	Sort order by the created_at timestamp of the objects. asc for ascending order and desc for descending order.
`after`	string	Optional	A cursor for use in pagination. `after` is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, ending with obj_foo, your subsequent call can include after=obj_foo in order to fetch the next page of the list.
`before`	string	Optional	A cursor for use in pagination. `before` is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, ending with obj_foo, your subsequent call can include before=obj_foo in order to fetch the previous page of the list.

Returns

A list of assistant objects

Example list assistants

Python 1.x
REST

from openai import AzureOpenAI
    
client = AzureOpenAI(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),  
    api_version="2024-08-01-preview",
    azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
    )

my_assistants = client.beta.assistants.list(
    order="desc",
    limit="20",
)
print(my_assistants.data)

curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants?api-version=2024-08-01-preview  \
  -H "api-key: $AZURE_OPENAI_API_KEY" \
  -H 'Content-Type: application/json'

Retrieve assistant

GET https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants/{assistant_id}?api-version=2024-08-01-preview

Retrieves an assistant.

Path parameters

Parameter	Type	Required	Description
`assistant_id`	string	Required	The ID of the assistant to retrieve.

Returns

The assistant object matching the specified ID.

Example retrieve assistant

Python 1.x
REST

client = AzureOpenAI(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),  
    api_version="2024-08-01-preview",
    azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
    )

my_assistant = client.beta.assistants.retrieve("asst_abc123")
print(my_assistant)

curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants/{assistant-id}?api-version=2024-08-01-preview  \
  -H "api-key: $AZURE_OPENAI_API_KEY" \
  -H 'Content-Type: application/json'

Modify assistant

POST https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants/{assistant_id}?api-version=2024-08-01-preview

Modifies an assistant.

Path parameters

Parameter	Type	Required	Description
assistant_id	string	Required	The ID of the assistant the file belongs to.

Request Body

Parameter	Type	Required	Description
`model`		Optional	The model deployment name of the model to use.
`name`	string or null	Optional	The name of the assistant. The maximum length is 256 characters.
`description`	string or null	Optional	The description of the assistant. The maximum length is 512 characters.
`instructions`	string or null	Optional	The system instructions that the assistant uses. The maximum length is 32768 characters.
`tools`	array	Optional	Defaults to []. A list of tools enabled on the assistant. There can be a maximum of 128 tools per assistant. Tools can be of types code_interpreter, or function. A `function` description can be a maximum of 1,024 characters.
`metadata`	map	Optional	Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
`temperature`	number or null	Optional	Defaults to 1. Determines what sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
`top_p`	number or null	Optional	Defaults to 1. An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
`response_format`	string or object	Optional	Specifies the format that the model must output. Compatible with GPT-4 Turbo and all GPT-3.5 Turbo models since gpt-3.5-turbo-1106. Setting this parameter to `{ "type": "json_object" }` enables JSON mode, which guarantees the message the model generates is valid JSON. Importantly, when using JSON mode, you must also instruct the model to produce JSON yourself using a system or user message. Without this instruction, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly "stuck" request. Additionally, the message content may be partially cut off if you use `finish_reason="length"`, which indicates the generation exceeded `max_tokens` or the conversation exceeded the max context length.
`tool_resources`	object	Optional	A set of resources that are used by the assistant's tools. The resources are specific to the type of tool. For example, the `code_interpreter` tool requires a list of file IDs, while the `file_search` tool requires a list of vector store IDs.

Returns

The modified assistant object.

Example modify assistant

Python 1.x
REST

client = AzureOpenAI(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),  
    api_version="2024-08-01-preview",
    azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
    )

my_updated_assistant = client.beta.assistants.update(
  "asst_abc123",
  instructions="You are an HR bot, and you have access to files to answer employee questions about company policies. Always respond with info from either of the files.",
  name="HR Helper",
  tools=[{"type": "code-interpreter"}],
  model="gpt-4", #model = model deployment name
)

print(my_updated_assistant)

curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants/{assistant-id}?api-version=2024-08-01-preview  \
  -H "api-key: $AZURE_OPENAI_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
      "instructions": "You are an HR bot, and you have access to files to answer employee questions about company policies. Always response with info from either of the files.",
      "tools": [{"type": "code-interpreter"}],
      "model": "gpt-4"
    }'

Delete assistant

DELETE https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants/{assistant_id}?api-version=2024-08-01-preview

Delete an assistant.

Path parameters

Parameter	Type	Required	Description
`assistant_id`	string	Required	The ID of the assistant the file belongs to.

Returns

Deletion status.

Example delete assistant

Python 1.x
REST

client = AzureOpenAI(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),  
    api_version="2024-08-01-preview",
    azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
    )

response = client.beta.assistants.delete("asst_abc123")
print(response)

curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/assistants/{assistant-id}?api-version=2024-08-01-preview  \
  -H "api-key: $AZURE_OPENAI_API_KEY" \
  -H 'Content-Type: application/json' \
  -X DELETE

File upload API reference

Assistants use the same API for file upload as fine-tuning. When uploading a file you have to specify an appropriate value for the purpose parameter.

Assistant object

Field	Type	Description
`id`	string	The identifier, which can be referenced in API endpoints.
`object`	string	The object type, which is always assistant.
`created_at`	integer	The Unix timestamp (in seconds) for when the assistant was created.
`name`	string or null	The name of the assistant. The maximum length is 256 characters.
`description`	string or null	The description of the assistant. The maximum length is 512 characters.
`model`	string	Name of the model deployment name to use.
`instructions`	string or null	The system instructions that the assistant uses. The maximum length is 32768 characters.
`tools`	array	A list of tool enabled on the assistant. There can be a maximum of 128 tools per assistant. Tools can be of types code_interpreter, or function. A `function` description can be a maximum of 1,024 characters.
`metadata`	map	Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
`temperature`	number or null	Defaults to 1. Determines what sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
`top_p`	number or null	Defaults to 1. An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
`response_format`	string or object	Specifies the format that the model must output. Compatible with GPT-4 Turbo and all GPT-3.5 Turbo models since gpt-3.5-turbo-1106. Setting this parameter to `{ "type": "json_object" }` enables JSON mode, which guarantees the message the model generates is valid JSON. Importantly, when using JSON mode, you must also instruct the model to produce JSON yourself using a system or user message. Without this instruction, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly "stuck" request. Additionally, the message content may be partially cut off if you use `finish_reason="length"`, which indicates the generation exceeded `max_tokens` or the conversation exceeded the max context length.
`tool_resources`	object	A set of resources that are used by the assistant's tools. The resources are specific to the type of tool. For example, the `code_interpreter` tool requires a list of file IDs, while the `file_search` tool requires a list of vector store IDs.

Jaa

Assistants API (Preview) reference

Create an assistant

Request body

response_format types

tool_resources properties

Returns

Example create assistant request

List assistants

Returns

Example list assistants

Retrieve assistant

Example retrieve assistant

Modify assistant

Example modify assistant

Delete assistant

Example delete assistant

File upload API reference

Assistant object

Palaute

Lisäresursseja