Assistants API (GPT4o base model). Pass/upload an image and ask relevant questions based on info. contained in image. Possible?

Question

Hi Team,

In our mobile app, using Assistants API we're able to pass text content in system message, ask relevant questions & get Assistant reply.

We want to upload image to Assistant API and be able to ask questions about info. contained in image. Our understanding is GPT4o is a vision native model and should be able to analyze/parse image. Note: We're not using GPT 4 Turbo (seems that model has image capability?)

When using azure portal assistants playground, we're able to upload image & chat + assistant replies with info. in image.

Using Assistants API, when we upload image, assistant is unable to "read" the image & reply with releavnt info. What are we missing?

In a previous thread we were told to convert image to base64 & pass to Assistant. However, doing so is exceeding the string limit that the model can accept. So that won't likely work. How can implement image in Assistants API?

Answer

You need to convert the image to a base64 encoded string, this is necessary because the API expects the image data in a text format :


import base64

def image_to_base64(image_path):

    with open(image_path, "rb") as image_file:

        return base64.b64encode(image_file.read()).decode('utf-8')

Then create the payload for the API request. Include the base64 encoded image in the payload :


import requests

def create_payload(base64_image):

    return {

        "model": "gpt-4o",

        "messages": [

            {

                "role": "user",

                "content": [

                    {

                        "type": "text",

                        "text": "What is in this image?"

                    },

                    {

                        "type": "image_url",

                        "image_url": {

                            "url": f"data:image/jpeg;base64,{base64_image}"

                        }

                    }

                ]

            }

        ]

    }

Send the request to the Assistants API endpoint.


def send_request(payload, api_key):

    headers = {

        "Content-Type": "application/json",

        "Authorization": f"Bearer {api_key}"

    }

    response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)

    return response.json()

# Example usage

image_path = "path_to_your_image.jpg"

base64_image = image_to_base64(image_path)

payload = create_payload(base64_image)

api_key = "your_api_key_here"

response = send_request(payload, api_key)

print(response)

The response will contain the model's analysis of the image. You can extract and display this information as needed.


def handle_response(response):

    if 'choices' in response:

        for choice in response['choices']:

            print(choice['message']['content'])

    else:

        print("No response received.")

handle_response(response)

If you are using Azure AI services, the process is similar but involves using Azure-specific endpoints and authentication methods. Ensure you have the correct endpoint and API key from your Azure portal.


def send_request_azure(payload, api_key, endpoint):

    headers = {

        "Content-Type": "application/json",

        "Authorization": f"Bearer {api_key}"

    }

    response = requests.post(endpoint, headers=headers, json=payload)

    return response.json()

# Example usage for Azure

azure_endpoint = "your_azure_endpoint_here"

response_azure = send_request_azure(payload, api_key, azure_endpoint)

print(response_azure)

Answer

Hi GenixPRO,

Yes, you can pass images to the Azure OpenAI Assistants API using the GPT-4o model, but there are some limitations, and specific implementation details you need to consider.

Firstly, Understanding the issue

GPT-4o is a vision-enabled model and supports image inputs.

You were able to upload images via the Azure portal's Assistants Playground, meaning Azure OpenAI does support image analysis in some capacity.

When using the Assistants API, your image upload is not being processed as expected.

You tried base64 encoding, but the payload exceeded the model’s input size limit.

Solution: Using file_id Instead of Base64 Encoding:

Azure OpenAI Assistants API supports file-based image uploads instead of base64 encoding.

Steps to Upload an Image & Get Responses from GPT-4o

Upload the Image to Azure OpenAI Files API

Pass the file_id to the Assistants API

Ask Questions About the Image

Receive a Response from the Assistant

Step-by-Step Implementation:

Step 1: Upload the Image

Before using the image in your Assistant, you must first upload it to OpenAI’s files API.

API Request (Upload File)

curl -X POST "https://YOUR_AZURE_OPENAI_ENDPOINT/openai/files" \
-H "Authorization: Bearer YOUR_AZURE_OPENAI_API_KEY" \
-H "Content-Type: multipart/form-data" \
-F "file=@your-image.jpg" \
-F "purpose=assistants"

Response:

{
  "id": "file-xyz123",
  "object": "file",
  "bytes": 123456,
  "created_at": 1700000000,
  "filename": "your-image.jpg",
  "purpose": "assistants"
}

Extract file_id from the response (file-xyz123)

Step 2: Pass Image File to the Assistants API

Now that the file is uploaded, pass the file_id to the Assistants API.

API Request (Message with Image Reference)

{
  "model": "gpt-4o",
  "assistant_id": "your-assistant-id",
  "thread_id": "your-thread-id",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What is shown in this image?"},
        {"type": "image_file", "file_id": "file-xyz123"}
      ]
    }
  ]
}

Step 3: Get Assistant's Response

Once you send the request, GPT-4o will process the image and reply with the extracted information.

Expected Response (Example):

{
  "role": "assistant",
  "content": [
    {"type": "text", "text": "This image contains a cat sitting on a chair."}
  ]
}

Here I have given an example text; you will get text response of your image.

Why This Works?

Avoids Base64 Size Issues – The image is uploaded as a file and referenced by file_id, rather than being embedded directly in the request.

Matches Playground Behavior – The Assistants Playground also uses file-based image processing.

Supports Large Images – OpenAI’s Assistants API is optimized for handling images this way.

Hope this works

Thank You.

Share via

Assistants API (GPT4o base model). Pass/upload an image and ask relevant questions based on info. contained in image. Possible?

2 answers

Your answer