Assistants API (GPT4o base model). Pass/upload an image and ask relevant questions based on info. contained in image. Possible?

GenixPRO 66 Reputation points
2025-03-02T04:39:56.68+00:00

Hi Team,

In our mobile app, using Assistants API we're able to pass text content in system message, ask relevant questions & get Assistant reply.

We want to upload image to Assistant API and be able to ask questions about info. contained in image. Our understanding is GPT4o is a vision native model and should be able to analyze/parse image. Note: We're not using GPT 4 Turbo (seems that model has image capability?)

When using azure portal assistants playground, we're able to upload image & chat + assistant replies with info. in image.

Using Assistants API, when we upload image, assistant is unable to "read" the image & reply with releavnt info. What are we missing?

In a previous thread we were told to convert image to base64 & pass to Assistant. However, doing so is exceeding the string limit that the model can accept. So that won't likely work. How can implement image in Assistants API?

Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,203 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Amira Bedhiafi 29,711 Reputation points
    2025-03-02T22:39:14.8933333+00:00

    You need to convert the image to a base64 encoded string, this is necessary because the API expects the image data in a text format :

    
    import base64
    
    def image_to_base64(image_path):
    
        with open(image_path, "rb") as image_file:
    
            return base64.b64encode(image_file.read()).decode('utf-8')
    

    Then create the payload for the API request. Include the base64 encoded image in the payload :

    
    import requests
    
    def create_payload(base64_image):
    
        return {
    
            "model": "gpt-4o",
    
            "messages": [
    
                {
    
                    "role": "user",
    
                    "content": [
    
                        {
    
                            "type": "text",
    
                            "text": "What is in this image?"
    
                        },
    
                        {
    
                            "type": "image_url",
    
                            "image_url": {
    
                                "url": f"data:image/jpeg;base64,{base64_image}"
    
                            }
    
                        }
    
                    ]
    
                }
    
            ]
    
        }
    

    Send the request to the Assistants API endpoint.

    
    def send_request(payload, api_key):
    
        headers = {
    
            "Content-Type": "application/json",
    
            "Authorization": f"Bearer {api_key}"
    
        }
    
        response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)
    
        return response.json()
    
    # Example usage
    
    image_path = "path_to_your_image.jpg"
    
    base64_image = image_to_base64(image_path)
    
    payload = create_payload(base64_image)
    
    api_key = "your_api_key_here"
    
    response = send_request(payload, api_key)
    
    print(response)
    

    The response will contain the model's analysis of the image. You can extract and display this information as needed.

    
    def handle_response(response):
    
        if 'choices' in response:
    
            for choice in response['choices']:
    
                print(choice['message']['content'])
    
        else:
    
            print("No response received.")
    
    handle_response(response)
    

    If you are using Azure AI services, the process is similar but involves using Azure-specific endpoints and authentication methods. Ensure you have the correct endpoint and API key from your Azure portal.

    
    def send_request_azure(payload, api_key, endpoint):
    
        headers = {
    
            "Content-Type": "application/json",
    
            "Authorization": f"Bearer {api_key}"
    
        }
    
        response = requests.post(endpoint, headers=headers, json=payload)
    
        return response.json()
    
    # Example usage for Azure
    
    azure_endpoint = "your_azure_endpoint_here"
    
    response_azure = send_request_azure(payload, api_key, azure_endpoint)
    
    print(response_azure)
    
    

  2. Prashanth Veeragoni 1,185 Reputation points Microsoft External Staff
    2025-03-05T04:34:23.81+00:00

    Hi GenixPRO,

    Yes, you can pass images to the Azure OpenAI Assistants API using the GPT-4o model, but there are some limitations, and specific implementation details you need to consider.

    Firstly, Understanding the issue

    GPT-4o is a vision-enabled model and supports image inputs.

    You were able to upload images via the Azure portal's Assistants Playground, meaning Azure OpenAI does support image analysis in some capacity.

    When using the Assistants API, your image upload is not being processed as expected.

    You tried base64 encoding, but the payload exceeded the model’s input size limit.

    Solution: Using file_id Instead of Base64 Encoding:

    Azure OpenAI Assistants API supports file-based image uploads instead of base64 encoding.

    Steps to Upload an Image & Get Responses from GPT-4o

    Upload the Image to Azure OpenAI Files API

    Pass the file_id to the Assistants API

    Ask Questions About the Image

    Receive a Response from the Assistant

    Step-by-Step Implementation:

    Step 1: Upload the Image

    Before using the image in your Assistant, you must first upload it to OpenAI’s files API.

    API Request (Upload File)

    curl -X POST "https://YOUR_AZURE_OPENAI_ENDPOINT/openai/files" \
    -H "Authorization: Bearer YOUR_AZURE_OPENAI_API_KEY" \
    -H "Content-Type: multipart/form-data" \
    -F "file=@your-image.jpg" \
    -F "purpose=assistants"
    

    Response:

    {
      "id": "file-xyz123",
      "object": "file",
      "bytes": 123456,
      "created_at": 1700000000,
      "filename": "your-image.jpg",
      "purpose": "assistants"
    }
    

    Extract file_id from the response (file-xyz123)

    Step 2: Pass Image File to the Assistants API

    Now that the file is uploaded, pass the file_id to the Assistants API.

    API Request (Message with Image Reference)

    {
      "model": "gpt-4o",
      "assistant_id": "your-assistant-id",
      "thread_id": "your-thread-id",
      "messages": [
        {
          "role": "user",
          "content": [
            {"type": "text", "text": "What is shown in this image?"},
            {"type": "image_file", "file_id": "file-xyz123"}
          ]
        }
      ]
    }
    
    

    Step 3: Get Assistant's Response

    Once you send the request, GPT-4o will process the image and reply with the extracted information.

    Expected Response (Example):

    {
      "role": "assistant",
      "content": [
        {"type": "text", "text": "This image contains a cat sitting on a chair."}
      ]
    }
    

    Here I have given an example text; you will get text response of your image.

    Why This Works?

    Avoids Base64 Size Issues – The image is uploaded as a file and referenced by file_id, rather than being embedded directly in the request.

    Matches Playground Behavior – The Assistants Playground also uses file-based image processing.

    Supports Large Images – OpenAI’s Assistants API is optimized for handling images this way.

    Hope this works

    Thank You.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.