Assistants API (base model: GPT4o) unable to parse uploaded image as attachment and answer questions related to info. in image.

GenixPRO 66 Reputation points
2025-02-26T08:50:16.7+00:00
  1. When using Assistants Playground on Azure portal: We create an Assistant using Assistants Playground. Then upload a PNG image attachment of a table with some records/content. In our prompt (see sample below) we ask questions related to this info. We get a reply from the Assistant created in Playground.

*Note: We see that Assistant is calling lib. and defining path for image etc. See below.


Load and extract text from all image files to analyze {...}

from PIL import Image

import pytesseract

Define paths for the uploaded images

image_paths = [

'/mnt/data/assistant-9s4pfyeg8W5RSwkHjdYqA4',

'/mnt/data/assistant-Uhoc9WZoxz7Tw8sRKeWxC4',

'/mnt/data/assistant-HE9LPS

  1. When using Assistants API (w/ Assistant Thread ID) from our mobile app: Using Assistant Thread ID (created above), we tried to upload the same PNG image attachment and pass the same prompt. However, we keep getting a standard reply "It seems there was an issue with extracting the text from the image. Let's try again"

In this case, we've simply uploaded image to the thread [and not to vector store].

Sample Prompt:

Attached herein are PNG, JPG or JPEG image files. Use code interpreter to sequentially extract information from each file; read, understand, and interpret all information to make relevant inference. Then answer the following questions using the information contained in these files and any other contextual information shared earlier. <followed by questions>

Question: How do make the Assistant API work in this case, so that it can reply with information extracted from image?

Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,174 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Manas Mohanty 945 Reputation points Microsoft Vendor
    2025-02-27T11:40:46.6933333+00:00

    Hi GenixPRO

    I am able to replicate the issue with PNG file on azure portal UI.

    Used tesseract in below sample code to do an OCR on image to convert to text before using code interpreter.

    Here is my sample code. You can check the other samples here in github

    
    import pytesseract
    
    from openai import AzureOpenAI
    
    import os
    
    import time
    
    from PIL import Image
    
    client = AzureOpenAI(
    
    api_key=os.getenv("AZURE_OPENAI_API_KEY","<endpointkey>"),
    
    api_version="2024-05-01-preview",
    
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT","<endpointurl>")
    
    )
    
    # Perform OCR to extract text from the image, please set environment variable to tesseract location and install the package prior
    
    extracted_text = pytesseract.image_to_string(Image.open("C:/Users/Screenshots/sampleinterpreter.png"))
    
    # Save the extracted text to a .txt file
    
    text_file_path = "C:/Users/Screenshots/extracted_text.txt"
    
    with open(text_file_path, "w") as text_file:
    
    text_file.write(extracted_text)
    
    # Upload the text file with an "assistants" purpose
    
    file = client.files.create(
    
    file=open(text_file_path, "rb"),
    
    purpose='assistants'
    
    )
    
    # Create an assistant using the file ID
    
    assistant = client.beta.assistants.create(
    
    instructions="You are an AI assistant that can write and help analyze code to answer. The input passed is extracted text from an image file that needs to be analyzed.",
    
    model="gpt-4o",
    
    tools=[{"type": "code_interpreter"}],
    
    tool_resources={"code_interpreter": {"file_ids": [file.id]}}
    
    )
    
    # Create a thread
    
    thread = client.beta.threads.create()
    
    # Add a user question to the thread
    
    message = client.beta.threads.messages.create(
    
    thread_id=thread.id,
    
    role="user",
    
    content=f"hi, Can you analyze the code and see if anything wrong and what the code is doing here. The extracted text from the image is as follows:\n\n{extracted_text}" # Replace this with your prompt
    
    )
    
    # Run the thread
    
    run = client.beta.threads.runs.create(
    
    thread_id=thread.id,
    
    assistant_id=assistant.id
    
    )
    
    # Looping until the run completes or fails
    
    while run.status in ['queued', 'in_progress', 'cancelling']:
    
    time.sleep(1)
    
    run = client.beta.threads.runs.retrieve(
    
    thread_id=thread.id,
    
    run_id=run.id
    
    )
    
    if run.status == 'completed':
    
    messages = client.beta.threads.messages.list(
    
    thread_id=thread.id
    
    )
    
    # Print the messages as a paragraph
    
    for message in messages:
    
    if message.role == "assistant":
    
    content = message.content[0].text.value
    
    print(f"The assistant's response: {content}")
    
    elif run.status == 'requires_action':
    
    # the assistant requires calling some functions
    
    # and submit the tool outputs back to the run
    
    pass
    
    else:
    
    print(run.status)
    
    

    Output#

    
    SyncCursorPage[Message](data=[Message(id='msg_dCopLLzpv1JdycDKQBWcbmY2', assistant_id='asst_ut47eXqPtxA9VdeThLGbFZIe', attachments=[], completed_at=None, content=[TextContentBlock(text=Text(annotations=[], value="Here is a cleaned-up and reconstructed version of the code based on the extracted input:\n\n### Key Features of the Code\n1. **Sorting Functionality (`sort_by`):**\n - `sort_by` accepts a list of callback functions (`cbs`), which determine the sorting logic.\n - Each callback can optionally specify whether sorting should be done in descending order using the `desc` attribute.\n\n2. **Custom Sorting Logic:**\n - The code iterates through the list of callbacks.\n - For each callback, it extracts values from objects `a` and `b` to compare them.\n - Depending on the `desc` attribute and whether the values are strings or numbers, it calculates the difference for sorting. If non-zero, the difference determines the order.\n\n3. **Descending Mode (`desc`):**\n - A utility function (`desc`) is provided to wrap a callback with metadata indicating descending order.\n\n4. **Example Callback Function (`sample_callback`):**\n - Demonstrates simple extraction of the `value` field from items.\n\n### How It Works\n- Use `sort_by` to define comparison logic for sorting an array, and optionally specify sorting in descending order.\n- Pass in an array of callback definitions (`cbs`), where each callback specifies how to extract comparison values.\n\n#### Example `callbacks` Output:\npython\n[{'desc': True, 'cb': <function sample_callback>}]\n\n\nThis structure indicates the callback for sorting in descending order.\n\nLet me know if you'd like help with examples or further clarification!"), type='text')], t_k75Vg4KI31emNavmVbaKxiou', attachments=[], completed_at=None, content=[TextContentBlock(text=Text(annotations=[], value="This code snippet appears to be JavaScript code dealing with a sort function (`sortBy`) and a descending comparator function (`desc`). However, there are syntactical issues and potential logic issues with the extracted text. Below, I'll highlight the extracted code issues and explain what the code is attempting to do.\n\n### Translation of the extracted code with issues highlighted:\nThe code appears garbled and incomplete in several places. Here's how it looks reformatted based on the extracted text:\njavascript\nconst sortBy = (cbs) => (a, b) => {\n for (let i = 0; i < cbs.length; i++) {\n const cb = cbs[i].desc ? cbs[i].cb : cbs[i];\n const aa = cb(a);\n const bb = cb(b);\n const diff = cbs[i].desc\n ? (typeof aa === 'string'\n ? bb.localeCompare(aa)\n : bb - aa)\n : (typeof aa === 'string'\n ? aa.localeCompare(bb)\n : aa - bb);\n if (diff !== 0) return diff;\n }\n return 0;\n};\n\nconst desc = (cb) => ({ desc: true, cb });\n\n\n### Issues and Observations in the Code\n1. **Syntax Errors**:\n - The initial `sortBy` function has improper syntax in its arrow function and parameters. It uses `>` incorrectly for the arrow function replacement.\n - The extracted conditional blocks (`diff`) have misplaced syntax: the `isString(aa)` function is used without definition or context.\n\n2. **Logical Errors**:\n - There is a confusion in comparing strings using `localeCompare`. Ensure it is dealing with strings only when using this method.\n\n3. **Missing Context**:\n - `isString` is likely meant to check if `aa` and `bb` are strings but is missing. Native JavaScript doesn't have `isString`; instead `typeof variable === 'string'` should be used.\n - The variable `cbs` is expected to be an array of objects, where each object may contain a `desc` key and a `cb` function, but no example `cbs` structure is provided.\n - There’s no explanation of what `a` and `b` represent (likely elements to be sorted).\n\n4. **Undefined `Q`**:\n - `Q` is referenced but has no definition or mention of use in this context.\n\n### What the Code is Doing\n#### `sortBy` Function:\n- `sortBy(cbs)` generates a comparator function for sorting an array of elements (`a` and `b`).\n- The `cbs` parameter is expected to be a list of callback functions or objects with `{ cb, desc }`. Each `cb` transforms elements for comparison and `desc` indicates whether sorting should be in descending order.\n- Iterates through `cbs` and compares elements (`a` and `b`) using each callback:\n - If a difference (`diff`) is found using one of the callbacks, it returns the difference.\n - Strings are compared using `.localeCompare()`.\n - Numbers or other types are compared using subtraction (`-`).\n- If all callbacks result in zero difference (i.e., equal elements), the function returns
    
    

    Hope it helps

    Thank you.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.