Can you provide guidance on how to use images as input in the Azure AI assistant? Which AI models support image input, and in which regions are these features available?

Sahil Saini 0 Reputation points
2025-01-01T10:07:44.05+00:00

Hi,

I am working on building a doubt-solving AI chatbot for students, and I want to create an assistant capable of handling both text and image inputs. The goal is to design an AI-powered assistant that can answer questions from users based on text queries as well as images (e.g., scanned textbook pages, handwritten notes, or diagrams).

Specifically, I want to understand which Azure AI models support image input in the Assistant (Preview). Additionally, I need to know the regions where this image input feature is supported.

Please help me to find how to use image input in the Assistant Model..

Thank you in advance!

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,479 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,024 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Azar 25,145 Reputation points MVP
    2025-01-01T16:48:14.19+00:00

    Hi there Sahil Saini

    Thanks for using QandA platform

    U can use Azure Cognitive Services, specifically the Custom Vision models, which can process and analyze images such as scanned textbook pages, handwritten notes, or diagrams. For text extraction from images, the Azure Computer Vision service provides the Read API, which supports OCR to extract text from images. If you need to recognize specific objects, you can use Azure Custom Vision, where you can train a custom model to detect and understand images in the context of your chatbot.

    To get started, you can create a Computer Vision resource or a Custom Vision resource in your Azure portal. After uploading your image, use the Read API for text extraction or train a model with Custom Vision for specific image recognition tasks. The extracted information can then be fed into your AI assistant, which can process both text and image-based queries to provide more accurate and comprehensive responses.

    Cognitive Services are available in many regions, Azure region availability documentation.

    I know this might seem overwelming but try documenation have a good read and im sure you can get started

    Computer Vision API documentation

    Custom Vision API documentation.

    If this helps kindly accept the answer thanks much.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.