Video Assistants - Open AI launched its demo 9 months ago. Why is still not available on Azure?

GenixPRO 41 Reputation points
2025-01-15T02:09:16.5666667+00:00

Hi Team,

Open AI demo'ed their video assistant (https://www.youtube.com/watch?v=vgYi3Wr7v_g) ~9 months ago. Why do we not have the API on Azure yet? Or if it's available, can we have the quickstart link pls. It's likely based on Realtime API but the API call doesn't specify video params. We'd like to include realtime video as demo'ed in the video @ link above. How can this e accomplished directly (without using a separate API for video in addition to realtime API for audio).

Thanks.

Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,056 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Vikram Singh 415 Reputation points Microsoft Employee
    2025-01-15T04:44:51.5533333+00:00

    Hi @GenixPRO ,

    Welcome to Microsoft Q&A Forum, Thank you for your inquiry regarding the OpenAI video assistant demo. As of now, the specific video assistant API showcased in the demo is not yet available on Azure. OpenAI is continuously working on expanding its offerings, and we recommend keeping an eye on the Azure OpenAI Service documentation for updates on new features and APIs.

    For real-time video processing, you may consider using Azure Media Services in conjunction with the Azure OpenAI Service. While there isn't a direct API for video within the OpenAI framework, you can integrate Azure Media Services for video handling and use the OpenAI API for audio processing.

    For a quick start on using Azure OpenAI, please refer to the Quickstart guide which provides a comprehensive overview of how to set up and use the service.

    If the reply was helpful please don't forget to upvote and/or accept as answer.

    Thank you


  2. Vikram Singh 415 Reputation points Microsoft Employee
    2025-01-20T09:26:25.8+00:00

    Hi @GenixPRO ,

    To create a solution where an assistant can "see and hear" in real-time using Azure Media Services and Azure OpenAI Service, you would need to integrate multiple components. Here’s a breakdown of how you can approach this, along with the challenges and potential solutions.

    Proposed Solution Overview

    1. Azure Media Services: This service can handle video streaming and processing. You can use it to ingest video streams, encode them, and deliver them to your application.
    2. Azure OpenAI Service: This service can be used for processing audio input and generating responses. Currently, it primarily supports text and audio, but you can leverage it for voice interactions.

    Combining Azure Media Services and Azure OpenAI Service: Step-by-Step Implementation

    1. Video Ingestion: Use Azure Media Services to ingest the video stream from a camera or video source. This can be done using the Media Services API to create a live event or a streaming endpoint.
    2. Audio Processing: For the audio component, you can continue using the Realtime API from Azure OpenAI Service. This API can process audio input in real-time and generate text responses.
    3. Video Analysis: Since the OpenAI Realtime API does not currently support video input, you would need to implement a separate video analysis component. You can use Azure Cognitive Services, specifically the Computer Vision API, to analyze the video stream. This can help you extract relevant information from the video feed (e.g., recognizing objects, faces, or actions).
    4. Integration: You would need to create a middleware or a custom application that:
      1. Captures the video stream and sends it to Azure Media Services.
      2. Captures the audio stream and sends it to the Azure OpenAI Realtime API.
      3. Processes the video stream using Azure Cognitive Services to extract insights.
      4. Combines the insights from the video analysis with the audio responses from the OpenAI API to generate a cohesive response.

    Challenges and Considerations: This solution involves multiple services and may require significant development effort to integrate them effectively.

    Future Prospects:

    OpenAI's demo of video capabilities suggests that these features may be integrated into Azure's offerings in the future. Keeping an eye on Azure's updates and announcements will help you stay informed about new capabilities that might simplify this process.

    Quickstart Resources

    While there may not be a specific quickstart for the exact combination of Azure Media Services and Azure OpenAI Service, you can refer to the following resources to get started with each component:

    Azure Media Services Quickstart

    Azure OpenAI Service Quickstart

    Azure Cognitive Services Computer Vision

    If the reply was helpful, please don't forget to upvote and/or accept as answer.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.