GPT-4o Realtime API for speech and audio (Preview)
Note
This feature is currently in public preview. This preview is provided without a service-level agreement, and we don't recommend it for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.
Azure OpenAI GPT-4o Realtime API for speech and audio is part of the GPT-4o model family that supports low-latency, "speech in, speech out" conversational interactions. The GPT-4o audio realtime
API is designed to handle real-time, low-latency conversational interactions, making it a great fit for use cases involving live interactions between a user and a model, such as customer support agents, voice assistants, and real-time translators.
Most users of the Realtime API need to deliver and receive audio from an end-user in real time, including applications that use WebRTC or a telephony system. The Realtime API isn't designed to connect directly to end user devices and relies on client integrations to terminate end user audio streams.
Supported models
Currently only gpt-4o-realtime-preview
version: 2024-10-01-preview
supports real-time audio.
The gpt-4o-realtime-preview
model is available for global deployments in East US 2 and Sweden Central regions.
Important
The system stores your prompts and completions as described in the "Data Use and Access for Abuse Monitoring" section of the service-specific Product Terms for Azure OpenAI Service, except that the Limited Exception does not apply. Abuse monitoring will be turned on for use of the gpt-4o-realtime-preview
API even for customers who otherwise are approved for modified abuse monitoring.
API support
Support for the Realtime API was first added in API version 2024-10-01-preview
.
Note
For more information about the API and architecture, see the Azure OpenAI GPT-4o real-time audio repository on GitHub.
Deploy a model for real-time audio
To deploy the gpt-4o-realtime-preview
model in the Azure AI Foundry portal:
- Go to the Azure AI Foundry portal and make sure you're signed in with the Azure subscription that has your Azure OpenAI Service resource (with or without model deployments.)
- Select the Real-time audio playground from under Playgrounds in the left pane.
- Select Create new deployment to open the deployment window.
- Search for and select the
gpt-4o-realtime-preview
model and then select Confirm. - In the deployment wizard, make sure to select the
2024-10-01
model version. - Follow the wizard to finish deploying the model.
Now that you have a deployment of the gpt-4o-realtime-preview
model, you can interact with it in real time in the Azure AI Foundry portal Real-time audio playground or Realtime API.
Use the GPT-4o real-time audio
To chat with your deployed gpt-4o-realtime-preview
model in the Azure AI Foundry Real-time audio playground, follow these steps:
Go to the Azure OpenAI Service page in Azure AI Foundry portal. Make sure you're signed in with the Azure subscription that has your Azure OpenAI Service resource and the deployed
gpt-4o-realtime-preview
model.Select the Real-time audio playground from under Playgrounds in the left pane.
Select your deployed
gpt-4o-realtime-preview
model from the Deployment dropdown.Select Enable microphone to allow the browser to access your microphone. If you already granted permission, you can skip this step.
Optionally, you can edit contents in the Give the model instructions and context text box. Give the model instructions about how it should behave and any context it should reference when generating a response. You can describe the assistant's personality, tell it what it should and shouldn't answer, and tell it how to format responses.
Optionally, change settings such as threshold, prefix padding, and silence duration.
Select Start listening to start the session. You can speak into the microphone to start a chat.
You can interrupt the chat at any time by speaking. You can end the chat by selecting the Stop listening button.
Prerequisites
- An Azure subscription - Create one for free
- Node.js LTS or ESM support.
- An Azure OpenAI resource created in the East US 2 or Sweden Central regions. See Region availability.
- Then, you need to deploy a
gpt-4o-realtime-preview
model with your Azure OpenAI resource. For more information, see Create a resource and deploy a model with Azure OpenAI.
Microsoft Entra ID prerequisites
For the recommended keyless authentication with Microsoft Entra ID, you need to:
- Install the Azure CLI used for keyless authentication with Microsoft Entra ID.
- Assign the
Cognitive Services User
role to your user account. You can assign roles in the Azure portal under Access control (IAM) > Add role assignment.
Deploy a model for real-time audio
To deploy the gpt-4o-realtime-preview
model in the Azure AI Foundry portal:
- Go to the Azure AI Foundry portal and make sure you're signed in with the Azure subscription that has your Azure OpenAI Service resource (with or without model deployments.)
- Select the Real-time audio playground from under Playgrounds in the left pane.
- Select Create new deployment to open the deployment window.
- Search for and select the
gpt-4o-realtime-preview
model and then select Confirm. - In the deployment wizard, make sure to select the
2024-10-01
model version. - Follow the wizard to finish deploying the model.
Now that you have a deployment of the gpt-4o-realtime-preview
model, you can interact with it in real time in the Azure AI Foundry portal Real-time audio playground or Realtime API.
Set up
Create a new folder
realtime-audio-quickstart
to contain the application and open Visual Studio Code in that folder with the following command:mkdir realtime-audio-quickstart && code realtime-audio-quickstart
Create the
package.json
with the following command:npm init -y
Update the
package.json
to ECMAScript with the following command:npm pkg set type=module
Install the real-time audio client library for JavaScript with:
npm install https://github.com/Azure-Samples/aoai-realtime-audio-sdk/releases/download/js/v0.5.2/rt-client-0.5.2.tgz
For the recommended keyless authentication with Microsoft Entra ID, install the
@azure/identity
package with:npm install @azure/identity
Retrieve resource information
Variable name | Value |
---|---|
AZURE_OPENAI_ENDPOINT |
This value can be found in the Keys and Endpoint section when examining your resource from the Azure portal. |
AZURE_OPENAI_DEPLOYMENT_NAME |
This value will correspond to the custom name you chose for your deployment when you deployed a model. This value can be found under Resource Management > Model Deployments in the Azure portal. |
OPENAI_API_VERSION |
Learn more about API Versions. |
Learn more about keyless authentication and setting environment variables.
Caution
To use the recommended keyless authentication with the SDK, make sure that the AZURE_OPENAI_API_KEY
environment variable isn't set.
Text in audio out
Create the
text-in-audio-out.js
file with the following code:import { DefaultAzureCredential } from "@azure/identity"; import { LowLevelRTClient } from "rt-client"; import dotenv from "dotenv"; dotenv.config(); async function text_in_audio_out() { // Set environment variables or edit the corresponding values here. const endpoint = process.env["AZURE_OPENAI_ENDPOINT"] || "yourEndpoint"; const deployment = "gpt-4o-realtime-preview"; if (!endpoint || !deployment) { throw new Error("You didn't set the environment variables."); } const client = new LowLevelRTClient(new URL(endpoint), new DefaultAzureCredential(), { deployment: deployment }); try { await client.send({ type: "response.create", response: { modalities: ["audio", "text"], instructions: "Please assist the user." } }); for await (const message of client.messages()) { switch (message.type) { case "response.done": { break; } case "error": { console.error(message.error); break; } case "response.audio_transcript.delta": { console.log(`Received text delta: ${message.delta}`); break; } case "response.audio.delta": { const buffer = Buffer.from(message.delta, "base64"); console.log(`Received ${buffer.length} bytes of audio data.`); break; } } if (message.type === "response.done" || message.type === "error") { break; } } } finally { client.close(); } } await text_in_audio_out();
Sign in to Azure with the following command:
az login
Run the JavaScript file.
node text-in-audio-out.js
Wait a few moments to get the response.
Output
The script gets a response from the model and prints the transcript and audio data received.
The output will look similar to the following:
Received text delta: Hello
Received text delta: !
Received text delta: How
Received text delta: can
Received text delta: I
Received 4800 bytes of audio data.
Received 7200 bytes of audio data.
Received text delta: help
Received 12000 bytes of audio data.
Received text delta: you
Received text delta: today
Received text delta: ?
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 24000 bytes of audio data.
Web application sample
Our JavaScript web sample on GitHub demonstrates how to use the GPT-4o Realtime API to interact with the model in real time. The sample code includes a simple web interface that captures audio from the user's microphone and sends it to the model for processing. The model responds with text and audio, which the sample code renders in the web interface.
You can run the sample code locally on your machine by following these steps. Refer to the repository on GitHub for the most up-to-date instructions.
If you don't have Node.js installed, download and install the LTS version of Node.js.
Clone the repository to your local machine:
git clone https://github.com/Azure-Samples/aoai-realtime-audio-sdk.git
Go to the
javascript/samples/web
folder in your preferred code editor.cd ./javascript/samples
Run
download-pkg.ps1
ordownload-pkg.sh
to download the required packages.Go to the
web
folder from the./javascript/samples
folder.cd ./web
Run
npm install
to install package dependencies.Run
npm run dev
to start the web server, navigating any firewall permissions prompts as needed.Go to any of the provided URIs from the console output (such as
http://localhost:5173/
) in a browser.Enter the following information in the web interface:
- Endpoint: The resource endpoint of an Azure OpenAI resource. You don't need to append the
/realtime
path. An example structure might behttps://my-azure-openai-resource-from-portal.openai.azure.com
. - API Key: A corresponding API key for the Azure OpenAI resource.
- Deployment: The name of the
gpt-4o-realtime-preview
model that you deployed in the previous section. - System Message: Optionally, you can provide a system message such as "You always talk like a friendly pirate."
- Temperature: Optionally, you can provide a custom temperature.
- Voice: Optionally, you can select a voice.
- Endpoint: The resource endpoint of an Azure OpenAI resource. You don't need to append the
Select the Record button to start the session. Accept permissions to use your microphone if prompted.
You should see a
<< Session Started >>
message in the main output. Then you can speak into the microphone to start a chat.You can interrupt the chat at any time by speaking. You can end the chat by selecting the Stop button.
Prerequisites
- An Azure subscription. Create one for free.
- Python 3.8 or later version. We recommend using Python 3.10 or later, but having at least Python 3.8 is required. If you don't have a suitable version of Python installed, you can follow the instructions in the VS Code Python Tutorial for the easiest way of installing Python on your operating system.
- An Azure OpenAI resource created in the East US 2 or Sweden Central regions. See Region availability.
- Then, you need to deploy a
gpt-4o-realtime-preview
model with your Azure OpenAI resource. For more information, see Create a resource and deploy a model with Azure OpenAI.
Microsoft Entra ID prerequisites
For the recommended keyless authentication with Microsoft Entra ID, you need to:
- Install the Azure CLI used for keyless authentication with Microsoft Entra ID.
- Assign the
Cognitive Services User
role to your user account. You can assign roles in the Azure portal under Access control (IAM) > Add role assignment.
Deploy a model for real-time audio
To deploy the gpt-4o-realtime-preview
model in the Azure AI Foundry portal:
- Go to the Azure AI Foundry portal and make sure you're signed in with the Azure subscription that has your Azure OpenAI Service resource (with or without model deployments.)
- Select the Real-time audio playground from under Playgrounds in the left pane.
- Select Create new deployment to open the deployment window.
- Search for and select the
gpt-4o-realtime-preview
model and then select Confirm. - In the deployment wizard, make sure to select the
2024-10-01
model version. - Follow the wizard to finish deploying the model.
Now that you have a deployment of the gpt-4o-realtime-preview
model, you can interact with it in real time in the Azure AI Foundry portal Real-time audio playground or Realtime API.
Set up
Create a new folder
realtime-audio-quickstart
to contain the application and open Visual Studio Code in that folder with the following command:mkdir realtime-audio-quickstart && code realtime-audio-quickstart
Create a virtual environment. If you already have Python 3.10 or higher installed, you can create a virtual environment using the following commands:
Activating the Python environment means that when you run
python
orpip
from the command line, you then use the Python interpreter contained in the.venv
folder of your application. You can use thedeactivate
command to exit the python virtual environment, and can later reactivate it when needed.Tip
We recommend that you create and activate a new Python environment to use to install the packages you need for this tutorial. Don't install packages into your global python installation. You should always use a virtual or conda environment when installing python packages, otherwise you can break your global installation of Python.
Install the real-time audio client library for Python with:
pip install "https://github.com/Azure-Samples/aoai-realtime-audio-sdk/releases/download/py%2Fv0.5.3/rtclient-0.5.3.tar.gz"
For the recommended keyless authentication with Microsoft Entra ID, install the
azure-identity
package with:pip install azure-identity
Retrieve resource information
Variable name | Value |
---|---|
AZURE_OPENAI_ENDPOINT |
This value can be found in the Keys and Endpoint section when examining your resource from the Azure portal. |
AZURE_OPENAI_DEPLOYMENT_NAME |
This value will correspond to the custom name you chose for your deployment when you deployed a model. This value can be found under Resource Management > Model Deployments in the Azure portal. |
OPENAI_API_VERSION |
Learn more about API Versions. |
Learn more about keyless authentication and setting environment variables.
Text in audio out
Create the
text-in-audio-out.py
file with the following code:import base64 import asyncio from azure.identity.aio import DefaultAzureCredential from rtclient import ( ResponseCreateMessage, RTLowLevelClient, ResponseCreateParams ) # Set environment variables or edit the corresponding values here. endpoint = os.environ["AZURE_OPENAI_ENDPOINT"] deployment = "gpt-4o-realtime-preview" async def text_in_audio_out(): async with RTLowLevelClient( url=endpoint, azure_deployment=deployment, token_credential=DefaultAzureCredential(), ) as client: await client.send( ResponseCreateMessage( response=ResponseCreateParams( modalities={"audio", "text"}, instructions="Please assist the user." ) ) ) done = False while not done: message = await client.recv() match message.type: case "response.done": done = True case "error": done = True print(message.error) case "response.audio_transcript.delta": print(f"Received text delta: {message.delta}") case "response.audio.delta": buffer = base64.b64decode(message.delta) print(f"Received {len(buffer)} bytes of audio data.") case _: pass async def main(): await text_in_audio_out() asyncio.run(main())
Run the Python file.
python text-in-audio-out.py
Wait a few moments to get the response.
Output
The script gets a response from the model and prints the transcript and audio data received.
The output will look similar to the following:
Received text delta: Hello
Received text delta: !
Received text delta: How
Received 4800 bytes of audio data.
Received 7200 bytes of audio data.
Received text delta: can
Received 12000 bytes of audio data.
Received text delta: I
Received text delta: assist
Received text delta: you
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received text delta: today
Received text delta: ?
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 28800 bytes of audio data.
Prerequisites
- An Azure subscription - Create one for free
- Node.js LTS or ESM support.
- TypeScript installed globally.
- An Azure OpenAI resource created in the East US 2 or Sweden Central regions. See Region availability.
- Then, you need to deploy a
gpt-4o-realtime-preview
model with your Azure OpenAI resource. For more information, see Create a resource and deploy a model with Azure OpenAI.
Microsoft Entra ID prerequisites
For the recommended keyless authentication with Microsoft Entra ID, you need to:
- Install the Azure CLI used for keyless authentication with Microsoft Entra ID.
- Assign the
Cognitive Services User
role to your user account. You can assign roles in the Azure portal under Access control (IAM) > Add role assignment.
Deploy a model for real-time audio
To deploy the gpt-4o-realtime-preview
model in the Azure AI Foundry portal:
- Go to the Azure AI Foundry portal and make sure you're signed in with the Azure subscription that has your Azure OpenAI Service resource (with or without model deployments.)
- Select the Real-time audio playground from under Playgrounds in the left pane.
- Select Create new deployment to open the deployment window.
- Search for and select the
gpt-4o-realtime-preview
model and then select Confirm. - In the deployment wizard, make sure to select the
2024-10-01
model version. - Follow the wizard to finish deploying the model.
Now that you have a deployment of the gpt-4o-realtime-preview
model, you can interact with it in real time in the Azure AI Foundry portal Real-time audio playground or Realtime API.
Set up
Create a new folder
realtime-audio-quickstart
to contain the application and open Visual Studio Code in that folder with the following command:mkdir realtime-audio-quickstart && code realtime-audio-quickstart
Create the
package.json
with the following command:npm init -y
Update the
package.json
to ECMAScript with the following command:npm pkg set type=module
Install the real-time audio client library for JavaScript with:
npm install https://github.com/Azure-Samples/aoai-realtime-audio-sdk/releases/download/js/v0.5.2/rt-client-0.5.2.tgz
For the recommended keyless authentication with Microsoft Entra ID, install the
@azure/identity
package with:npm install @azure/identity
Retrieve resource information
Variable name | Value |
---|---|
AZURE_OPENAI_ENDPOINT |
This value can be found in the Keys and Endpoint section when examining your resource from the Azure portal. |
AZURE_OPENAI_DEPLOYMENT_NAME |
This value will correspond to the custom name you chose for your deployment when you deployed a model. This value can be found under Resource Management > Model Deployments in the Azure portal. |
OPENAI_API_VERSION |
Learn more about API Versions. |
Learn more about keyless authentication and setting environment variables.
Caution
To use the recommended keyless authentication with the SDK, make sure that the AZURE_OPENAI_API_KEY
environment variable isn't set.
Text in audio out
Create the
text-in-audio-out.ts
file with the following code:import { DefaultAzureCredential } from "@azure/identity"; import { LowLevelRTClient } from "rt-client"; import dotenv from "dotenv"; dotenv.config(); async function text_in_audio_out() { // Set environment variables or edit the corresponding values here. const endpoint: string = process.env["AZURE_OPENAI_ENDPOINT"] || "yourEndpoint"; const deployment = "gpt-4o-realtime-preview"; if (!endpoint || !deployment) { throw new Error("You didn't set the environment variables."); } const client = new LowLevelRTClient( new URL(endpoint), new DefaultAzureCredential(), {deployment: deployment} ); try { await client.send({ type: "response.create", response: { modalities: ["audio", "text"], instructions: "Please assist the user." } }); for await (const message of client.messages()) { switch (message.type) { case "response.done": { break; } case "error": { console.error(message.error); break; } case "response.audio_transcript.delta": { console.log(`Received text delta: ${message.delta}`); break; } case "response.audio.delta": { const buffer = Buffer.from(message.delta, "base64"); console.log(`Received ${buffer.length} bytes of audio data.`); break; } } if (message.type === "response.done" || message.type === "error") { break; } } } finally { client.close(); } } await text_in_audio_out();
Create the
tsconfig.json
file to transpile the TypeScript code and copy the following code for ECMAScript.{ "compilerOptions": { "module": "NodeNext", "target": "ES2022", // Supports top-level await "moduleResolution": "NodeNext", "skipLibCheck": true, // Avoid type errors from node_modules "strict": true // Enable strict type-checking options }, "include": ["*.ts"] }
Transpile from TypeScript to JavaScript.
tsc
Sign in to Azure with the following command:
az login
Run the code with the following command:
node text-in-audio-out.js
Wait a few moments to get the response.
Output
The script gets a response from the model and prints the transcript and audio data received.
The output will look similar to the following:
Received text delta: Hello
Received text delta: !
Received text delta: How
Received text delta: can
Received text delta: I
Received 4800 bytes of audio data.
Received 7200 bytes of audio data.
Received text delta: help
Received 12000 bytes of audio data.
Received text delta: you
Received text delta: today
Received text delta: ?
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 24000 bytes of audio data.
Related content
- Learn more about How to use the Realtime API
- See the Realtime API reference
- Learn more about Azure OpenAI quotas and limits