クイックスタート - Azure OpenAI オーディオ生成の作業の開始 - Azure OpenAI

gpt-4o-audio-preview モデルでは、既存の /chat/completions API にオーディオモダリティが導入されます。オーディオモデルは、テキストおよび音声ベースの対話とオーディオ分析における AI アプリケーションの可能性を広げます。 gpt-4o-audio-preview モデルでサポートされるモダリティには、テキスト、オーディオ、テキスト + オーディオが含まれます。

サポートされているモダリティとユースケースの例の表を次に示します。

モダリティ入力	モダリティ出力	ユースケースの例
Text	テキスト + オーディオ	テキスト読み上げ、オーディオブックの生成
Audio	テキスト + オーディオ	オーディオ文字起こし、オーディオブックの生成
Audio	Text	音声文字起こし
テキスト + オーディオ	テキスト + オーディオ	オーディオブックの生成
テキスト + オーディオ	Text	音声文字起こし

オーディオ生成機能を使用することで、より動的で対話型の AI アプリケーションを実現できます。オーディオ入力と出力をサポートするモデルを使用すると、プロンプトに対する音声によるオーディオ応答を生成し、オーディオ入力を使用してモデルにプロンプトを表示できます。

サポートされているモデル

現在 gpt-4o-audio-preview バージョンのみ: 2024-12-17 はオーディオ生成をサポートしています。

gpt-4o-audio-preview モデルは、米国東部 2 リージョンとスウェーデン中部リージョンのグローバルデプロイで使用できます。

現在、オーディオ出力では、Alloy、Echo、Shimmer の音声がサポートされています。

オーディオファイルの最大サイズは 20 MB です。

Note

Realtime API は、入力候補 API と同じ基本となる GPT-4o オーディオモデルを使用しますが、低遅延でリアルタイムのオーディオ操作用に最適化されています。

API のサポート

オーディオ入力候補のサポートは、API バージョン 2025-01-01-preview で最初に追加されました。

オーディオ生成のためにモデルをデプロイする

Azure AI Foundry ポータルで gpt-4o-audio-preview モデルをデプロイするには:

Azure AI Foundry ポータルの Azure OpenAI Service ページに移動します。 Azure OpenAI Service リソースとデプロイされた gpt-4o-audio-preview モデルを持つ Azure サブスクリプションでサインインしていることを確認します。
左側のペインの [プレイグラウンド] から [チャット] プレイグラウンドを選択します。
[+ 新しいデプロイの作成]>[基本モデルから] を選択してデプロイウィンドウを開きます。
gpt-4o-audio-preview モデルを検索して選択し、[選択したリソースにデプロイする] を選びます。
デプロイウィザードで、2024-12-17 モデルバージョンを選択します。
ウィザードに従ってモデルのデプロイを完了します。

これで、gpt-4o-audio-preview モデルをデプロイしたので、Azure AI Foundry ポータルの [チャット] プレイグラウンドまたは Chat Completions API でこれを操作できます。

GPT-4o オーディオ生成を使用する

Azure AI Foundry ポータルの [チャット] プレイグラウンドで、デプロイされた gpt-4o-audio-preview モデルとチャットするには、次の手順に従います。

Azure AI Foundry ポータルの Azure OpenAI Service ページに移動します。 Azure OpenAI Service リソースとデプロイされた gpt-4o-audio-preview モデルを持つ Azure サブスクリプションでサインインしていることを確認します。
左側のペインの [リソースプレイグラウンド] の下から [チャット] プレイグラウンドを選択します。
[デプロイメント] ドロップダウンから、デプロイした gpt-4o-audio-preview モデルを選びます。
モデルとのチャットを開始し、オーディオ応答を聞きます。

次のことを実行できます。
- オーディオプロンプトを録音します。
- オーディオファイルをチャットに添付します。
- テキストプロンプトを入力します。

リファレンスのドキュメント | ライブラリのソースコード | パッケージ (npm) | サンプル

gpt-4o-audio-preview モデルでは、既存の /chat/completions API にオーディオモダリティが導入されます。オーディオモデルは、テキストおよび音声ベースの対話とオーディオ分析における AI アプリケーションの可能性を広げます。 gpt-4o-audio-preview モデルでサポートされるモダリティには、テキスト、オーディオ、テキスト + オーディオが含まれます。

サポートされているモダリティとユースケースの例の表を次に示します。

モダリティ入力	モダリティ出力	ユースケースの例
Text	テキスト + オーディオ	テキスト読み上げ、オーディオブックの生成
Audio	テキスト + オーディオ	オーディオ文字起こし、オーディオブックの生成
Audio	Text	音声文字起こし
テキスト + オーディオ	テキスト + オーディオ	オーディオブックの生成
テキスト + オーディオ	Text	音声文字起こし

オーディオ生成機能を使用することで、より動的で対話型の AI アプリケーションを実現できます。オーディオ入力と出力をサポートするモデルを使用すると、プロンプトに対する音声によるオーディオ応答を生成し、オーディオ入力を使用してモデルにプロンプトを表示できます。

サポートされているモデル

現在 gpt-4o-audio-preview バージョンのみ: 2024-12-17 はオーディオ生成をサポートしています。

gpt-4o-audio-preview モデルは、米国東部 2 リージョンとスウェーデン中部リージョンのグローバルデプロイで使用できます。

現在、オーディオ出力では、Alloy、Echo、Shimmer の音声がサポートされています。

オーディオファイルの最大サイズは 20 MB です。

Note

Realtime API は、入力候補 API と同じ基本となる GPT-4o オーディオモデルを使用しますが、低遅延でリアルタイムのオーディオ操作用に最適化されています。

API のサポート

オーディオ入力候補のサポートは、API バージョン 2025-01-01-preview で最初に追加されました。

前提条件

Azure サブスクリプション - 無料アカウントを作成します
Node.js (LTS または ESM サポート)。
米国東部 2 またはスウェーデン中部リージョンに作成された Azure OpenAI リソース。利用可能なリージョンに関するページを参照してください。
次に、Azure OpenAI リソースを使って gpt-4o-audio-preview モデルをデプロイする必要があります。詳細については、「Azure OpenAI を使用してリソースを作成し、モデルをデプロイする」を参照してください。

Microsoft Entra ID の前提条件

Microsoft Entra ID で推奨されるキーレス認証の場合、次のことを行う必要があります。

Microsoft Entra ID でのキーレス認証に使われる Azure CLI をインストールします。
ユーザーアカウントに Cognitive Services User ロールを割り当てます。 Azure portal の [アクセス制御 (IAM)]>[ロールの割り当ての追加] で、ロールを割り当てることができます。

設定

アプリケーションを含める新しいフォルダー audio-completions-quickstart を作成し、次のコマンドを使用してそのフォルダー内で Visual Studio Code を開きます。
```
mkdir audio-completions-quickstart && code audio-completions-quickstart
```
次のコマンドで package.json を作成します。
```
npm init -y
```
次のコマンドを使用して、package.json を ECMAScript に更新します。
```
npm pkg set type=module
```
次を使用して JavaScript 用の OpenAI クライアントライブラリをインストールします。
```
npm install openai
```
Microsoft Entra ID で推奨されるキーレス認証の場合、次を使って @azure/identity パッケージをインストールします。
```
npm install @azure/identity
```

リソース情報の取得

Azure OpenAI リソースでアプリケーションを認証するには、次の情報を取得する必要があります。

Microsoft Entra ID
API キー

変数名	値
`AZURE_OPENAI_ENDPOINT`	この値は、Azure portal からリソースを調べる際のキーとエンドポイントセクションにあります。
`AZURE_OPENAI_DEPLOYMENT_NAME`	この値は、モデルのデプロイ時にデプロイに対して選択したカスタム名に対応します。この値は、Azure portal の [リソース管理]>[モデルデプロイ] にあります。
`OPENAI_API_VERSION`	API バージョンの詳細を参照してください。

キーレス認証と環境変数の設定の詳細を参照してください。

変数名	値
`AZURE_OPENAI_ENDPOINT`	この値は、Azure portal からリソースを調べる際のキーとエンドポイントセクションにあります。
`AZURE_OPENAI_API_KEY`	この値は、Azure portal からリソースを調べる際のキーとエンドポイントセクションにあります。 `KEY1` または `KEY2` を使用できます。
`AZURE_OPENAI_DEPLOYMENT_NAME`	この値は、モデルのデプロイ時にデプロイに対して選択したカスタム名に対応します。この値は、Azure portal の [リソース管理]>[モデルデプロイ] にあります。
`OPENAI_API_VERSION`	API バージョンの詳細を参照してください。

API キーの確認と環境変数の設定の詳細を参照してください。

重要

API キーを使用する場合は、それを Azure Key Vault などの別の場所に安全に保存します。 API キーは、コード内に直接含めないようにし、絶対に公開しないでください。

AI サービスのセキュリティの詳細については、「Azure AI サービスに対する要求の認証」を参照してください。

注意事項

SDK で推奨されるキーレス認証を使用するには、AZURE_OPENAI_API_KEY 環境変数が設定されていないことを確認します。

次のコードを使用して to-audio.js ファイルを作成します。

require("dotenv").config();
const { AzureOpenAI } = require("openai");
const { DefaultAzureCredential, getBearerTokenProvider } = require("@azure/identity");
const { writeFileSync } = require("node:fs");

// Keyless authentication    
const credential = new DefaultAzureCredential();
const scope = "https://cognitiveservices.azure.com/.default";
const azureADTokenProvider = getBearerTokenProvider(credential, scope);

// Set environment variables or edit the corresponding values here.
const endpoint = process.env["AZURE_OPENAI_ENDPOINT"] || "AZURE_OPENAI_ENDPOINT";
const apiVersion = "2025-01-01-preview"; 
const deployment = "gpt-4o-audio-preview"; 

const client = new AzureOpenAI({ 
    endpoint, 
    azureADTokenProvider, 
    apiVersion, 
    deployment 
}); 

async function main() {

    // Make the audio chat completions request
    const response = await client.chat.completions.create({ 
        model: "gpt-4o-audio-preview", 
        modalities: ["text", "audio"], 
        audio: { voice: "alloy", format: "wav" }, 
        messages: [ 
        { 
            role: "user", 
            content: "Is a golden retriever a good family dog?" 
        } 
        ] 
    }); 

// Inspect returned data 
console.log(response.choices[0]); 

// Write the output audio data to a file
writeFileSync( 
    "dog.wav", 
    Buffer.from(response.choices[0].message.audio.data, 'base64'), 
    { encoding: "utf-8" } 
); 
}

main().catch((err) => {
  console.error("Error occurred:", err);
});

module.exports = { main };

次のコマンドを使用して Azure にサインインします。
```
az login
```
JavaScript ファイルを実行します。
```
node to-audio.js
```

次のコードを使用して to-audio.js ファイルを作成します。

require("dotenv").config();
const { AzureOpenAI } = require("openai");
const { writeFileSync } = require("node:fs");

// Set environment variables or edit the corresponding values here.
const endpoint = process.env["AZURE_OPENAI_ENDPOINT"] || "AZURE_OPENAI_ENDPOINT";
const apiKey = process.env["AZURE_OPENAI_API_KEY"] || "AZURE_OPENAI_API_KEY";
const apiVersion = "2025-01-01-preview"; 
const deployment = "gpt-4o-audio-preview"; 

const client = new AzureOpenAI({ 
    endpoint, 
    apiKey, 
    apiVersion, 
    deployment 
});  

async function main() {

    // Make the audio chat completions request
    const response = await client.chat.completions.create({ 
        model: "gpt-4o-audio-preview", 
        modalities: ["text", "audio"], 
        audio: { voice: "alloy", format: "wav" }, 
        messages: [ 
        { 
            role: "user", 
            content: "Is a golden retriever a good family dog?" 
        } 
        ] 
    }); 

// Inspect returned data 
console.log(response.choices[0]); 

// Write the output audio data to a file
writeFileSync( 
    "dog.wav", 
    Buffer.from(response.choices[0].message.audio.data, 'base64'), 
    { encoding: "utf-8" } 
); 
}

main().catch((err) => {
  console.error("Error occurred:", err);
});

module.exports = { main };

JavaScript ファイルを実行します。
```
node to-audio.js
```

応答が返されるまで少し時間がかかります。

テキスト入力からのオーディオ生成の出力

スクリプトは、このスクリプトと同じディレクトリに dog.wav という名前のオーディオファイルを生成します。オーディオファイルには、"ゴールデンレトリバーは適した飼い犬ですか?" というプロンプトに対する音声による応答が含まれています。

オーディオ入力からオーディオとテキストを生成する

Microsoft Entra ID
API キー

次のコードを使用して from-audio.js ファイルを作成します。

require("dotenv").config();
const { AzureOpenAI } = require("openai");
const { DefaultAzureCredential, getBearerTokenProvider } = require("@azure/identity");
const fs = require('fs').promises;
const { writeFileSync } = require("node:fs");

// Keyless authentication    
const credential = new DefaultAzureCredential();
const scope = "https://cognitiveservices.azure.com/.default";
const azureADTokenProvider = getBearerTokenProvider(credential, scope);

// Set environment variables or edit the corresponding values here.
const endpoint = process.env["AZURE_OPENAI_ENDPOINT"] || "AZURE_OPENAI_ENDPOINT";
const apiVersion = "2025-01-01-preview"; 
const deployment = "gpt-4o-audio-preview"; 

const client = new AzureOpenAI({ 
    endpoint, 
    azureADTokenProvider, 
    apiVersion, 
    deployment 
});    

async function main() {

    // Buffer the audio for input to the chat completion
    const wavBuffer = await fs.readFile("dog.wav"); 
    const base64str = Buffer.from(wavBuffer).toString("base64"); 

    // Make the audio chat completions request
    const response = await client.chat.completions.create({
        model: "gpt-4o-audio-preview",
        modalities: ["text", "audio"],
        audio: { voice: "alloy", format: "wav" }, 
        messages: [
            {
                role: "user",
                content: [
                    { 
                        type: "text", 
                        text: "Describe in detail the spoken audio input." 
                    },
                    { 
                        type: "input_audio", 
                        input_audio: { 
                            data: base64str, 
                            format: "wav" 
                        } 
                    }
                ]
            }
        ]
    });

    console.log(response.choices[0]); 

    // Write the output audio data to a file
    writeFileSync( 
        "analysis.wav", 
        Buffer.from(response.choices[0].message.audio.data, 'base64'), 
        { encoding: "utf-8" } 
    ); 
}

main().catch((err) => {
    console.error("Error occurred:", err);
});

module.exports = { main };

次のコマンドを使用して Azure にサインインします。
```
az login
```
JavaScript ファイルを実行します。
```
node from-audio.js
```

次のコードを使用して from-audio.js ファイルを作成します。

require("dotenv").config();
const { AzureOpenAI } = require("openai");
const fs = require('fs').promises;
const { writeFileSync } = require("node:fs");

// Set environment variables or edit the corresponding values here.
const endpoint = process.env["AZURE_OPENAI_ENDPOINT"] || "AZURE_OPENAI_ENDPOINT";
const apiKey = process.env["AZURE_OPENAI_API_KEY"] || "AZURE_OPENAI_API_KEY";
const apiVersion = "2025-01-01-preview"; 
const deployment = "gpt-4o-audio-preview"; 

const client = new AzureOpenAI({ 
    endpoint, 
    apiKey, 
    apiVersion, 
    deployment 
});  

async function main() {

    // Buffer the audio for input to the chat completion
    const wavBuffer = await fs.readFile("dog.wav"); 
    const base64str = Buffer.from(wavBuffer).toString("base64"); 

    // Make the audio chat completions request
    const response = await client.chat.completions.create({
        model: "gpt-4o-audio-preview",
        modalities: ["text", "audio"],
        audio: { voice: "alloy", format: "wav" }, 
        messages: [
            {
                role: "user",
                content: [
                    { 
                        type: "text", 
                        text: "Describe in detail the spoken audio input." 
                    },
                    { 
                        type: "input_audio", 
                        input_audio: { 
                            data: base64str, 
                            format: "wav" 
                        } 
                    }
                ]
            }
        ]
    });

    console.log(response.choices[0]); 

    // Write the output audio data to a file
    writeFileSync( 
        "analysis.wav", 
        Buffer.from(response.choices[0].message.audio.data, 'base64'), 
        { encoding: "utf-8" } 
    ); 
}

main().catch((err) => {
    console.error("Error occurred:", err);
});

module.exports = { main };

JavaScript ファイルを実行します。
```
node from-audio.js
```

応答が返されるまで少し時間がかかります。

オーディオ入力からのオーディオとテキスト生成の出力

このスクリプトでは、音声によるオーディオ入力の要約のトランスクリプトを生成します。また、スクリプトと同じディレクトリに analysis.wav という名前のオーディオファイルも生成されます。オーディオファイルには、プロンプトに対する音声による応答が含まれています。

オーディオを生成し、マルチターンのチャット入力候補を使用する

Microsoft Entra ID
API キー

次のコードを使用して multi-turn.js ファイルを作成します。

require("dotenv").config();
const { AzureOpenAI } = require("openai");
const { DefaultAzureCredential, getBearerTokenProvider } = require("@azure/identity");
const fs = require('fs').promises;

// Keyless authentication    
const credential = new DefaultAzureCredential();
const scope = "https://cognitiveservices.azure.com/.default";
const azureADTokenProvider = getBearerTokenProvider(credential, scope);

// Set environment variables or edit the corresponding values here.
const endpoint = process.env["AZURE_OPENAI_ENDPOINT"] || "AZURE_OPENAI_ENDPOINT";
const apiVersion = "2025-01-01-preview"; 
const deployment = "gpt-4o-audio-preview"; 

const client = new AzureOpenAI({ 
    endpoint, 
    azureADTokenProvider, 
    apiVersion, 
    deployment 
}); 

async function main() {

    // Buffer the audio for input to the chat completion
    const wavBuffer = await fs.readFile("dog.wav"); 
    const base64str = Buffer.from(wavBuffer).toString("base64"); 

    // Initialize messages with the first turn's user input 
    const messages = [
        {
            role: "user",
            content: [
                { 
                    type: "text", 
                    text: "Describe in detail the spoken audio input." 
                },
                { 
                    type: "input_audio", 
                    input_audio: { 
                        data: base64str, 
                        format: "wav" 
                    } 
                }
            ]
        }
    ];

    // Get the first turn's response 

    const response = await client.chat.completions.create({ 
        model: "gpt-4o-audio-preview",
        modalities: ["text", "audio"], 
        audio: { voice: "alloy", format: "wav" }, 
        messages: messages
    }); 

    console.log(response.choices[0]); 

    // Add a history message referencing the previous turn's audio by ID 
    messages.push({ 
        role: "assistant", 
        audio: { id: response.choices[0].message.audio.id }
    });

    // Add a new user message for the second turn
    messages.push({ 
        role: "user", 
        content: [ 
            { 
                type: "text", 
                text: "Very concisely summarize the favorability." 
            } 
        ] 
    }); 

    // Send the follow-up request with the accumulated messages
    const followResponse = await client.chat.completions.create({ 
        model: "gpt-4o-audio-preview",
        messages: messages
    });

    console.log(followResponse.choices[0].message.content); 
}

main().catch((err) => {
    console.error("Error occurred:", err);
});

module.exports = { main };

次のコマンドを使用して Azure にサインインします。
```
az login
```
JavaScript ファイルを実行します。
```
node multi-turn.js
```

次のコードを使用して multi-turn.js ファイルを作成します。

require("dotenv").config();
const { AzureOpenAI } = require("openai");
const fs = require('fs').promises;

// Set environment variables or edit the corresponding values here.
const endpoint = process.env["AZURE_OPENAI_ENDPOINT"] || "AZURE_OPENAI_ENDPOINT";
const apiKey = process.env["AZURE_OPENAI_API_KEY"] || "AZURE_OPENAI_API_KEY";
const apiVersion = "2025-01-01-preview"; 
const deployment = "gpt-4o-audio-preview"; 

const client = new AzureOpenAI({ 
    endpoint, 
    apiKey, 
    apiVersion, 
    deployment 
});  

async function main() {

    // Buffer the audio for input to the chat completion
    const wavBuffer = await fs.readFile("dog.wav"); 
    const base64str = Buffer.from(wavBuffer).toString("base64"); 

    // Initialize messages with the first turn's user input 
    const messages = [
        {
            role: "user",
            content: [
                { 
                    type: "text", 
                    text: "Describe in detail the spoken audio input." 
                },
                { 
                    type: "input_audio", 
                    input_audio: { 
                        data: base64str, 
                        format: "wav" 
                    } 
                }
            ]
        }
    ];

    // Get the first turn's response 

    const response = await client.chat.completions.create({ 
        model: "gpt-4o-audio-preview",
        modalities: ["text", "audio"], 
        audio: { voice: "alloy", format: "wav" }, 
        messages: messages
    }); 

    console.log(response.choices[0]); 

    // Add a history message referencing the previous turn's audio by ID 
    messages.push({ 
        role: "assistant", 
        audio: { id: response.choices[0].message.audio.id }
    });

    // Add a new user message for the second turn
    messages.push({ 
        role: "user", 
        content: [ 
            { 
                type: "text", 
                text: "Very concisely summarize the favorability." 
            } 
        ] 
    }); 

    // Send the follow-up request with the accumulated messages
    const followResponse = await client.chat.completions.create({ 
        model: "gpt-4o-audio-preview",
        messages: messages
    });

    console.log(followResponse.choices[0].message.content); 
}

main().catch((err) => {
    console.error("Error occurred:", err);
});

module.exports = { main };

JavaScript ファイルを実行します。
```
node multi-turn.js
```

応答が返されるまで少し時間がかかります。

マルチターンのチャット入力候補の出力

このスクリプトでは、音声によるオーディオ入力の要約のトランスクリプトを生成します。次に、マルチターンのチャット入力候補を作成して、音声によるオーディオ入力を簡単に要約します。

ライブラリソースコード | パッケージ | サンプル

gpt-4o-audio-preview モデルでは、既存の /chat/completions API にオーディオモダリティが導入されます。オーディオモデルは、テキストおよび音声ベースの対話とオーディオ分析における AI アプリケーションの可能性を広げます。 gpt-4o-audio-preview モデルでサポートされるモダリティには、テキスト、オーディオ、テキスト + オーディオが含まれます。

サポートされているモダリティとユースケースの例の表を次に示します。

モダリティ入力	モダリティ出力	ユースケースの例
Text	テキスト + オーディオ	テキスト読み上げ、オーディオブックの生成
Audio	テキスト + オーディオ	オーディオ文字起こし、オーディオブックの生成
Audio	Text	音声文字起こし
テキスト + オーディオ	テキスト + オーディオ	オーディオブックの生成
テキスト + オーディオ	Text	音声文字起こし

オーディオ生成機能を使用することで、より動的で対話型の AI アプリケーションを実現できます。オーディオ入力と出力をサポートするモデルを使用すると、プロンプトに対する音声によるオーディオ応答を生成し、オーディオ入力を使用してモデルにプロンプトを表示できます。

サポートされているモデル

現在 gpt-4o-audio-preview バージョンのみ: 2024-12-17 はオーディオ生成をサポートしています。

gpt-4o-audio-preview モデルは、米国東部 2 リージョンとスウェーデン中部リージョンのグローバルデプロイで使用できます。

現在、オーディオ出力では、Alloy、Echo、Shimmer の音声がサポートされています。

オーディオファイルの最大サイズは 20 MB です。

Note

Realtime API は、入力候補 API と同じ基本となる GPT-4o オーディオモデルを使用しますが、低遅延でリアルタイムのオーディオ操作用に最適化されています。

API のサポート

オーディオ入力候補のサポートは、API バージョン 2025-01-01-preview で最初に追加されました。

このガイドを使用して、Azure OpenAI SDK for Python を使用してオーディオの生成を開始します。

前提条件

Azure サブスクリプション。無料で作成できます。
Python 3.8 以降のバージョン。 Python 3.10 以降を使用することをお勧めしますが、少なくとも Python 3.8 が必要です。適切なバージョンの Python がインストールされていない場合は、オペレーティングシステムへの Python のインストールの最も簡単な方法として、VS Code Python チュートリアルの手順に従うことができます。
米国東部 2 またはスウェーデン中部リージョンに作成された Azure OpenAI リソース。利用可能なリージョンに関するページを参照してください。
次に、Azure OpenAI リソースを使って gpt-4o-audio-preview モデルをデプロイする必要があります。詳細については、「Azure OpenAI を使用してリソースを作成し、モデルをデプロイする」を参照してください。

Microsoft Entra ID の前提条件

Microsoft Entra ID で推奨されるキーレス認証の場合、次のことを行う必要があります。

Microsoft Entra ID でのキーレス認証に使われる Azure CLI をインストールします。
ユーザーアカウントに Cognitive Services User ロールを割り当てます。 Azure portal の [アクセス制御 (IAM)]>[ロールの割り当ての追加] で、ロールを割り当てることができます。

設定

アプリケーションを含める新しいフォルダー audio-completions-quickstart を作成し、次のコマンドを使用してそのフォルダー内で Visual Studio Code を開きます。
```
mkdir audio-completions-quickstart && code audio-completions-quickstart
```
仮想環境を作成します。 Python 3.10 以降が既にインストールされている場合は、次のコマンドを使用して仮想環境を作成できます:
- Windows
- Linux
- macOS
```
py -3 -m venv .venv
.venv\scripts\activate
```
```
python3 -m venv .venv
source .venv/bin/activate
```
```
python3 -m venv .venv
source .venv/bin/activate
```
Python 環境をアクティブ化するということは、コマンドラインから python または pip を実行する際に、アプリケーションの .venv フォルダーに含まれている Python インタープリターを使用するということを意味します。 deactivate コマンドを使用して Python 仮想環境を終了し、必要に応じて、それを後で再アクティブ化できます。

ヒント

新しい Python 環境を作成してアクティブにし、このチュートリアルに必要なパッケージのインストールに使うことをお勧めします。グローバルな Python インストールにパッケージをインストールしないでください。 Python パッケージをインストールするときは、常に仮想または Conda 環境を使う必要があります。そうしないと、Python のグローバルインストールが損なわれる可能性があります。
次を使用して Python 用の OpenAI クライアントライブラリをインストールします。
```
pip install openai
```
Microsoft Entra ID で推奨されるキーレス認証の場合、次を使って azure-identity パッケージをインストールします。
```
pip install azure-identity
```

リソース情報の取得

Azure OpenAI リソースでアプリケーションを認証するには、次の情報を取得する必要があります。

Microsoft Entra ID
API キー

変数名	値
`AZURE_OPENAI_ENDPOINT`	この値は、Azure portal からリソースを調べる際のキーとエンドポイントセクションにあります。
`AZURE_OPENAI_DEPLOYMENT_NAME`	この値は、モデルのデプロイ時にデプロイに対して選択したカスタム名に対応します。この値は、Azure portal の [リソース管理]>[モデルデプロイ] にあります。
`OPENAI_API_VERSION`	API バージョンの詳細を参照してください。

キーレス認証と環境変数の設定の詳細を参照してください。

変数名	値
`AZURE_OPENAI_ENDPOINT`	この値は、Azure portal からリソースを調べる際のキーとエンドポイントセクションにあります。
`AZURE_OPENAI_API_KEY`	この値は、Azure portal からリソースを調べる際のキーとエンドポイントセクションにあります。 `KEY1` または `KEY2` を使用できます。
`AZURE_OPENAI_DEPLOYMENT_NAME`	この値は、モデルのデプロイ時にデプロイに対して選択したカスタム名に対応します。この値は、Azure portal の [リソース管理]>[モデルデプロイ] にあります。
`OPENAI_API_VERSION`	API バージョンの詳細を参照してください。

API キーの確認と環境変数の設定の詳細を参照してください。

重要

API キーを使用する場合は、それを Azure Key Vault などの別の場所に安全に保存します。 API キーは、コード内に直接含めないようにし、絶対に公開しないでください。

AI サービスのセキュリティの詳細については、「Azure AI サービスに対する要求の認証」を参照してください。

テキスト入力からオーディオを生成する

Microsoft Entra ID
API キー

次のコードを使用して to-audio.py ファイルを作成します。

import requests
import base64 
import os 
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider=get_bearer_token_provider(DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default")

# Set environment variables or edit the corresponding values here.
endpoint = os.environ['AZURE_OPENAI_ENDPOINT']

# Keyless authentication
client=AzureOpenAI(
    azure_ad_token_provider=token_provider,
    azure_endpoint=endpoint,
    api_version="2025-01-01-preview"
)

# Make the audio chat completions request
completion=client.chat.completions.create(
    model="gpt-4o-audio-preview",
    modalities=["text", "audio"],
    audio={"voice": "alloy", "format": "wav"},
    messages=[
        {
            "role": "user",
            "content": "Is a golden retriever a good family dog?"
        }
    ]
)

print(completion.choices[0])

# Write the output audio data to a file
wav_bytes=base64.b64decode(completion.choices[0].message.audio.data)
with open("dog.wav", "wb") as f:
    f.write(wav_bytes)

Python ファイルを実行します。
```
python to-audio.py
```

次のコードを使用して to-audio.py ファイルを作成します。

import base64 
import os 
from openai import AzureOpenAI 

# Set environment variables or edit the corresponding values here.
endpoint = os.environ['AZURE_OPENAI_ENDPOINT']
api_key = os.environ['AZURE_OPENAI_API_KEY']

client = AzureOpenAI(
    api_version="2025-01-01-preview",  
    api_key=api_key,
    azure_endpoint=endpoint
)

# Make the audio chat completions request
completion = client.chat.completions.create(
    model="gpt-4o-audio-preview",
    modalities=["text", "audio"],
    audio={"voice": "alloy", "format": "wav"},
    messages=[
        {
            "role": "user",
            "content": "Is a golden retriever a good family dog?"
        }
    ]
)

print(completion.choices[0])

# Write the output audio data to a file
wav_bytes = base64.b64decode(completion.choices[0].message.audio.data)
with open("dog.wav", "wb") as f:
    f.write(wav_bytes)

Python ファイルを実行します。
```
python to-audio.py
```

応答が返されるまで少し時間がかかります。

テキスト入力からのオーディオ生成の出力

スクリプトは、このスクリプトと同じディレクトリに dog.wav という名前のオーディオファイルを生成します。オーディオファイルには、"ゴールデンレトリバーは適した飼い犬ですか?" というプロンプトに対する音声による応答が含まれています。

オーディオ入力からオーディオとテキストを生成する

Microsoft Entra ID
API キー

次のコードを使用して from-audio.py ファイルを作成します。

import base64
import os
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider=get_bearer_token_provider(DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default")

# Set environment variables or edit the corresponding values here.
endpoint = os.environ['AZURE_OPENAI_ENDPOINT']

# Keyless authentication
client=AzureOpenAI(
    azure_ad_token_provider=token_provider,
    azure_endpoint=endpoint
    api_version="2025-01-01-preview"
)

# Read and encode audio file  
with open('dog.wav', 'rb') as wav_reader: 
    encoded_string = base64.b64encode(wav_reader.read()).decode('utf-8') 

# Make the audio chat completions request
completion = client.chat.completions.create( 
    model="gpt-4o-audio-preview", 
    modalities=["text", "audio"], 
    audio={"voice": "alloy", "format": "wav"}, 
    messages=[ 
        { 
            "role": "user", 
            "content": [ 
                {  
                    "type": "text", 
                    "text": "Describe in detail the spoken audio input." 
                }, 
                { 
                    "type": "input_audio", 
                    "input_audio": { 
                        "data": encoded_string, 
                        "format": "wav" 
                    } 
                } 
            ] 
        }, 
    ] 
) 

print(completion.choices[0].message.audio.transcript)

# Write the output audio data to a file
wav_bytes = base64.b64decode(completion.choices[0].message.audio.data)
with open("analysis.wav", "wb") as f:
    f.write(wav_bytes)

Python ファイルを実行します。
```
python from-audio.py
```

次のコードを使用して from-audio.py ファイルを作成します。

import base64
import os
from openai import AzureOpenAI

# Set environment variables or edit the corresponding values here.
endpoint = os.environ['AZURE_OPENAI_ENDPOINT']
api_key = os.environ['AZURE_OPENAI_API_KEY']

client = AzureOpenAI(
    api_version="2025-01-01-preview",  
    api_key=api_key, 
    azure_endpoint=endpoint
)

# Read and encode audio file  
with open('dog.wav', 'rb') as wav_reader: 
    encoded_string = base64.b64encode(wav_reader.read()).decode('utf-8') 

# Make the audio chat completions request
completion = client.chat.completions.create( 
    model="gpt-4o-audio-preview", 
    modalities=["text", "audio"], 
    audio={"voice": "alloy", "format": "wav"}, 
    messages=[ 
        { 
            "role": "user", 
            "content": [ 
                {  
                    "type": "text", 
                    "text": "Describe in detail the spoken audio input." 
                }, 
                { 
                    "type": "input_audio", 
                    "input_audio": { 
                        "data": encoded_string, 
                        "format": "wav" 
                    } 
                } 
            ] 
        }, 
    ] 
) 

print(completion.choices[0].message.audio.transcript)

# Write the output audio data to a file
wav_bytes = base64.b64decode(completion.choices[0].message.audio.data)
with open("analysis.wav", "wb") as f:
    f.write(wav_bytes)

Python ファイルを実行します。
```
python from-audio.py
```

応答が返されるまで少し時間がかかります。

オーディオ入力からのオーディオとテキスト生成の出力

このスクリプトでは、音声によるオーディオ入力の要約のトランスクリプトを生成します。また、スクリプトと同じディレクトリに analysis.wav という名前のオーディオファイルも生成されます。オーディオファイルには、プロンプトに対する音声による応答が含まれています。

オーディオを生成し、マルチターンのチャット入力候補を使用する

Microsoft Entra ID
API キー

次のコードを使用して multi-turn.py ファイルを作成します。

import base64 
import os 
from openai import AzureOpenAI 
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider=get_bearer_token_provider(DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default")

# Set environment variables or edit the corresponding values here.
endpoint = os.environ['AZURE_OPENAI_ENDPOINT']

# Keyless authentication
client=AzureOpenAI(
    azure_ad_token_provider=token_provider,
    azure_endpoint=endpoint,
    api_version="2025-01-01-preview"
)

# Read and encode audio file  
with open('dog.wav', 'rb') as wav_reader: 
    encoded_string = base64.b64encode(wav_reader.read()).decode('utf-8') 

# Initialize messages with the first turn's user input 
messages = [
    { 
        "role": "user", 
        "content": [ 
            { "type": "text", "text": "Describe in detail the spoken audio input." }, 
            { "type": "input_audio", 
                "input_audio": { 
                    "data": encoded_string, 
                    "format": "wav" 
                } 
            } 
        ] 
    }] 

# Get the first turn's response

completion = client.chat.completions.create( 
    model="gpt-4o-audio-preview", 
    modalities=["text", "audio"], 
    audio={"voice": "alloy", "format": "wav"}, 
    messages=messages
) 

print("Get the first turn's response:")
print(completion.choices[0].message.audio.transcript) 

print("Add a history message referencing the first turn's audio by ID:")
print(completion.choices[0].message.audio.id)

# Add a history message referencing the first turn's audio by ID 
messages.append({ 
    "role": "assistant", 
    "audio": { "id": completion.choices[0].message.audio.id } 
}) 

# Add the next turn's user message 
messages.append({ 
    "role": "user", 
    "content": "Very briefly, summarize the favorability." 
}) 

# Send the follow-up request with the accumulated messages
completion = client.chat.completions.create( 
    model="gpt-4o-audio-preview", 
    messages=messages
) 

print("Very briefly, summarize the favorability.")
print(completion.choices[0].message.content)

Python ファイルを実行します。
```
python multi-turn.py
```

次のコードを使用して multi-turn.py ファイルを作成します。

import base64 
import os 
from openai import AzureOpenAI 

# Set environment variables or edit the corresponding values here.
endpoint = os.environ['AZURE_OPENAI_ENDPOINT']
api_key = os.environ['AZURE_OPENAI_API_KEY']

client = AzureOpenAI(
    api_version="2025-01-01-preview",  
    api_key=api_key, 
    azure_endpoint=endpoint
)

# Read and encode audio file  
with open('dog.wav', 'rb') as wav_reader: 
    encoded_string = base64.b64encode(wav_reader.read()).decode('utf-8') 

# Initialize messages with the first turn's user input 
messages = [
    { 
        "role": "user", 
        "content": [ 
            { "type": "text", "text": "Describe in detail the spoken audio input." }, 
            { "type": "input_audio", 
                "input_audio": { 
                    "data": encoded_string, 
                    "format": "wav" 
                } 
            } 
        ] 
    }] 

# Get the first turn's response 

completion = client.chat.completions.create( 
    model="gpt-4o-audio-preview", 
    modalities=["text", "audio"], 
    audio={"voice": "alloy", "format": "wav"}, 
    messages=messages
) 

print("Get the first turn's response:")
print(completion.choices[0].message.audio.transcript) 

print("Add a history message referencing the first turn's audio by ID:")
print(completion.choices[0].message.audio.id)

# Add a history message referencing the first turn's audio by ID 
messages.append({ 
    "role": "assistant", 
    "audio": { "id": completion.choices[0].message.audio.id } 
}) 

# Add the next turn's user message 
messages.append({ 
    "role": "user", 
    "content": "Very briefly, summarize the favorability." 
}) 

# Send the follow-up request with the accumulated messages 
completion = client.chat.completions.create( 
    model="gpt-4o-audio-preview", 
    messages=messages
) 

print("Very briefly, summarize the favorability.")
print(completion.choices[0].message.content)

Python ファイルを実行します。
```
python multi-turn.py
```

応答が返されるまで少し時間がかかります。

マルチターンのチャット入力候補の出力

このスクリプトでは、音声によるオーディオ入力の要約のトランスクリプトを生成します。次に、マルチターンのチャット入力候補を作成して、音声によるオーディオ入力を簡単に要約します。

Rest API 仕様 |

gpt-4o-audio-preview モデルでは、既存の /chat/completions API にオーディオモダリティが導入されます。オーディオモデルは、テキストおよび音声ベースの対話とオーディオ分析における AI アプリケーションの可能性を広げます。 gpt-4o-audio-preview モデルでサポートされるモダリティには、テキスト、オーディオ、テキスト + オーディオが含まれます。

サポートされているモダリティとユースケースの例の表を次に示します。

モダリティ入力	モダリティ出力	ユースケースの例
Text	テキスト + オーディオ	テキスト読み上げ、オーディオブックの生成
Audio	テキスト + オーディオ	オーディオ文字起こし、オーディオブックの生成
Audio	Text	音声文字起こし
テキスト + オーディオ	テキスト + オーディオ	オーディオブックの生成
テキスト + オーディオ	Text	音声文字起こし

オーディオ生成機能を使用することで、より動的で対話型の AI アプリケーションを実現できます。オーディオ入力と出力をサポートするモデルを使用すると、プロンプトに対する音声によるオーディオ応答を生成し、オーディオ入力を使用してモデルにプロンプトを表示できます。

サポートされているモデル

現在 gpt-4o-audio-preview バージョンのみ: 2024-12-17 はオーディオ生成をサポートしています。

gpt-4o-audio-preview モデルは、米国東部 2 リージョンとスウェーデン中部リージョンのグローバルデプロイで使用できます。

現在、オーディオ出力では、Alloy、Echo、Shimmer の音声がサポートされています。

オーディオファイルの最大サイズは 20 MB です。

Note

Realtime API は、入力候補 API と同じ基本となる GPT-4o オーディオモデルを使用しますが、低遅延でリアルタイムのオーディオ操作用に最適化されています。

API のサポート

オーディオ入力候補のサポートは、API バージョン 2025-01-01-preview で最初に追加されました。

前提条件

Azure サブスクリプション。無料で作成できます。
Python 3.8 以降のバージョン。 Python 3.10 以降を使用することをお勧めしますが、少なくとも Python 3.8 が必要です。適切なバージョンの Python がインストールされていない場合は、オペレーティングシステムへの Python のインストールの最も簡単な方法として、VS Code Python チュートリアルの手順に従うことができます。
米国東部 2 またはスウェーデン中部リージョンに作成された Azure OpenAI リソース。利用可能なリージョンに関するページを参照してください。
次に、Azure OpenAI リソースを使って gpt-4o-audio-preview モデルをデプロイする必要があります。詳細については、「Azure OpenAI を使用してリソースを作成し、モデルをデプロイする」を参照してください。

Microsoft Entra ID の前提条件

Microsoft Entra ID で推奨されるキーレス認証の場合、次のことを行う必要があります。

Microsoft Entra ID でのキーレス認証に使われる Azure CLI をインストールします。
ユーザーアカウントに Cognitive Services User ロールを割り当てます。 Azure portal の [アクセス制御 (IAM)]>[ロールの割り当ての追加] で、ロールを割り当てることができます。

設定

アプリケーションを含める新しいフォルダー audio-completions-quickstart を作成し、次のコマンドを使用してそのフォルダー内で Visual Studio Code を開きます。
```
mkdir audio-completions-quickstart && code audio-completions-quickstart
```
仮想環境を作成します。 Python 3.10 以降が既にインストールされている場合は、次のコマンドを使用して仮想環境を作成できます:
- Windows
- Linux
- macOS
```
py -3 -m venv .venv
.venv\scripts\activate
```
```
python3 -m venv .venv
source .venv/bin/activate
```
```
python3 -m venv .venv
source .venv/bin/activate
```
Python 環境をアクティブ化するということは、コマンドラインから python または pip を実行する際に、アプリケーションの .venv フォルダーに含まれている Python インタープリターを使用するということを意味します。 deactivate コマンドを使用して Python 仮想環境を終了し、必要に応じて、それを後で再アクティブ化できます。

ヒント

新しい Python 環境を作成してアクティブにし、このチュートリアルに必要なパッケージのインストールに使うことをお勧めします。グローバルな Python インストールにパッケージをインストールしないでください。 Python パッケージをインストールするときは、常に仮想または Conda 環境を使う必要があります。そうしないと、Python のグローバルインストールが損なわれる可能性があります。
次を使用して Python 用の OpenAI クライアントライブラリをインストールします。
```
pip install openai
```
Microsoft Entra ID で推奨されるキーレス認証の場合、次を使って azure-identity パッケージをインストールします。
```
pip install azure-identity
```

リソース情報の取得

Azure OpenAI リソースでアプリケーションを認証するには、次の情報を取得する必要があります。

Microsoft Entra ID
API キー

変数名	値
`AZURE_OPENAI_ENDPOINT`	この値は、Azure portal からリソースを調べる際のキーとエンドポイントセクションにあります。
`AZURE_OPENAI_DEPLOYMENT_NAME`	この値は、モデルのデプロイ時にデプロイに対して選択したカスタム名に対応します。この値は、Azure portal の [リソース管理]>[モデルデプロイ] にあります。
`OPENAI_API_VERSION`	API バージョンの詳細を参照してください。

キーレス認証と環境変数の設定の詳細を参照してください。

変数名	値
`AZURE_OPENAI_ENDPOINT`	この値は、Azure portal からリソースを調べる際のキーとエンドポイントセクションにあります。
`AZURE_OPENAI_API_KEY`	この値は、Azure portal からリソースを調べる際のキーとエンドポイントセクションにあります。 `KEY1` または `KEY2` を使用できます。
`AZURE_OPENAI_DEPLOYMENT_NAME`	この値は、モデルのデプロイ時にデプロイに対して選択したカスタム名に対応します。この値は、Azure portal の [リソース管理]>[モデルデプロイ] にあります。
`OPENAI_API_VERSION`	API バージョンの詳細を参照してください。

API キーの確認と環境変数の設定の詳細を参照してください。

重要

API キーを使用する場合は、それを Azure Key Vault などの別の場所に安全に保存します。 API キーは、コード内に直接含めないようにし、絶対に公開しないでください。

AI サービスのセキュリティの詳細については、「Azure AI サービスに対する要求の認証」を参照してください。

テキスト入力からオーディオを生成する

Microsoft Entra ID
API キー

次のコードを使用して to-audio.py ファイルを作成します。

import requests
import base64 
import os 
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential

# Set environment variables or edit the corresponding values here.
endpoint = os.environ['AZURE_OPENAI_ENDPOINT']

# Keyless authentication
credential = DefaultAzureCredential()
token = credential.get_token("https://cognitiveservices.azure.com/.default")

api_version = '2025-01-01-preview'
url = f"{endpoint}/openai/deployments/gpt-4o-audio-preview/chat/completions?api-version={api_version}"
headers= { "Authorization": f"Bearer {token.token}", "Content-Type": "application/json" }
body = {
  "modalities": ["audio", "text"],
  "model": "gpt-4o-audio-preview",
  "audio": {
      "format": "wav",
      "voice": "alloy"
  },
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Is a golden retriever a good family dog?"
        }
      ]
    }
  ]
}

# Make the audio chat completions request
completion = requests.post(url, headers=headers, json=body)
audio_data = completion.json()['choices'][0]['message']['audio']['data']

# Write the output audio data to a file
wav_bytes = base64.b64decode(audio_data)
with open("dog.wav", "wb") as f: 
  f.write(wav_bytes)

Python ファイルを実行します。
```
python to-audio.py
```

次のコードを使用して to-audio.py ファイルを作成します。

import requests
import base64 
import os 
from openai import AzureOpenAI 

# Set environment variables or edit the corresponding values here.
endpoint = os.environ['AZURE_OPENAI_ENDPOINT']
api_key = os.environ['AZURE_OPENAI_API_KEY']

api_version = '2025-01-01-preview'
url = f"{endpoint}/openai/deployments/gpt-4o-audio-preview/chat/completions?api-version={api_version}"
headers= { "api-key": api_key, "Content-Type": "application/json" }
body = {
  "modalities": ["audio", "text"],
  "model": "gpt-4o-audio-preview",
  "audio": {
      "format": "wav",
      "voice": "alloy"
  },
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Is a golden retriever a good family dog?"
        }
      ]
    }
  ]
}

# Make the audio chat completions request
completion = requests.post(url, headers=headers, json=body)
audio_data = completion.json()['choices'][0]['message']['audio']['data']

# Write the output audio data to a file 
wav_bytes = base64.b64decode(audio_data)
with open("dog.wav", "wb") as f: 
  f.write(wav_bytes)

Python ファイルを実行します。
```
python to-audio.py
```

応答が返されるまで少し時間がかかります。

テキスト入力からのオーディオ生成の出力

スクリプトは、このスクリプトと同じディレクトリに dog.wav という名前のオーディオファイルを生成します。オーディオファイルには、"ゴールデンレトリバーは適した飼い犬ですか?" というプロンプトに対する音声による応答が含まれています。

オーディオ入力からオーディオとテキストを生成する

Microsoft Entra ID
API キー

次のコードを使用して from-audio.py ファイルを作成します。

import requests
import base64
import os
from azure.identity import DefaultAzureCredential

# Set environment variables or edit the corresponding values here.
endpoint = os.environ['AZURE_OPENAI_ENDPOINT']

# Keyless authentication
credential = DefaultAzureCredential()
token = credential.get_token("https://cognitiveservices.azure.com/.default")

# Read and encode audio file  
with open('dog.wav', 'rb') as wav_reader: 
  encoded_string = base64.b64encode(wav_reader.read()).decode('utf-8') 

api_version = '2025-01-01-preview'
url = f"{endpoint}/openai/deployments/gpt-4o-audio-preview/chat/completions?api-version={api_version}"
headers= { "Authorization": f"Bearer {token.token}", "Content-Type": "application/json" }
body = {
  "modalities": ["audio", "text"],
  "model": "gpt-4o-audio-preview",
  "audio": {
      "format": "wav",
      "voice": "alloy"
  },
  "messages": [
    { 
        "role": "user", 
        "content": [ 
            {  
                "type": "text", 
                "text": "Describe in detail the spoken audio input." 
            }, 
            { 
                "type": "input_audio", 
                "input_audio": { 
                    "data": encoded_string, 
                    "format": "wav" 
                } 
            } 
        ] 
    }, 
  ]
}

completion = requests.post(url, headers=headers, json=body)

print(completion.json()['choices'][0]['message']['audio']['transcript'])

# Write the output audio data to a file
audio_data = completion.json()['choices'][0]['message']['audio']['data'] 
wav_bytes = base64.b64decode(audio_data)
with open("analysis.wav", "wb") as f: 
  f.write(wav_bytes)

Python ファイルを実行します。
```
python from-audio.py
```

次のコードを使用して from-audio.py ファイルを作成します。

import requests
import base64
import os

# Set environment variables or edit the corresponding values here.
endpoint = os.environ['AZURE_OPENAI_ENDPOINT']
api_key = os.environ['AZURE_OPENAI_API_KEY']

# Read and encode audio file  
with open('dog.wav', 'rb') as wav_reader: 
  encoded_string = base64.b64encode(wav_reader.read()).decode('utf-8') 

api_version = '2025-01-01-preview'
url = f"{endpoint}/openai/deployments/gpt-4o-audio-preview/chat/completions?api-version={api_version}"
headers= { "api-key": api_key, "Content-Type": "application/json" }
body = {
  "modalities": ["audio", "text"],
  "model": "gpt-4o-audio-preview",
  "audio": {
      "format": "wav",
      "voice": "alloy"
  },
  "messages": [
    { 
        "role": "user", 
        "content": [ 
            {  
                "type": "text", 
                "text": "Describe in detail the spoken audio input." 
            }, 
            { 
                "type": "input_audio", 
                "input_audio": { 
                    "data": encoded_string, 
                    "format": "wav" 
                } 
            } 
        ] 
    }, 
  ]
}

completion = requests.post(url, headers=headers, json=body)

print(completion.json()['choices'][0]['message']['audio']['transcript'])

# Write the output audio data to a file
audio_data = completion.json()['choices'][0]['message']['audio']['data'] 
wav_bytes = base64.b64decode(audio_data)
with open("analysis.wav", "wb") as f: 
  f.write(wav_bytes)

Python ファイルを実行します。
```
python from-audio.py
```

応答が返されるまで少し時間がかかります。

オーディオ入力からのオーディオとテキスト生成の出力

このスクリプトでは、音声によるオーディオ入力の要約のトランスクリプトを生成します。また、スクリプトと同じディレクトリに analysis.wav という名前のオーディオファイルも生成されます。オーディオファイルには、プロンプトに対する音声による応答が含まれています。

オーディオを生成し、マルチターンのチャット入力候補を使用する

Microsoft Entra ID
API キー

次のコードを使用して multi-turn.py ファイルを作成します。

import requests
import base64 
import os 
from openai import AzureOpenAI 
from azure.identity import DefaultAzureCredential

# Set environment variables or edit the corresponding values here.
endpoint = os.environ['AZURE_OPENAI_ENDPOINT']

# Keyless authentication
credential = DefaultAzureCredential()
token = credential.get_token("https://cognitiveservices.azure.com/.default")

api_version = '2025-01-01-preview'
url = f"{endpoint}/openai/deployments/gpt-4o-audio-preview/chat/completions?api-version={api_version}"
headers= { "Authorization": f"Bearer {token.token}", "Content-Type": "application/json" }

# Read and encode audio file  
with open('dog.wav', 'rb') as wav_reader: 
  encoded_string = base64.b64encode(wav_reader.read()).decode('utf-8') 

# Initialize messages with the first turn's user input 
messages = [
    { 
        "role": "user", 
        "content": [ 
            {  
                "type": "text", 
                "text": "Describe in detail the spoken audio input." 
            }, 
            { 
                "type": "input_audio", 
                "input_audio": { 
                    "data": encoded_string, 
                    "format": "wav" 
                } 
            } 
        ] 
    }] 

body = {
  "modalities": ["audio", "text"],
  "model": "gpt-4o-audio-preview",
  "audio": {
      "format": "wav",
      "voice": "alloy"
  },
  "messages": messages
}

# Get the first turn's response, including generated audio 
completion = requests.post(url, headers=headers, json=body)

print("Get the first turn's response:")
print(completion.json()['choices'][0]['message']['audio']['transcript']) 

print("Add a history message referencing the first turn's audio by ID:")
print(completion.json()['choices'][0]['message']['audio']['id'])

# Add a history message referencing the first turn's audio by ID 
messages.append({ 
    "role": "assistant", 
    "audio": { "id": completion.json()['choices'][0]['message']['audio']['id'] } 
}) 

# Add the next turn's user message 
messages.append({ 
    "role": "user", 
    "content": "Very briefly, summarize the favorability." 
}) 

body = {
  "model": "gpt-4o-audio-preview",
  "messages": messages
}

# Send the follow-up request with the accumulated messages
completion = requests.post(url, headers=headers, json=body) 

print("Very briefly, summarize the favorability.")
print(completion.json()['choices'][0]['message']['content'])

Python ファイルを実行します。
```
python multi-turn.py
```

次のコードを使用して multi-turn.py ファイルを作成します。

import requests
import base64 
import os 
from openai import AzureOpenAI 

# Set environment variables or edit the corresponding values here.
endpoint = os.environ['AZURE_OPENAI_ENDPOINT']
api_key = os.environ['AZURE_OPENAI_API_KEY']

api_version = '2025-01-01-preview'
url = f"{endpoint}/openai/deployments/gpt-4o-audio-preview/chat/completions?api-version={api_version}"
headers= { "api-key": api_key, "Content-Type": "application/json" }

# Read and encode audio file  
with open('dog.wav', 'rb') as wav_reader: 
  encoded_string = base64.b64encode(wav_reader.read()).decode('utf-8') 

# Initialize messages with the first turn's user input 
messages = [
    { 
        "role": "user", 
        "content": [ 
            {  
                "type": "text", 
                "text": "Describe in detail the spoken audio input." 
            }, 
            { 
                "type": "input_audio", 
                "input_audio": { 
                    "data": encoded_string, 
                    "format": "wav" 
                } 
            } 
        ] 
    }] 

body = {
  "modalities": ["audio", "text"],
  "model": "gpt-4o-audio-preview",
  "audio": {
      "format": "wav",
      "voice": "alloy"
  },
  "messages": messages
}


# Get the first turn's response, including generated audio 
completion = requests.post(url, headers=headers, json=body)

print("Get the first turn's response:")
print(completion.json()['choices'][0]['message']['audio']['transcript']) 

print("Add a history message referencing the first turn's audio by ID:")
print(completion.json()['choices'][0]['message']['audio']['id'])

# Add a history message referencing the first turn's audio by ID 
messages.append({ 
    "role": "assistant", 
    "audio": { "id": completion.json()['choices'][0]['message']['audio']['id'] } 
}) 

# Add the next turn's user message 
messages.append({ 
    "role": "user", 
    "content": "Very briefly, summarize the favorability." 
}) 

body = {
  "model": "gpt-4o-audio-preview",
  "messages": messages
}

# Send the follow-up request with the accumulated messages
completion = requests.post(url, headers=headers, json=body) 

print("Very briefly, summarize the favorability.")
print(completion.json()['choices'][0]['message']['content'])

Python ファイルを実行します。
```
python multi-turn.py
```

応答が返されるまで少し時間がかかります。

マルチターンのチャット入力候補の出力

このスクリプトでは、音声によるオーディオ入力の要約のトランスクリプトを生成します。次に、マルチターンのチャット入力候補を作成して、音声によるオーディオ入力を簡単に要約します。

リファレンスのドキュメント | ライブラリのソースコード | パッケージ (npm) | サンプル

gpt-4o-audio-preview モデルでは、既存の /chat/completions API にオーディオモダリティが導入されます。オーディオモデルは、テキストおよび音声ベースの対話とオーディオ分析における AI アプリケーションの可能性を広げます。 gpt-4o-audio-preview モデルでサポートされるモダリティには、テキスト、オーディオ、テキスト + オーディオが含まれます。

サポートされているモダリティとユースケースの例の表を次に示します。

モダリティ入力	モダリティ出力	ユースケースの例
Text	テキスト + オーディオ	テキスト読み上げ、オーディオブックの生成
Audio	テキスト + オーディオ	オーディオ文字起こし、オーディオブックの生成
Audio	Text	音声文字起こし
テキスト + オーディオ	テキスト + オーディオ	オーディオブックの生成
テキスト + オーディオ	Text	音声文字起こし

オーディオ生成機能を使用することで、より動的で対話型の AI アプリケーションを実現できます。オーディオ入力と出力をサポートするモデルを使用すると、プロンプトに対する音声によるオーディオ応答を生成し、オーディオ入力を使用してモデルにプロンプトを表示できます。

サポートされているモデル

現在 gpt-4o-audio-preview バージョンのみ: 2024-12-17 はオーディオ生成をサポートしています。

gpt-4o-audio-preview モデルは、米国東部 2 リージョンとスウェーデン中部リージョンのグローバルデプロイで使用できます。

現在、オーディオ出力では、Alloy、Echo、Shimmer の音声がサポートされています。

オーディオファイルの最大サイズは 20 MB です。

Note

Realtime API は、入力候補 API と同じ基本となる GPT-4o オーディオモデルを使用しますが、低遅延でリアルタイムのオーディオ操作用に最適化されています。

API のサポート

オーディオ入力候補のサポートは、API バージョン 2025-01-01-preview で最初に追加されました。

前提条件

Azure サブスクリプション - 無料アカウントを作成します
Node.js (LTS または ESM サポート)。
グローバルにインストールされた TypeScript。
米国東部 2 またはスウェーデン中部リージョンに作成された Azure OpenAI リソース。利用可能なリージョンに関するページを参照してください。
次に、Azure OpenAI リソースを使って gpt-4o-audio-preview モデルをデプロイする必要があります。詳細については、「Azure OpenAI を使用してリソースを作成し、モデルをデプロイする」を参照してください。

Microsoft Entra ID の前提条件

Microsoft Entra ID で推奨されるキーレス認証の場合、次のことを行う必要があります。

Microsoft Entra ID でのキーレス認証に使われる Azure CLI をインストールします。
ユーザーアカウントに Cognitive Services User ロールを割り当てます。 Azure portal の [アクセス制御 (IAM)]>[ロールの割り当ての追加] で、ロールを割り当てることができます。

設定

アプリケーションを含める新しいフォルダー audio-completions-quickstart を作成し、次のコマンドを使用してそのフォルダー内で Visual Studio Code を開きます。
```
mkdir audio-completions-quickstart && code audio-completions-quickstart
```
次のコマンドで package.json を作成します。
```
npm init -y
```
次のコマンドを使用して、package.json を ECMAScript に更新します。
```
npm pkg set type=module
```
次を使用して JavaScript 用の OpenAI クライアントライブラリをインストールします。
```
npm install openai
```
Microsoft Entra ID で推奨されるキーレス認証の場合、次を使って @azure/identity パッケージをインストールします。
```
npm install @azure/identity
```

リソース情報の取得

Azure OpenAI リソースでアプリケーションを認証するには、次の情報を取得する必要があります。

Microsoft Entra ID
API キー

変数名	値
`AZURE_OPENAI_ENDPOINT`	この値は、Azure portal からリソースを調べる際のキーとエンドポイントセクションにあります。
`AZURE_OPENAI_DEPLOYMENT_NAME`	この値は、モデルのデプロイ時にデプロイに対して選択したカスタム名に対応します。この値は、Azure portal の [リソース管理]>[モデルデプロイ] にあります。
`OPENAI_API_VERSION`	API バージョンの詳細を参照してください。

キーレス認証と環境変数の設定の詳細を参照してください。

変数名	値
`AZURE_OPENAI_ENDPOINT`	この値は、Azure portal からリソースを調べる際のキーとエンドポイントセクションにあります。
`AZURE_OPENAI_API_KEY`	この値は、Azure portal からリソースを調べる際のキーとエンドポイントセクションにあります。 `KEY1` または `KEY2` を使用できます。
`AZURE_OPENAI_DEPLOYMENT_NAME`	この値は、モデルのデプロイ時にデプロイに対して選択したカスタム名に対応します。この値は、Azure portal の [リソース管理]>[モデルデプロイ] にあります。
`OPENAI_API_VERSION`	API バージョンの詳細を参照してください。

API キーの確認と環境変数の設定の詳細を参照してください。

重要

API キーを使用する場合は、それを Azure Key Vault などの別の場所に安全に保存します。 API キーは、コード内に直接含めないようにし、絶対に公開しないでください。

AI サービスのセキュリティの詳細については、「Azure AI サービスに対する要求の認証」を参照してください。

注意事項

SDK で推奨されるキーレス認証を使用するには、AZURE_OPENAI_API_KEY 環境変数が設定されていないことを確認します。

テキスト入力からオーディオを生成する

Microsoft Entra ID
API キー

次のコードを使用して to-audio.ts ファイルを作成します。

import { writeFileSync } from "node:fs";
import { AzureOpenAI } from "openai/index.mjs";
import {
    DefaultAzureCredential,
    getBearerTokenProvider,
  } from "@azure/identity";

// Set environment variables or edit the corresponding values here.
const endpoint: string = process.env["AZURE_OPENAI_ENDPOINT"] || "AZURE_OPENAI_ENDPOINT";
const apiVersion: string = "2025-01-01-preview"; 
const deployment: string = "gpt-4o-audio-preview"; 

// Keyless authentication 
const getClient = (): AzureOpenAI => {
    const credential = new DefaultAzureCredential();
    const scope = "https://cognitiveservices.azure.com/.default";
    const azureADTokenProvider = getBearerTokenProvider(credential, scope);
    const client = new AzureOpenAI({
      endpoint: endpoint,
      apiVersion: apiVersion,
      azureADTokenProvider,
    });
    return client;
};

const client = getClient();

async function main(): Promise<void> {

    // Make the audio chat completions request
    const response = await client.chat.completions.create({ 
        model: "gpt-4o-audio-preview", 
        modalities: ["text", "audio"], 
        audio: { voice: "alloy", format: "wav" }, 
        messages: [ 
        { 
            role: "user", 
            content: "Is a golden retriever a good family dog?" 
        } 
        ] 
    }); 

  // Inspect returned data 
  console.log(response.choices[0]); 

  // Write the output audio data to a file
  if (response.choices[0].message.audio) {
    writeFileSync( 
      "dog.wav", 
      Buffer.from(response.choices[0].message.audio.data, 'base64'), 
      { encoding: "utf-8" } 
    ); 
  } else {
    console.error("Audio data is null or undefined.");
  }
}

main().catch((err: Error) => {
  console.error("Error occurred:", err);
});

export { main };

TypeScript コードをトランスパイルするために tsconfig.json ファイルを作成して、ECMAScript 向けの次のコードをコピーします。

{
    "compilerOptions": {
      "module": "NodeNext",
      "target": "ES2022", // Supports top-level await
      "moduleResolution": "NodeNext",
      "skipLibCheck": true, // Avoid type errors from node_modules
      "strict": true // Enable strict type-checking options
    },
    "include": ["*.ts"]
}

TypeScript から JavaScript にトランスパイルします。
```
tsc
```
次のコマンドを使用して Azure にサインインします。
```
az login
```
次のコマンドを使用して、コードを実行します。
```
node to-audio.js
```

次のコードを使用して to-audio.ts ファイルを作成します。

import { writeFileSync } from "node:fs";
import { AzureOpenAI } from "openai/index.mjs";

// Set environment variables or edit the corresponding values here.
const endpoint: string = process.env["AZURE_OPENAI_ENDPOINT"] || "AZURE_OPENAI_ENDPOINT";
const apiKey: string = process.env["AZURE_OPENAI_API_KEY"] || "AZURE_OPENAI_API_KEY";
const apiVersion: string = "2025-01-01-preview"; 
const deployment: string = "gpt-4o-audio-preview"; 

const client = new AzureOpenAI({ 
  endpoint, 
  apiKey, 
  apiVersion, 
  deployment 
});  

async function main(): Promise<void> {

    // Make the audio chat completions request
    const response = await client.chat.completions.create({ 
        model: "gpt-4o-audio-preview", 
        modalities: ["text", "audio"], 
        audio: { voice: "alloy", format: "wav" }, 
        messages: [ 
        { 
            role: "user", 
            content: "Is a golden retriever a good family dog?" 
        } 
        ] 
    }); 

  // Inspect returned data 
  console.log(response.choices[0]); 

  // Write the output audio data to a file
  if (response.choices[0].message.audio) {
    writeFileSync( 
      "dog.wav", 
      Buffer.from(response.choices[0].message.audio.data, 'base64'), 
      { encoding: "utf-8" } 
    ); 
  } else {
    console.error("Audio data is null or undefined.");
  }
}

main().catch((err: Error) => {
  console.error("Error occurred:", err);
});

export { main };

TypeScript コードをトランスパイルするために tsconfig.json ファイルを作成して、ECMAScript 向けの次のコードをコピーします。

{
    "compilerOptions": {
      "module": "NodeNext",
      "target": "ES2022", // Supports top-level await
      "moduleResolution": "NodeNext",
      "skipLibCheck": true, // Avoid type errors from node_modules
      "strict": true // Enable strict type-checking options
    },
    "include": ["*.ts"]
}

TypeScript から JavaScript にトランスパイルします。
```
tsc
```
次のコマンドを使用して、コードを実行します。
```
node to-audio.js
```

応答が返されるまで少し時間がかかります。

テキスト入力からのオーディオ生成の出力

スクリプトは、このスクリプトと同じディレクトリに dog.wav という名前のオーディオファイルを生成します。オーディオファイルには、"ゴールデンレトリバーは適した飼い犬ですか?" というプロンプトに対する音声による応答が含まれています。

オーディオ入力からオーディオとテキストを生成する

Microsoft Entra ID
API キー

次のコードを使用して from-audio.ts ファイルを作成します。

import { AzureOpenAI } from "openai";
import { writeFileSync } from "node:fs";
import { promises as fs } from 'fs';
import {
    DefaultAzureCredential,
    getBearerTokenProvider,
  } from "@azure/identity";

// Set environment variables or edit the corresponding values here.
const endpoint: string = process.env["AZURE_OPENAI_ENDPOINT"] || "AZURE_OPENAI_ENDPOINT";
const apiVersion: string = "2025-01-01-preview"; 
const deployment: string = "gpt-4o-audio-preview"; 

// Keyless authentication 
const getClient = (): AzureOpenAI => {
    const credential = new DefaultAzureCredential();
    const scope = "https://cognitiveservices.azure.com/.default";
    const azureADTokenProvider = getBearerTokenProvider(credential, scope);
    const client = new AzureOpenAI({
      endpoint: endpoint,
      apiVersion: apiVersion,
      azureADTokenProvider,
    });
    return client;
};

const client = getClient();

async function main(): Promise<void> {

    // Buffer the audio for input to the chat completion
    const wavBuffer = await fs.readFile("dog.wav"); 
    const base64str = Buffer.from(wavBuffer).toString("base64"); 

    // Make the audio chat completions request
    const response = await client.chat.completions.create({ 
      model: "gpt-4o-audio-preview",
      modalities: ["text", "audio"], 
      audio: { voice: "alloy", format: "wav" },
      messages: [ 
        { 
          role: "user", 
          content: [ 
            { 
              type: "text", 
              text: "Describe in detail the spoken audio input." 
            }, 
            { 
              type: "input_audio", 
              input_audio: { 
                data: base64str, 
                format: "wav" 
              } 
            } 
          ] 
        } 
      ] 
    }); 

    console.log(response.choices[0]); 

    // Write the output audio data to a file
    if (response.choices[0].message.audio) {
        writeFileSync("analysis.wav", Buffer.from(response.choices[0].message.audio.data, 'base64'), { encoding: "utf-8" });
    }
    else {
        console.error("Audio data is null or undefined.");
  }
}

main().catch((err: Error) => {
  console.error("Error occurred:", err);
});

export { main };

TypeScript コードをトランスパイルするために tsconfig.json ファイルを作成して、ECMAScript 向けの次のコードをコピーします。

{
    "compilerOptions": {
      "module": "NodeNext",
      "target": "ES2022", // Supports top-level await
      "moduleResolution": "NodeNext",
      "skipLibCheck": true, // Avoid type errors from node_modules
      "strict": true // Enable strict type-checking options
    },
    "include": ["*.ts"]
}

TypeScript から JavaScript にトランスパイルします。
```
tsc
```
次のコマンドを使用して Azure にサインインします。
```
az login
```
次のコマンドを使用して、コードを実行します。
```
node from-audio.js
```

次のコードを使用して from-audio.ts ファイルを作成します。

import { AzureOpenAI } from "openai";
import { writeFileSync } from "node:fs";
import { promises as fs } from 'fs';

// Set environment variables or edit the corresponding values here.
const endpoint: string = process.env["AZURE_OPENAI_ENDPOINT"] || "AZURE_OPENAI_ENDPOINT";
const apiKey: string = process.env["AZURE_OPENAI_API_KEY"] || "AZURE_OPENAI_API_KEY";
const apiVersion: string = "2025-01-01-preview"; 
const deployment: string = "gpt-4o-audio-preview"; 

const client = new AzureOpenAI({ 
  endpoint, 
  apiKey, 
  apiVersion, 
  deployment 
});  

async function main(): Promise<void> {

  // Buffer the audio for input to the chat completion
  const wavBuffer = await fs.readFile("dog.wav"); 
  const base64str = Buffer.from(wavBuffer).toString("base64"); 

  // Make the audio chat completions request
  const response = await client.chat.completions.create({ 
    model: "gpt-4o-audio-preview",
    modalities: ["text", "audio"], 
    audio: { voice: "alloy", format: "wav" },
    messages: [ 
      { 
        role: "user", 
        content: [ 
          { 
            type: "text", 
            text: "Describe in detail the spoken audio input." 
          }, 
          { 
            type: "input_audio", 
            input_audio: { 
              data: base64str, 
              format: "wav" 
            } 
          } 
        ] 
      } 
    ] 
  }); 

  console.log(response.choices[0]); 

  // Write the output audio data to a file
  if (response.choices[0].message.audio) {
      writeFileSync("analysis.wav", Buffer.from(response.choices[0].message.audio.data, 'base64'), { encoding: "utf-8" });
  }
  else {
      console.error("Audio data is null or undefined.");
}
}

main().catch((err: Error) => {
console.error("Error occurred:", err);
});

export { main };

TypeScript コードをトランスパイルするために tsconfig.json ファイルを作成して、ECMAScript 向けの次のコードをコピーします。

{
    "compilerOptions": {
      "module": "NodeNext",
      "target": "ES2022", // Supports top-level await
      "moduleResolution": "NodeNext",
      "skipLibCheck": true, // Avoid type errors from node_modules
      "strict": true // Enable strict type-checking options
    },
    "include": ["*.ts"]
}

TypeScript から JavaScript にトランスパイルします。
```
tsc
```
次のコマンドを使用して、コードを実行します。
```
node from-audio.js
```

応答が返されるまで少し時間がかかります。

オーディオ入力からのオーディオとテキスト生成の出力

このスクリプトでは、音声によるオーディオ入力の要約のトランスクリプトを生成します。また、スクリプトと同じディレクトリに analysis.wav という名前のオーディオファイルも生成されます。オーディオファイルには、プロンプトに対する音声による応答が含まれています。

オーディオを生成し、マルチターンのチャット入力候補を使用する

Microsoft Entra ID
API キー

次のコードを使用して multi-turn.ts ファイルを作成します。

import { AzureOpenAI } from "openai/index.mjs";
import { promises as fs } from 'fs';
import { ChatCompletionMessageParam } from "openai/resources/index.mjs";
import {
    DefaultAzureCredential,
    getBearerTokenProvider,
  } from "@azure/identity";

// Set environment variables or edit the corresponding values here.
const endpoint: string = process.env["AZURE_OPENAI_ENDPOINT"] || "AZURE_OPENAI_ENDPOINT";
const apiVersion: string = "2025-01-01-preview"; 
const deployment: string = "gpt-4o-audio-preview"; 

// Keyless authentication 
const getClient = (): AzureOpenAI => {
    const credential = new DefaultAzureCredential();
    const scope = "https://cognitiveservices.azure.com/.default";
    const azureADTokenProvider = getBearerTokenProvider(credential, scope);
    const client = new AzureOpenAI({
      endpoint: endpoint,
      apiVersion: apiVersion,
      azureADTokenProvider,
    });
    return client;
};

const client = getClient(); 

async function main(): Promise<void> {

    // Buffer the audio for input to the chat completion
    const wavBuffer = await fs.readFile("dog.wav"); 
    const base64str = Buffer.from(wavBuffer).toString("base64"); 

    // Initialize messages with the first turn's user input 
    const messages: ChatCompletionMessageParam[] = [
      {
        role: "user",
        content: [
          { 
            type: "text", 
            text: "Describe in detail the spoken audio input." 
          },
          { 
            type: "input_audio", 
            input_audio: { 
              data: base64str, 
              format: "wav" 
            } 
          }
        ]
      }
    ];

    // Get the first turn's response 

    const response = await client.chat.completions.create({ 
        model: "gpt-4o-audio-preview",
        modalities: ["text", "audio"], 
        audio: { voice: "alloy", format: "wav" }, 
        messages: messages
    }); 

    console.log(response.choices[0]); 

    // Add a history message referencing the previous turn's audio by ID 
    messages.push({ 
        role: "assistant", 
        audio: response.choices[0].message.audio ? { id: response.choices[0].message.audio.id } : undefined
    });

    // Add a new user message for the second turn
    messages.push({ 
        role: "user", 
        content: [ 
            { 
              type: "text", 
              text: "Very concisely summarize the favorability." 
            } 
        ] 
    }); 

    // Send the follow-up request with the accumulated messages
    const followResponse = await client.chat.completions.create({ 
        model: "gpt-4o-audio-preview",
        messages: messages
    });

    console.log(followResponse.choices[0].message.content); 
}

main().catch((err: Error) => {
  console.error("Error occurred:", err);
});

export { main };

TypeScript コードをトランスパイルするために tsconfig.json ファイルを作成して、ECMAScript 向けの次のコードをコピーします。

{
    "compilerOptions": {
      "module": "NodeNext",
      "target": "ES2022", // Supports top-level await
      "moduleResolution": "NodeNext",
      "skipLibCheck": true, // Avoid type errors from node_modules
      "strict": true // Enable strict type-checking options
    },
    "include": ["*.ts"]
}

TypeScript から JavaScript にトランスパイルします。
```
tsc
```
次のコマンドを使用して Azure にサインインします。
```
az login
```
次のコマンドを使用して、コードを実行します。
```
node multi-turn.js
```

次のコードを使用して multi-turn.ts ファイルを作成します。

import { AzureOpenAI } from "openai/index.mjs";
import { promises as fs } from 'fs';
import { ChatCompletionMessageParam } from "openai/resources/index.mjs";

// Set environment variables or edit the corresponding values here.
const endpoint: string = process.env["AZURE_OPENAI_ENDPOINT"] || "AZURE_OPENAI_ENDPOINT" as string;
const apiKey: string = process.env["AZURE_OPENAI_API_KEY"] || "AZURE_OPENAI_API_KEY";
const apiVersion: string = "2025-01-01-preview"; 
const deployment: string = "gpt-4o-audio-preview"; 

const client = new AzureOpenAI({ 
  endpoint, 
  apiKey, 
  apiVersion, 
  deployment 
});  

async function main(): Promise<void> {

    // Buffer the audio for input to the chat completion
    const wavBuffer = await fs.readFile("dog.wav"); 
    const base64str = Buffer.from(wavBuffer).toString("base64"); 

    // Initialize messages with the first turn's user input 
    const messages: ChatCompletionMessageParam[] = [
      {
        role: "user",
        content: [
          { 
            type: "text", 
            text: "Describe in detail the spoken audio input." 
          },
          { 
            type: "input_audio", 
            input_audio: { 
              data: base64str, 
              format: "wav" 
            } 
          }
        ]
      }
    ];

    // Get the first turn's response 

    const response = await client.chat.completions.create({ 
      model: "gpt-4o-audio-preview",
      modalities: ["text", "audio"], 
      audio: { voice: "alloy", format: "wav" }, 
      messages: messages
    }); 

    console.log(response.choices[0]); 

    // Add a history message referencing the previous turn's audio by ID 
    messages.push({ 
        role: "assistant", 
        audio: response.choices[0].message.audio ? { id: response.choices[0].message.audio.id } : undefined
    });

    // Add a new user message for the second turn
    messages.push({ 
        role: "user", 
        content: [ 
            { 
              type: "text", 
              text: "Very concisely summarize the favorability." 
            } 
        ] 
    }); 

    // Send the follow-up request with the accumulated messages
    const followResponse = await client.chat.completions.create({ 
        model: "gpt-4o-audio-preview",
        messages: messages
    });

    console.log(followResponse.choices[0].message.content); 
}

main().catch((err: Error) => {
  console.error("Error occurred:", err);
});

export { main };

TypeScript コードをトランスパイルするために tsconfig.json ファイルを作成して、ECMAScript 向けの次のコードをコピーします。

{
    "compilerOptions": {
      "module": "NodeNext",
      "target": "ES2022", // Supports top-level await
      "moduleResolution": "NodeNext",
      "skipLibCheck": true, // Avoid type errors from node_modules
      "strict": true // Enable strict type-checking options
    },
    "include": ["*.ts"]
}

TypeScript から JavaScript にトランスパイルします。
```
tsc
```
次のコマンドを使用して、コードを実行します。
```
node multi-turn.js
```

応答が返されるまで少し時間がかかります。

マルチターンのチャット入力候補の出力

このスクリプトでは、音声によるオーディオ入力の要約のトランスクリプトを生成します。次に、マルチターンのチャット入力候補を作成して、音声によるオーディオ入力を簡単に要約します。

次の方法で共有

クイックスタート: Azure OpenAI オーディオ生成の作業を開始する

サポートされているモデル

API のサポート

オーディオ生成のためにモデルをデプロイする

GPT-4o オーディオ生成を使用する

サポートされているモデル

API のサポート

前提条件

Microsoft Entra ID の前提条件

設定

リソース情報の取得

テキスト入力からオーディオを生成する

テキスト入力からのオーディオ生成の出力

オーディオ入力からオーディオとテキストを生成する

オーディオ入力からのオーディオとテキスト生成の出力

オーディオを生成し、マルチターンのチャット入力候補を使用する

マルチターンのチャット入力候補の出力

サポートされているモデル

API のサポート

前提条件

Microsoft Entra ID の前提条件

設定

リソース情報の取得

テキスト入力からオーディオを生成する

テキスト入力からのオーディオ生成の出力

オーディオ入力からオーディオとテキストを生成する

オーディオ入力からのオーディオとテキスト生成の出力

オーディオを生成し、マルチターンのチャット入力候補を使用する

マルチターンのチャット入力候補の出力

サポートされているモデル

API のサポート

前提条件

Microsoft Entra ID の前提条件

設定

リソース情報の取得

テキスト入力からオーディオを生成する

テキスト入力からのオーディオ生成の出力

オーディオ入力からオーディオとテキストを生成する

オーディオ入力からのオーディオとテキスト生成の出力

オーディオを生成し、マルチターンのチャット入力候補を使用する

マルチターンのチャット入力候補の出力

サポートされているモデル

API のサポート

前提条件

Microsoft Entra ID の前提条件

設定

リソース情報の取得

テキスト入力からオーディオを生成する

テキスト入力からのオーディオ生成の出力

オーディオ入力からオーディオとテキストを生成する

オーディオ入力からのオーディオとテキスト生成の出力

オーディオを生成し、マルチターンのチャット入力候補を使用する

マルチターンのチャット入力候補の出力

リソースをクリーンアップする

関連するコンテンツ

フィードバック

その他のリソース