予測された出力 (プレビュー)

[アーティクル]
02/04/2025

予測出力により、より大きなテキスト本文への最小限の変更が必要なチャット完了呼び出しのモデル応答待ち時間を改善できます。期待される応答の大部分が既にわかっている応答をモデルに提供するよう求めている場合、予測される出力によって、この要求の待機時間が大幅に短縮される可能性があります。この機能は、オートコンプリート、エラー検出、リアルタイム編集など、開発者やエンドユーザーにとって速度と応答性が重要なコーディングシナリオに特に適しています。モデルですべてのテキストをゼロから再生成するのではなく、既知のテキストを prediction パラメーターに渡すことで、ほとんどの応答が既に認識されていることをモデルに示すことができます。

モデルのサポート

gpt-4o-mini のバージョン: 2024-07-18
gpt-4o のバージョン: 2024-08-06
gpt-4o のバージョン: 2024-11-20

API のサポート

2025-01-01-preview

サポートされていない機能

予測出力は現在、テキストのみです。これらの特徴は、prediction パラメーターおよび予測出力と組み合わせて使用することはできません。

ツール/関数呼び出し
オーディオモデル/入力と出力
n の値は 1 よりも大きい
logprobs
presence_penalty の値は 0 よりも大きい
frequency_penalty の値は 0 よりも大きい
max_completion_tokens

Note

現在、予測出力機能は、東南アジアリージョンのモデルでは使用できません。

概要

予測される出力の基本を示すために、まず、FizzBuzz のインスタンスを MSFTBuzz に置き換えるために、一般的なプログラミング FizzBuzz の問題からコードをリファクタリングするようにモデルに依頼します。サンプルコードをモデルに 2 か所で渡します。最初に、messages 配列/リスト内のユーザーメッセージの一部として、2 回目は新しい prediction パラメーターの内容の一部として使用します。

Python (Microsoft Entra ID)
Python (キーベースの認証)

prediction パラメーターにアクセスするには、OpenAI クライアントライブラリのアップグレードが必要になる場合があります。

pip install openai --upgrade

import os
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
)

client = AzureOpenAI(
  azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"), 
  azure_ad_token_provider=token_provider,
  api_version="2025-01-01-preview"
)

code = """
for number in range(1, 101):
    if number % 3 == 0 and number % 5 == 0:
        print("FizzBuzz")
    elif number % 3 == 0:
        print("Fizz")
    elif number % 5 == 0:
        print("Buzz")
    else:
        print(number)
"""

instructions = """
Replace string `FizzBuzz` with `MSFTBuzz`. Respond only 
with code, and with no markdown formatting.
"""


completion = client.chat.completions.create(
    model="gpt-4o-mini", # replace with your unique model deployment name
    messages=[
        {
            "role": "user",
            "content": instructions
        },
        {
            "role": "user",
            "content": code
        }
    ],
    prediction={
        "type": "content",
        "content": code
    }
)

print(completion.model_dump_json(indent=2))

prediction パラメーターにアクセスするには、OpenAI クライアントライブラリのアップグレードが必要になる場合があります。

pip install openai --upgrade

import os
from openai import AzureOpenAI

client = AzureOpenAI(
  azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"), 
  api_key=os.getenv("AZURE_OPENAI_API_KEY"),  
  api_version="2025-01-01-preview"
)

code = """
for number in range(1, 101):
    if number % 3 == 0 and number % 5 == 0:
        print("FizzBuzz")
    elif number % 3 == 0:
        print("Fizz")
    elif number % 5 == 0:
        print("Buzz")
    else:
        print(number)
"""

instructions = """
Replace string `FizzBuzz` with `MSFTBuzz`. Respond only 
with code, and with no markdown formatting.
"""


completion = client.chat.completions.create(
    model="gpt-4o-mini", # replace with your unique model deployment name
    messages=[
        {
            "role": "user",
            "content": instructions
        },
        {
            "role": "user",
            "content": code
        }
    ],
    prediction={
        "type": "content",
        "content": code
    }
)

print(completion.model_dump_json(indent=2))

出力

{
  "id": "chatcmpl-AskZk3P5QGmefqobDw4Ougo6jLxSP",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "for number in range(1, 101):\n    if number % 3 == 0 and number % 5 == 0:\n        print(\"MSFTBuzz\")\n    elif number % 3 == 0:\n        print(\"Fizz\")\n    elif number % 5 == 0:\n        print(\"Buzz\")\n    else:\n        print(number)",
        "refusal": null,
        "role": "assistant",
        "audio": null,
        "function_call": null,
        "tool_calls": null
      },
      "content_filter_results": {
        "hate": {
          "filtered": false,
          "severity": "safe"
        },
        "protected_material_code": {
          "filtered": false,
          "detected": false
        },
        "protected_material_text": {
          "filtered": false,
          "detected": false
        },
        "self_harm": {
          "filtered": false,
          "severity": "safe"
        },
        "sexual": {
          "filtered": false,
          "severity": "safe"
        },
        "violence": {
          "filtered": false,
          "severity": "safe"
        }
      }
    }
  ],
  "created": 1737612112,
  "model": "gpt-4o-mini-2024-07-18",
  "object": "chat.completion",
  "service_tier": null,
  "system_fingerprint": "fp_5154047bf2",
  "usage": {
    "completion_tokens": 77,
    "prompt_tokens": 124,
    "total_tokens": 201,
    "completion_tokens_details": {
      "accepted_prediction_tokens": 6,
      "audio_tokens": 0,
      "reasoning_tokens": 0,
      "rejected_prediction_tokens": 4
    },
    "prompt_tokens_details": {
      "audio_tokens": 0,
      "cached_tokens": 0
    }
  },
  "prompt_filter_results": [
    {
      "prompt_index": 0,
      "content_filter_results": {
        "hate": {
          "filtered": false,
          "severity": "safe"
        },
        "jailbreak": {
          "filtered": false,
          "detected": false
        },
        "self_harm": {
          "filtered": false,
          "severity": "safe"
        },
        "sexual": {
          "filtered": false,
          "severity": "safe"
        },
        "violence": {
          "filtered": false,
          "severity": "safe"
        }
      }
    }
  ]
}

出力で、accepted_prediction_tokens と rejected_prediction_tokens の新しい応答パラメーターに注目してください:

  "usage": {
    "completion_tokens": 77,
    "prompt_tokens": 124,
    "total_tokens": 201,
    "completion_tokens_details": {
      "accepted_prediction_tokens": 6,
      "audio_tokens": 0,
      "reasoning_tokens": 0,
      "rejected_prediction_tokens": 4
    }

accepted_prediction_tokens はモデルの応答待機時間を短縮するのに役立ちますが、rejected_prediction_tokens は、モデルによって生成される追加の出力トークンと同じコストへの影響を与えます。このため、予測された出力によってモデルの応答時間が向上する一方で、コストが大きくなる可能性があります。モデルのパフォーマンスの向上を評価し、コストの増加の可能性とバランスを取る必要があります。

予測出力を使用しても待機時間が短縮されるわけではないことを理解することも重要です。拒否された予測トークンの割合が、受け入れられた予測トークンよりも多い大きな要求では、モデル応答の待機時間が減少するのではなく、増加する可能性があります。

Note

要求の開始時に設定された最小数の初期トークンが同一の場合にのみ機能するプロンプトキャッシュとは異なり、予測出力はトークンの場所によって制約されません。予測出力の前に返される新しい出力が応答テキストに含まれている場合でも、accepted_prediction_tokens は引き続き発生する可能性があります。

ストリーミング

ストリーミングを有効にして応答を返す場合、予測される出力のパフォーマンス向上は、多くの場合、最も明白です。

Python (Microsoft Entra ID)
Python (キーベースの認証)

prediction パラメーターにアクセスするには、OpenAI クライアントライブラリのアップグレードが必要になる場合があります。

pip install openai --upgrade

import os
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
)

client = AzureOpenAI(
  azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"), 
  azure_ad_token_provider=token_provider,
  api_version="2025-01-01-preview"
)

code = """
for number in range(1, 101):
    if number % 3 == 0 and number % 5 == 0:
        print("FizzBuzz")
    elif number % 3 == 0:
        print("Fizz")
    elif number % 5 == 0:
        print("Buzz")
    else:
        print(number)
"""

instructions = """
Replace string `FizzBuzz` with `MSFTBuzz`. Respond only 
with code, and with no markdown formatting.
"""


completion = client.chat.completions.create(
    model="gpt-4o-mini", # replace with your unique model deployment name
    messages=[
        {
            "role": "user",
            "content": instructions
        },
        {
            "role": "user",
            "content": code
        }
    ],
    prediction={
        "type": "content",
        "content": code
    },
    stream=True
)

for chunk in completion:
    if chunk.choices and chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end='',)

prediction パラメーターにアクセスするには、OpenAI クライアントライブラリのアップグレードが必要になる場合があります。

pip install openai --upgrade

import os
from openai import AzureOpenAI

client = AzureOpenAI(
  azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"), 
  api_key=os.getenv("AZURE_OPENAI_API_KEY"),  
  api_version="2025-01-01-preview"
)

code = """
for number in range(1, 101):
    if number % 3 == 0 and number % 5 == 0:
        print("FizzBuzz")
    elif number % 3 == 0:
        print("Fizz")
    elif number % 5 == 0:
        print("Buzz")
    else:
        print(number)
"""

instructions = """
Replace string `FizzBuzz` with `MSFTBuzz`. Respond only 
with code, and with no markdown formatting.
"""


completion = client.chat.completions.create(
    model="gpt-4o-mini", # replace with your unique model deployment name
    messages=[
        {
            "role": "user",
            "content": instructions
        },
        {
            "role": "user",
            "content": code
        }
    ],
    prediction={
        "type": "content",
        "content": code
    },
    stream=True
)

for chunk in completion:
    if chunk.choices and chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end='',)

次の方法で共有

予測された出力 (プレビュー)

モデルのサポート

API のサポート

サポートされていない機能

概要

出力

ストリーミング

フィードバック

その他のリソース