Azure AI モデル推論エンドポイントを使用してモデルを実行する

[アーティクル]
02/26/2025

Azure AI サービスにおいて Azure AI 推論を使用すると、顧客は 1 つのエンドポイントと資格情報を使用して、フラグシップモデルプロバイダーの最も強力なモデルを実行できるようになります。つまり、1 行のコードも変更することなく、モデルを切り替えてお使いのアプリケーションから実行できます。

この記事では、推論エンドポイントを使用してそれらを呼び出す方法について説明します。

エンドポイント

Azure AI サービスは、ユーザーが求めている作業の種類に応じて複数のエンドポイントを公開します。

Azure AI モデル推論エンドポイント
Azure OpenAI エンドポイント

Azure AI 推論エンドポイント (通常は https://<resource-name>.services.ai.azure.com/models の形式) を使用すると、顧客は同じ認証とスキーマを持つ 1 つのエンドポイントを使用して、リソース内にデプロイされたモデルの推論を生成できるようになります。すべてのモデルでこの機能がサポートされています。このエンドポイントは、Azure AI Model Inference API に従います。

AI サービスにデプロイされた Azure OpenAI モデルでは、Azure OpenAI API もサポートされます (通常は https://<resource-name>.openai.azure.com の形式)。このエンドポイントでは、OpenAI モデルのすべての機能が公開され、アシスタント、スレッド、ファイル、バッチ推論などのさらに多くの機能がサポートされます。

Azure OpenAI エンドポイントを適用する方法の詳細については、Azure OpenAI サービスのドキュメント参照してください。

Azure AI モデル推論エンドポイントでのルーティング機能の使用

推論エンドポイントは、要求の内部の name パラメーターをデプロイの名前と照合することで、要求を特定のデプロイにルーティングします。つまり、"デプロイは、特定の構成下で特定のモデルのエイリアスとして機能する" ということです。この柔軟性により、特定のモデルをサービスで複数回デプロイできますが、必要に応じて異なる構成でデプロイできます。

たとえば、Mistral-large という名前のデプロイを作成した場合、そのようなデプロイを次のようにして呼び出すことができます。

pip のように、パッケージマネージャーを使用してパッケージ azure-ai-inference をインストールします。

pip install azure-ai-inference

その後、パッケージを使用してモデルを使用できます。次の例では、チャット入力候補を使用してクライアントを作成する方法を示します。

import os
from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential

model = ChatCompletionsClient(
    endpoint="https://<resource>.services.ai.azure.com/models",
    credential=AzureKeyCredential(os.environ["AZUREAI_ENDPOINT_KEY"]),
)

サンプルを確認し、API リファレンスドキュメントを参照して、作業を開始してください。

npm を使用してパッケージ @azure-rest/ai-inference をインストールします。

npm install @azure-rest/ai-inference

その後、パッケージを使用してモデルを使用できます。次の例では、チャット入力候補を使用してクライアントを作成する方法を示します。

import ModelClient from "@azure-rest/ai-inference";
import { isUnexpected } from "@azure-rest/ai-inference";
import { AzureKeyCredential } from "@azure/core-auth";

const client = new ModelClient(
    "https://<resource>.services.ai.azure.com/models", 
    new AzureKeyCredential(process.env.AZUREAI_ENDPOINT_KEY)
);

サンプルを確認し、API リファレンスドキュメントを参照して、作業を開始してください。

次のコマンドを使用して Azure AI 推論ライブラリをインストールします:

dotnet add package Azure.AI.Inference --prerelease

次の名前空間をインポートします。

using Azure;
using Azure.Identity;
using Azure.AI.Inference;

その後、パッケージを使用してモデルを使用できます。次の例では、チャット入力候補を使用してクライアントを作成する方法を示します。

ChatCompletionsClient client = new ChatCompletionsClient(
    new Uri("https://<resource>.services.ai.azure.com/models"),
    new AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_INFERENCE_CREDENTIAL"))
);

サンプルを確認し、API リファレンスドキュメントを参照して、作業を開始してください。

パッケージをプロジェクトに追加します。

<dependency>
    <groupId>com.azure</groupId>
    <artifactId>azure-ai-inference</artifactId>
    <version>1.0.0-beta.1</version>
</dependency>

その後、パッケージを使用してモデルを使用できます。次の例では、チャット入力候補を使用してクライアントを作成する方法を示します。

ChatCompletionsClient client = new ChatCompletionsClientBuilder()
    .credential(new AzureKeyCredential("{key}"))
    .endpoint("{endpoint}")
    .buildClient();

サンプルを確認し、API リファレンスドキュメントを参照して、作業を開始してください。

リファレンスセクションを活用して、API の設計と使用可能なパラメーターを調べることができます。たとえば、チャット補完のリファレンスセクションでは、ルート /chat/completions を使用し、チャット形式の指示に基づいて予測を生成する方法について詳しく説明しています。パス /models が URL のルートに含まれていることに注目してください。

Request

POST https://<resource>.services.ai.azure.com/models/chat/completions?api-version=2024-05-01-preview
api-key: <api-key>
Content-Type: application/json

チャットモデルの場合は、次のように要求を作成できます。

from azure.ai.inference.models import SystemMessage, UserMessage

response = client.complete(
    messages=[
        SystemMessage(content="You are a helpful assistant."),
        UserMessage(content="Explain Riemann's conjecture in 1 paragraph"),
    ],
    model="mistral-large"
)

print(response.choices[0].message.content)

var messages = [
    { role: "system", content: "You are a helpful assistant" },
    { role: "user", content: "Explain Riemann's conjecture in 1 paragraph" },
];

var response = await client.path("/chat/completions").post({
    body: {
        messages: messages,
        model: "mistral-large"
    }
});

console.log(response.choices[0].message.content)

requestOptions = new ChatCompletionsOptions()
{
    Messages = {
        new ChatRequestSystemMessage("You are a helpful assistant."),
        new ChatRequestUserMessage("Explain Riemann's conjecture in 1 paragraph")
    },
    Model = "mistral-large"
};

response = client.Complete(requestOptions);
Console.WriteLine($"Response: {response.Value.Content}");

List<ChatRequestMessage> chatMessages = new ArrayList<>();
chatMessages.add(new ChatRequestSystemMessage("You are a helpful assistant"));
chatMessages.add(new ChatRequestUserMessage("Explain Riemann's conjecture in 1 paragraph"));

ChatCompletions chatCompletions = client.complete(new ChatCompletionsOptions(chatMessages));

for (ChatChoice choice : chatCompletions.getChoices()) {
    ChatResponseMessage message = choice.getMessage();
    System.out.println("Response:" + message.getContent());
}

Request

POST https://<resource>.services.ai.azure.com/models/chat/completions?api-version=2024-05-01-preview
api-key: <api-key>
Content-Type: application/json

{
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant"
        },
        {
            "role": "user",
            "content": "Explain Riemann's conjecture in 1 paragraph"
        }
    ],
    "model": "mistral-large"
}

特定のモデルデプロイと一致しないモデル名を指定すると、そのモデルが存在しないというエラーが発生します。モデルデプロイを追加して構成するで説明されているように、モデルデプロイを作成することで、ユーザーに提供するモデルを制御できます。

キーレス認証

Azure AI サービスの Azure AI モデル推論にデプロイされたモデルは、Microsoft Entra ID を使用したキーレス認証をサポートしています。キーレス認証により、セキュリティが強化され、ユーザーエクスペリエンスが簡素化され、運用の複雑さが軽減され、最新の開発に対する堅牢なコンプライアンスサポートが提供されます。これは、セキュリティで保護されたスケーラブルな ID 管理ソリューションを導入する組織にとって強力な選択肢となります。

キーレス認証を使用するには、リソースを構成し、推論を実行するユーザーにアクセス権を付与します。構成したら、次のようにして認証できます。

pip のように、パッケージマネージャーを使用してパッケージ azure-ai-inference をインストールします。

pip install azure-ai-inference

その後、パッケージを使用してモデルを使用できます。次の例は、Entra ID でチャット入力候補を使用するクライアントを作成する方法を示しています。

import os
from azure.ai.inference import ChatCompletionsClient
from azure.identity import DefaultAzureCredential

client = ChatCompletionsClient(
    endpoint="https://<resource>.services.ai.azure.com/models",
    credential=DefaultAzureCredential(),
    credential_scopes=["https://cognitiveservices.azure.com/.default"],
)

npm を使用してパッケージ @azure-rest/ai-inference をインストールします。

npm install @azure-rest/ai-inference

import ModelClient from "@azure-rest/ai-inference";
import { isUnexpected } from "@azure-rest/ai-inference";
import { DefaultAzureCredential } from "@azure/identity";

const clientOptions = { credentials: { "https://cognitiveservices.azure.com" } };

const client = new ModelClient(
    "https://<resource>.services.ai.azure.com/models", 
    new DefaultAzureCredential(),
    clientOptions,
);

次のコマンドを使用して Azure AI 推論ライブラリをインストールします:

dotnet add package Azure.AI.Inference --prerelease

Azure.Identity パッケージのインストール:

dotnet add package Azure.Identity

次の名前空間をインポートします。

using Azure;
using Azure.Identity;
using Azure.AI.Inference;

TokenCredential credential = new DefaultAzureCredential();
AzureAIInferenceClientOptions clientOptions = new AzureAIInferenceClientOptions();
BearerTokenAuthenticationPolicy tokenPolicy = new BearerTokenAuthenticationPolicy(credential, new string[] { "https://cognitiveservices.azure.com/.default" });
clientOptions.AddPolicy(tokenPolicy, HttpPipelinePosition.PerRetry);

ChatCompletionsClient client = new ChatCompletionsClient(
    new Uri("https://<resource>.services.ai.azure.com/models"),
    credential,
    clientOptions.
);

パッケージをプロジェクトに追加します。

<dependency>
    <groupId>com.azure</groupId>
    <artifactId>azure-ai-inference</artifactId>
    <version>1.0.0-beta.1</version>
</dependency>
<dependency>
    <groupId>com.azure</groupId>
    <artifactId>azure-identity</artifactId>
    <version>1.13.3</version>
</dependency>

その後、パッケージを使用してモデルを使用できます。次の例では、チャット入力候補を使用してクライアントを作成する方法を示します。

TokenCredential defaultCredential = new DefaultAzureCredentialBuilder().build();
ChatCompletionsClient client = new ChatCompletionsClientBuilder()
    .credential(defaultCredential)
    .endpoint("https://<resource>.services.ai.azure.com/models")
    .buildClient();

サンプルを確認し、API リファレンスドキュメントを参照して、作業を開始してください。

リファレンスセクションを使用して API 設計と使用可能なパラメーターを調べて、ヘッダー Authorization に認証トークンを指定します。たとえば、チャット補完のリファレンスセクションでは、ルート /chat/completions を使用し、チャット形式の指示に基づいて予測を生成する方法について詳しく説明しています。パス /models が URL のルートに含まれていることに注目してください。

Request

POST https://<resource>.services.ai.azure.com/models/chat/completions?api-version=2024-05-01-preview
Authorization: Bearer <bearer-token>
Content-Type: application/json

トークンはスコープ https://cognitiveservices.azure.com/.default で発行する必要があります。

テスト目的でユーザーアカウントの有効なトークンを取得する最も簡単な方法は、Azure CLI を使用することです。コンソールで、次の Azure CLI コマンドを実行します。

az account get-access-token --resource https://cognitiveservices.azure.com --query "accessToken" --output tsv

制限事項

Azure OpenAI Batch は、Azure AI モデル推論エンドポイントでは使用できません。 Azure OpenAI ドキュメント内の Batch API サポートで説明されているように、専用のデプロイ URL を使用する必要があります。
リアルタイム API は、推論エンドポイントにおいてはサポートされていません。専用のデプロイ URL を使用します。

次の方法で共有

Azure AI モデル推論エンドポイントを使用してモデルを実行する

エンドポイント

Azure AI モデル推論エンドポイントでのルーティング機能の使用

キーレス認証

制限事項

次のステップ

フィードバック

その他のリソース