Azure AI 모델 유추 API | Azure Machine Learning

아티클
09/04/2024

Azure AI 모델 유추는 기본 모델에 대한 일반적인 기능 집합을 노출하고 개발자가 균일하고 일관된 방식으로 다양한 모델 집합의 예측을 사용하는 데 사용할 수 있는 API입니다. 개발자는 사용 중인 기본 코드를 변경하지 않고도 Azure AI 스튜디오에 배포된 다양한 모델과 대화할 수 있습니다.

이점

언어 모델과 같은 기본 모델은 실제로 최근 몇 년 동안 놀라운 진전을 이루었습니다. 이러한 발전은 자연어 처리, Computer Vision 등 다양한 분야에 혁명을 일으켰으며 챗봇, 가상 도우미, 언어 번역 서비스와 같은 애플리케이션을 가능하게 했습니다.

기본 모델은 특정 영역에서 탁월하지만 통일된 기능 집합이 부족합니다. 일부 모델은 특정 작업에서 더 뛰어나며 동일한 작업에서도 일부 모델은 다른 방식으로 문제에 접근할 수 있습니다. 개발자는 적절한 작업에 적합한 모델을 사용하여 다음과 같은 다양성을 누릴 수 있습니다.

특정 다운스트림 작업의 성능을 개선합니다.
더 간단한 작업을 위해 보다 효율적인 모델을 사용합니다.
특정 작업에서 더 빠르게 실행할 수 있는 더 작은 모델을 사용합니다.
여러 모델을 구성하여 지능형 환경을 개발합니다.

기본 모델을 사용하는 통일된 방법을 사용하면 개발자는 이식성을 희생하거나 기본 코드를 변경하지 않고도 이러한 모든 이점을 실현할 수 있습니다.

가용성

Azure AI 모델 유추 API는 다음 모델에서 사용할 수 있습니다.

서버리스 API 엔드포인트에 배포된 모델:

Cohere Embed V3 모델 제품군
Cohere Command R 모델 제품군
Meta Llama 2 채팅 모델 제품군
Meta Llama 3 지침 모델 제품군
Mistral-Small
Mistral-Large
Jais 모델 제품군
Jamba 모델 제품군
Phi-3 모델 제품군

관리형 유추에 배포된 모델:

Meta Llama 3 지침 모델 제품군
Phi-3 모델 제품군
Mixtral 모델 제품군

API는 Azure OpenAI 모델 배포와 호환됩니다.

참고 항목

Azure AI 모델 유추 API는 2024년 6월 24일 이후에 배포된 모델에 대한 관리 유추(관리형 온라인 엔드포인트)에서 사용할 수 있습니다. API를 활용하려면 해당 날짜 이전에 모델이 배포된 경우 엔드포인트를 다시 배포합니다.

기능

다음 섹션에서는 API가 제공하는 일부 기능을 설명합니다. API의 전체 사양을 보려면 참조 섹션을 확인합니다.

형식

API는 개발자가 다음 양식에 대한 예측을 사용할 수 있는 방법을 나타냅니다.

정보 가져오기: 엔드포인트에 배포된 모델에 대한 정보를 반환합니다.
텍스트 포함: 입력 텍스트를 나타내는 포함 벡터를 만듭니다.
텍스트 완료: 제공된 프롬프트 및 매개 변수에 대한 완료를 만듭니다.
채팅 완료: 특정 채팅 대화에 대한 모델 응답을 만듭니다.
이미지 포함: 입력 텍스트와 이미지를 나타내는 포함 벡터를 만듭니다.

유추 SDK 지원

원하는 언어로 간소화된 유추 클라이언트를 사용하여 Azure AI 모델 유추 API를 실행하는 모델의 예측을 사용할 수 있습니다.

pip와 같은 패키지 관리 시스템을 사용하여 azure-ai-inference 패키지를 설치합니다.

pip install azure-ai-inference

그런 다음 패키지를 사용하여 모델을 이용할 수 있습니다. 다음 예에서는 채팅 완성을 이용하는 클라이언트를 만드는 방법을 보여 줍니다.

import os
from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential

client = ChatCompletionsClient(
    endpoint=os.environ["AZUREAI_ENDPOINT_URL"],
    credential=AzureKeyCredential(os.environ["AZUREAI_ENDPOINT_KEY"]),
)

Entra ID를 지원하는 엔드포인트를 사용하는 경우 다음과 같이 클라이언트를 만들 수 있습니다.

import os
from azure.ai.inference import ChatCompletionsClient
from azure.identity import AzureDefaultCredential

client = ChatCompletionsClient(
    endpoint=os.environ["AZUREAI_ENDPOINT_URL"],
    credential=AzureDefaultCredential(),
)

시작하려면 샘플을 살펴보고 API 참조 설명서를 참조하세요.

npm을 사용하여 @azure-rest/ai-inference 패키지를 설치합니다.

npm install @azure-rest/ai-inference

그런 다음 패키지를 사용하여 모델을 이용할 수 있습니다. 다음 예에서는 채팅 완성을 이용하는 클라이언트를 만드는 방법을 보여 줍니다.

import ModelClient from "@azure-rest/ai-inference";
import { isUnexpected } from "@azure-rest/ai-inference";
import { AzureKeyCredential } from "@azure/core-auth";

const client = new ModelClient(
    process.env.AZUREAI_ENDPOINT_URL, 
    new AzureKeyCredential(process.env.AZUREAI_ENDPOINT_KEY)
);

Microsoft Entra ID를 지원하는 엔드포인트의 경우 다음과 같이 클라이언트를 만들 수 있습니다.

import ModelClient from "@azure-rest/ai-inference";
import { isUnexpected } from "@azure-rest/ai-inference";
import { AzureDefaultCredential } from "@azure/identity";

const client = new ModelClient(
    process.env.AZUREAI_ENDPOINT_URL, 
    new AzureDefaultCredential()
);

시작하려면 샘플을 살펴보고 API 참조 설명서를 참조하세요.

다음 명령을 사용하여 Azure AI 유추 라이브러리를 설치합니다.

dotnet add package Azure.AI.Inference --prerelease

Microsoft Entra ID(이전 Azure Active Directory)를 지원하는 엔드포인트의 경우 Azure.Identity 패키지를 설치합니다.

dotnet add package Azure.Identity

다음 네임스페이스를 가져옵니다.

using Azure;
using Azure.Identity;
using Azure.AI.Inference;

그런 다음 패키지를 사용하여 모델을 이용할 수 있습니다. 다음 예에서는 채팅 완성을 이용하는 클라이언트를 만드는 방법을 보여 줍니다.

ChatCompletionsClient client = new ChatCompletionsClient(
    new Uri(Environment.GetEnvironmentVariable("AZURE_INFERENCE_ENDPOINT")),
    new AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_INFERENCE_CREDENTIAL"))
);

Microsoft Entra ID(이전 Azure Active Directory)를 지원하는 엔드포인트의 경우:

ChatCompletionsClient client = new ChatCompletionsClient(
    new Uri(Environment.GetEnvironmentVariable("AZURE_INFERENCE_ENDPOINT")),
    new DefaultAzureCredential(includeInteractiveCredentials: true)
);

시작하려면 샘플을 살펴보고 API 참조 설명서를 참조하세요.

참조 섹션을 사용하여 API 디자인 및 사용할 수 있는 매개 변수를 살펴봅니다. 예를 들어 채팅 완성에 대한 참조 섹션에서는 /chat/completions 경로를 사용하여 채팅 형식 지침에 따라 예측을 생성하는 방법을 자세히 설명합니다.

요청

POST /chat/completions?api-version=2024-04-01-preview
Authorization: Bearer <bearer-token>
Content-Type: application/json

확장성

Azure AI 모델 유추 API는 모델이 구독할 수 있는 형식 및 매개 변수 집합을 지정합니다. 그러나 일부 모델에는 API가 나타내는 추가 기능이 있을 수 있습니다. 이러한 경우 API를 사용하면 개발자가 이를 페이로드의 추가 매개 변수로 전달할 수 있습니다.

헤더 extra-parameters: pass-through를 설정하면 API는 알 수 없는 매개 변수를 기본 모델에 직접 전달하려고 시도합니다. 모델이 해당 매개 변수를 처리할 수 있으면 요청이 완료됩니다.

다음 예에서는 Azure AI 모델 유추 API에 지정되지 않은 Mistral-Large에서 지원하는 매개 변수 safe_prompt를 전달하는 요청을 보여 줍니다.

response = client.complete(
    messages=[
        SystemMessage(content="You are a helpful assistant."),
        UserMessage(content="How many languages are in the world?"),
    ],
    model_extras={
        "safe_mode": True
    }
)

print(response.choices[0].message.content)

팁

Azure AI 추론 SDK를 사용하는 경우 model_extras을(를) 사용하면 자동으로 extra-parameters: pass-through(으)로 요청이 구성됩니다.

var messages = [
    { role: "system", content: "You are a helpful assistant" },
    { role: "user", content: "How many languages are in the world?" },
];

var response = await client.path("/chat/completions").post({
    "extra-parameters": "pass-through",
    body: {
        messages: messages,
        safe_mode: true
    }
});

console.log(response.choices[0].message.content)

requestOptions = new ChatCompletionsOptions()
{
    Messages = {
        new ChatRequestSystemMessage("You are a helpful assistant."),
        new ChatRequestUserMessage("How many languages are in the world?")
    },
    AdditionalProperties = { { "logprobs", BinaryData.FromString("true") } },
};

response = client.Complete(requestOptions, extraParams: ExtraParameters.PassThrough);
Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}");

Request

POST /chat/completions?api-version=2024-04-01-preview
Authorization: Bearer <bearer-token>
Content-Type: application/json
extra-parameters: pass-through

{
    "messages": [
    {
        "role": "system",
        "content": "You are a helpful assistant"
    },
    {
        "role": "user",
        "content": "Explain Riemann's conjecture in 1 paragraph"
    }
    ],
    "temperature": 0,
    "top_p": 1,
    "response_format": { "type": "text" },
    "safe_prompt": true
}

참고 항목

extra-parameters의 기본값은 페이로드에 추가 매개 변수가 표시되면 오류를 반환하는 error입니다. 또는 요청에서 알 수 없는 매개 변수를 삭제하도록 extra-parameters: drop을 설정할 수 있습니다. 모델이 지원하지 않는다는 것을 알고 있지만 어쨌든 요청이 완료되기를 원하는 추가 매개 변수가 포함된 요청을 보내는 경우 이 기능을 사용합니다. 이에 대한 일반적인 예는 seed 매개 변수를 나타내는 것입니다.

서로 다른 기능 집합을 갖춘 모델

Azure AI 모델 유추 API는 일반적인 기능 집합을 나타내지만 각 모델은 이를 구현할지 여부를 결정할 수 있습니다. 모델이 특정 매개 변수를 지원할 수 없는 경우 특정 오류가 반환됩니다.

다음 예에서는 매개 변수 reponse_format을 나타내고 JSON 형식으로 회신을 요청하는 채팅 완료 요청에 대한 응답을 보여 줍니다. 이 예에서는 모델이 이러한 기능을 지원하지 않으므로 오류 422가 사용자에게 반환됩니다.

import json
from azure.ai.inference.models import SystemMessage, UserMessage, ChatCompletionsResponseFormatJSON
from azure.core.exceptions import HttpResponseError

try:
    response = client.complete(
        messages=[
            SystemMessage(content="You are a helpful assistant."),
            UserMessage(content="How many languages are in the world?"),
        ],
        response_format=ChatCompletionsResponseFormatJSON()
    )
except HttpResponseError as ex:
    if ex.status_code == 422:
        response = json.loads(ex.response._content.decode('utf-8'))
        if isinstance(response, dict) and "detail" in response:
            for offending in response["detail"]:
                param = ".".join(offending["loc"])
                value = offending["input"]
                print(
                    f"Looks like the model doesn't support the parameter '{param}' with value '{value}'"
                )
    else:
        raise ex

try {
    var messages = [
        { role: "system", content: "You are a helpful assistant" },
        { role: "user", content: "How many languages are in the world?" },
    ];
    
    var response = await client.path("/chat/completions").post({
        body: {
            messages: messages,
            response_format: { type: "json_object" }
        }
    });
}
catch (error) {
    if (error.status_code == 422) {
        var response = JSON.parse(error.response._content)
        if (response.detail) {
            for (const offending of response.detail) {
                var param = offending.loc.join(".")
                var value = offending.input
                console.log(`Looks like the model doesn't support the parameter '${param}' with value '${value}'`)
            }
        }
    }
    else 
    {
        throw error
    }
}

try
{
    requestOptions = new ChatCompletionsOptions()
    {
        Messages = {
            new ChatRequestSystemMessage("You are a helpful assistant"),
            new ChatRequestUserMessage("How many languages are in the world?"),
        },
        ResponseFormat = new ChatCompletionsResponseFormatJSON()
    };

    response = client.Complete(requestOptions);
    Console.WriteLine(response.Value.Choices[0].Message.Content);
}
catch (RequestFailedException ex)
{
    if (ex.Status == 422)
    {
        Console.WriteLine($"Looks like the model doesn't support a parameter: {ex.Message}");
    }
    else
    {
        throw;
    }
}

Request

POST /chat/completions?api-version=2024-04-01-preview
Authorization: Bearer <bearer-token>
Content-Type: application/json

{
    "messages": [
    {
        "role": "system",
        "content": "You are a helpful assistant"
    },
    {
        "role": "user",
        "content": "Explain Riemann's conjecture in 1 paragraph"
    }
    ],
    "temperature": 0,
    "top_p": 1,
    "response_format": { "type": "json_object" },
}

Response

{
    "status": 422,
    "code": "parameter_not_supported",
    "detail": {
        "loc": [ "body", "response_format" ],
        "input": "json_object"
    },
    "message": "One of the parameters contain invalid values."
}

팁

detail.loc 속성을 검사하여 잘못된 매개 변수의 위치를 파악하고 detail.input 속성을 검사하여 요청에 전달된 값을 확인할 수 있습니다.

콘텐츠 안전

Azure AI 모델 유추 API는 Azure AI 콘텐츠 보안을 지원합니다. Azure AI 콘텐츠 보안이 설정된 배포를 사용하는 경우 입력 및 출력은 유해한 콘텐츠의 출력을 검색하고 방지하기 위한 분류 모델 앙상블을 통과합니다. 콘텐츠 필터링(미리 보기) 시스템은 입력 프롬프트와 출력 완성 모두에서 잠재적으로 유해한 콘텐츠의 특정 범주를 검색하고 조치를 취합니다.

다음 예에서는 콘텐츠 보안을 트리거한 채팅 완료 요청에 대한 응답을 보여 줍니다.

from azure.ai.inference.models import AssistantMessage, UserMessage, SystemMessage
from azure.core.exceptions import HttpResponseError

try:
    response = client.complete(
        messages=[
            SystemMessage(content="You are an AI assistant that helps people find information."),
            UserMessage(content="Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills."),
        ]
    )

    print(response.choices[0].message.content)

except HttpResponseError as ex:
    if ex.status_code == 400:
        response = json.loads(ex.response._content.decode('utf-8'))
        if isinstance(response, dict) and "error" in response:
            print(f"Your request triggered an {response['error']['code']} error:\n\t {response['error']['message']}")
        else:
            raise ex
    else:
        raise ex

try {
    var messages = [
        { role: "system", content: "You are an AI assistant that helps people find information." },
        { role: "user", content: "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills." },
    ]

    var response = await client.path("/chat/completions").post({
        body: {
            messages: messages,
        }
    });
    
    console.log(response.body.choices[0].message.content)
}
catch (error) {
    if (error.status_code == 400) {
        var response = JSON.parse(error.response._content)
        if (response.error) {
            console.log(`Your request triggered an ${response.error.code} error:\n\t ${response.error.message}`)
        }
        else
        {
            throw error
        }
    }
}

try
{
    requestOptions = new ChatCompletionsOptions()
    {
        Messages = {
            new ChatRequestSystemMessage("You are an AI assistant that helps people find information."),
            new ChatRequestUserMessage(
                "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills."
            ),
        },
    };

    response = client.Complete(requestOptions);
    Console.WriteLine(response.Value.Choices[0].Message.Content);
}
catch (RequestFailedException ex)
{
    if (ex.ErrorCode == "content_filter")
    {
        Console.WriteLine($"Your query has trigger Azure Content Safeaty: {ex.Message}");
    }
    else
    {
        throw;
    }
}

Request

POST /chat/completions?api-version=2024-04-01-preview
Authorization: Bearer <bearer-token>
Content-Type: application/json

{
    "messages": [
    {
        "role": "system",
        "content": "You are a helpful assistant"
    },
    {
        "role": "user",
        "content": "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills."
    }
    ],
    "temperature": 0,
    "top_p": 1,
}

Response

{
    "status": 400,
    "code": "content_filter",
    "message": "The response was filtered",
    "param": "messages",
    "type": null
}

시작

Azure AI 모델 유추 API는 현재 서버리스 API 엔드포인트 및 관리형 온라인 엔드포인트로 배포된 특정 모델에서 지원됩니다. 지원되는 모델을 배포하고 정확히 동일한 코드를 사용하여 예측을 사용합니다.

클라이언트 라이브러리 azure-ai-inference는 Azure AI 스튜디오 및 Azure Machine Learning 스튜디오에서 배포한 AI 모델에 대해 채팅 완료를 포함한 유추를 수행합니다. 서버리스 API 엔드포인트와 관리 컴퓨팅 엔드포인트(이전의 관리형 온라인 엔드포인트)를 지원합니다.

시작하려면 샘플을 살펴보고 API 참조 설명서를 참조하세요.

클라이언트 라이브러리 @azure-rest/ai-inference는 Azure AI 스튜디오 및 Azure Machine Learning 스튜디오에서 배포한 AI 모델에 대해 채팅 완료를 포함한 유추를 수행합니다. 서버리스 API 엔드포인트와 관리 컴퓨팅 엔드포인트(이전의 관리형 온라인 엔드포인트)를 지원합니다.

시작하려면 샘플을 살펴보고 API 참조 설명서를 참조하세요.

클라이언트 라이브러리 Azure.Ai.Inference는 Azure AI 스튜디오 및 Azure Machine Learning 스튜디오에서 배포한 AI 모델에 대해 채팅 완료를 포함한 유추를 수행합니다. 서버리스 API 엔드포인트와 관리 컴퓨팅 엔드포인트(이전의 관리형 온라인 엔드포인트)를 지원합니다.

시작하려면 샘플을 살펴보고 API 참조 설명서를 참조하세요.

다음을 통해 공유

Azure AI 모델 유추 API | Azure Machine Learning

이점

가용성

기능

형식

유추 SDK 지원

확장성

서로 다른 기능 집합을 갖춘 모델

콘텐츠 안전

시작

피드백

추가 리소스