Redigera

Dela via


Upgrade from GitHub Models to Azure AI model inference

If you want to develop a generative AI application, you can use GitHub Models to find and experiment with AI models for free. The playground and free API usage are rate limited by requests per minute, requests per day, tokens per request, and concurrent requests. If you get rate limited, you need to wait for the rate limit that you hit to reset before you can make more requests.

Once you're ready to bring your application to production, you can upgrade your experience by deploying an Azure AI Services resource in an Azure subscription and start using Azure AI model inference service. You don't need to change anything else in your code.

The following article explains how to get started from GitHub Models and deploy an Azure AI Services resource with Azure AI model inference.

Prerequisites

To complete this tutorial, you need:

  • A GitHub account with access to GitHub Models.
  • An Azure subscription. If you don't have one, you're prompted to create or update your Azure account to a pay as you go account when you're ready to deploy your model to production.

Upgrade to Azure AI model inference

The rate limits for the playground and free API usage are intended to help you experiment with models and develop your AI application. Once you're ready to bring your application to production, use a key and endpoint from a paid Azure account. You don't need to change anything else in your code.

To obtain the key and endpoint:

  1. Got to GitHub Models and select the model you're interested in.

  2. In the playground for your model, select Get API key.

  3. Select Get production key.

    An animation showing how to upgrade GitHub Models to get a production ready resource.

  4. If you don't have an Azure account, select Create my account and follow the steps to create one.

  5. If you have an Azure account, select Sign back in.

  6. If your existing account is a free account, you first have to upgrade to a Pay as you go plan. Once you upgrade, go back to the playground and select Get API key again, then sign in with your upgraded account.

  7. Once you've signed in to your Azure account, you're taken to Azure AI Studio > GitHub. It might take one or two minutes to load your initial model details in AI Studio.

  8. The page is loaded with your model's details. Select the Deploy button to deploy the model to your account.

  9. Once it's deployed, your model's API Key and endpoint are shown in the Overview. Use these values in your code to use the model in your production environment.

At this point, the model you selected is ready to consume.

Upgrade your code to use the new endpoint

Once your Azure AI Services resource is configured, you can start consuming it from your code. To consume the Azure AI Services resource, you need the endpoint URL and key, which are available in the Overview section:

Screenshot showing how to get the URL and key associated with the resource.

You can use any of the supported SDKs to get predictions out from the endpoint. The following SDKs are officially supported:

  • OpenAI SDK
  • Azure OpenAI SDK
  • Azure AI Inference SDK

See the supported languages and SDKs section for more details and examples. The following example shows how to use the Azure AI model inference SDK with the newly deployed model:

Install the package azure-ai-inference using your package manager, like pip:

pip install azure-ai-inference>=1.0.0b5

Warning

Azure AI Services resource requires the version azure-ai-inference>=1.0.0b5 for Python.

Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:

import os
from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential

model = ChatCompletionsClient(
    endpoint=os.environ["AZUREAI_ENDPOINT_URL"],
    credential=AzureKeyCredential(os.environ["AZUREAI_ENDPOINT_KEY"]),
)

Explore our samples and read the API reference documentation to get yourself started.

Generate your first chat completion:

from azure.ai.inference.models import SystemMessage, UserMessage

response = client.complete(
    messages=[
        SystemMessage(content="You are a helpful assistant."),
        UserMessage(content="Explain Riemann's conjecture in 1 paragraph"),
    ],
    model="mistral-large"
)

print(response.choices[0].message.content)

Use the parameter model="<deployment-name> to route your request to this deployment. Deployments work as an alias of a given model under certain configurations. See Routing concept page to learn how Azure AI Services route deployments.

Important

As opposite to GitHub Models where all the models are already configured, the Azure AI Services resource allows you to control which models are available in your endpoint and under which configuration. Add as many models as you plan to use before indicating them in the model parameter. Learn how to add more models to your resource.

Explore additional features

Azure AI model inference supports additional features not available in GitHub Models, including:

Got troubles?

See the FAQ section to explore more help.

Next steps