Consume serverless API endpoints from a different Azure AI Foundry project or hub

Статья
02/28/2025

In this article, you learn how to configure an existing serverless API endpoint in a different project or hub than the one that was used to create the deployment.

Important

Models that are in preview are marked as preview on their model cards in the model catalog.

Certain models in the model catalog can be deployed as serverless APIs. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. This deployment option doesn't require quota from your subscription.

The need to consume a serverless API endpoint in a different project or hub than the one that was used to create the deployment might arise in situations such as these:

You want to centralize your deployments in a given project or hub and consume them from different projects or hubs in your organization.
You need to deploy a model in a hub in a particular Azure region where serverless deployment for that model is available. However, you need to consume it from another region, where serverless deployment isn't available for the particular models.

Prerequisites

An Azure subscription with a valid payment method. Free or trial Azure subscriptions won't work. If you don't have an Azure subscription, create a paid Azure account to begin.
An Azure AI Foundry hub.
An Azure AI Foundry project.
A model deployed to a serverless API endpoint. This article assumes that you previously deployed the Meta-Llama-3-8B-Instruct model. To learn how to deploy this model as a serverless API, see Deploy models as serverless APIs.
You need to install the following software to work with Azure AI Foundry:
You can use any compatible web browser to navigate Azure AI Foundry.
The Azure CLI and the ml extension for Azure Machine Learning.
```
az extension add -n ml
```
If you already have the extension installed, ensure the latest version is installed.
```
az extension update -n ml
```
Once the extension is installed, configure it:
```
az account set --subscription <subscription>
az configure --defaults workspace=<project-name> group=<resource-group> location=<location>
```
Install the Azure Machine Learning SDK for Python.
```
pip install -U azure-ai-ml
```
Once installed, import necessary namespaces:
```
from azure.ai.ml import MLClient
from azure.identity import InteractiveBrowserCredential
from azure.ai.ml.entities import ServerlessEndpoint, ServerlessConnection
```

Create a serverless API endpoint connection

Follow these steps to create a connection:

Connect to the project or hub where the endpoint is deployed:

Go to Azure AI Foundry and navigate to the project where the endpoint you want to connect to is deployed.

Configure the CLI to point to the project:

az account set --subscription <subscription>
az configure --defaults workspace=<project-name> group=<resource-group> location=<location>

Create a client connected to your project:

client = MLClient(
    credential=InteractiveBrowserCredential(tenant_id="<tenant-id>"),
    subscription_id="<subscription-id>",
    resource_group_name="<resource-group>",
    workspace_name="<project-name>",
)

Get the endpoint's URL and credentials for the endpoint you want to connect to. In this example, you get the details for an endpoint name meta-llama3-8b-qwerty.
1. From the left sidebar of your project in Azure AI Foundry portal, go to My assets > Models + endpoints to see the list of deployments in the project.
2. Select the deployment you want to connect to.
3. Copy the values for Target URI and Key.
```
az ml serverless-endpoint get-credentials -n meta-llama3-8b-qwerty
```
```
endpoint_name = "meta-llama3-8b-qwerty"
endpoint_keys = client.serverless_endpoints.get_keys(endpoint_name)
print(endpoint_keys.primary_key)
print(endpoint_keys.secondary_key)
```

Now, connect to the project or hub where you want to create the connection:

Go to the project where the connection needs to be created to.

Configure the CLI to point to the project:

az account set --subscription <subscription>
az configure --defaults workspace=<project-name> group=<resource-group> location=<location>

Create a client connected to your project:

client = MLClient(
    credential=InteractiveBrowserCredential(tenant_id="<tenant-id>"),
    subscription_id="<subscription-id>",
    resource_group_name="<resource-group>",
    workspace_name="<project-name>",
)

Create the connection in the project:
1. From your project in Azure AI Foundry portal, go to the bottom part of the left sidebar and select Management center.
2. From the left sidebar of the management center, select Connected resources.
3. Select New connection.
4. Select Serverless Model.
5. For the Target URI, paste the value you copied previously.
6. For the Key, paste the value you copied previously.
7. Give the connection a name, in this case meta-llama3-8b-connection.
8. Select Add connection.
Create a connection definition:

connection.yml
```
name: meta-llama3-8b-connection
type: serverless
endpoint: https://meta-llama3-8b-qwerty-serverless.inference.ai.azure.com
api_key: 1234567890qwertyuiop
```
```
az ml connection create -f connection.yml
```
```
client.connections.create_or_update(ServerlessConnection(
    name="meta-llama3-8b-connection",
    endpoint="https://meta-llama3-8b-qwerty-serverless.inference.ai.azure.com",
    api_key="1234567890qwertyuiop"
))
```
At this point, the connection is available for consumption.
To validate that the connection is working:
1. Return to your project in Azure AI Foundry portal.
2. From the left sidebar of your project, go to Build and customize > Prompt flow.
3. Select Create to create a new flow.
4. Select Create in the Chat flow box.
5. Give your Prompt flow a name and select Create.
6. Select the chat node from the graph to go to the chat section.
7. For Connection, open the dropdown list to select the connection you just created, in this case meta-llama3-8b-connection.
8. Select Start compute session from the top navigation bar, to start a prompt flow automatic runtime.
9. Select the Chat option. You can now send messages and get responses.

Поделиться через

Consume serverless API endpoints from a different Azure AI Foundry project or hub

Prerequisites

Create a serverless API endpoint connection

Обратная связь

Дополнительные ресурсы

Поделиться через

Consume serverless API endpoints from a different Azure AI Foundry project or hub

Prerequisites

Create a serverless API endpoint connection

Related content

Обратная связь

Дополнительные ресурсы