Deploy an AI model on Azure Kubernetes Service (AKS) with the AI toolchain operator (preview)

Article
11/19/2024

The AI toolchain operator (KAITO) is a managed add-on for AKS that simplifies the experience of running OSS AI models on your AKS clusters. The AI toolchain operator automatically provisions the necessary GPU nodes and sets up the associated inference server as an endpoint server to your AI models. Using this add-on reduces your onboarding time and enables you to focus on AI model usage and development rather than infrastructure setup.

This article shows you how to enable the AI toolchain operator add-on and deploy an AI model on AKS.

Important

AKS preview features are available on a self-service, opt-in basis. Previews are provided "as is" and "as available," and they're excluded from the service-level agreements and limited warranty. AKS previews are partially covered by customer support on a best-effort basis. As such, these features aren't meant for production use. For more information, see the following support articles:

Before you begin

This article assumes a basic understanding of Kubernetes concepts. For more information, see Kubernetes core concepts for AKS.
For all hosted model inference images and recommended infrastructure setup, see the KAITO GitHub repository.
The AI toolchain operator add-on currently supports KAITO version 0.1.0, please make a note of this in considering your choice of model from the KAITO model repository.

Prerequisites

If you don't have an Azure subscription, create a free account before you begin.
- If you have multiple Azure subscriptions, make sure you select the correct subscription in which the resources will be created and charged using the az account set command.
  
  Note
  
  The subscription you use must have GPU VM quota for deployment of the model that you choose.
Azure CLI version 2.47.0 or later installed and configured. Run az --version to find the version. If you need to install or upgrade, see Install Azure CLI.
The Kubernetes command-line client, kubectl, installed and configured. For more information, see Install kubectl.
Install the Azure CLI AKS preview extension.
Register the AI toolchain operator add-on feature flag.

Install the Azure CLI preview extension

Install the Azure CLI preview extension using the az extension add command.
```
az extension add --name aks-preview
```
Update the extension to make sure you have the latest version using the az extension update command.
```
az extension update --name aks-preview
```

Register the AI toolchain operator add-on feature flag

Register the AIToolchainOperatorPreview feature flag using the az feature register command.
```
az feature register --namespace "Microsoft.ContainerService" --name "AIToolchainOperatorPreview"
```
It takes a few minutes for the registration to complete.

Verify the registration using the az feature show command.

az feature show --namespace "Microsoft.ContainerService" --name "AIToolchainOperatorPreview"

Export environment variables

To simplify the configuration steps in this article, you can define environment variables using the following commands. Make sure to replace the placeholder values with your own.
```
export AZURE_SUBSCRIPTION_ID="mySubscriptionID"
export AZURE_RESOURCE_GROUP="myResourceGroup"
export AZURE_LOCATION="myLocation"
export CLUSTER_NAME="myClusterName"
```

Enable the AI toolchain operator add-on on an AKS cluster

The following sections describe how to create an AKS cluster with the AI toolchain operator add-on enabled and deploy a default hosted AI model.

Create an AKS cluster with the AI toolchain operator add-on enabled

Create an Azure resource group using the az group create command.

az group create --name ${AZURE_RESOURCE_GROUP} --location ${AZURE_LOCATION}

Create an AKS cluster with the AI toolchain operator add-on enabled using the az aks create command with the --enable-ai-toolchain-operator and --enable-oidc-issuer flags.
```
az aks create --location ${AZURE_LOCATION} \
    --resource-group ${AZURE_RESOURCE_GROUP} \
    --name ${CLUSTER_NAME} \
    --enable-oidc-issuer \
    --enable-ai-toolchain-operator \
    --generate-ssh-keys
```
Note

AKS creates a managed identity once you enable the AI toolchain operator add-on. The managed identity is used to create GPU node pools in the managed AKS cluster. Proper permissions need to be set for it manually following the steps introduced in the following sections.

On an existing AKS cluster, you can enable the AI toolchain operator add-on using the az aks update command.

az aks update --name ${CLUSTER_NAME} \
        --resource-group ${AZURE_RESOURCE_GROUP} \
        --enable-oidc-issuer \
        --enable-ai-toolchain-operator

Connect to your cluster

Configure kubectl to connect to your cluster using the az aks get-credentials command.

az aks get-credentials --resource-group ${AZURE_RESOURCE_GROUP} --name ${CLUSTER_NAME}

Verify the connection to your cluster using the kubectl get command.
```
kubectl get nodes
```

Export environment variables

Export environment variables for the MC resource group, principal ID identity, and KAITO identity using the following commands:

export MC_RESOURCE_GROUP=$(az aks show --resource-group ${AZURE_RESOURCE_GROUP} \
    --name ${CLUSTER_NAME} \
    --query nodeResourceGroup \
    -o tsv)
export PRINCIPAL_ID=$(az identity show --name "ai-toolchain-operator-${CLUSTER_NAME}" \
    --resource-group "${MC_RESOURCE_GROUP}" \
    --query 'principalId' \
    -o tsv)
export KAITO_IDENTITY_NAME="ai-toolchain-operator-${CLUSTER_NAME}"

Get the AKS OpenID Connect (OIDC) Issuer

Get the AKS OIDC Issuer URL and export it as an environment variable:

export AKS_OIDC_ISSUER=$(az aks show --resource-group "${AZURE_RESOURCE_GROUP}" \
    --name "${CLUSTER_NAME}" \
    --query "oidcIssuerProfile.issuerUrl" \
    -o tsv)

Create role assignment for the service principal

Create a new role assignment for the service principal using the az role assignment create command.

az role assignment create --role "Contributor" \
    --assignee "${PRINCIPAL_ID}" \
    --scope "/subscriptions/${AZURE_SUBSCRIPTION_ID}/resourcegroups/${AZURE_RESOURCE_GROUP}"

Establish a federated identity credential

Create the federated identity credential between the managed identity, AKS OIDC issuer, and subject using the az identity federated-credential create command.
```
az identity federated-credential create --name "kaito-federated-identity" \
    --identity-name "${KAITO_IDENTITY_NAME}" \
    -g "${MC_RESOURCE_GROUP}" \
    --issuer "${AKS_OIDC_ISSUER}" \
    --subject system:serviceaccount:"kube-system:kaito-gpu-provisioner" \
    --audience api://AzureADTokenExchange
```
Note

Before this step is complete, the gpu-provisioner controller pod will remain in a crash loop status. Once the federated credential is created, the gpu-provisioner controller pod will reach a running state and you will be able to verify that the deployment is running in the following steps.

Verify that your deployment is running

Restart the KAITO GPU provisioner deployment on your pods using the kubectl rollout restart command:
```
kubectl rollout restart deployment/kaito-gpu-provisioner -n kube-system
```
Verify that the deployment is running using the kubectl get command:
```
kubectl get deployment -n kube-system | grep kaito
```

Deploy a default hosted AI model

Deploy the Falcon 7B-instruct model from the KAITO model repository using the kubectl apply command.

kubectl apply -f https://raw.githubusercontent.com/Azure/kaito/main/examples/inference/kaito_workspace_falcon_7b-instruct.yaml

Track the live resource changes in your workspace using the kubectl get command.
```
kubectl get workspace workspace-falcon-7b-instruct -w
```
Note

As you track the live resource changes in your workspace, note that machine readiness can take up to 10 minutes, and workspace readiness up to 20 minutes.

Check your service and get the service IP address using the kubectl get svc command.

export SERVICE_IP=$(kubectl get svc workspace-falcon-7b-instruct -o jsonpath='{.spec.clusterIP}')

Run the Falcon 7B-instruct model with a sample input of your choice using the following curl command:

kubectl run -it --rm --restart=Never curl --image=curlimages/curl -- curl -X POST http://$SERVICE_IP/chat -H "accept: application/json" -H "Content-Type: application/json" -d "{\"prompt\":\"YOUR QUESTION HERE\"}"

Clean up resources

If you no longer need these resources, you can delete them to avoid incurring extra Azure compute charges.

Delete the KAITO workspace and its associated resources using the kubectl delete workspace command.
```
kubectl delete workspace workspace-falcon-7b-instruct
```

Next steps

For more model deployment options, see the upstream KAITO GitHub repository.

Explore MLOps for AI and machine learning workflows on AKS
Learn about MLOps best practices for your AI pipelines on AKS
Learn how to deploy GPU workloads on AKS

Share via

Deploy an AI model on Azure Kubernetes Service (AKS) with the AI toolchain operator (preview)

Before you begin

Prerequisites

Install the Azure CLI preview extension

Register the AI toolchain operator add-on feature flag

Export environment variables

Enable the AI toolchain operator add-on on an AKS cluster

Create an AKS cluster with the AI toolchain operator add-on enabled

Connect to your cluster

Export environment variables

Get the AKS OpenID Connect (OIDC) Issuer

Create role assignment for the service principal

Establish a federated identity credential

Verify that your deployment is running

Deploy a default hosted AI model

Clean up resources

Next steps

Additional resources