Introduction to Kubernetes compute target in Azure Machine Learning

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (current)

The Azure Machine Learning CLI and Python SDK v2 provide support for a Kubernetes compute target. You can enable an existing Azure Kubernetes Service (AKS) cluster or Azure Arc-enabled Kubernetes (Arc Kubernetes) cluster as a Kubernetes compute target. Use the compute in Machine Learning to train or deploy models.

Diagram illustrating how Azure Machine Learning connects to Kubernetes.

This article describes how you can use the Kubernetes compute target in Machine Learning, including usage scenarios, recommended best practices, and a comparison of the KubernetesCompute and legacy AksCompute targets.

How the Kubernetes compute target works

Azure Machine Learning Kubernetes compute supports two kinds of Kubernetes cluster.

Compute Location Description
AKS cluster Within Azure With your self-managed AKS cluster in Azure, you can gain security and controls to meet compliance requirement and flexibility to manage your team's machine learning workload.
Arc Kubernetes cluster Outside Azure With Arc Kubernetes cluster, you can train or deploy models in any on-premises or multicloud infrastructure, or the edge.

With a simple cluster extension deployment on AKS or Arc Kubernetes cluster, Kubernetes cluster is seamlessly supported in Machine Learning to run training or inference workload. It's easy to enable and use an existing Kubernetes cluster for Machine Learning workload with the following process:

Here are the primary responsibilities in this process:

  • The IT-operation team is responsible for Steps 1, 2, and 3. This team prepares an AKS or Arc Kubernetes cluster, deploys the Machine Learning cluster extension, and attaches the Kubernetes cluster to the Machine Learning workspace. In addition to these essential compute setup steps, the IT-operation team also uses familiar tools, such as the Azure CLI or kubectl, to complete the following tasks for the Data-science team:

    • Configure network and security options, such as outbound proxy server connection or Azure firewall, inference router (azureml-fe) setup, SSL/TLS termination, and virtual network setup.

    • Create and manage instance types for different machine learning workload scenarios and gain efficient compute resource utilization.

    • Troubleshoot workload issues related to Kubernetes cluster.

  • The Data-science team begins their tasks after the IT-operations team finishes compute setup and creation of the compute targets. This team discovers a list of available compute targets and instance types in the Machine Learning workspace. The compute resources can be used for training or inference workload. The Data-science team specifies the compute target name and instance type name by using their preferred tools or APIs. They can use the Azure Machine Learning CLI v2, Python SDK v2, or the Machine Learning studio UI.

Kubernetes usage scenarios

With Arc Kubernetes cluster, you can build, train, and deploy models in any on-premises and multicloud infrastructure by using Kubernetes. This strategy opens some new use patterns previously not possible in a cloud setting environment. The following table provides a summary of the new use patterns enabled when you work with Azure Machine Learning Kubernetes compute:

Usage pattern Location of data Goals and requirements Scenario configuration
Train model in cloud, deploy model on-premises Cloud Use cloud compute to support elastic compute needs or special hardware such as a GPU.

Model deployment must be on-premises for security, compliance, or latency requirements.
- Azure-managed compute in cloud
- Customer-managed Kubernetes on-premises
- Fully automated machine learning operations in hybrid mode, including training and model deployment steps that transition seamlessly between cloud and on-premises
- Repeatable, all assets properly tracked, model retrained as needed, deployment updated automatically after retraining
Train model on-premises and cloud, deploy to both cloud and on-premises Cloud Combine on-premises investments with cloud scalability.

Bring cloud and on-premises compute under single pane of glass.

Access single source of truth for data in cloud and replicate on-premises (lazily on usage or proactively).

Enable cloud compute primary usage when on-premises resources aren't available (in use or in maintenance) or don't meet specific hardware requirements (GPU).
- Azure-managed compute in cloud.
Customer-managed Kubernetes on-premises
- Fully automated machine learning operations in hybrid mode, including training and model deployment steps that transition seamlessly between cloud and on-premises
- Repeatable, all assets properly tracked, model retrained as needed, deployment updated automatically after retraining
Train model on-premises, deploy model in cloud On-premises Store data on-premises to meet data-residency requirements.

Deploy model in the cloud for global-service access or to enable compute elasticity for scale and throughput.
- Azure-managed compute in cloud
- Customer-managed Kubernetes on-premises
- Fully automated machine learning operations in hybrid mode, including training and model deployment steps that transition seamlessly between cloud and on-premises
- Repeatable, all assets properly tracked, model retrained as needed, deployment updated automatically after retraining
Bring your own AKS in Azure Cloud Gain more security and controls.

Establish all private IP machine learning to prevent data exfiltration.
- AKS cluster behind an Azure virtual network
- Private endpoints in the same virtual network for Azure Machine Learning workspace and associated resources
Fully automated machine learning operations
Full machine learning lifecycle on-premises On-premises Secure sensitive data or proprietary IP, such as machine learning models, code, and scripts. - Outbound proxy server connection on-premises
- Azure ExpressRoute and Azure Arc private link to Azure resources
- Customer-managed Kubernetes on-premises
- Fully automated machine learning operations

Limitations for Kubernetes compute target

A KubernetesCompute target in Azure Machine Learning workloads (training and model inference) has the following limitations:

  • The availability of Preview features in Azure Machine Learning isn't guaranteed.
  • Models (including the foundational model) from the Model Catalog and Registry aren't supported on Kubernetes online endpoints.
  • The process of creating a model inference deployment inside the cluster has a timeout limit of 20 minutes. This includes downloading the image, downloading the model, and initializing the user scripts.
  • Azure Machine Learning extension supports Kubernetes Baseline Pod Security Standard.
  • Training on Kubernetes compute doesn't support auto scale nodes.

This section summarizes recommended best practices for working with a Kubernetes compute.

Separation of responsibilities between the IT-operations team and Data-science team. As described earlier, managing your own compute and infrastructure for machine learning workload is a complex task. The best approach is to have the IT-operations team handle the task, so the Data-science team can focus on machine learning models for organizational efficiency.

Create and manage instance types for different machine learning workload scenarios. Each machine learning workload uses different amounts of compute resources, such as CPU/GPU and memory. Azure Machine Learning implements the instance type as a Kubernetes custom resource definition (CRD) with properties of nodeSelector and resource request/limit. With a carefully curated list of instance types, IT-operations can target machine learning workload on specific nodes and manage compute resource utilization efficiently.

Multiple Azure Machine Learning workspaces share the same Kubernetes cluster. You can attach a Kubernetes cluster multiple times to the same Machine Learning workspace or different workspaces. This process creates multiple compute targets in a single workspace or multiple workspaces. Because many customers organize data science projects around Machine Learning workspace, multiple data science projects can now share the same Kubernetes cluster. This approach significantly reduces machine learning infrastructure management overheads and enhances IT cost saving.

Team/project workload isolation using Kubernetes namespace. When you attach a Kubernetes cluster to a Machine Learning workspace, you can specify a Kubernetes namespace for the compute target. All workloads run by the compute target are placed under the specified namespace.

Comparison of KubernetesCompute and legacy AksCompute targets

With the Azure Machine Learning CLI/Python SDK v1, you can deploy models on AKS by using the legacy AksCompute target. Both the KubernetesCompute and AksCompute targets support AKS integration, but the support approach is different. The following table summarizes the key differences:

Capability AksCompute (legacy) KubernetesCompute
Use the CLI/SDK v1 Yes No
Use the CLI/SDK v2 No Yes
Set up training No Yes
Apply real-time inference Yes Yes
Apply batch inference No Yes
Access new features for real-time inference No new features development Active roadmap available

In consideration of these differences, and the overall Machine Learning evolution to use the CLI/SDK v2, the recommended approach is to use Kubernetes compute target (KubernetesCompute) for AKS model deployment.

For more information, explore the following articles:

Machine learning examples

Machine learning examples are available in the Azure Machine Learning (azureml-examples) repository on GitHub. In any example, replace the compute target name with your Kubernetes compute target, and run the sample.

Here are several options: