Use node taints in an AKS enabled by Azure Arc cluster
Applies to: Azure Local, version 23H2
This article describes how to use node taints in an AKS cluster.
Overview
The AKS scheduling mechanism is responsible for placing pods onto nodes and is based on the upstream Kubernetes scheduler, kube-scheduler. You can constrain a pod to run on particular nodes by instructing the node to reject a set of pods using node taints, which interact with the AKS scheduler.
Node taints work by marking a node so that the scheduler avoids placing certain pods on the marked nodes. You can place tolerations on a pod to allow the scheduler to schedule that pod on a node with a matching taint. Taints and tolerations work together to help you control how the scheduler places pods onto nodes. For more information, see example use cases of taints and tolerations.
Taints are key-value pairs with an effect. There are three values for the effect field when using node taints: NoExecute
, NoSchedule
, and PreferNoSchedule
.
NoExecute
: Pods already running on the node are immediately evicted if they don't have a matching toleration. If a pod has a matching toleration, it might be evicted iftolerationSeconds
are specified.NoSchedule
: Only pods with a matching toleration are placed on this node. Existing pods aren't evicted.PreferNoSchedule
: The scheduler avoids placing any pods that don't have a matching toleration.
Before you begin
- This article assumes you have an existing AKS cluster. If you need an AKS cluster, you can create one using Azure CLI, Azure PowerShell, or the Azure portal.
- When you create a node pool, you can add taints to it. When you add a taint, all nodes within that node pool also get that taint.
Important
You should add taints or labels to nodes for the entire node pool using az aksarc nodepool
. We don't recommend using kubectl
to apply taints or labels to individual nodes in a node pool.
Set node pool taints
Create a node pool with a taint using the az aksarc nodepool add
command. Specify the name taintnp
and use the --node-taints
parameter to specify sku=gpu:NoSchedule
for the taint:
az aksarc nodepool add \
--resource-group myResourceGroup \
--cluster-name myAKSCluster \
--name taintnp \
--node-count 1 \
--node-taints sku=gpu:NoSchedule \
--no-wait
Check the status of the node pool using the az aksarc nodepool list
command:
az aksarc nodepool list -g myResourceGroup --cluster-name myAKSCluster
The following example output shows that the taintnp
node pool creates nodes with the specified nodeTaints
:
[
{
...
"count": 1,
...
"name": "taintnp",
...
"provisioningState": "Succeeded",
...
"nodeTaints": [
"sku=gpu:NoSchedule"
],
...
},
...
]
The taint information is visible in Kubernetes for handling scheduling rules for nodes. The Kubernetes scheduler can use taints and tolerations to restrict which workloads can run on nodes.
- A taint is applied to a node that indicates only specific pods can be scheduled on them.
- A toleration is then applied to a pod that allows them to "tolerate" a node's taint.
Set node pool tolerations
In the previous step, you applied the sku=gpu:NoSchedule
taint when you created the node pool. The following example YAML manifest uses a toleration to allow the Kubernetes scheduler to run an NGINX pod on a node in that node pool:
Create a file named nginx-toleration.yaml and copy/paste the following example YAML:
apiVersion: v1
kind: Pod
metadata:
name: mypod
spec:
containers:
- image: mcr.microsoft.com/oss/nginx/nginx:1.15.9-alpine
name: mypod
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 1
memory: 2G
tolerations:
- key: "sku"
operator: "Equal"
value: "gpu"
effect: "NoSchedule"
Schedule the pod using the kubectl apply
command:
kubectl apply -f nginx-toleration.yaml
It takes a few seconds to schedule the pod and pull the NGINX image.
Check the status using the kubectl describe pod
command:
kubectl describe pod mypod
The following condensed example output shows that the sku=gpu:NoSchedule
toleration is applied. In the Events section, the scheduler assigned the pod to the moc-lbeof1gn6x3
node:
[...]
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
sku=gpu:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 54s default-scheduler Successfully assigned default/mypod to moc-lbeof1gn6x3
Normal Pulling 53s kubelet Pulling image "mcr.microsoft.com/oss/nginx/nginx:1.15.9-alpine"
Normal Pulled 48s kubelet Successfully pulled image "mcr.microsoft.com/oss/nginx/nginx:1.15.9-alpine" in 3.025148695s (3.025157609s including waiting)
Normal Created 48s kubelet Created container
Normal Started 48s kubelet Started container
Only pods that have this toleration applied can be scheduled on nodes in taintnp
. Any other pods are scheduled in the nodepool1 node pool. If you create more node pools, you can use taints and tolerations to limit what pods can be scheduled on those node resources.
Update a cluster node pool to add a node taint
Update a cluster to add a node taint using the az aksarc update
command and the --node-taints
parameter to specify sku=gpu:NoSchedule
for the taint. All existing taints are replaced with the new values. The old taints are deleted:
az aksarc update -g myResourceGroup --cluster-name myAKSCluster --name taintnp --node-taints "sku=gpu:NoSchedule"