Use Azure Container Storage with local NVMe

Azure Container Storage is a cloud-based volume management, deployment, and orchestration service built natively for containers. This article shows you how to configure Azure Container Storage to use Ephemeral Disk with local NVMe as back-end storage for your Kubernetes workloads. At the end, you'll have a pod that's using local NVMe as its storage.

What is Ephemeral Disk?

When your application needs sub-millisecond storage latency and doesn't require data durability, you can use Ephemeral Disk with Azure Container Storage to meet your performance requirements. Ephemeral means that the disks are deployed on the local virtual machine (VM) hosting the AKS cluster and not saved to an Azure storage service. Data will be lost on these disks if you stop/deallocate your VM.

There are two types of Ephemeral Disk available: local NVMe and temp SSD. NVMe is designed for high-speed data transfer between storage and CPU. Choose NVMe when your application needs higher IOPS or throughput than temp SSD, or requires more storage space. Be aware that Azure Container Storage only supports synchronous data replication for local NVMe.

Due to the ephemeral nature of these disks, Azure Container Storage supports the use of generic ephemeral volumes by default when using ephemeral disk. However, certain use cases might call for persistent volumes even if the data isn't durable; for example, if you want to use existing YAML files or deployment templates that are hard-coded to use persistent volumes, and your workload supports application-level replication for durability. In such cases, you can update your Azure Container Storage installation and add the annotation acstor.azure.com/accept-ephemeral-storage=true in your persistent volume claim definition to support the creation of persistent volumes from ephemeral disk storage pools.

Prerequisites

  • If you don't have an Azure subscription, create a free account before you begin.

  • This article requires the latest version (2.35.0 or later) of the Azure CLI. See How to install the Azure CLI. If you're using the Bash environment in Azure Cloud Shell, the latest version is already installed. If you plan to run the commands locally instead of in Azure Cloud Shell, be sure to run them with administrative privileges. For more information, see Get started with Azure Cloud Shell.

  • You'll need the Kubernetes command-line client, kubectl. It's already installed if you're using Azure Cloud Shell, or you can install it locally by running the az aks install-cli command.

  • If you haven't already installed Azure Container Storage, follow the instructions in Use Azure Container Storage with Azure Kubernetes Service.

  • Check if your target region is supported in Azure Container Storage regions.

Choose a VM type that supports local NVMe

Local NVMe Disk is only available in certain types of VMs, for example, Storage optimized VM SKUs or GPU accelerated VM SKUs. If you plan to use local NVMe capacity, choose one of these VM SKUs.

Run the following command to get the VM type that's used with your node pool. Replace <resource group> and <cluster name> with your own values. You don't need to supply values for PoolName or VmSize, so keep the query as shown here.

az aks nodepool list --resource-group <resource group> --cluster-name <cluster name> --query "[].{PoolName:name, VmSize:vmSize}" -o table

The following is an example of output.

PoolName    VmSize
----------  ---------------
nodepool1   standard_l8s_v3

We recommend that each VM have a minimum of four virtual CPUs (vCPUs), and each node pool have at least three nodes.

Create and attach generic ephemeral volumes

Follow these steps to create and attach a generic ephemeral volume.

1. Create a storage pool

First, create a storage pool, which is a logical grouping of storage for your Kubernetes cluster, by defining it in a YAML manifest file.

If you enabled Azure Container Storage using az aks create or az aks update commands, you might already have a storage pool. Use kubectl get sp -n acstor to get the list of storage pools. If you have a storage pool already available that you want to use, you can skip this section and proceed to Display the available storage classes.

Follow these steps to create a storage pool using local NVMe.

  1. Use your favorite text editor to create a YAML manifest file such as code acstor-storagepool.yaml.

  2. Paste in the following code and save the file. The storage pool name value can be whatever you want.

    apiVersion: containerstorage.azure.com/v1
    kind: StoragePool
    metadata:
      name: ephemeraldisk-nvme
      namespace: acstor
    spec:
      poolType:
        ephemeralDisk:
          diskType: nvme
    
  3. Apply the YAML manifest file to create the storage pool.

    kubectl apply -f acstor-storagepool.yaml 
    

    When storage pool creation is complete, you'll see a message like:

    storagepool.containerstorage.azure.com/ephemeraldisk-nvme created
    

    You can also run this command to check the status of the storage pool. Replace <storage-pool-name> with your storage pool name value. For this example, the value would be ephemeraldisk-nvme.

    kubectl describe sp <storage-pool-name> -n acstor
    

When the storage pool is created, Azure Container Storage will create a storage class on your behalf, using the naming convention acstor-<storage-pool-name>.

2. Display the available storage classes

When the storage pool is ready to use, you must select a storage class to define how storage is dynamically created when creating and deploying volumes.

Run kubectl get sc to display the available storage classes. You should see a storage class called acstor-<storage-pool-name>.

$ kubectl get sc | grep "^acstor-"
acstor-azuredisk-internal   disk.csi.azure.com               Retain          WaitForFirstConsumer   true                   65m
acstor-ephemeraldisk-nvme        containerstorage.csi.azure.com   Delete          WaitForFirstConsumer   true                   2m27s

Important

Don't use the storage class that's marked internal. It's an internal storage class that's needed for Azure Container Storage to work.

3. Deploy a pod with a generic ephemeral volume

Create a pod using Fio (Flexible I/O Tester) for benchmarking and workload simulation, that uses a generic ephemeral volume.

  1. Use your favorite text editor to create a YAML manifest file such as code acstor-pod.yaml.

  2. Paste in the following code and save the file.

    kind: Pod
    apiVersion: v1
    metadata:
      name: fiopod
    spec:
      nodeSelector:
        acstor.azure.com/io-engine: acstor
      containers:
        - name: fio
          image: nixery.dev/shell/fio
          args:
            - sleep
            - "1000000"
          volumeMounts:
            - mountPath: "/volume"
              name: ephemeralvolume
      volumes:
        - name: ephemeralvolume
          ephemeral:
            volumeClaimTemplate:
              metadata:
                labels:
                  type: my-ephemeral-volume
              spec:
                accessModes: [ "ReadWriteOnce" ]
                storageClassName: acstor-ephemeraldisk-nvme # replace with the name of your storage class if different
                resources:
                  requests:
                    storage: 1Gi
    

    When you change the storage size of your volumes, make sure the size is less than the available capacity of a single node's ephemeral disk. See Check node ephemeral disk capacity.

  3. Apply the YAML manifest file to deploy the pod.

    kubectl apply -f acstor-pod.yaml
    

    You should see output similar to the following:

    pod/fiopod created
    
  4. Check that the pod is running and that the ephemeral volume claim has been bound successfully to the pod:

    kubectl describe pod fiopod
    kubectl describe pvc fiopod-ephemeralvolume
    
  5. Check fio testing to see its current status:

    kubectl exec -it fiopod -- fio --name=benchtest --size=800m --filename=/volume/test --direct=1 --rw=randrw --ioengine=libaio --bs=4k --iodepth=16 --numjobs=8 --time_based --runtime=60
    

You've now deployed a pod that's using local NVMe as its storage, and you can use it for your Kubernetes workloads.

Create and attach persistent volumes

To create a persistent volume from an ephemeral disk storage pool, you must include an annotation in your persistent volume claims (PVCs) as a safeguard to ensure that you intend to use persistent volumes even when the data is ephemeral. Additionally, you need to enable the --ephemeral-disk-volume-type flag with the PersistentVolumeWithAnnotation value on your cluster before creating your persistent volume claims.

Follow these steps to create and attach a persistent volume.

1. Update your Azure Container Storage installation

Run the following command to update your Azure Container Storage installation to allow the creation of persistent volumes from ephemeral disk storage pools.

az aks update -n <cluster-name> -g <resource-group> --enable-azure-container-storage ephemeralDisk --storage-pool-option NVMe --ephemeral-disk-volume-type PersistentVolumeWithAnnotation 

2. Create a storage pool

Create a storage pool, which is a logical grouping of storage for your Kubernetes cluster, by defining it in a YAML manifest file.

If you enabled Azure Container Storage using az aks create or az aks update commands, you might already have a storage pool. Use kubectl get sp -n acstor to get the list of storage pools. If you have a storage pool already available that you want to use, you can skip this section and proceed to Display the available storage classes.

Follow these steps to create a storage pool using local NVMe.

  1. Use your favorite text editor to create a YAML manifest file such as code acstor-storagepool.yaml.

  2. Paste in the following code and save the file. The storage pool name value can be whatever you want.

    apiVersion: containerstorage.azure.com/v1
    kind: StoragePool
    metadata:
      name: ephemeraldisk-nvme
      namespace: acstor
    spec:
      poolType:
        ephemeralDisk:
          diskType: nvme
    
  3. Apply the YAML manifest file to create the storage pool.

    kubectl apply -f acstor-storagepool.yaml 
    

    When storage pool creation is complete, you'll see a message like:

    storagepool.containerstorage.azure.com/ephemeraldisk-nvme created
    

    You can also run this command to check the status of the storage pool. Replace <storage-pool-name> with your storage pool name value. For this example, the value would be ephemeraldisk-nvme.

    kubectl describe sp <storage-pool-name> -n acstor
    

When the storage pool is created, Azure Container Storage will create a storage class on your behalf, using the naming convention acstor-<storage-pool-name>.

3. Display the available storage classes

When the storage pool is ready to use, you must select a storage class to define how storage is dynamically created when creating and deploying volumes.

Run kubectl get sc to display the available storage classes. You should see a storage class called acstor-<storage-pool-name>.

$ kubectl get sc | grep "^acstor-"
acstor-azuredisk-internal   disk.csi.azure.com               Retain          WaitForFirstConsumer   true                   65m
acstor-ephemeraldisk-nvme        containerstorage.csi.azure.com   Delete          WaitForFirstConsumer   true                   2m27s

Important

Don't use the storage class that's marked internal. It's an internal storage class that's needed for Azure Container Storage to work.

4. Create a persistent volume claim

A persistent volume claim is used to automatically provision storage based on a storage class. Follow these steps to create a PVC using the new storage class.

  1. Use your favorite text editor to create a YAML manifest file such as code acstor-pvc.yaml.

  2. Paste in the following code and save the file. The PVC name value can be whatever you want.

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: ephemeralpvc
      annotations:
        acstor.azure.com/accept-ephemeral-storage: "true"
    spec:
      accessModes:
        - ReadWriteOnce
      storageClassName: acstor-ephemeraldisk-nvme # replace with the name of your storage class if different
      resources:
        requests:
          storage: 100Gi
    

    When you change the storage size of your volumes, make sure the size is less than the available capacity of a single node's ephemeral disk. See Check node ephemeral disk capacity.

  3. Apply the YAML manifest file to create the PVC.

    kubectl apply -f acstor-pvc.yaml
    

    You should see output similar to:

    persistentvolumeclaim/ephemeralpvc created
    

    You can verify the status of the PVC by running the following command:

    kubectl describe pvc ephemeralpvc
    

Once the PVC is created, it's ready for use by a pod.

5. Deploy a pod and attach a persistent volume

Create a pod using Fio (Flexible I/O Tester) for benchmarking and workload simulation, and specify a mount path for the persistent volume. For claimName, use the name value that you used when creating the persistent volume claim.

  1. Use your favorite text editor to create a YAML manifest file such as code acstor-pod.yaml.

  2. Paste in the following code and save the file.

    kind: Pod
    apiVersion: v1
    metadata:
      name: fiopod
    spec:
      nodeSelector:
        acstor.azure.com/io-engine: acstor
      volumes:
        - name: ephemeralpv
          persistentVolumeClaim:
            claimName: ephemeralpvc
      containers:
        - name: fio
          image: nixery.dev/shell/fio
          args:
            - sleep
            - "1000000"
          volumeMounts:
            - mountPath: "/volume"
              name: ephemeralpv
    
  3. Apply the YAML manifest file to deploy the pod.

    kubectl apply -f acstor-pod.yaml
    

    You should see output similar to the following:

    pod/fiopod created
    
  4. Check that the pod is running and that the persistent volume claim has been bound successfully to the pod:

    kubectl describe pod fiopod
    kubectl describe pvc ephemeralpvc
    
  5. Check fio testing to see its current status:

    kubectl exec -it fiopod -- fio --name=benchtest --size=800m --filename=/volume/test --direct=1 --rw=randrw --ioengine=libaio --bs=4k --iodepth=16 --numjobs=8 --time_based --runtime=60
    

You've now deployed a pod that's using local NVMe and you can use it for your Kubernetes workloads.

Manage volumes and storage pools

In this section, you'll learn how to check the available capacity of ephemeral disk for a single node, how to expand or delete a storage pool, and how to optimize performance.

Check node ephemeral disk capacity

An ephemeral volume is allocated on a single node. When you configure the size of your ephemeral volumes, the size should be less than the available capacity of the single node's ephemeral disk.

Run the following command to check the available capacity of ephemeral disk for a single node.

$ kubectl get diskpool -n acstor
NAME                                CAPACITY      AVAILABLE     USED        RESERVED    READY   AGE
ephemeraldisk-nvme-diskpool-jaxwb   75660001280   75031990272   628011008   560902144   True    21h
ephemeraldisk-nvme-diskpool-wzixx   75660001280   75031990272   628011008   560902144   True    21h
ephemeraldisk-nvme-diskpool-xbtlj   75660001280   75031990272   628011008   560902144   True    21h

In this example, the available capacity of ephemeral disk for a single node is 75031990272 bytes or 69 GiB.

Expand a storage pool

You can expand storage pools backed by local NVMe to scale up quickly and without downtime. Shrinking storage pools isn't currently supported.

Because a storage pool backed by Ephemeral Disk uses local storage resources on the AKS cluster nodes (VMs), expanding the storage pool requires adding another node to the cluster. Follow these instructions to expand the storage pool.

  1. Run the following command to add a node to the AKS cluster. Replace <cluster-name>, <nodepool name>, and <resource-group-name> with your own values. To get the name of your node pool, run kubectl get nodes.

    az aks nodepool add --cluster-name <cluster name> --name <nodepool name> --resource-group <resource group> --node-vm-size Standard_L8s_v3 --node-count 1 --labels acstor.azure.com/io-engine=acstor
    
  2. Run kubectl get nodes and you'll see that a node has been added to the cluster.

  3. Run kubectl get sp -A and you should see that the capacity of the storage pool has increased.

Delete a storage pool

If you want to delete a storage pool, run the following command. Replace <storage-pool-name> with the storage pool name.

kubectl delete sp -n acstor <storage-pool-name>

Optimize performance when using local NVMe

Depending on your workload’s performance requirements, you can choose from three different performance tiers: Basic, Standard, and Premium. Your selection will impact the number of vCPUs that Azure Container Storage components consume in the nodes where it's installed. Standard is the default configuration if you don't update the performance tier.

These three tiers offer a different range of IOPS. The following table contains guidance on what you could expect with each of these tiers. We used FIO, a popular benchmarking tool, to achieve these numbers with the following configuration:

  • AKS: Node SKU - Standard_L16s_v3;
  • FIO: Block size - 4KB; Queue depth - 32; Numjobs - number of cores assigned to container storage components; Access pattern - random; Worker set size - 32G
Tier Number of vCPUs 100 % Read IOPS 100 % Write IOPS
Basic 12.5% of total VM cores Up to 120,000 Up to 90,000
Standard (default) 25% of total VM cores Up to 220,000 Up to 180,000
Premium 50% of total VM cores Up to 550,000 Up to 360,000

Note

RAM and hugepages consumption will stay consistent across all tiers: 1 GiB of RAM and 2 GiB of hugepages.

Once you've identified the performance tier that aligns best to your needs, you can run the following command to update the performance tier of your Azure Container Storage installation. Replace <performance tier> with basic, standard, or premium.

az aks update -n <cluster-name> -g <resource-group> --enable-azure-container-storage <storage-pool-type> --ephemeral-disk-nvme-perf-tier <performance-tier>

See also