GPU-enabled compute

Note

Some GPU-enabled instance types are in Beta and are marked as such in the drop-down list when you select the driver and worker types during compute creation.

Overview

Azure Databricks supports compute accelerated with graphics processing units (GPUs). This article describes how to create compute with GPU-enabled instances and describes the GPU drivers and libraries installed on those instances.

To learn more about deep learning on GPU-enabled compute, see Deep learning.

Create a GPU compute

Creating a GPU compute is similar to creating any compute. You should keep in mind the following:

  • The Databricks Runtime Version must be a GPU-enabled version, such as Runtime 13.3 LTS ML (GPU, Scala 2.12.15, Spark 3.4.1).
  • The Worker Type and Driver Type must be GPU instance types.

Supported instance types

Warning

Azure Databricks is deprecating will no longer support spinning up compute using NC v3 instance type series since Azure is deprecating Nc24rs by March 31, 2025 and NC6s_v3, NC12s_v3, and NC24s_v3 by September 30, 2025.

Azure Databricks supports the following instance types:

NCads_H100_v5

  • GPU Type: NVIDIA H100 NVL GPU
Instance Name Number of GPUs GPU Memory vCPUs CPU Memory
Standard_NC40ads_H100_v5 1 94GB 40 320GB
Standard_NC80adis_H100_v5 2 94GB x 2 80 640GB

ND_H100_v5

  • GPU Type: NVIDIA H100 Tensor Core GPU
Instance Name Number of GPUs GPU Memory vCPUs CPU Memory
Standard_ND96isr_H100_v5 8 80GB x 8 96 1900GB

NC_A100_v4

  • GPU Type: NVIDIA A100 PCIe GPU
Instance Name Number of GPUs GPU Memory vCPUs CPU Memory
Standard_NC24ads_A100_v4 1 80GB 24 220GB
Standard_NC48ads_A100_v4 1 80GB x 2 48 440GB
Standard_NC96ads_A100_v4 1 80GB x 4 96 880GB

NDasrA100_v4

  • GPU Type: NVIDIA Ampere A100 40GB Tensor Core GPU
Instance Name Number of GPUs GPU Memory vCPUs CPU Memory
Standard_ND96asr_v4 8 40GB x 8 96 900GB

NVadsA10_v5

  • GPU Type: NVIDIA A10 GPU
Instance Name Number of GPUs GPU Memory vCPUs CPU Memory
Standard_NV36ads_A10_v5 1 24GB 36 440GB
Standard_NV36adms_A10_v5 1 24GB 36 880GB
Standard_NV72ads_A10_v5 2 24GB x 2 72 880GB

NCasT4_v3

  • GPU Type: NVIDIA T4 GPU
Instance Name Number of GPUs GPU Memory vCPUs CPU Memory
Standard_NC4as_T4_v3 1 16GB 4 28GB
Standard_NC8as_T4_v3 1 16GB 8 56GB
Standard_NC16as_T4_v3 1 16GB 16 110GB
Standard_NC64as_T4_v3 4 16GB x 4 64 440GB

NC_v3

  • GPU Type: NVIDIA Tesla V100 GPU
Instance Name Number of GPUs GPU Memory vCPUs CPU Memory
Standard_NC6s_v3 1 16GB 6 112GB
Standard_NC12s_v3 2 16GB x 2 12 224GB
Standard_NC24s_v3 4 16GB x 4 24 448GB
Standard_NC24rs_v3 4 16GB x 4 24 448GB

See Azure Databricks Pricing for an up-to-date list of supported GPU instance types and their availability regions. Your Azure Databricks deployment must reside in a supported region to launch GPU-enabled compute.

GPU scheduling

GPU scheduling distributes Spark tasks efficiently across a large number of GPUs.

Databricks Runtime supports GPU-aware scheduling from Apache Spark 3.0. Azure Databricks preconfigures it on GPU compute.

Note

GPU scheduling is not enabled on single-node compute.

GPU scheduling for AI and ML

spark.task.resource.gpu.amount is the only Spark config related to GPU-aware scheduling that you may need to configure. The default configuration uses one GPU per task, which is a good baseline for distributed inference workloads and distributed training if you use all GPU nodes.

To reduce communication overhead during distributed training, Databricks recommends setting spark.task.resource.gpu.amount to the number of GPUs per worker node in the compute Spark configuration. This creates only one Spark task for each Spark worker and assigns all GPUs in that worker node to the same task.

To increase parallelization for distributed deep learning inference, you can set spark.task.resource.gpu.amount to fractional values such as 1/2, 1/3, 1/4, … 1/N. This creates more Spark tasks than there are GPUs, allowing more concurrent tasks to handle inference requests in parallel. For example, if you set spark.task.resource.gpu.amount to 0.5, 0.33, or 0.25, then the available GPUs will be split among double, triple, or quadruple the number of tasks.

GPU indices

For PySpark tasks, Azure Databricks automatically remaps assigned GPU(s) to zero-based indices. For the default configuration that uses one GPU per task, you can use the default GPU without checking which GPU is assigned to the task. If you set multiple GPUs per task, for example, 4, the indices of the assigned GPUs are always 0, 1, 2, and 3. If you do need the physical indices of the assigned GPUs, you can get them from the CUDA_VISIBLE_DEVICES environment variable.

If you use Scala, you can get the indices of the GPUs assigned to the task from TaskContext.resources().get("gpu").

NVIDIA GPU driver, CUDA, and cuDNN

Azure Databricks installs the NVIDIA driver and libraries required to use GPUs on Spark driver and worker instances:

  • CUDA Toolkit, installed under /usr/local/cuda.
  • cuDNN: NVIDIA CUDA Deep Neural Network Library.
  • NCCL: NVIDIA Collective Communications Library.

The version of the NVIDIA driver included is 535.54.03, which supports CUDA 11.0. For the NV A10 v5 instance type series, the version of the NVIDIA driver included is 535.154.05.

For the versions of the libraries included, see the release notes for the specific Databricks Runtime version you are using.

Note

This software contains source code provided by NVIDIA Corporation. Specifically, to support GPUs, Azure Databricks includes code from CUDA Samples.

NVIDIA End User License Agreement (EULA)

When you select a GPU-enabled “Databricks Runtime Version” in Azure Databricks, you implicitly agree to the terms and conditions outlined in the NVIDIA EULA with respect to the CUDA, cuDNN, and Tesla libraries, and the NVIDIA End User License Agreement (with NCCL Supplement) for the NCCL library.

Databricks Container Services on GPU compute

Important

This feature is in Public Preview.

You can use Databricks Container Services on compute with GPUs to create portable deep learning environments with customized libraries. See Customize containers with Databricks Container Service for instructions.

To create custom images for GPU compute, you must select a standard runtime version instead of Databricks Runtime ML for GPU. When you select Use your own Docker container, you can choose GPU compute with a standard runtime version. The custom images for GPU are based on the official CUDA containers, which is different from Databricks Runtime ML for GPU.

When you create custom images for GPU compute, you cannot change the NVIDIA driver version because it must match the driver version on the host machine.

The databricksruntime Docker Hub contains example base images with GPU capability. The Dockerfiles used to generate these images are located in the example containers GitHub repository, which also has details on what the example images provide and how to customize them.