NVIDIA GPU Driver Extension for Linux
This extension installs NVIDIA GPU drivers on Linux N-series virtual machines (VMs). Depending on the VM family, the extension installs CUDA or GRID drivers. When you install NVIDIA drivers by using this extension, you're accepting and agreeing to the terms of the NVIDIA End-User License Agreement. During the installation process, the VM might reboot to complete the driver setup.
Instructions on manual installation of the drivers and the current supported versions are available. An extension is also available to install NVIDIA GPU drivers on Windows N-series VMs.
Note
With Secure Boot enabled, all OS boot components (boot loader, kernel, kernel drivers) must be signed by trusted publishers (key trusted by the system). Secure Boot is not supported using Windows or Linux extensions. For more information on manually installing GPU drivers with Secure Boot enabled, see Azure N-series GPU driver setup for Linux.
Note
The GPU driver extensions do not automatically update the driver after the extension is installed. If you need to move to a newer driver version then either manually download and install the driver or remove and add the extension again.
Prerequisites
Operating system
This extension supports the following OS distros, depending on driver support for the specific OS version:
Distribution | Version |
---|---|
Linux: Ubuntu | 20.04 LTS |
Linux: Red Hat Enterprise Linux | 7.9 |
Note
The latest supported CUDA drivers for NC-series VMs are currently 470.82.01. Later driver versions aren't supported on the K80 cards in NC. While the extension is being updated with this end of support for NC, install CUDA drivers manually for K80 cards on the NC-series.
Important
This document references a release version of Linux that is nearing or at, End of Life (EOL). Please consider updating to a more current version.
Internet connectivity
The Microsoft Azure Extension for NVIDIA GPU Drivers requires that the target VM is connected to the internet and has access.
Extension schema
The following JSON shows the schema for the extension:
{
"name": "<myExtensionName>",
"type": "extensions",
"apiVersion": "2015-06-15",
"location": "<location>",
"dependsOn": [
"[concat('Microsoft.Compute/virtualMachines/', <myVM>)]"
],
"properties": {
"publisher": "Microsoft.HpcCompute",
"type": "NvidiaGpuDriverLinux",
"typeHandlerVersion": "1.6",
"autoUpgradeMinorVersion": true,
"settings": {
}
}
}
Properties
Name | Value/Example | Data type |
---|---|---|
apiVersion | 2015-06-15 | date |
publisher | Microsoft.HpcCompute | string |
type | NvidiaGpuDriverLinux | string |
typeHandlerVersion | 1.6 | int |
Settings
All settings are optional. The default behavior is to not update the kernel if not required for driver installation and install the latest supported driver and the CUDA toolkit (as applicable).
Name | Description | Default value | Valid values | Data type |
---|---|---|---|---|
updateOS | Update the kernel even if not required for driver installation. | false | true, false | boolean |
driverVersion | NV: GRID driver version. NC/ND: CUDA toolkit version. The latest drivers for the chosen CUDA are installed automatically. |
latest | List of supported driver versions | string |
installCUDA | Install CUDA toolkit. Only relevant for NC/ND series VMs. | true | true, false | boolean |
Deployment
Azure portal
You can deploy Azure NVIDIA VM extensions in the Azure portal.
In a browser, go to the Azure portal.
Go to the virtual machine on which you want to install the driver.
On the left menu, select Extensions.
Select Add.
Scroll to find and select NVIDIA GPU Driver Extension, and then select Next.
Select Review + create, and select Create. Wait a few minutes for the driver to deploy.
Verify that the extension was added to the list of installed extensions.
Azure Resource Manager template
You can use Azure Resource Manager templates to deploy Azure VM extensions. Templates are ideal when you deploy one or more virtual machines that require post-deployment configuration.
The JSON configuration for a virtual machine extension can be nested inside the virtual machine resource or placed at the root or top level of a Resource Manager JSON template. The placement of the JSON configuration affects the value of the resource name and type. For more information, see Set name and type for child resources.
The following example assumes the extension is nested inside the virtual machine resource. When the extension resource is nested, the JSON is placed in the "resources": []
object of the virtual machine.
{
"name": "myExtensionName",
"type": "extensions",
"location": "[resourceGroup().location]",
"apiVersion": "2015-06-15",
"dependsOn": [
"[concat('Microsoft.Compute/virtualMachines/', myVM)]"
],
"properties": {
"publisher": "Microsoft.HpcCompute",
"type": "NvidiaGpuDriverLinux",
"typeHandlerVersion": "1.6",
"autoUpgradeMinorVersion": true,
"settings": {
}
}
}
PowerShell
Set-AzVMExtension
-ResourceGroupName "myResourceGroup" `
-VMName "myVM" `
-Location "southcentralus" `
-Publisher "Microsoft.HpcCompute" `
-ExtensionName "NvidiaGpuDriverLinux" `
-ExtensionType "NvidiaGpuDriverLinux" `
-TypeHandlerVersion 1.6 `
-SettingString '{ `
}'
Azure CLI
The following example mirrors the preceding Resource Manager and PowerShell examples:
az vm extension set \
--resource-group myResourceGroup \
--vm-name myVM \
--name NvidiaGpuDriverLinux \
--publisher Microsoft.HpcCompute \
--version 1.6
The following example also adds two optional custom settings as an example for nondefault driver installation. Specifically, it updates the OS kernel to the latest and installs a specific CUDA toolkit version driver. Again, note the --settings
are optional and default. Updating the kernel might increase the extension installation times. Also, choosing a specific (older) CUDA toolkit version might not always be compatible with newer kernels.
az vm extension set \
--resource-group myResourceGroup \
--vm-name myVM \
--name NvidiaGpuDriverLinux \
--publisher Microsoft.HpcCompute \
--version 1.6 \
--settings '{ \
"updateOS": true, \
"driverVersion": "10.0.130" \
}'
Troubleshoot and support
Troubleshoot
You can retrieve data about the state of extension deployments from the Azure portal and by using Azure PowerShell and the Azure CLI. To see the deployment state of extensions for a given VM, run the following command:
Get-AzVMExtension -ResourceGroupName myResourceGroup -VMName myVM -Name myExtensionName
az vm extension list --resource-group myResourceGroup --vm-name myVM -o table
Extension execution output is logged to the following file. Refer to this file to track the status of any long-running installation and for troubleshooting any failures.
/var/log/azure/nvidia-vmext-status
Exit codes
Exit code | Meaning | Possible action |
---|---|---|
0 | Operation successful | |
1 | Incorrect usage of extension | Check the execution output log. |
10 | Linux Integration Services for Hyper-V and Azure not available or installed | Check the output of lspci. |
11 | NVIDIA GPU not found on this VM size | Use a supported VM size and OS. |
12 | Image offer not supported | |
13 | VM size not supported | Use an N-series VM to deploy. |
14 | Operation unsuccessful | Check the execution output log. |
Known issues
NvidiaGpuDriverLinux
currently fails to install the latest drivers17.x
GRID drivers because of certificate issues. While Azure is working to resolve this issue, use GRID driver16.5
by passing a runtime setting to the extension.
az vm extension set --resource-group <rg-name> --vm-name <vm-name> --name NvidiaGpuDriverLinux --publisher Microsoft.HpcCompute --settings "{'driverVersion':'535.161'}"
{
"name": "NvidiaGpuDriverLinux",
"type": "extensions",
"apiVersion": "2015-06-15",
"location": "<location>",
"dependsOn": [
"[concat('Microsoft.Compute/virtualMachines/', <myVM>)]"
],
"properties": {
"publisher": "Microsoft.HpcCompute",
"type": "NvidiaGpuDriverLinux",
"typeHandlerVersion": "1.11",
"autoUpgradeMinorVersion": true,
"settings": {
"driverVersion": "535.161"
}
}
}
- GRID Driver version
17.x
is incompatible on NVv3 (NVIDIA Tesla M60). GRID drivers up to version16.5
are supported.NvidiaGpuDriverLinux
installs the latest drivers which are incompatible on NVv3 SKU. Instead, use the following runtime settings to force the extension to install an older version of the driver. For more information on driver versions, see NVIDIA GPU resources.
az vm extension set --resource-group <rg-name> --vm-name <vm-name> --name NvidiaGpuDriverLinux --publisher Microsoft.HpcCompute --settings "{'driverVersion':'535.161'}"
{
"name": "NvidiaGpuDriverLinux",
"type": "extensions",
"apiVersion": "2015-06-15",
"location": "<location>",
"dependsOn": [
"[concat('Microsoft.Compute/virtualMachines/', <myVM>)]"
],
"properties": {
"publisher": "Microsoft.HpcCompute",
"type": "NvidiaGpuDriverLinux",
"typeHandlerVersion": "1.11",
"autoUpgradeMinorVersion": true,
"settings": {
"driverVersion": "535.161"
}
}
}
Support
If you need more help at any point in this article, contact the Azure experts on the MSDN Azure and Stack Overflow forums. Alternatively, you can file an Azure support incident. Go to Azure support and select Get support. For information about using Azure support, read the Azure support FAQ.
Next steps
- For more information about extensions, see Virtual machine extensions and features for Linux.
- For more information about N-series VMs, see GPU optimized virtual machine sizes.