Communication issue between my Azure VM (NVv5 A10) and the GPU driver
Hi everyone!
I have a communication problem between my Azure VM (NVv5 A10) and the driver. Which is causing the problem that I can not use the GPU. When I check if the driver is working I get the following issue:
~$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
More over I installed and tested various drivers, none of which helped the Problem. I found one article stating „We have licensing issues on the NVv5 A10 series and Azure is actively working with Nvidia to resolve it. Use versions lower than v17.x on NVv5 A10 series. The extension currently installs 16.5 GRID drivers.“ (https://learn.microsoft.com/en-us/azure/virtual-machines/linux/n-series-driver-setup?source=recommendations) So that’s exactly what I did with help of the Azure NVIDIA driver extension:
az vm extension set \
--resource-group „resource group“ \
--vm-name "vm-tools“ \
--name NvidiaGpuDriverLinux \
--publisher Microsoft.HpcCompute
After Reboot and still the same issue with „nvidia-smi“ and no modules shown when checking „Ismod | grep nvidia“.
I am really out of ideas right now and appreciate any ideas an thoughts about what might be causing this issue.
Thanks a lot!
Petra