PyTorch not finding GPU when using Azure ML online endpoint

aot 66 Reputation points
2025-01-23T14:34:11.0533333+00:00

I'm trying to deploy a Azure ML managed online endpoint, that will be executing my model inference flow, using PyTorch-based models. The endpoint is set up to use a Standard_DS4_v2 compute, and uses an environment based on one of the slightly older, curated acpt-pytorch environment available through Azure ML Studio.

When I try to deploy my endpoint, the deployment fails upon initializing my models, claiming that:

ERROR:root:Error initializing model: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver

If I try to deploy the endpoint, but disabling CUDA and simply score on the CPU, the endpoint deploys as expected.

I have no issues running training of the model, on the same type of compute as mentioned above, with full GPU support. Identical environments are used for training, and for endpoint deployment.

Any suggestions as to why my endpoint cannot deploy when I want to use the GPU?

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,086 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.