@Pavankumar Purilla Thank you for your reply and suggestions.
I wrote incorrectly in my initial question, I'm not using a compute instance, but rather a compute cluster. And when deploying my endpoint, to my understanding the compute cluster is spun up on demand, so I have no real option to log in beforehand to check whether there is an NVIDIA driver installed on the cluster?
According to your own documentation there should be both a GPU and full CUDA support
It's the exact same type of compute cluster I use for my pipeline training, where I do not face this issue. It is only for the endpoint deployment that this is happening.