Thanks for posting your question in the Microsoft Q&A forum.
You need to make a few adjustments to your configuration. The issue you're encountering is likely due to the way GPU support is implemented in Azure Batch for Linux containers.
- Ensure your pool is configured correctly: Use an NC-series VM size and set the virtual machine configuration to use a GPU-enabled image
- Instead of using the container option "--gpu all", you should specify the GPU requirements in the task's resource requirements (when creating your task, set the
container_settings
to include GPU resources) - Make sure your custom image has the NVIDIA drivers and CUDA toolkit installed. Since you're using a custom image based on an Azure ML image, these should already be included
- In your Batch pool configuration, ensure that the
container_configuration
is set up correctly
Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful