AzureMlCompute job fails with error AggregatedUnauthorizedAccessError: Failed to pull Docker image

PARUPALLY, ANUJA REDDY 0 Reputation points
2025-02-10T10:50:04.4066667+00:00

We have a machine learning workspace that was recently moved to a vnet. we created a private endpoint from workspace to the ml subnet. Compute cluster is created with user assigned identity and the role acrpull is assigned to it. however, we see the following error related to acrpull.

Can you please suggest what might be missing here.

AzureMLCompute job failed
AggregatedUnauthorizedAccessError: Failed to pull Docker image. This error may occur because the compute could not authenticate with the Docker registry to pull the image. If using ACR please ensure the ACR has Admin user enabled or a Managed Identity with `AcrPull` access to the ACR is assigned to the compute. If the ACR Admin user's password was changed recently it may be necessary to synchronize the workspace keys. 	: Anonymous
	: Request to https://xyz.workspace.centralus.api.azureml.ms/environment/v1.0/subscriptions/subscriptionid/resourceGroups/mlworkspace-lower-sub/providers/Microsoft.MachineLearningServices/workspaces/occ-ml-lower-sub/environments/azureml%3A%2F%2Flocations%2Fcentralus%2Fworkspaces%2F%2Fenvironments%2FCliV2AnonymousEnvironment%2Fversions%/image/feed?environmentVersion=&secrets=true failed due to DNS errors, verify if custom network settings, such as VNet and Custom DNS, are correctly defined (visit https://docs.microsoft.com/en-us/azure/machine-learning/how-to-network-security-overview for more information) 	: Identity (MSI) not found on the compute, if the intention is to authenticate with identity ensure that a Managed Identity with `AcrPull` access to the ACR is assigned to the compute
	: {"Code":"DockerUnauthorizedAccessError","Category":"UserError","Message":"Failed to pull Docker image occcrlowersub.azurecr.io/azureml/azureml_e69:latest with authentication mode Anonymous due to: Docker responded with status code 500: {\"message\":\"Head \\\"https://occcrlowersub.azurecr.io/v2/azureml/azureml_121/manifests/latest\\\": unauthorized: authentication required, visit https://aka.ms/acr/authorization for more information. CorrelationId: 8e9\"}\n. Compute could not authenticate with the Docker registry to pull the image.","Details":[],"Error":null}
	: Some(true)

Close

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,112 questions
0 comments No comments
{count} votes

3 answers

Sort by: Most helpful
  1. Amira Bedhiafi 28,381 Reputation points
    2025-02-10T19:18:59.98+00:00

    From the message you are getting, I understand that the compute cluster is unable to pull the Docker image from the ACR due to authentication issues.

    You need to verify if the user-assigned managed identity (MSI) has the AcrPull role assigned to it on the ACR.

    If you are not using a managed identity, check that the Admin user is enabled on the ACR.

    If the ACR Admin user password was changed recently, synchronize the workspace keys:

    1. Navigate to your Azure Machine Learning workspace in the Azure portal.
    2. Go to Settings > Keys.
    3. Click on Sync keys.

    Another thing, verify that the VNet and custom DNS settings are correctly configured to allow the compute cluster to access the ACR.

    The compute cluster should be able to reach the ACR endpoint over the internet or through a private link so check that there are no NSGs or firewalls blocking access to the ACR.

    0 comments No comments

  2. Suwarna S Kale 471 Reputation points
    2025-02-10T21:12:02.24+00:00

    Hello PARUPALLY, ANUJA REDDY,

    Thank you for posting your question in the Microsoft Q&A forum.

    The error you are encountering indicates that the compute cluster is unable to authenticate with the Azure Container Registry (ACR) to pull the Docker image required for the Azure Machine Learning (AML) job. This is a common issue when moving an AML workspace to a Virtual Network (VNet) and configuring private endpoints.

    • Ensure the ACR has a private endpoint configured within the same VNet as the AML workspace.
    • Verify that the private endpoint is correctly linked to the ACR and that DNS resolution is working. This link may be useful - Configure private endpoint for ACR

    If above answer helped, please do not forget to "Accept Answer" as this may help other community members to refer the info if facing similar issue.

    0 comments No comments

  3. Pavankumar Purilla 3,235 Reputation points Microsoft Vendor
    2025-02-10T21:34:20.68+00:00

    Hi PARUPALLY, ANUJA REDDY,
    Greetings & Welcome to Microsoft Q&A forum! Thanks for posting your query!
    It sounds like you're encountering an authentication issue when your AzureMLCompute job tries to pull a Docker image from the Azure Container Registry (ACR). Here are a few steps you can take to troubleshoot and resolve this issue:

    • Ensure that the compute cluster has a Managed Identity assigned to it. You can check this in the Azure portal under the compute cluster's identity settings.
    • Confirm that the Managed Identity assigned to the compute cluster has the AcrPull role on the ACR. You can verify this in the Azure portal under the ACR's Access Control (IAM) settings.
    • If the ACR Admin user is not enabled, enable it. This can be done in the Azure portal under the ACR's Access keys settings.
    • If the ACR Admin user's password was changed recently, you may need to synchronize the workspace keys. You can do this using the Azure CLI:
    
    az ml workspace sync-keys -w <workspace-name> -g <resource-group-name>
    
    
    • Ensure that your custom network settings, such as VNet and Custom DNS, are correctly defined. This includes verifying that the private endpoint from the workspace to the ML subnet is properly configured.
    • The error message indicates a DNS error and an issue with the Managed Identity not being found on the compute. Double-check the network configuration and Managed Identity settings to ensure they are correctly set up.
      Please refer the following: Failed to pull Docker image from ACR to Azure ML

    Hope this helps. Do let us know if you have any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.