How do you create an ML Environment in an AML registry, using a custom, pre built docker image residing in a private ACR

Chris Musselle 0 Reputation points
2025-02-17T15:48:10.01+00:00

I am trying to create a reusable ML Environment for use by multiple AML workspaces by creating one in an AML registry. However, this Environment needs to be based off a custom docker image that I create in a previous step. Reason being, as part of creating this docker image, I need to make use of build arguments and build secrets which are passed as part of the docker cli command. The documentation says that it should be possible to create such an environment using an existing docker image, but I can't get this working when the image is in a private ACR.

https://learn.microsoft.com/en-us/azure/machine-learning/how-to-manage-environments-v2?view=azureml-api-2&tabs=python#create-an-environment-from-a-docker-image

Things I have tried:

  • Using a docker build context and building the docker image as an ACR task. This works with a minimal test image, but I am unable to pass in additional build arguments or secrets, so can't use this approach with my custom image.
  • Running the example in the docs, using the az cli, but referencing a private ACR in the image url. This appeared to result in the command trying to lookup the image in dockerhub, as it gave the following error:
    • cli.azure.cli.core.azclierror: (UserError) Authentication failed for container registry docker.io
  • Running the example in the docs, using the python SDK but referencing a private ACR in the image url. This looking like it tried to connect to the private ACR, but could not authenticate.
    • HttpResponseError: (UserError) Authentication failed for container registry myprivateacr.azurecr.io
  • I retried both of the above after giving the AML registry identity the AcrPull role on the private ACR. This did not resolve the issue.
  • Treating the AML registry like a regular AML workspace, and trying to add a connection object to the registry using the admin username and password for the ACR. This did not work as I don't think AML registry objects have the same concept as connections. I received the below error during the request serialisation step:
    • ValueError: No value for given attribute

Version info:

  • Python: 3.12
  • azure.ai.ml: 1.25.0
  • azure cli: 2.69.0
  • azure cli ml extension: 2.33.1

Any help getting this working would be greatly appreciated.

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,149 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Vikram Singh 1,980 Reputation points Microsoft Employee
    2025-02-21T10:02:07.95+00:00

    Hi Chris Musselle,

    I'm sorry to hear that you're still encountering issues and frustration on it. Based on the error message and the traceback, it seems like the authentication to your private Azure Container Registry (ACR) is failing. I know you have already troubleshooted a few steps. Can you confirm the following steps:

    1. Verify Role Assignment: Ensure that the Azure Machine Learning (AML) registry identity has the AcrPull role assigned on the ACR. You can verify this using the Azure CLI:
    az role assignment create --assignee <AML-Registry-Identity-ID> --role AcrPull --scope /subscriptions/<Subscription-ID>/resourceGroups/<Resource-Group-Name>/providers/Microsoft.ContainerRegistry/registries/<ACR-Name>
    
    1. Check Managed Identity: Make sure that the managed identity used by the AML registry has the necessary permissions. If you are using a system-assigned managed identity, ensure it has the AcrPull role on the ACR.
    2. Update Authentication Method: If you are using the Azure ML Python SDK, ensure that you are using the DefaultAzureCredential for authentication. This credential will automatically handle the authentication process for you. Here is an example:
    from azure.ai.ml import MLClient
    from azure.identity import DefaultAzureCredential
    from azure.ai.ml.entities import Environment
    
    credential = DefaultAzureCredential()
    
    ml_client = MLClient(
        credential=credential,
        subscription_id="<Subscription-ID>",
        resource_group_name="<Resource-Group-Name>",
        registry_name="<AML-Registry-Name>"
    )
    
    env_name = "my-custom-env"
    env = Environment(
        name=env_name,
        description="Environment with custom Docker image",
        image="<ACR-Name>.azurecr.io/<Image-Name>:<Tag>",
        conda_file="path/to/conda.yml"
    )
    
    ml_client.environments.create_or_update(env)
    
    1. Check Docker Image URL: Verify that the Docker image URL is correctly formatted and accessible. The URL should be in the format <ACR-Name>.azurecr.io/<Image-Name>:<Tag>.
    2. Disable Local Auth: If you have local authentication enabled for your ACR, consider disabling it and using managed identities instead. This can help avoid issues related to credential management.
    3. Firewall and Network Configuration: Ensure that your ACR is not behind a firewall or virtual network that might be blocking access from the AML registry. If it is, you need to configure the firewall settings to allow access from the AML workspace.

    If you have followed these steps and are still encountering issues, please provide more the latest details about your setup, including any additional error messages or logs. This will help in diagnosing the problem further.

    For more detailed information, you can refer to the official documentation

    Thanks


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.