Muokkaa

Jaa


Use private Python packages with Azure Machine Learning

APPLIES TO: Python SDK azureml v1

In this article, learn how to use private Python packages securely within Azure Machine Learning. Use cases for private Python packages include:

  • You've developed a private package that you don't want to share publicly.
  • You want to use a curated repository of packages stored within an enterprise firewall.

The recommended approach depends on whether you have few packages for a single Azure Machine Learning workspace, or an entire repository of packages for all workspaces within an organization.

The private packages are used through Environment class. Within an environment, you declare which Python packages to use, including private ones. To learn about environment in Azure Machine Learning in general, see How to use environments.

Prerequisites

Use small number of packages for development and testing

For a few private packages for a single workspace, use the static Environment.add_private_pip_wheel() method. This approach allows you to quickly add a private package to the workspace, and is well suited for development and testing purposes.

Point the file path argument to a local wheel file and run the add_private_pip_wheel command. The command returns a URL used to track the location of the package within your Workspace. Capture the storage URL and pass it the add_pip_package() method.

whl_url = Environment.add_private_pip_wheel(workspace=ws,file_path = "my-custom.whl")
myenv = Environment(name="myenv")
conda_dep = CondaDependencies()
conda_dep.add_pip_package(whl_url)
myenv.python.conda_dependencies=conda_dep

Internally, Azure Machine Learning service replaces the URL by secure SAS URL, so your wheel file is kept private and secure.

Use a repository of packages from Azure DevOps feed

If you're actively developing Python packages for your machine learning application, you can host them in an Azure DevOps repository as artifacts and publish them as a feed. This approach allows you to integrate the DevOps workflow for building packages with your Azure Machine Learning Workspace. To learn how to set up Python feeds using Azure DevOps, read Get Started with Python Packages in Azure Artifacts

This approach uses Personal Access Token to authenticate against the repository. The same approach is applicable to other repositories with token based authentication, such as private GitHub repositories.

  1. Create a Personal Access Token (PAT) for your Azure DevOps instance. Set the scope of the token to Packaging > Read.

  2. Add the Azure DevOps URL and PAT as workspace properties, using the Workspace.set_connection method.

    from azureml.core import Workspace
    
    pat_token = input("Enter secret token")
    ws = Workspace.from_config()
    ws.set_connection(name="connection-1", 
       category = "PythonFeed",
       target = "https://pkgs.dev.azure.com/<MY-ORG>", 
       authType = "PAT", 
       value = pat_token) 
    
  3. Create an Azure Machine Learning environment and add Python packages from the feed.

    from azureml.core import Environment
    from azureml.core.conda_dependencies import CondaDependencies
    
    env = Environment(name="my-env")
    cd = CondaDependencies()
    cd.add_pip_package("<my-package>")
    cd.set_pip_option("--extra-index-url https://pkgs.dev.azure.com/<MY-ORG>/_packaging/<MY-FEED>/pypi/simple")")
    env.python.conda_dependencies=cd
    

The environment is now ready to be used in training runs or web service endpoint deployments. When building the environment, Azure Machine Learning service uses the PAT to authenticate against the feed with the matching base URL.

Use a repository of packages from private storage

You can consume packages from an Azure storage account within your organization's firewall. The storage account can hold a curated set of packages or an internal mirror of publicly available packages.

To set up such private storage, see Secure an Azure Machine Learning workspace and associated resources. You must also place the Azure Container Registry (ACR) behind the virtual network.

Important

You must complete this step to be able to train or deploy models using the private package repository.

After completing these configurations, you can reference the packages in the Azure Machine Learning environment definition by their full URL in Azure blob storage.

Next steps