Vulnerability management for Azure AI Foundry

Article
11/21/2024

Important

Items marked (preview) in this article are currently in public preview. This preview is provided without a service-level agreement, and we don't recommend it for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

Vulnerability management involves detecting, assessing, mitigating, and reporting on any security vulnerabilities that exist in an organization's systems and software. Vulnerability management is a shared responsibility between you and Microsoft.

This article discusses these responsibilities and outlines the vulnerability management controls that Azure AI Foundry provides. You learn how to keep your service instance and applications up to date with the latest security updates, and how to minimize the window of opportunity for attackers.

Microsoft-managed VM images

Microsoft manages host OS virtual machine (VM) images for compute instances and serverless compute clusters. The update frequency is monthly and includes the following details:

For each new VM image version, the latest updates are sourced from the original publisher of the OS. Using the latest updates helps ensure that you get all applicable OS-related patches. For Azure AI Foundry, the publisher is Canonical for all the Ubuntu images.
VM images are updated monthly.
In addition to patches that the original publisher applies, Microsoft updates system packages when updates are available.
Microsoft checks and validates any machine learning packages that might require an upgrade. In most circumstances, new VM images contain the latest package versions.
All VM images are built on secure subscriptions that run vulnerability scanning regularly. Microsoft flags any unaddressed vulnerabilities and fixes them within the next release.
The frequency is a monthly interval for most images. For compute instances, the image release is aligned with the release cadence of the Azure Machine Learning SDK that's preinstalled in the environment.

In addition to the regular release cadence, Microsoft applies hotfixes if vulnerabilities surface. Microsoft rolls out hotfixes within 72 hours for serverless compute clusters and within a week for compute instances.

Note

The host OS is not the OS version that you might specify for an environment when you're training or deploying a model. Environments run inside Docker. Docker runs on the host OS.

Microsoft-managed container images

Base docker images that Microsoft maintains for Azure AI Foundry get security patches frequently to address newly discovered vulnerabilities.

Microsoft releases updates for supported images every two weeks to address vulnerabilities. As a commitment, we aim to have no vulnerabilities older than 30 days in the latest version of supported images.

Patched images are released under a new immutable tag and an updated :latest tag. Using the :latest tag or pinning to a particular image version might be a tradeoff between security and environment reproducibility for your machine learning job.

Managing environments and container images

In Azure AI Foundry portal, Docker images are used to provide a runtime environment for prompt flow deployments. The images are built from a base image that Azure AI Foundry provides.

Although Microsoft patches base images with each release, whether you use the latest image might be tradeoff between reproducibility and vulnerability management. It's your responsibility to choose the environment version that you use for your jobs or model deployments.

By default, dependencies are layered on top of base images when you're building an image. After you install more dependencies on top of the Microsoft-provided images, vulnerability management becomes your responsibility.

Associated with your Azure AI Foundry hub is an Azure Container Registry instance that functions as a cache for container images. Any image that materializes is pushed to the container registry. The workspace uses it when deployment is triggered for the corresponding environment.

The hub doesn't delete any image from your container registry. You're responsible for evaluating the need for an image over time. To monitor and maintain environment hygiene, you can use Microsoft Defender for Container Registry to help scan your images for vulnerabilities. To automate your processes based on triggers from Microsoft Defender, see Automate remediation responses.

Vulnerability management on compute hosts

Managed compute nodes in Azure AI Foundry portal use Microsoft-managed OS VM images. When you provision a node, it pulls the latest updated VM image. This behavior applies to compute instance, serverless compute cluster, and managed inference compute options.

Although OS VM images are regularly patched, Microsoft doesn't actively scan compute nodes for vulnerabilities while they're in use. For an extra layer of protection, consider network isolation of your computes.

Ensuring that your environment is up to date and that compute nodes use the latest OS version is a shared responsibility between you and Microsoft. Nodes that aren't idle can't be updated to the latest VM image. Considerations are slightly different for each compute type, as listed in the following sections.

Compute instance

Compute instances get the latest VM images at the time of provisioning. Microsoft releases new VM images on a monthly basis. After you deploy a compute instance, it isn't actively updated. To keep current with the latest software updates and security patches, you can use one of these methods:

Re-create a compute instance to get the latest OS image (recommended).

If you use this method, you'll lose data and customizations (such as installed packages) that are stored on the instance's OS and temporary disks.

For more information about image releases, see the Azure Machine Learning compute instance image release notes.
Regularly update OS and Python packages.
- Use Linux package management tools to update the package list with the latest versions:
```
sudo apt-get update
```
- Use Linux package management tools to upgrade packages to the latest versions. Package conflicts might occur when you use this approach.
```
sudo apt-get upgrade
```
- Use Python package management tools to upgrade packages and check for updates:
```
pip list --outdated
```

You can install and run additional scanning software on the compute instance to scan for security issues:

Use Trivy to discover OS and Python package-level vulnerabilities.
Use ClamAV to discover malware. It comes preinstalled on compute instances.

Microsoft Defender for Servers agent installation is currently not supported.

Endpoints

Endpoints automatically receive OS host image updates that include vulnerability fixes. The update frequency of images is at least once a month.

Compute nodes are automatically upgraded to the latest VM image version when that version is released. You don't need to take any action.

Share via