az ml data
Note
This reference is part of the ml extension for the Azure CLI (version 2.15.0 or higher). The extension will automatically install the first time you run an az ml data command. Learn more about extensions.
Manage Azure ML data assets.
Azure ML data assets are references to file(s) in your storage services or public URLs along with any corresponding metadata. They are not copies of your data. You can use these data assets to access relevant data during model training and mount or download the referenced data to your compute target.
Commands
Name | Description | Type | Status |
---|---|---|---|
az ml data archive |
Archive a data asset. |
Extension | GA |
az ml data create |
Create a data asset in a workspace/registry. If you are using a registry, replace |
Extension | GA |
az ml data import |
Import data and create a data asset. |
Extension | Preview |
az ml data list |
List data assets in a workspace/registry. If you are using a registry, replace |
Extension | GA |
az ml data list-materialization-status |
Show status of list of data import materialization jobs that create versions of a data asset. |
Extension | Preview |
az ml data mount |
Mount a specific data asset to a local path. For now only Linux is supported. |
Extension | Preview |
az ml data restore |
Restore an archived data asset. |
Extension | GA |
az ml data share |
Share a specific data asset from workspace to registry. |
Extension | Preview |
az ml data show |
Shows details for a data asset in a workspace/registry. If you are using a registry, replace |
Extension | GA |
az ml data update |
Update a data asset. |
Extension | GA |
az ml data archive
Archive a data asset.
Archiving a data asset will hide it by default from list queries (az ml data list
). You can still continue to reference and use an archived data asset in your workflows. You can archive either a data asset container or a specific data asset version. Archiving a data asset container will archive all versions of the data asset under that given name. You can restore an archived data asset using az ml data restore
. If the entire data asset container is archived, you cannot restore individual versions of the data asset - you will need to restore the data asset container.
az ml data archive --name
[--label]
[--resource-group]
[--version]
[--workspace-name]
Examples
Archive an data asset container (archives all versions of that data asset)
az ml data archive --name my-env --resource-group my-resource-group --workspace-name my-workspace
Archive a specific data asset version
az ml data archive --name my-env --version 1 --resource-group my-resource-group --workspace-name my-workspace
Required Parameters
Name of the data asset.
Optional Parameters
Label of the data asset. Mutually exclusive with version.
Name of resource group. You can configure the default group using az configure --defaults group=<name>
.
Version of the data asset. Mutually exclusive with label.
Name of the Azure ML workspace. You can configure the default workspace using az configure --defaults workspace=<name>
.
Global Parameters
Increase logging verbosity to show all debug logs.
Show this help message and exit.
Only show errors, suppressing warnings.
Output format.
JMESPath query string. See http://jmespath.org/ for more information and examples.
Name or ID of subscription. You can configure the default subscription using az account set -s NAME_OR_ID
.
Increase logging verbosity. Use --debug for full debug logs.
az ml data create
Create a data asset in a workspace/registry. If you are using a registry, replace --workspace-name my-workspace
with the --registry-name <registry-name>
option.
Data assets can be defined from files on your local machine or as references to files in cloud storage. The created data asset will be tracked in the workspace/registry under the specified name and version.
To create a data asset from file(s) on your local machine, specify the 'path' field in your YAML config. Azure ML will upload these file(s) to the blob container that backs the workspace's default datastore (named 'workspaceblobstore'). The created data asset will then point to that uploaded data.
To create a data asset that references file(s) in cloud storage, specify the 'path' to the file(s) in storage in your YAML config.
You can also create a data asset directly from a storage URL or public URL. To do so, specify the URL to the 'path' field in your YAML config.
If you are using a registry, replace --workspace-name my-workspace
with the --registry-name <my-registry-name>
option.
az ml data create [--datastore]
[--description]
[--file]
[--name]
[--no-wait]
[--path]
[--registry-name]
[--resource-group]
[--set]
[--skip-validation]
[--type {mltable, uri_file, uri_folder}]
[--version]
[--workspace-name]
Examples
Create a data asset from a YAML specification file in a workspace
az ml data create --file data.yml --resource-group my-resource-group --workspace-name my-workspace
Create a data asset from a YAML specification file in a registry
az ml data create --file data.yml --registry-name my-registry-name
Create a data asset without using a YAML specification file in a workspace
az ml data create --name my-data --version 1 --path ./my-data.csv --resource-group my-resource-group --workspace-name my-workspace
Create a data asset without using a YAML specification file in a registry
az ml data create --name my-data --version 1 --path ./my-data.csv --registry-name my-registry-name
Optional Parameters
The datastore to upload the local artifact to.
Description of the data asset.
Local path to the YAML file containing the Azure ML data specification. The YAML reference docs for data can be found at: https://aka.ms/ml-cli-v2-data-yaml-reference.
Name of the data asset. Required if --registry-name is provided.
Do not wait for the long-running-operation to finish. Default is False.
Path to the data asset, can be local or remote.
If provided, the command will target the registry instead of a workspace. Hence resource group and workspace won't be required. Must be provided if --workspace-name and --resource-group are not provided.
Name of resource group. You can configure the default group using az configure --defaults group=<name>
.
Update an object by specifying a property path and value to set. Example: --set property1.property2=value.
Skip validation of MLTable metadata when type is MLTable.
Type of the data asset.
Version of the data asset. Required if --registry-name is provided.
Name of the Azure ML workspace. You can configure the default workspace using az configure --defaults workspace=<name>
.
Global Parameters
Increase logging verbosity to show all debug logs.
Show this help message and exit.
Only show errors, suppressing warnings.
Output format.
JMESPath query string. See http://jmespath.org/ for more information and examples.
Name or ID of subscription. You can configure the default subscription using az account set -s NAME_OR_ID
.
Increase logging verbosity. Use --debug for full debug logs.
az ml data import
This command is in preview and under development. Reference and support levels: https://aka.ms/CLI_refstatus
Import data and create a data asset.
Data asset can be created by first importing data from database or filesystem to cloud storage. The created data asset will be tracked in the workspace under the specified name and version.
Specify 'query' of database table or 'path' on filesystem fields in your YAML config. Azure ML will run a job to copy the data to cloud storage first .
az ml data import --resource-group
--workspace-name
[--datastore]
[--description]
[--file]
[--name]
[--path]
[--set]
[--skip-validation]
[--type {mltable, uri_file, uri_folder}]
[--version]
Examples
Import a data asset from a YAML specification file
az ml data import --file dataimport.yml --resource-group my-resource-group --workspace-name my-workspace
Required Parameters
Name of resource group. You can configure the default group using az configure --defaults group=<name>
.
Name of the Azure ML workspace. You can configure the default workspace using az configure --defaults workspace=<name>
.
Optional Parameters
The datastore to upload the local artifact to.
Description of the data asset.
Local path to the YAML file containing the Azure ML data specification. The YAML reference docs for data can be found at: https://aka.ms/ml-cli-v2-data-yaml-reference.
Name of the data asset.
Path to the data asset on cloud storage.
Update an object by specifying a property path and value to set. Example: --set property1.property2=value.
Skip validation of compute resource referenced by underlying data import materialization job.
Type of the data asset.
Version of the data asset.
Global Parameters
Increase logging verbosity to show all debug logs.
Show this help message and exit.
Only show errors, suppressing warnings.
Output format.
JMESPath query string. See http://jmespath.org/ for more information and examples.
Name or ID of subscription. You can configure the default subscription using az account set -s NAME_OR_ID
.
Increase logging verbosity. Use --debug for full debug logs.
az ml data list
List data assets in a workspace/registry. If you are using a registry, replace --workspace-name my-workspace
with the --registry-name <registry-name>
option.
az ml data list [--archived-only]
[--include-archived]
[--max-results]
[--name]
[--registry-name]
[--resource-group]
[--workspace-name]
Examples
List all the data assets in a workspace
az ml data list --resource-group my-resource-group --workspace-name my-workspace
List all the data asset versions for the specified name in a workspace
az ml data list --name my-data --resource-group my-resource-group --workspace-name my-workspace
List all the data assets in a workspace using --query argument to execute a JMESPath query on the results of commands.
az ml data list --query "[].{Name:name}" --output table --resource-group my-resource-group --workspace-name my-workspace
List all the data assets in a registry
az ml data list --registry-name my-registry-name
List all the data asset versions for the specified name in a registry
az ml data list --name my-data --registry-name my-registry-name
Optional Parameters
List archived data assets only.
List archived data assets and active data assets.
Max number of results to return.
Name of the data asset. If provided, all the data versions under this name will be returned.
If provided, the command will target the registry instead of a workspace. Hence resource group and workspace won't be required. Must be provided if --workspace-name and --resource-group are not provided.
Name of resource group. You can configure the default group using az configure --defaults group=<name>
.
Name of the Azure ML workspace. You can configure the default workspace using az configure --defaults workspace=<name>
.
Global Parameters
Increase logging verbosity to show all debug logs.
Show this help message and exit.
Only show errors, suppressing warnings.
Output format.
JMESPath query string. See http://jmespath.org/ for more information and examples.
Name or ID of subscription. You can configure the default subscription using az account set -s NAME_OR_ID
.
Increase logging verbosity. Use --debug for full debug logs.
az ml data list-materialization-status
This command is in preview and under development. Reference and support levels: https://aka.ms/CLI_refstatus
Show status of list of data import materialization jobs that create versions of a data asset.
az ml data list-materialization-status --resource-group
--workspace-name
[--all-results {false, true}]
[--archived-only]
[--include-archived]
[--max-results]
[--name]
Examples
Show materialization status of a data asset from a YAML specification file
az ml data list-materialization-status --name asset-name --resource-group my-resource-group --workspace-name my-workspace
Required Parameters
Name of resource group. You can configure the default group using az configure --defaults group=<name>
.
Name of the Azure ML workspace. You can configure the default workspace using az configure --defaults workspace=<name>
.
Optional Parameters
Returns all results.
List archived jobs only.
List archived jobs and active jobs.
Max number of results to return. Default is 50.
Name of the asset. Will list all materialization jobs that create versions of the asset matching the given name.
Global Parameters
Increase logging verbosity to show all debug logs.
Show this help message and exit.
Only show errors, suppressing warnings.
Output format.
JMESPath query string. See http://jmespath.org/ for more information and examples.
Name or ID of subscription. You can configure the default subscription using az account set -s NAME_OR_ID
.
Increase logging verbosity. Use --debug for full debug logs.
az ml data mount
This command is in preview and under development. Reference and support levels: https://aka.ms/CLI_refstatus
Mount a specific data asset to a local path. For now only Linux is supported.
az ml data mount --path
[--mode]
[--mount-point]
[--persistent]
[--resource-group]
[--workspace-name]
Examples
Mount a data asset version with Named Asset URI
az ml data mount --mount-point /mnt/my-data --mode ro_mount --path azureml:my_urifolder:1
Mount a data asset version with AzureML full URI
az ml data mount --mount-point /mnt/my-data --mode ro_mount --path azureml://subscriptions/my-sub-id/resourcegroups/my-rg/workspaces/myworkspace/data/some_data/versions/5
Mount all versions of a data asset with Named Asset URI
az ml data mount --mount-point /mnt/my-data --mode ro_mount --path azureml:my_urifolder
Mount all versions of a data asset with AzureML full URI
az ml data mount --mount-point /mnt/my-data --mode ro_mount --path azureml://subscriptions/my-sub-id/resourcegroups/my-rg/workspaces/myworkspace/data/some_data
Mount data on public HTTP(s) server by URL
az ml data mount --mount-point /mnt/my-data --mode ro_mount --path https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv
Mount data on Azure by Azure Blob Storage URL
az ml data mount --mount-point /mnt/my-data --mode ro_mount --path https://<account_name>.blob.core.windows.net/<container_name>/<path>
Mount data on Azure by Azure Data Lake Storage Gen 2 URL
az ml data mount --mount-point /mnt/my-data --mode ro_mount --path abfss://<file_system>@<account_name>.dfs.core.windows.net/<path>
Required Parameters
The data asset path to mount, in the form of azureml:<name>
or azureml:<name>:<version>
.
Optional Parameters
Mount mode. Only ro_mount
(read-only) is supported for data asset mount.
A local path used as mount point.
Make mount persist across reboots. Supported only on Compute Instance.
Name of resource group. You can configure the default group using az configure --defaults group=<name>
.
Name of the Azure ML workspace. You can configure the default workspace using az configure --defaults workspace=<name>
.
Global Parameters
Increase logging verbosity to show all debug logs.
Show this help message and exit.
Only show errors, suppressing warnings.
Output format.
JMESPath query string. See http://jmespath.org/ for more information and examples.
Name or ID of subscription. You can configure the default subscription using az account set -s NAME_OR_ID
.
Increase logging verbosity. Use --debug for full debug logs.
az ml data restore
Restore an archived data asset.
When an archived data asset is restored, it will no longer be hidden from list queries (az ml data list
). If an entire data asset container is archived, you can restore that archived container. This will restore all versions of the data asset under that given name. You cannot restore only a specific data asset version if the entire data asset container is archived - you will need to restore the entire container. If only an individual data asset version was archived, you can restore that specific version.
az ml data restore --name
[--label]
[--resource-group]
[--version]
[--workspace-name]
Examples
Restore an archived data asset container (restores all versions of that data asset)
az ml data restore --name my-env --resource-group my-resource-group --workspace-name my-workspace
Restore a specific archived data asset version
az ml data restore --name my-env --version 1 --resource-group my-resource-group --workspace-name my-workspace
Required Parameters
Name of the data asset.
Optional Parameters
Label of the data asset. Mutually exclusive with version.
Name of resource group. You can configure the default group using az configure --defaults group=<name>
.
Version of the data asset. Mutually exclusive with label.
Name of the Azure ML workspace. You can configure the default workspace using az configure --defaults workspace=<name>
.
Global Parameters
Increase logging verbosity to show all debug logs.
Show this help message and exit.
Only show errors, suppressing warnings.
Output format.
JMESPath query string. See http://jmespath.org/ for more information and examples.
Name or ID of subscription. You can configure the default subscription using az account set -s NAME_OR_ID
.
Increase logging verbosity. Use --debug for full debug logs.
az ml data share
This command is in preview and under development. Reference and support levels: https://aka.ms/CLI_refstatus
Share a specific data asset from workspace to registry.
Copy an existing data asset from a workspace to a registry for cross-workspace reuse.
az ml data share --name
--registry-name
--resource-group
--share-with-name
--share-with-version
--version
--workspace-name
Examples
Share an existing data asset from workspace to registry
az ml data share --name my-data --version my-version --resource-group my-resource-group --workspace-name my-workspace --share-with-name new-name-in-registry --share-with-version new-version-in-registry --registry-name my-registry
Required Parameters
Name of the data asset.
Destination registry.
Name of resource group. You can configure the default group using az configure --defaults group=<name>
.
Name of the data asset to be created with.
Version of the data asset to be created with.
Version of the data asset.
Name of the Azure ML workspace. You can configure the default workspace using az configure --defaults workspace=<name>
.
Global Parameters
Increase logging verbosity to show all debug logs.
Show this help message and exit.
Only show errors, suppressing warnings.
Output format.
JMESPath query string. See http://jmespath.org/ for more information and examples.
Name or ID of subscription. You can configure the default subscription using az account set -s NAME_OR_ID
.
Increase logging verbosity. Use --debug for full debug logs.
az ml data show
Shows details for a data asset in a workspace/registry. If you are using a registry, replace --workspace-name my-workspace
with the --registry-name <registry-name>
option.
az ml data show --name
[--label]
[--registry-name]
[--resource-group]
[--version]
[--workspace-name]
Examples
Show details for a data asset with the specified name and version in a workspace
az ml data show --name my-data --version 1 --resource-group my-resource-group --workspace-name my-workspace
Show details for a data asset with the specified name and label
az ml data show --name my-data --label latest --resource-group my-resource-group --workspace-name my-workspace
Show details for a data asset with the specified name and version in a registry
az ml data show --name my-data --version 1 --registry-name my-registry-name
Required Parameters
Name of the data asset.
Optional Parameters
Label of the data asset. Must be provided, if version is not provided. Mutually exclusive with version.
If provided, the command will target the registry instead of a workspace. Hence resource group and workspace won't be required. Must be provided if --workspace-name and --resource-group are not provided.
Name of resource group. You can configure the default group using az configure --defaults group=<name>
.
Version of the data asset. Must be provided, if label is not provided. Mutually exclusive with label.
Name of the Azure ML workspace. You can configure the default workspace using az configure --defaults workspace=<name>
.
Global Parameters
Increase logging verbosity to show all debug logs.
Show this help message and exit.
Only show errors, suppressing warnings.
Output format.
JMESPath query string. See http://jmespath.org/ for more information and examples.
Name or ID of subscription. You can configure the default subscription using az account set -s NAME_OR_ID
.
Increase logging verbosity. Use --debug for full debug logs.
az ml data update
Update a data asset.
Only the 'description' and 'tags' properties can be updated.
az ml data update --name
--resource-group
--workspace-name
[--add]
[--force-string]
[--label]
[--registry-name]
[--remove]
[--set]
[--version]
Required Parameters
Name of the data asset.
Name of resource group. You can configure the default group using az configure --defaults group=<name>
.
Name of the Azure ML workspace. You can configure the default workspace using az configure --defaults workspace=<name>
.
Optional Parameters
Add an object to a list of objects by specifying a path and key value pairs. Example: --add property.listProperty <key=value, string or JSON string>
.
When using 'set' or 'add', preserve string literals instead of attempting to convert to JSON.
Label of the data asset. Must be provided, if version is not provided. Mutually exclusive with version.
If provided, the command will target the registry instead of a workspace. Hence resource group and workspace won't be required. Must be provided if --workspace-name and --resource-group are not provided.
Remove a property or an element from a list. Example: --remove property.list <indexToRemove>
OR --remove propertyToRemove
.
Update an object by specifying a property path and value to set. Example: --set property1.property2=<value>
.
Version of the data asset. Must be provided, if label is not provided. Mutually exclusive with label.
Global Parameters
Increase logging verbosity to show all debug logs.
Show this help message and exit.
Only show errors, suppressing warnings.
Output format.
JMESPath query string. See http://jmespath.org/ for more information and examples.
Name or ID of subscription. You can configure the default subscription using az account set -s NAME_OR_ID
.
Increase logging verbosity. Use --debug for full debug logs.
Azure CLI