Share via


az ml data

Note

This reference is part of the ml extension for the Azure CLI (version 2.15.0 or higher). The extension will automatically install the first time you run an az ml data command. Learn more about extensions.

Manage Azure ML data assets.

Azure ML data assets are references to file(s) in your storage services or public URLs along with any corresponding metadata. They are not copies of your data. You can use these data assets to access relevant data during model training and mount or download the referenced data to your compute target.

Commands

Name Description Type Status
az ml data archive

Archive a data asset.

Extension GA
az ml data create

Create a data asset in a workspace/registry. If you are using a registry, replace --workspace-name my-workspace with the --registry-name <registry-name> option.

Extension GA
az ml data import

Import data and create a data asset.

Extension Preview
az ml data list

List data assets in a workspace/registry. If you are using a registry, replace --workspace-name my-workspace with the --registry-name <registry-name> option.

Extension GA
az ml data list-materialization-status

Show status of list of data import materialization jobs that create versions of a data asset.

Extension Preview
az ml data mount

Mount a specific data asset to a local path. For now only Linux is supported.

Extension Preview
az ml data restore

Restore an archived data asset.

Extension GA
az ml data share

Share a specific data asset from workspace to registry.

Extension Preview
az ml data show

Shows details for a data asset in a workspace/registry. If you are using a registry, replace --workspace-name my-workspace with the --registry-name <registry-name> option.

Extension GA
az ml data update

Update a data asset.

Extension GA

az ml data archive

Archive a data asset.

Archiving a data asset will hide it by default from list queries (az ml data list). You can still continue to reference and use an archived data asset in your workflows. You can archive either a data asset container or a specific data asset version. Archiving a data asset container will archive all versions of the data asset under that given name. You can restore an archived data asset using az ml data restore. If the entire data asset container is archived, you cannot restore individual versions of the data asset - you will need to restore the data asset container.

az ml data archive --name
                   [--label]
                   [--resource-group]
                   [--version]
                   [--workspace-name]

Examples

Archive an data asset container (archives all versions of that data asset)

az ml data archive --name my-env --resource-group my-resource-group --workspace-name my-workspace

Archive a specific data asset version

az ml data archive --name my-env --version 1 --resource-group my-resource-group --workspace-name my-workspace

Required Parameters

--name -n

Name of the data asset.

Optional Parameters

--label -l

Label of the data asset. Mutually exclusive with version.

--resource-group -g

Name of resource group. You can configure the default group using az configure --defaults group=<name>.

--version -v

Version of the data asset. Mutually exclusive with label.

--workspace-name -w

Name of the Azure ML workspace. You can configure the default workspace using az configure --defaults workspace=<name>.

Global Parameters
--debug

Increase logging verbosity to show all debug logs.

--help -h

Show this help message and exit.

--only-show-errors

Only show errors, suppressing warnings.

--output -o

Output format.

Accepted values: json, jsonc, none, table, tsv, yaml, yamlc
Default value: json
--query

JMESPath query string. See http://jmespath.org/ for more information and examples.

--subscription

Name or ID of subscription. You can configure the default subscription using az account set -s NAME_OR_ID.

--verbose

Increase logging verbosity. Use --debug for full debug logs.

az ml data create

Create a data asset in a workspace/registry. If you are using a registry, replace --workspace-name my-workspace with the --registry-name <registry-name> option.

Data assets can be defined from files on your local machine or as references to files in cloud storage. The created data asset will be tracked in the workspace/registry under the specified name and version.

To create a data asset from file(s) on your local machine, specify the 'path' field in your YAML config. Azure ML will upload these file(s) to the blob container that backs the workspace's default datastore (named 'workspaceblobstore'). The created data asset will then point to that uploaded data.

To create a data asset that references file(s) in cloud storage, specify the 'path' to the file(s) in storage in your YAML config.

You can also create a data asset directly from a storage URL or public URL. To do so, specify the URL to the 'path' field in your YAML config. If you are using a registry, replace --workspace-name my-workspace with the --registry-name <my-registry-name> option.

az ml data create [--datastore]
                  [--description]
                  [--file]
                  [--name]
                  [--no-wait]
                  [--path]
                  [--registry-name]
                  [--resource-group]
                  [--set]
                  [--skip-validation]
                  [--type {mltable, uri_file, uri_folder}]
                  [--version]
                  [--workspace-name]

Examples

Create a data asset from a YAML specification file in a workspace

az ml data create --file data.yml --resource-group my-resource-group --workspace-name my-workspace

Create a data asset from a YAML specification file in a registry

az ml data create --file data.yml --registry-name my-registry-name

Create a data asset without using a YAML specification file in a workspace

az ml data create --name my-data --version 1 --path ./my-data.csv --resource-group my-resource-group --workspace-name my-workspace

Create a data asset without using a YAML specification file in a registry

az ml data create --name my-data --version 1 --path ./my-data.csv --registry-name my-registry-name

Optional Parameters

--datastore

The datastore to upload the local artifact to.

--description -d

Description of the data asset.

--file -f

Local path to the YAML file containing the Azure ML data specification. The YAML reference docs for data can be found at: https://aka.ms/ml-cli-v2-data-yaml-reference.

--name -n

Name of the data asset. Required if --registry-name is provided.

--no-wait

Do not wait for the long-running-operation to finish. Default is False.

Default value: False
--path -p

Path to the data asset, can be local or remote.

--registry-name

If provided, the command will target the registry instead of a workspace. Hence resource group and workspace won't be required. Must be provided if --workspace-name and --resource-group are not provided.

--resource-group -g

Name of resource group. You can configure the default group using az configure --defaults group=<name>.

--set

Update an object by specifying a property path and value to set. Example: --set property1.property2=value.

--skip-validation

Skip validation of MLTable metadata when type is MLTable.

Default value: False
--type -t

Type of the data asset.

Accepted values: mltable, uri_file, uri_folder
--version -v

Version of the data asset. Required if --registry-name is provided.

--workspace-name -w

Name of the Azure ML workspace. You can configure the default workspace using az configure --defaults workspace=<name>.

Global Parameters
--debug

Increase logging verbosity to show all debug logs.

--help -h

Show this help message and exit.

--only-show-errors

Only show errors, suppressing warnings.

--output -o

Output format.

Accepted values: json, jsonc, none, table, tsv, yaml, yamlc
Default value: json
--query

JMESPath query string. See http://jmespath.org/ for more information and examples.

--subscription

Name or ID of subscription. You can configure the default subscription using az account set -s NAME_OR_ID.

--verbose

Increase logging verbosity. Use --debug for full debug logs.

az ml data import

Preview

This command is in preview and under development. Reference and support levels: https://aka.ms/CLI_refstatus

Import data and create a data asset.

Data asset can be created by first importing data from database or filesystem to cloud storage. The created data asset will be tracked in the workspace under the specified name and version.

Specify 'query' of database table or 'path' on filesystem fields in your YAML config. Azure ML will run a job to copy the data to cloud storage first .

az ml data import --resource-group
                  --workspace-name
                  [--datastore]
                  [--description]
                  [--file]
                  [--name]
                  [--path]
                  [--set]
                  [--skip-validation]
                  [--type {mltable, uri_file, uri_folder}]
                  [--version]

Examples

Import a data asset from a YAML specification file

az ml data import --file dataimport.yml --resource-group my-resource-group --workspace-name my-workspace

Required Parameters

--resource-group -g

Name of resource group. You can configure the default group using az configure --defaults group=<name>.

--workspace-name -w

Name of the Azure ML workspace. You can configure the default workspace using az configure --defaults workspace=<name>.

Optional Parameters

--datastore

The datastore to upload the local artifact to.

--description -d

Description of the data asset.

--file -f

Local path to the YAML file containing the Azure ML data specification. The YAML reference docs for data can be found at: https://aka.ms/ml-cli-v2-data-yaml-reference.

--name -n

Name of the data asset.

--path -p

Path to the data asset on cloud storage.

--set

Update an object by specifying a property path and value to set. Example: --set property1.property2=value.

--skip-validation

Skip validation of compute resource referenced by underlying data import materialization job.

Default value: False
--type -t

Type of the data asset.

Accepted values: mltable, uri_file, uri_folder
--version -v

Version of the data asset.

Global Parameters
--debug

Increase logging verbosity to show all debug logs.

--help -h

Show this help message and exit.

--only-show-errors

Only show errors, suppressing warnings.

--output -o

Output format.

Accepted values: json, jsonc, none, table, tsv, yaml, yamlc
Default value: json
--query

JMESPath query string. See http://jmespath.org/ for more information and examples.

--subscription

Name or ID of subscription. You can configure the default subscription using az account set -s NAME_OR_ID.

--verbose

Increase logging verbosity. Use --debug for full debug logs.

az ml data list

List data assets in a workspace/registry. If you are using a registry, replace --workspace-name my-workspace with the --registry-name <registry-name> option.

az ml data list [--archived-only]
                [--include-archived]
                [--max-results]
                [--name]
                [--registry-name]
                [--resource-group]
                [--workspace-name]

Examples

List all the data assets in a workspace

az ml data list --resource-group my-resource-group --workspace-name my-workspace

List all the data asset versions for the specified name in a workspace

az ml data list --name my-data --resource-group my-resource-group --workspace-name my-workspace

List all the data assets in a workspace using --query argument to execute a JMESPath query on the results of commands.

az ml data list --query "[].{Name:name}" --output table --resource-group my-resource-group --workspace-name my-workspace

List all the data assets in a registry

az ml data list --registry-name my-registry-name

List all the data asset versions for the specified name in a registry

az ml data list --name my-data --registry-name my-registry-name

Optional Parameters

--archived-only

List archived data assets only.

Default value: False
--include-archived

List archived data assets and active data assets.

Default value: False
--max-results -r

Max number of results to return.

--name -n

Name of the data asset. If provided, all the data versions under this name will be returned.

--registry-name

If provided, the command will target the registry instead of a workspace. Hence resource group and workspace won't be required. Must be provided if --workspace-name and --resource-group are not provided.

--resource-group -g

Name of resource group. You can configure the default group using az configure --defaults group=<name>.

--workspace-name -w

Name of the Azure ML workspace. You can configure the default workspace using az configure --defaults workspace=<name>.

Global Parameters
--debug

Increase logging verbosity to show all debug logs.

--help -h

Show this help message and exit.

--only-show-errors

Only show errors, suppressing warnings.

--output -o

Output format.

Accepted values: json, jsonc, none, table, tsv, yaml, yamlc
Default value: json
--query

JMESPath query string. See http://jmespath.org/ for more information and examples.

--subscription

Name or ID of subscription. You can configure the default subscription using az account set -s NAME_OR_ID.

--verbose

Increase logging verbosity. Use --debug for full debug logs.

az ml data list-materialization-status

Preview

This command is in preview and under development. Reference and support levels: https://aka.ms/CLI_refstatus

Show status of list of data import materialization jobs that create versions of a data asset.

az ml data list-materialization-status --resource-group
                                       --workspace-name
                                       [--all-results {false, true}]
                                       [--archived-only]
                                       [--include-archived]
                                       [--max-results]
                                       [--name]

Examples

Show materialization status of a data asset from a YAML specification file

az ml data list-materialization-status --name asset-name --resource-group my-resource-group --workspace-name my-workspace

Required Parameters

--resource-group -g

Name of resource group. You can configure the default group using az configure --defaults group=<name>.

--workspace-name -w

Name of the Azure ML workspace. You can configure the default workspace using az configure --defaults workspace=<name>.

Optional Parameters

--all-results

Returns all results.

Accepted values: false, true
Default value: False
--archived-only

List archived jobs only.

Default value: False
--include-archived

List archived jobs and active jobs.

Default value: False
--max-results -r

Max number of results to return. Default is 50.

Default value: 50
--name -p

Name of the asset. Will list all materialization jobs that create versions of the asset matching the given name.

Global Parameters
--debug

Increase logging verbosity to show all debug logs.

--help -h

Show this help message and exit.

--only-show-errors

Only show errors, suppressing warnings.

--output -o

Output format.

Accepted values: json, jsonc, none, table, tsv, yaml, yamlc
Default value: json
--query

JMESPath query string. See http://jmespath.org/ for more information and examples.

--subscription

Name or ID of subscription. You can configure the default subscription using az account set -s NAME_OR_ID.

--verbose

Increase logging verbosity. Use --debug for full debug logs.

az ml data mount

Preview

This command is in preview and under development. Reference and support levels: https://aka.ms/CLI_refstatus

Mount a specific data asset to a local path. For now only Linux is supported.

az ml data mount --path
                 [--mode]
                 [--mount-point]
                 [--persistent]
                 [--resource-group]
                 [--workspace-name]

Examples

Mount a data asset version with Named Asset URI

az ml data mount --mount-point /mnt/my-data --mode ro_mount --path azureml:my_urifolder:1

Mount a data asset version with AzureML full URI

az ml data mount --mount-point /mnt/my-data --mode ro_mount --path azureml://subscriptions/my-sub-id/resourcegroups/my-rg/workspaces/myworkspace/data/some_data/versions/5

Mount all versions of a data asset with Named Asset URI

az ml data mount --mount-point /mnt/my-data --mode ro_mount --path azureml:my_urifolder

Mount all versions of a data asset with AzureML full URI

az ml data mount --mount-point /mnt/my-data --mode ro_mount --path azureml://subscriptions/my-sub-id/resourcegroups/my-rg/workspaces/myworkspace/data/some_data

Mount data on public HTTP(s) server by URL

az ml data mount --mount-point /mnt/my-data --mode ro_mount --path https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv

Mount data on Azure by Azure Blob Storage URL

az ml data mount --mount-point /mnt/my-data --mode ro_mount --path https://<account_name>.blob.core.windows.net/<container_name>/<path>

Mount data on Azure by Azure Data Lake Storage Gen 2 URL

az ml data mount --mount-point /mnt/my-data --mode ro_mount --path abfss://<file_system>@<account_name>.dfs.core.windows.net/<path>

Required Parameters

--path

The data asset path to mount, in the form of azureml:<name> or azureml:<name>:<version>.

Optional Parameters

--mode

Mount mode. Only ro_mount (read-only) is supported for data asset mount.

Default value: ro_mount
--mount-point

A local path used as mount point.

Default value: /home/azureuser/mount/data
--persistent

Make mount persist across reboots. Supported only on Compute Instance.

Default value: False
--resource-group -g

Name of resource group. You can configure the default group using az configure --defaults group=<name>.

--workspace-name -w

Name of the Azure ML workspace. You can configure the default workspace using az configure --defaults workspace=<name>.

Global Parameters
--debug

Increase logging verbosity to show all debug logs.

--help -h

Show this help message and exit.

--only-show-errors

Only show errors, suppressing warnings.

--output -o

Output format.

Accepted values: json, jsonc, none, table, tsv, yaml, yamlc
Default value: json
--query

JMESPath query string. See http://jmespath.org/ for more information and examples.

--subscription

Name or ID of subscription. You can configure the default subscription using az account set -s NAME_OR_ID.

--verbose

Increase logging verbosity. Use --debug for full debug logs.

az ml data restore

Restore an archived data asset.

When an archived data asset is restored, it will no longer be hidden from list queries (az ml data list). If an entire data asset container is archived, you can restore that archived container. This will restore all versions of the data asset under that given name. You cannot restore only a specific data asset version if the entire data asset container is archived - you will need to restore the entire container. If only an individual data asset version was archived, you can restore that specific version.

az ml data restore --name
                   [--label]
                   [--resource-group]
                   [--version]
                   [--workspace-name]

Examples

Restore an archived data asset container (restores all versions of that data asset)

az ml data restore --name my-env --resource-group my-resource-group --workspace-name my-workspace

Restore a specific archived data asset version

az ml data restore --name my-env --version 1 --resource-group my-resource-group --workspace-name my-workspace

Required Parameters

--name -n

Name of the data asset.

Optional Parameters

--label -l

Label of the data asset. Mutually exclusive with version.

--resource-group -g

Name of resource group. You can configure the default group using az configure --defaults group=<name>.

--version -v

Version of the data asset. Mutually exclusive with label.

--workspace-name -w

Name of the Azure ML workspace. You can configure the default workspace using az configure --defaults workspace=<name>.

Global Parameters
--debug

Increase logging verbosity to show all debug logs.

--help -h

Show this help message and exit.

--only-show-errors

Only show errors, suppressing warnings.

--output -o

Output format.

Accepted values: json, jsonc, none, table, tsv, yaml, yamlc
Default value: json
--query

JMESPath query string. See http://jmespath.org/ for more information and examples.

--subscription

Name or ID of subscription. You can configure the default subscription using az account set -s NAME_OR_ID.

--verbose

Increase logging verbosity. Use --debug for full debug logs.

az ml data share

Preview

This command is in preview and under development. Reference and support levels: https://aka.ms/CLI_refstatus

Share a specific data asset from workspace to registry.

Copy an existing data asset from a workspace to a registry for cross-workspace reuse.

az ml data share --name
                 --registry-name
                 --resource-group
                 --share-with-name
                 --share-with-version
                 --version
                 --workspace-name

Examples

Share an existing data asset from workspace to registry

az ml data share --name my-data --version my-version --resource-group my-resource-group --workspace-name my-workspace --share-with-name new-name-in-registry --share-with-version new-version-in-registry --registry-name my-registry

Required Parameters

--name -n

Name of the data asset.

--registry-name

Destination registry.

--resource-group -g

Name of resource group. You can configure the default group using az configure --defaults group=<name>.

--share-with-name

Name of the data asset to be created with.

--share-with-version

Version of the data asset to be created with.

--version -v

Version of the data asset.

--workspace-name -w

Name of the Azure ML workspace. You can configure the default workspace using az configure --defaults workspace=<name>.

Global Parameters
--debug

Increase logging verbosity to show all debug logs.

--help -h

Show this help message and exit.

--only-show-errors

Only show errors, suppressing warnings.

--output -o

Output format.

Accepted values: json, jsonc, none, table, tsv, yaml, yamlc
Default value: json
--query

JMESPath query string. See http://jmespath.org/ for more information and examples.

--subscription

Name or ID of subscription. You can configure the default subscription using az account set -s NAME_OR_ID.

--verbose

Increase logging verbosity. Use --debug for full debug logs.

az ml data show

Shows details for a data asset in a workspace/registry. If you are using a registry, replace --workspace-name my-workspace with the --registry-name <registry-name> option.

az ml data show --name
                [--label]
                [--registry-name]
                [--resource-group]
                [--version]
                [--workspace-name]

Examples

Show details for a data asset with the specified name and version in a workspace

az ml data show --name my-data --version 1 --resource-group my-resource-group --workspace-name my-workspace

Show details for a data asset with the specified name and label

az ml data show --name my-data --label latest --resource-group my-resource-group --workspace-name my-workspace

Show details for a data asset with the specified name and version in a registry

az ml data show --name my-data --version 1 --registry-name my-registry-name

Required Parameters

--name -n

Name of the data asset.

Optional Parameters

--label -l

Label of the data asset. Must be provided, if version is not provided. Mutually exclusive with version.

--registry-name

If provided, the command will target the registry instead of a workspace. Hence resource group and workspace won't be required. Must be provided if --workspace-name and --resource-group are not provided.

--resource-group -g

Name of resource group. You can configure the default group using az configure --defaults group=<name>.

--version -v

Version of the data asset. Must be provided, if label is not provided. Mutually exclusive with label.

--workspace-name -w

Name of the Azure ML workspace. You can configure the default workspace using az configure --defaults workspace=<name>.

Global Parameters
--debug

Increase logging verbosity to show all debug logs.

--help -h

Show this help message and exit.

--only-show-errors

Only show errors, suppressing warnings.

--output -o

Output format.

Accepted values: json, jsonc, none, table, tsv, yaml, yamlc
Default value: json
--query

JMESPath query string. See http://jmespath.org/ for more information and examples.

--subscription

Name or ID of subscription. You can configure the default subscription using az account set -s NAME_OR_ID.

--verbose

Increase logging verbosity. Use --debug for full debug logs.

az ml data update

Update a data asset.

Only the 'description' and 'tags' properties can be updated.

az ml data update --name
                  --resource-group
                  --workspace-name
                  [--add]
                  [--force-string]
                  [--label]
                  [--registry-name]
                  [--remove]
                  [--set]
                  [--version]

Required Parameters

--name -n

Name of the data asset.

--resource-group -g

Name of resource group. You can configure the default group using az configure --defaults group=<name>.

--workspace-name -w

Name of the Azure ML workspace. You can configure the default workspace using az configure --defaults workspace=<name>.

Optional Parameters

--add

Add an object to a list of objects by specifying a path and key value pairs. Example: --add property.listProperty <key=value, string or JSON string>.

Default value: []
--force-string

When using 'set' or 'add', preserve string literals instead of attempting to convert to JSON.

Default value: False
--label -l

Label of the data asset. Must be provided, if version is not provided. Mutually exclusive with version.

--registry-name

If provided, the command will target the registry instead of a workspace. Hence resource group and workspace won't be required. Must be provided if --workspace-name and --resource-group are not provided.

--remove

Remove a property or an element from a list. Example: --remove property.list <indexToRemove> OR --remove propertyToRemove.

Default value: []
--set

Update an object by specifying a property path and value to set. Example: --set property1.property2=<value>.

Default value: []
--version -v

Version of the data asset. Must be provided, if label is not provided. Mutually exclusive with label.

Global Parameters
--debug

Increase logging verbosity to show all debug logs.

--help -h

Show this help message and exit.

--only-show-errors

Only show errors, suppressing warnings.

--output -o

Output format.

Accepted values: json, jsonc, none, table, tsv, yaml, yamlc
Default value: json
--query

JMESPath query string. See http://jmespath.org/ for more information and examples.

--subscription

Name or ID of subscription. You can configure the default subscription using az account set -s NAME_OR_ID.

--verbose

Increase logging verbosity. Use --debug for full debug logs.