Supported metrics for Microsoft.MachineLearningServices/workspaces

The following table lists the metrics available for the Microsoft.MachineLearningServices/workspaces resource type.

Table headings

Metric - The metric display name as it appears in the Azure portal.
Name in Rest API - Metric name as referred to in the REST API.
Unit - Unit of measure.
Aggregation - The default aggregation type. Valid values: Average, Minimum, Maximum, Total, Count.
Dimensions - Dimensions available for the metric.
Time Grains - Intervals at which the metric is sampled. For example, PT1M indicates that the metric is sampled every minute, PT30M every 30 minutes, PT1H every hour, and so on.
DS Export- Whether the metric is exportable to Azure Monitor Logs via Diagnostic Settings.

For information on exporting metrics, see - Metrics export using data collection rules and Create diagnostic settings in Azure Monitor.

For information on metric retention, see Azure Monitor Metrics overview.

For a list of supported logs, see Supported log categories - Microsoft.MachineLearningServices/workspaces

Category Metric Name in REST API Unit Aggregation Dimensions Time Grains DS Export
Quota Active Cores

Number of active cores
Active Cores Count Average, Maximum, Minimum, Total (Sum) Scenario, ClusterName PT1M Yes
Quota Active Nodes

Number of Acitve nodes. These are the nodes which are actively running a job.
Active Nodes Count Average, Maximum, Minimum, Total (Sum) Scenario, ClusterName PT1M Yes
Run Cancel Requested Runs

Number of runs where cancel was requested for this workspace. Count is updated when cancellation request has been received for a run.
Cancel Requested Runs Count Total (Sum), Average, Minimum, Maximum, Count Scenario, RunType, PublishedPipelineId, ComputeType, PipelineStepType, ExperimentName PT1M Yes
Run Cancelled Runs

Number of runs cancelled for this workspace. Count is updated when a run is successfully cancelled.
Cancelled Runs Count Total (Sum), Average, Minimum, Maximum, Count Scenario, RunType, PublishedPipelineId, ComputeType, PipelineStepType, ExperimentName PT1M Yes
Run Completed Runs

Number of runs completed successfully for this workspace. Count is updated when a run has completed and output has been collected.
Completed Runs Count Total (Sum), Average, Minimum, Maximum, Count Scenario, RunType, PublishedPipelineId, ComputeType, PipelineStepType, ExperimentName PT1M Yes
Resource CpuCapacityMillicores

Maximum capacity of a CPU node in millicores. Capacity is aggregated in one minute intervals.
CpuCapacityMillicores Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, ComputeName PT1M Yes
Resource CpuMemoryCapacityMegabytes

Maximum memory utilization of a CPU node in megabytes. Utilization is aggregated in one minute intervals.
CpuMemoryCapacityMegabytes Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, ComputeName PT1M Yes
Resource CpuMemoryUtilizationMegabytes

Memory utilization of a CPU node in megabytes. Utilization is aggregated in one minute intervals.
CpuMemoryUtilizationMegabytes Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, ComputeName PT1M Yes
Resource CpuMemoryUtilizationPercentage

Memory utilization percentage of a CPU node. Utilization is aggregated in one minute intervals.
CpuMemoryUtilizationPercentage Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, ComputeName PT1M Yes
Resource CpuUtilization

Percentage of utilization on a CPU node. Utilization is reported at one minute intervals.
CpuUtilization Count Average, Maximum, Minimum, Total (Sum) Scenario, runId, NodeId, ClusterName PT1M Yes
Resource CpuUtilizationMillicores

Utilization of a CPU node in millicores. Utilization is aggregated in one minute intervals.
CpuUtilizationMillicores Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, ComputeName PT1M Yes
Resource CpuUtilizationPercentage

Utilization percentage of a CPU node. Utilization is aggregated in one minute intervals.
CpuUtilizationPercentage Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, ComputeName PT1M Yes
Resource DiskAvailMegabytes

Available disk space in megabytes. Metrics are aggregated in one minute intervals.
DiskAvailMegabytes Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, ComputeName PT1M Yes
Resource DiskReadMegabytes

Data read from disk in megabytes. Metrics are aggregated in one minute intervals.
DiskReadMegabytes Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, ComputeName PT1M Yes
Resource DiskUsedMegabytes

Used disk space in megabytes. Metrics are aggregated in one minute intervals.
DiskUsedMegabytes Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, ComputeName PT1M Yes
Resource DiskWriteMegabytes

Data written into disk in megabytes. Metrics are aggregated in one minute intervals.
DiskWriteMegabytes Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, ComputeName PT1M Yes
Run Errors

Number of run errors in this workspace. Count is updated whenever run encounters an error.
Errors Count Total (Sum), Average, Minimum, Maximum, Count Scenario PT1M Yes
Run Failed Runs

Number of runs failed for this workspace. Count is updated when a run fails.
Failed Runs Count Total (Sum), Average, Minimum, Maximum, Count Scenario, RunType, PublishedPipelineId, ComputeType, PipelineStepType, ExperimentName PT1M Yes
Run Finalizing Runs

Number of runs entered finalizing state for this workspace. Count is updated when a run has completed but output collection still in progress.
Finalizing Runs Count Total (Sum), Average, Minimum, Maximum, Count Scenario, RunType, PublishedPipelineId, ComputeType, PipelineStepType, ExperimentName PT1M Yes
Resource GpuCapacityMilliGPUs

Maximum capacity of a GPU device in milli-GPUs. Capacity is aggregated in one minute intervals.
GpuCapacityMilliGPUs Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, DeviceId, ComputeName PT1M Yes
Resource GpuEnergyJoules

Interval energy in Joules on a GPU node. Energy is reported at one minute intervals.
GpuEnergyJoules Count Average, Maximum, Minimum, Total (Sum) Scenario, runId, rootRunId, InstanceId, DeviceId, ComputeName PT1M Yes
Resource GpuMemoryCapacityMegabytes

Maximum memory capacity of a GPU device in megabytes. Capacity aggregated in at one minute intervals.
GpuMemoryCapacityMegabytes Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, DeviceId, ComputeName PT1M Yes
Resource GpuMemoryUtilization

Percentage of memory utilization on a GPU node. Utilization is reported at one minute intervals.
GpuMemoryUtilization Count Average, Maximum, Minimum, Total (Sum) Scenario, runId, NodeId, DeviceId, ClusterName PT1M Yes
Resource GpuMemoryUtilizationMegabytes

Memory utilization of a GPU device in megabytes. Utilization aggregated in at one minute intervals.
GpuMemoryUtilizationMegabytes Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, DeviceId, ComputeName PT1M Yes
Resource GpuMemoryUtilizationPercentage

Memory utilization percentage of a GPU device. Utilization aggregated in at one minute intervals.
GpuMemoryUtilizationPercentage Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, DeviceId, ComputeName PT1M Yes
Resource GpuUtilization

Percentage of utilization on a GPU node. Utilization is reported at one minute intervals.
GpuUtilization Count Average, Maximum, Minimum, Total (Sum) Scenario, runId, NodeId, DeviceId, ClusterName PT1M Yes
Resource GpuUtilizationMilliGPUs

Utilization of a GPU device in milli-GPUs. Utilization is aggregated in one minute intervals.
GpuUtilizationMilliGPUs Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, DeviceId, ComputeName PT1M Yes
Resource GpuUtilizationPercentage

Utilization percentage of a GPU device. Utilization is aggregated in one minute intervals.
GpuUtilizationPercentage Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, DeviceId, ComputeName PT1M Yes
Resource IBReceiveMegabytes

Network data received over InfiniBand in megabytes. Metrics are aggregated in one minute intervals.
IBReceiveMegabytes Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, ComputeName, DeviceId PT1M Yes
Resource IBTransmitMegabytes

Network data sent over InfiniBand in megabytes. Metrics are aggregated in one minute intervals.
IBTransmitMegabytes Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, ComputeName, DeviceId PT1M Yes
Quota Idle Cores

Number of idle cores
Idle Cores Count Average, Maximum, Minimum, Total (Sum) Scenario, ClusterName PT1M Yes
Quota Idle Nodes

Number of idle nodes. Idle nodes are the nodes which are not running any jobs but can accept new job if available.
Idle Nodes Count Average, Maximum, Minimum, Total (Sum) Scenario, ClusterName PT1M Yes
Quota Leaving Cores

Number of leaving cores
Leaving Cores Count Average, Maximum, Minimum, Total (Sum) Scenario, ClusterName PT1M Yes
Quota Leaving Nodes

Number of leaving nodes. Leaving nodes are the nodes which just finished processing a job and will go to Idle state.
Leaving Nodes Count Average, Maximum, Minimum, Total (Sum) Scenario, ClusterName PT1M Yes
Model Model Deploy Failed

Number of model deployments that failed in this workspace
Model Deploy Failed Count Total (Sum), Average, Minimum, Maximum, Count Scenario, StatusCode PT1M Yes
Model Model Deploy Started

Number of model deployments started in this workspace
Model Deploy Started Count Total (Sum), Average, Minimum, Maximum, Count Scenario PT1M Yes
Model Model Deploy Succeeded

Number of model deployments that succeeded in this workspace
Model Deploy Succeeded Count Total (Sum), Average, Minimum, Maximum, Count Scenario PT1M Yes
Model Model Register Failed

Number of model registrations that failed in this workspace
Model Register Failed Count Total (Sum), Average, Minimum, Maximum, Count Scenario, StatusCode PT1M Yes
Model Model Register Succeeded

Number of model registrations that succeeded in this workspace
Model Register Succeeded Count Total (Sum), Average, Minimum, Maximum, Count Scenario PT1M Yes
Resource NetworkInputMegabytes

Network data received in megabytes. Metrics are aggregated in one minute intervals.
NetworkInputMegabytes Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, ComputeName, DeviceId PT1M Yes
Resource NetworkOutputMegabytes

Network data sent in megabytes. Metrics are aggregated in one minute intervals.
NetworkOutputMegabytes Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, ComputeName, DeviceId PT1M Yes
Run Not Responding Runs

Number of runs not responding for this workspace. Count is updated when a run enters Not Responding state.
Not Responding Runs Count Total (Sum), Average, Minimum, Maximum, Count Scenario, RunType, PublishedPipelineId, ComputeType, PipelineStepType, ExperimentName PT1M Yes
Run Not Started Runs

Number of runs in Not Started state for this workspace. Count is updated when a request is received to create a run but run information has not yet been populated.
Not Started Runs Count Total (Sum), Average, Minimum, Maximum, Count Scenario, RunType, PublishedPipelineId, ComputeType, PipelineStepType, ExperimentName PT1M Yes
Quota Preempted Cores

Number of preempted cores
Preempted Cores Count Average, Maximum, Minimum, Total (Sum) Scenario, ClusterName PT1M Yes
Quota Preempted Nodes

Number of preempted nodes. These nodes are the low priority nodes which are taken away from the available node pool.
Preempted Nodes Count Average, Maximum, Minimum, Total (Sum) Scenario, ClusterName PT1M Yes
Run Preparing Runs

Number of runs that are preparing for this workspace. Count is updated when a run enters Preparing state while the run environment is being prepared.
Preparing Runs Count Total (Sum), Average, Minimum, Maximum, Count Scenario, RunType, PublishedPipelineId, ComputeType, PipelineStepType, ExperimentName PT1M Yes
Run Provisioning Runs

Number of runs that are provisioning for this workspace. Count is updated when a run is waiting on compute target creation or provisioning.
Provisioning Runs Count Total (Sum), Average, Minimum, Maximum, Count Scenario, RunType, PublishedPipelineId, ComputeType, PipelineStepType, ExperimentName PT1M Yes
Run Queued Runs

Number of runs that are queued for this workspace. Count is updated when a run is queued in compute target. Can occure when waiting for required compute nodes to be ready.
Queued Runs Count Total (Sum), Average, Minimum, Maximum, Count Scenario, RunType, PublishedPipelineId, ComputeType, PipelineStepType, ExperimentName PT1M Yes
Quota Quota Utilization Percentage

Percent of quota utilized
Quota Utilization Percentage Count Average, Maximum, Minimum, Total (Sum) Scenario, ClusterName, VmFamilyName, VmPriority PT1M Yes
Run Started Runs

Number of runs running for this workspace. Count is updated when run starts running on required resources.
Started Runs Count Total (Sum), Average, Minimum, Maximum, Count Scenario, RunType, PublishedPipelineId, ComputeType, PipelineStepType, ExperimentName PT1M Yes
Run Starting Runs

Number of runs started for this workspace. Count is updated after request to create run and run info, such as the Run Id, has been populated
Starting Runs Count Total (Sum), Average, Minimum, Maximum, Count Scenario, RunType, PublishedPipelineId, ComputeType, PipelineStepType, ExperimentName PT1M Yes
Resource StorageAPIFailureCount

Azure Blob Storage API calls failure count.
StorageAPIFailureCount Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, ComputeName PT1M Yes
Resource StorageAPISuccessCount

Azure Blob Storage API calls success count.
StorageAPISuccessCount Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, ComputeName PT1M Yes
Quota Total Cores

Number of total cores
Total Cores Count Average, Maximum, Minimum, Total (Sum) Scenario, ClusterName PT1M Yes
Quota Total Nodes

Number of total nodes. This total includes some of Active Nodes, Idle Nodes, Unusable Nodes, Premepted Nodes, Leaving Nodes
Total Nodes Count Average, Maximum, Minimum, Total (Sum) Scenario, ClusterName PT1M Yes
Quota Unusable Cores

Number of unusable cores
Unusable Cores Count Average, Maximum, Minimum, Total (Sum) Scenario, ClusterName PT1M Yes
Quota Unusable Nodes

Number of unusable nodes. Unusable nodes are not functional due to some unresolvable issue. Azure will recycle these nodes.
Unusable Nodes Count Average, Maximum, Minimum, Total (Sum) Scenario, ClusterName PT1M Yes
Run Warnings

Number of run warnings in this workspace. Count is updated whenever a run encounters a warning.
Warnings Count Total (Sum), Average, Minimum, Maximum, Count Scenario PT1M Yes

Next steps