Supported metrics for Microsoft.MachineLearningServices/workspaces
The following table lists the metrics available for the Microsoft.MachineLearningServices/workspaces resource type.
Table headings
Metric - The metric display name as it appears in the Azure portal.
Name in Rest API - Metric name as referred to in the REST API.
Unit - Unit of measure.
Aggregation - The default aggregation type. Valid values: Average, Minimum, Maximum, Total, Count.
Dimensions - Dimensions available for the metric.
Time Grains - Intervals at which the metric is sampled. For example, PT1M
indicates that the metric is sampled every minute, PT30M
every 30 minutes, PT1H
every hour, and so on.
DS Export- Whether the metric is exportable to Azure Monitor Logs via Diagnostic Settings.
For information on exporting metrics, see - Metrics export using data collection rules and Create diagnostic settings in Azure Monitor.
For information on metric retention, see Azure Monitor Metrics overview.
For a list of supported logs, see Supported log categories - Microsoft.MachineLearningServices/workspaces
Category | Metric | Name in REST API | Unit | Aggregation | Dimensions | Time Grains | DS Export |
---|---|---|---|---|---|---|---|
Quota | Active Cores Number of active cores |
Active Cores |
Count | Average, Maximum, Minimum, Total (Sum) | Scenario , ClusterName |
PT1M | Yes |
Quota | Active Nodes Number of Acitve nodes. These are the nodes which are actively running a job. |
Active Nodes |
Count | Average, Maximum, Minimum, Total (Sum) | Scenario , ClusterName |
PT1M | Yes |
Run | Cancel Requested Runs Number of runs where cancel was requested for this workspace. Count is updated when cancellation request has been received for a run. |
Cancel Requested Runs |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario , RunType , PublishedPipelineId , ComputeType , PipelineStepType , ExperimentName |
PT1M | Yes |
Run | Cancelled Runs Number of runs cancelled for this workspace. Count is updated when a run is successfully cancelled. |
Cancelled Runs |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario , RunType , PublishedPipelineId , ComputeType , PipelineStepType , ExperimentName |
PT1M | Yes |
Run | Completed Runs Number of runs completed successfully for this workspace. Count is updated when a run has completed and output has been collected. |
Completed Runs |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario , RunType , PublishedPipelineId , ComputeType , PipelineStepType , ExperimentName |
PT1M | Yes |
Resource | CpuCapacityMillicores Maximum capacity of a CPU node in millicores. Capacity is aggregated in one minute intervals. |
CpuCapacityMillicores |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , ComputeName |
PT1M | Yes |
Resource | CpuMemoryCapacityMegabytes Maximum memory utilization of a CPU node in megabytes. Utilization is aggregated in one minute intervals. |
CpuMemoryCapacityMegabytes |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , ComputeName |
PT1M | Yes |
Resource | CpuMemoryUtilizationMegabytes Memory utilization of a CPU node in megabytes. Utilization is aggregated in one minute intervals. |
CpuMemoryUtilizationMegabytes |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , ComputeName |
PT1M | Yes |
Resource | CpuMemoryUtilizationPercentage Memory utilization percentage of a CPU node. Utilization is aggregated in one minute intervals. |
CpuMemoryUtilizationPercentage |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , ComputeName |
PT1M | Yes |
Resource | CpuUtilization Percentage of utilization on a CPU node. Utilization is reported at one minute intervals. |
CpuUtilization |
Count | Average, Maximum, Minimum, Total (Sum) | Scenario , runId , NodeId , ClusterName |
PT1M | Yes |
Resource | CpuUtilizationMillicores Utilization of a CPU node in millicores. Utilization is aggregated in one minute intervals. |
CpuUtilizationMillicores |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , ComputeName |
PT1M | Yes |
Resource | CpuUtilizationPercentage Utilization percentage of a CPU node. Utilization is aggregated in one minute intervals. |
CpuUtilizationPercentage |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , ComputeName |
PT1M | Yes |
Resource | DiskAvailMegabytes Available disk space in megabytes. Metrics are aggregated in one minute intervals. |
DiskAvailMegabytes |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , ComputeName |
PT1M | Yes |
Resource | DiskReadMegabytes Data read from disk in megabytes. Metrics are aggregated in one minute intervals. |
DiskReadMegabytes |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , ComputeName |
PT1M | Yes |
Resource | DiskUsedMegabytes Used disk space in megabytes. Metrics are aggregated in one minute intervals. |
DiskUsedMegabytes |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , ComputeName |
PT1M | Yes |
Resource | DiskWriteMegabytes Data written into disk in megabytes. Metrics are aggregated in one minute intervals. |
DiskWriteMegabytes |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , ComputeName |
PT1M | Yes |
Run | Errors Number of run errors in this workspace. Count is updated whenever run encounters an error. |
Errors |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario |
PT1M | Yes |
Run | Failed Runs Number of runs failed for this workspace. Count is updated when a run fails. |
Failed Runs |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario , RunType , PublishedPipelineId , ComputeType , PipelineStepType , ExperimentName |
PT1M | Yes |
Run | Finalizing Runs Number of runs entered finalizing state for this workspace. Count is updated when a run has completed but output collection still in progress. |
Finalizing Runs |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario , RunType , PublishedPipelineId , ComputeType , PipelineStepType , ExperimentName |
PT1M | Yes |
Resource | GpuCapacityMilliGPUs Maximum capacity of a GPU device in milli-GPUs. Capacity is aggregated in one minute intervals. |
GpuCapacityMilliGPUs |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , DeviceId , ComputeName |
PT1M | Yes |
Resource | GpuEnergyJoules Interval energy in Joules on a GPU node. Energy is reported at one minute intervals. |
GpuEnergyJoules |
Count | Average, Maximum, Minimum, Total (Sum) | Scenario , runId , rootRunId , InstanceId , DeviceId , ComputeName |
PT1M | Yes |
Resource | GpuMemoryCapacityMegabytes Maximum memory capacity of a GPU device in megabytes. Capacity aggregated in at one minute intervals. |
GpuMemoryCapacityMegabytes |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , DeviceId , ComputeName |
PT1M | Yes |
Resource | GpuMemoryUtilization Percentage of memory utilization on a GPU node. Utilization is reported at one minute intervals. |
GpuMemoryUtilization |
Count | Average, Maximum, Minimum, Total (Sum) | Scenario , runId , NodeId , DeviceId , ClusterName |
PT1M | Yes |
Resource | GpuMemoryUtilizationMegabytes Memory utilization of a GPU device in megabytes. Utilization aggregated in at one minute intervals. |
GpuMemoryUtilizationMegabytes |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , DeviceId , ComputeName |
PT1M | Yes |
Resource | GpuMemoryUtilizationPercentage Memory utilization percentage of a GPU device. Utilization aggregated in at one minute intervals. |
GpuMemoryUtilizationPercentage |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , DeviceId , ComputeName |
PT1M | Yes |
Resource | GpuUtilization Percentage of utilization on a GPU node. Utilization is reported at one minute intervals. |
GpuUtilization |
Count | Average, Maximum, Minimum, Total (Sum) | Scenario , runId , NodeId , DeviceId , ClusterName |
PT1M | Yes |
Resource | GpuUtilizationMilliGPUs Utilization of a GPU device in milli-GPUs. Utilization is aggregated in one minute intervals. |
GpuUtilizationMilliGPUs |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , DeviceId , ComputeName |
PT1M | Yes |
Resource | GpuUtilizationPercentage Utilization percentage of a GPU device. Utilization is aggregated in one minute intervals. |
GpuUtilizationPercentage |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , DeviceId , ComputeName |
PT1M | Yes |
Resource | IBReceiveMegabytes Network data received over InfiniBand in megabytes. Metrics are aggregated in one minute intervals. |
IBReceiveMegabytes |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , ComputeName , DeviceId |
PT1M | Yes |
Resource | IBTransmitMegabytes Network data sent over InfiniBand in megabytes. Metrics are aggregated in one minute intervals. |
IBTransmitMegabytes |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , ComputeName , DeviceId |
PT1M | Yes |
Quota | Idle Cores Number of idle cores |
Idle Cores |
Count | Average, Maximum, Minimum, Total (Sum) | Scenario , ClusterName |
PT1M | Yes |
Quota | Idle Nodes Number of idle nodes. Idle nodes are the nodes which are not running any jobs but can accept new job if available. |
Idle Nodes |
Count | Average, Maximum, Minimum, Total (Sum) | Scenario , ClusterName |
PT1M | Yes |
Quota | Leaving Cores Number of leaving cores |
Leaving Cores |
Count | Average, Maximum, Minimum, Total (Sum) | Scenario , ClusterName |
PT1M | Yes |
Quota | Leaving Nodes Number of leaving nodes. Leaving nodes are the nodes which just finished processing a job and will go to Idle state. |
Leaving Nodes |
Count | Average, Maximum, Minimum, Total (Sum) | Scenario , ClusterName |
PT1M | Yes |
Model | Model Deploy Failed Number of model deployments that failed in this workspace |
Model Deploy Failed |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario , StatusCode |
PT1M | Yes |
Model | Model Deploy Started Number of model deployments started in this workspace |
Model Deploy Started |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario |
PT1M | Yes |
Model | Model Deploy Succeeded Number of model deployments that succeeded in this workspace |
Model Deploy Succeeded |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario |
PT1M | Yes |
Model | Model Register Failed Number of model registrations that failed in this workspace |
Model Register Failed |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario , StatusCode |
PT1M | Yes |
Model | Model Register Succeeded Number of model registrations that succeeded in this workspace |
Model Register Succeeded |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario |
PT1M | Yes |
Resource | NetworkInputMegabytes Network data received in megabytes. Metrics are aggregated in one minute intervals. |
NetworkInputMegabytes |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , ComputeName , DeviceId |
PT1M | Yes |
Resource | NetworkOutputMegabytes Network data sent in megabytes. Metrics are aggregated in one minute intervals. |
NetworkOutputMegabytes |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , ComputeName , DeviceId |
PT1M | Yes |
Run | Not Responding Runs Number of runs not responding for this workspace. Count is updated when a run enters Not Responding state. |
Not Responding Runs |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario , RunType , PublishedPipelineId , ComputeType , PipelineStepType , ExperimentName |
PT1M | Yes |
Run | Not Started Runs Number of runs in Not Started state for this workspace. Count is updated when a request is received to create a run but run information has not yet been populated. |
Not Started Runs |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario , RunType , PublishedPipelineId , ComputeType , PipelineStepType , ExperimentName |
PT1M | Yes |
Quota | Preempted Cores Number of preempted cores |
Preempted Cores |
Count | Average, Maximum, Minimum, Total (Sum) | Scenario , ClusterName |
PT1M | Yes |
Quota | Preempted Nodes Number of preempted nodes. These nodes are the low priority nodes which are taken away from the available node pool. |
Preempted Nodes |
Count | Average, Maximum, Minimum, Total (Sum) | Scenario , ClusterName |
PT1M | Yes |
Run | Preparing Runs Number of runs that are preparing for this workspace. Count is updated when a run enters Preparing state while the run environment is being prepared. |
Preparing Runs |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario , RunType , PublishedPipelineId , ComputeType , PipelineStepType , ExperimentName |
PT1M | Yes |
Run | Provisioning Runs Number of runs that are provisioning for this workspace. Count is updated when a run is waiting on compute target creation or provisioning. |
Provisioning Runs |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario , RunType , PublishedPipelineId , ComputeType , PipelineStepType , ExperimentName |
PT1M | Yes |
Run | Queued Runs Number of runs that are queued for this workspace. Count is updated when a run is queued in compute target. Can occure when waiting for required compute nodes to be ready. |
Queued Runs |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario , RunType , PublishedPipelineId , ComputeType , PipelineStepType , ExperimentName |
PT1M | Yes |
Quota | Quota Utilization Percentage Percent of quota utilized |
Quota Utilization Percentage |
Count | Average, Maximum, Minimum, Total (Sum) | Scenario , ClusterName , VmFamilyName , VmPriority |
PT1M | Yes |
Run | Started Runs Number of runs running for this workspace. Count is updated when run starts running on required resources. |
Started Runs |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario , RunType , PublishedPipelineId , ComputeType , PipelineStepType , ExperimentName |
PT1M | Yes |
Run | Starting Runs Number of runs started for this workspace. Count is updated after request to create run and run info, such as the Run Id, has been populated |
Starting Runs |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario , RunType , PublishedPipelineId , ComputeType , PipelineStepType , ExperimentName |
PT1M | Yes |
Resource | StorageAPIFailureCount Azure Blob Storage API calls failure count. |
StorageAPIFailureCount |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , ComputeName |
PT1M | Yes |
Resource | StorageAPISuccessCount Azure Blob Storage API calls success count. |
StorageAPISuccessCount |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , ComputeName |
PT1M | Yes |
Quota | Total Cores Number of total cores |
Total Cores |
Count | Average, Maximum, Minimum, Total (Sum) | Scenario , ClusterName |
PT1M | Yes |
Quota | Total Nodes Number of total nodes. This total includes some of Active Nodes, Idle Nodes, Unusable Nodes, Premepted Nodes, Leaving Nodes |
Total Nodes |
Count | Average, Maximum, Minimum, Total (Sum) | Scenario , ClusterName |
PT1M | Yes |
Quota | Unusable Cores Number of unusable cores |
Unusable Cores |
Count | Average, Maximum, Minimum, Total (Sum) | Scenario , ClusterName |
PT1M | Yes |
Quota | Unusable Nodes Number of unusable nodes. Unusable nodes are not functional due to some unresolvable issue. Azure will recycle these nodes. |
Unusable Nodes |
Count | Average, Maximum, Minimum, Total (Sum) | Scenario , ClusterName |
PT1M | Yes |
Run | Warnings Number of run warnings in this workspace. Count is updated whenever a run encounters a warning. |
Warnings |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario |
PT1M | Yes |