Azure Machine Learning monitoring data reference

Article
08/28/2024

This article contains all the monitoring reference information for this service.

See Monitor Machine Learning for details on the data you can collect for Azure Machine Learning and how to use it.

Metrics

This section lists all the automatically collected platform metrics for this service. These metrics are also part of the global list of all platform metrics supported in Azure Monitor.

For information on metric retention, see Azure Monitor Metrics overview.

The resource provider for these metrics is Microsoft.MachineLearningServices/workspaces.

The metrics categories are Model, Quota, Resource, Run, and Traffic. Quota information is for Machine Learning compute only. Run provides information on training runs for the workspace.

Supported metrics for Microsoft.MachineLearningServices/workspaces

The following table lists the metrics available for the Microsoft.MachineLearningServices/workspaces resource type.

All columns might not be present in every table.
Some columns might be beyond the viewing area of the page. Select Expand table to view all available columns.

Table headings

Category - The metrics group or classification.
Metric - The metric display name as it appears in the Azure portal.
Name in REST API - The metric name as referred to in the REST API.
Unit - Unit of measure.
Aggregation - The default aggregation type. Valid values: Average (Avg), Minimum (Min), Maximum (Max), Total (Sum), Count.
Dimensions - Dimensions available for the metric.
Time Grains - Intervals at which the metric is sampled. For example, PT1M indicates that the metric is sampled every minute, PT30M every 30 minutes, PT1H every hour, and so on.
DS Export- Whether the metric is exportable to Azure Monitor Logs via diagnostic settings. For information on exporting metrics, see Create diagnostic settings in Azure Monitor.

Category	Metric	Name in REST API	Unit	Aggregation	Dimensions	Time Grains	DS Export
Quota	Active Cores Number of active cores	`Active Cores`	Count	Average, Maximum, Minimum, Total (Sum)	`Scenario`, `ClusterName`	PT1M	Yes
Quota	Active Nodes Number of Acitve nodes. These are the nodes which are actively running a job.	`Active Nodes`	Count	Average, Maximum, Minimum, Total (Sum)	`Scenario`, `ClusterName`	PT1M	Yes
Run	Cancel Requested Runs Number of runs where cancel was requested for this workspace. Count is updated when cancellation request has been received for a run.	`Cancel Requested Runs`	Count	Total (Sum), Average, Minimum, Maximum, Count	`Scenario`, `RunType`, `PublishedPipelineId`, `ComputeType`, `PipelineStepType`, `ExperimentName`	PT1M	Yes
Run	Cancelled Runs Number of runs cancelled for this workspace. Count is updated when a run is successfully cancelled.	`Cancelled Runs`	Count	Total (Sum), Average, Minimum, Maximum, Count	`Scenario`, `RunType`, `PublishedPipelineId`, `ComputeType`, `PipelineStepType`, `ExperimentName`	PT1M	Yes
Run	Completed Runs Number of runs completed successfully for this workspace. Count is updated when a run has completed and output has been collected.	`Completed Runs`	Count	Total (Sum), Average, Minimum, Maximum, Count	`Scenario`, `RunType`, `PublishedPipelineId`, `ComputeType`, `PipelineStepType`, `ExperimentName`	PT1M	Yes
Resource	CpuCapacityMillicores Maximum capacity of a CPU node in millicores. Capacity is aggregated in one minute intervals.	`CpuCapacityMillicores`	Count	Average, Maximum, Minimum, Total (Sum)	`RunId`, `InstanceId`, `ComputeName`	PT1M	Yes
Resource	CpuMemoryCapacityMegabytes Maximum memory utilization of a CPU node in megabytes. Utilization is aggregated in one minute intervals.	`CpuMemoryCapacityMegabytes`	Count	Average, Maximum, Minimum, Total (Sum)	`RunId`, `InstanceId`, `ComputeName`	PT1M	Yes
Resource	CpuMemoryUtilizationMegabytes Memory utilization of a CPU node in megabytes. Utilization is aggregated in one minute intervals.	`CpuMemoryUtilizationMegabytes`	Count	Average, Maximum, Minimum, Total (Sum)	`RunId`, `InstanceId`, `ComputeName`	PT1M	Yes
Resource	CpuMemoryUtilizationPercentage Memory utilization percentage of a CPU node. Utilization is aggregated in one minute intervals.	`CpuMemoryUtilizationPercentage`	Count	Average, Maximum, Minimum, Total (Sum)	`RunId`, `InstanceId`, `ComputeName`	PT1M	Yes
Resource	CpuUtilization Percentage of utilization on a CPU node. Utilization is reported at one minute intervals.	`CpuUtilization`	Count	Average, Maximum, Minimum, Total (Sum)	`Scenario`, `runId`, `NodeId`, `ClusterName`	PT1M	Yes
Resource	CpuUtilizationMillicores Utilization of a CPU node in millicores. Utilization is aggregated in one minute intervals.	`CpuUtilizationMillicores`	Count	Average, Maximum, Minimum, Total (Sum)	`RunId`, `InstanceId`, `ComputeName`	PT1M	Yes
Resource	CpuUtilizationPercentage Utilization percentage of a CPU node. Utilization is aggregated in one minute intervals.	`CpuUtilizationPercentage`	Count	Average, Maximum, Minimum, Total (Sum)	`RunId`, `InstanceId`, `ComputeName`	PT1M	Yes
Resource	DiskAvailMegabytes Available disk space in megabytes. Metrics are aggregated in one minute intervals.	`DiskAvailMegabytes`	Count	Average, Maximum, Minimum, Total (Sum)	`RunId`, `InstanceId`, `ComputeName`	PT1M	Yes
Resource	DiskReadMegabytes Data read from disk in megabytes. Metrics are aggregated in one minute intervals.	`DiskReadMegabytes`	Count	Average, Maximum, Minimum, Total (Sum)	`RunId`, `InstanceId`, `ComputeName`	PT1M	Yes
Resource	DiskUsedMegabytes Used disk space in megabytes. Metrics are aggregated in one minute intervals.	`DiskUsedMegabytes`	Count	Average, Maximum, Minimum, Total (Sum)	`RunId`, `InstanceId`, `ComputeName`	PT1M	Yes
Resource	DiskWriteMegabytes Data written into disk in megabytes. Metrics are aggregated in one minute intervals.	`DiskWriteMegabytes`	Count	Average, Maximum, Minimum, Total (Sum)	`RunId`, `InstanceId`, `ComputeName`	PT1M	Yes
Run	Errors Number of run errors in this workspace. Count is updated whenever run encounters an error.	`Errors`	Count	Total (Sum), Average, Minimum, Maximum, Count	`Scenario`	PT1M	Yes
Run	Failed Runs Number of runs failed for this workspace. Count is updated when a run fails.	`Failed Runs`	Count	Total (Sum), Average, Minimum, Maximum, Count	`Scenario`, `RunType`, `PublishedPipelineId`, `ComputeType`, `PipelineStepType`, `ExperimentName`	PT1M	Yes
Run	Finalizing Runs Number of runs entered finalizing state for this workspace. Count is updated when a run has completed but output collection still in progress.	`Finalizing Runs`	Count	Total (Sum), Average, Minimum, Maximum, Count	`Scenario`, `RunType`, `PublishedPipelineId`, `ComputeType`, `PipelineStepType`, `ExperimentName`	PT1M	Yes
Resource	GpuCapacityMilliGPUs Maximum capacity of a GPU device in milli-GPUs. Capacity is aggregated in one minute intervals.	`GpuCapacityMilliGPUs`	Count	Average, Maximum, Minimum, Total (Sum)	`RunId`, `InstanceId`, `DeviceId`, `ComputeName`	PT1M	Yes
Resource	GpuEnergyJoules Interval energy in Joules on a GPU node. Energy is reported at one minute intervals.	`GpuEnergyJoules`	Count	Average, Maximum, Minimum, Total (Sum)	`Scenario`, `runId`, `rootRunId`, `InstanceId`, `DeviceId`, `ComputeName`	PT1M	Yes
Resource	GpuMemoryCapacityMegabytes Maximum memory capacity of a GPU device in megabytes. Capacity aggregated in at one minute intervals.	`GpuMemoryCapacityMegabytes`	Count	Average, Maximum, Minimum, Total (Sum)	`RunId`, `InstanceId`, `DeviceId`, `ComputeName`	PT1M	Yes
Resource	GpuMemoryUtilization Percentage of memory utilization on a GPU node. Utilization is reported at one minute intervals.	`GpuMemoryUtilization`	Count	Average, Maximum, Minimum, Total (Sum)	`Scenario`, `runId`, `NodeId`, `DeviceId`, `ClusterName`	PT1M	Yes
Resource	GpuMemoryUtilizationMegabytes Memory utilization of a GPU device in megabytes. Utilization aggregated in at one minute intervals.	`GpuMemoryUtilizationMegabytes`	Count	Average, Maximum, Minimum, Total (Sum)	`RunId`, `InstanceId`, `DeviceId`, `ComputeName`	PT1M	Yes
Resource	GpuMemoryUtilizationPercentage Memory utilization percentage of a GPU device. Utilization aggregated in at one minute intervals.	`GpuMemoryUtilizationPercentage`	Count	Average, Maximum, Minimum, Total (Sum)	`RunId`, `InstanceId`, `DeviceId`, `ComputeName`	PT1M	Yes
Resource	GpuUtilization Percentage of utilization on a GPU node. Utilization is reported at one minute intervals.	`GpuUtilization`	Count	Average, Maximum, Minimum, Total (Sum)	`Scenario`, `runId`, `NodeId`, `DeviceId`, `ClusterName`	PT1M	Yes
Resource	GpuUtilizationMilliGPUs Utilization of a GPU device in milli-GPUs. Utilization is aggregated in one minute intervals.	`GpuUtilizationMilliGPUs`	Count	Average, Maximum, Minimum, Total (Sum)	`RunId`, `InstanceId`, `DeviceId`, `ComputeName`	PT1M	Yes
Resource	GpuUtilizationPercentage Utilization percentage of a GPU device. Utilization is aggregated in one minute intervals.	`GpuUtilizationPercentage`	Count	Average, Maximum, Minimum, Total (Sum)	`RunId`, `InstanceId`, `DeviceId`, `ComputeName`	PT1M	Yes
Resource	IBReceiveMegabytes Network data received over InfiniBand in megabytes. Metrics are aggregated in one minute intervals.	`IBReceiveMegabytes`	Count	Average, Maximum, Minimum, Total (Sum)	`RunId`, `InstanceId`, `ComputeName`, `DeviceId`	PT1M	Yes
Resource	IBTransmitMegabytes Network data sent over InfiniBand in megabytes. Metrics are aggregated in one minute intervals.	`IBTransmitMegabytes`	Count	Average, Maximum, Minimum, Total (Sum)	`RunId`, `InstanceId`, `ComputeName`, `DeviceId`	PT1M	Yes
Quota	Idle Cores Number of idle cores	`Idle Cores`	Count	Average, Maximum, Minimum, Total (Sum)	`Scenario`, `ClusterName`	PT1M	Yes
Quota	Idle Nodes Number of idle nodes. Idle nodes are the nodes which are not running any jobs but can accept new job if available.	`Idle Nodes`	Count	Average, Maximum, Minimum, Total (Sum)	`Scenario`, `ClusterName`	PT1M	Yes
Quota	Leaving Cores Number of leaving cores	`Leaving Cores`	Count	Average, Maximum, Minimum, Total (Sum)	`Scenario`, `ClusterName`	PT1M	Yes
Quota	Leaving Nodes Number of leaving nodes. Leaving nodes are the nodes which just finished processing a job and will go to Idle state.	`Leaving Nodes`	Count	Average, Maximum, Minimum, Total (Sum)	`Scenario`, `ClusterName`	PT1M	Yes
Model	Model Deploy Failed Number of model deployments that failed in this workspace	`Model Deploy Failed`	Count	Total (Sum), Average, Minimum, Maximum, Count	`Scenario`, `StatusCode`	PT1M	Yes
Model	Model Deploy Started Number of model deployments started in this workspace	`Model Deploy Started`	Count	Total (Sum), Average, Minimum, Maximum, Count	`Scenario`	PT1M	Yes
Model	Model Deploy Succeeded Number of model deployments that succeeded in this workspace	`Model Deploy Succeeded`	Count	Total (Sum), Average, Minimum, Maximum, Count	`Scenario`	PT1M	Yes
Model	Model Register Failed Number of model registrations that failed in this workspace	`Model Register Failed`	Count	Total (Sum), Average, Minimum, Maximum, Count	`Scenario`, `StatusCode`	PT1M	Yes
Model	Model Register Succeeded Number of model registrations that succeeded in this workspace	`Model Register Succeeded`	Count	Total (Sum), Average, Minimum, Maximum, Count	`Scenario`	PT1M	Yes
Resource	NetworkInputMegabytes Network data received in megabytes. Metrics are aggregated in one minute intervals.	`NetworkInputMegabytes`	Count	Average, Maximum, Minimum, Total (Sum)	`RunId`, `InstanceId`, `ComputeName`, `DeviceId`	PT1M	Yes
Resource	NetworkOutputMegabytes Network data sent in megabytes. Metrics are aggregated in one minute intervals.	`NetworkOutputMegabytes`	Count	Average, Maximum, Minimum, Total (Sum)	`RunId`, `InstanceId`, `ComputeName`, `DeviceId`	PT1M	Yes
Run	Not Responding Runs Number of runs not responding for this workspace. Count is updated when a run enters Not Responding state.	`Not Responding Runs`	Count	Total (Sum), Average, Minimum, Maximum, Count	`Scenario`, `RunType`, `PublishedPipelineId`, `ComputeType`, `PipelineStepType`, `ExperimentName`	PT1M	Yes
Run	Not Started Runs Number of runs in Not Started state for this workspace. Count is updated when a request is received to create a run but run information has not yet been populated.	`Not Started Runs`	Count	Total (Sum), Average, Minimum, Maximum, Count	`Scenario`, `RunType`, `PublishedPipelineId`, `ComputeType`, `PipelineStepType`, `ExperimentName`	PT1M	Yes
Quota	Preempted Cores Number of preempted cores	`Preempted Cores`	Count	Average, Maximum, Minimum, Total (Sum)	`Scenario`, `ClusterName`	PT1M	Yes
Quota	Preempted Nodes Number of preempted nodes. These nodes are the low priority nodes which are taken away from the available node pool.	`Preempted Nodes`	Count	Average, Maximum, Minimum, Total (Sum)	`Scenario`, `ClusterName`	PT1M	Yes
Run	Preparing Runs Number of runs that are preparing for this workspace. Count is updated when a run enters Preparing state while the run environment is being prepared.	`Preparing Runs`	Count	Total (Sum), Average, Minimum, Maximum, Count	`Scenario`, `RunType`, `PublishedPipelineId`, `ComputeType`, `PipelineStepType`, `ExperimentName`	PT1M	Yes
Run	Provisioning Runs Number of runs that are provisioning for this workspace. Count is updated when a run is waiting on compute target creation or provisioning.	`Provisioning Runs`	Count	Total (Sum), Average, Minimum, Maximum, Count	`Scenario`, `RunType`, `PublishedPipelineId`, `ComputeType`, `PipelineStepType`, `ExperimentName`	PT1M	Yes
Run	Queued Runs Number of runs that are queued for this workspace. Count is updated when a run is queued in compute target. Can occure when waiting for required compute nodes to be ready.	`Queued Runs`	Count	Total (Sum), Average, Minimum, Maximum, Count	`Scenario`, `RunType`, `PublishedPipelineId`, `ComputeType`, `PipelineStepType`, `ExperimentName`	PT1M	Yes
Quota	Quota Utilization Percentage Percent of quota utilized	`Quota Utilization Percentage`	Count	Average, Maximum, Minimum, Total (Sum)	`Scenario`, `ClusterName`, `VmFamilyName`, `VmPriority`	PT1M	Yes
Run	Started Runs Number of runs running for this workspace. Count is updated when run starts running on required resources.	`Started Runs`	Count	Total (Sum), Average, Minimum, Maximum, Count	`Scenario`, `RunType`, `PublishedPipelineId`, `ComputeType`, `PipelineStepType`, `ExperimentName`	PT1M	Yes
Run	Starting Runs Number of runs started for this workspace. Count is updated after request to create run and run info, such as the Run Id, has been populated	`Starting Runs`	Count	Total (Sum), Average, Minimum, Maximum, Count	`Scenario`, `RunType`, `PublishedPipelineId`, `ComputeType`, `PipelineStepType`, `ExperimentName`	PT1M	Yes
Resource	StorageAPIFailureCount Azure Blob Storage API calls failure count.	`StorageAPIFailureCount`	Count	Average, Maximum, Minimum, Total (Sum)	`RunId`, `InstanceId`, `ComputeName`	PT1M	Yes
Resource	StorageAPISuccessCount Azure Blob Storage API calls success count.	`StorageAPISuccessCount`	Count	Average, Maximum, Minimum, Total (Sum)	`RunId`, `InstanceId`, `ComputeName`	PT1M	Yes
Quota	Total Cores Number of total cores	`Total Cores`	Count	Average, Maximum, Minimum, Total (Sum)	`Scenario`, `ClusterName`	PT1M	Yes
Quota	Total Nodes Number of total nodes. This total includes some of Active Nodes, Idle Nodes, Unusable Nodes, Premepted Nodes, Leaving Nodes	`Total Nodes`	Count	Average, Maximum, Minimum, Total (Sum)	`Scenario`, `ClusterName`	PT1M	Yes
Quota	Unusable Cores Number of unusable cores	`Unusable Cores`	Count	Average, Maximum, Minimum, Total (Sum)	`Scenario`, `ClusterName`	PT1M	Yes
Quota	Unusable Nodes Number of unusable nodes. Unusable nodes are not functional due to some unresolvable issue. Azure will recycle these nodes.	`Unusable Nodes`	Count	Average, Maximum, Minimum, Total (Sum)	`Scenario`, `ClusterName`	PT1M	Yes
Run	Warnings Number of run warnings in this workspace. Count is updated whenever a run encounters a warning.	`Warnings`	Count	Total (Sum), Average, Minimum, Maximum, Count	`Scenario`	PT1M	Yes

Supported metrics for Microsoft.MachineLearningServices/workspaces/onlineEndpoints

The following table lists the metrics available for the Microsoft.MachineLearningServices/workspaces/onlineEndpoints resource type.

All columns might not be present in every table.
Some columns might be beyond the viewing area of the page. Select Expand table to view all available columns.

Table headings

Category - The metrics group or classification.
Metric - The metric display name as it appears in the Azure portal.
Name in REST API - The metric name as referred to in the REST API.
Unit - Unit of measure.
Aggregation - The default aggregation type. Valid values: Average (Avg), Minimum (Min), Maximum (Max), Total (Sum), Count.
Dimensions - Dimensions available for the metric.
Time Grains - Intervals at which the metric is sampled. For example, PT1M indicates that the metric is sampled every minute, PT30M every 30 minutes, PT1H every hour, and so on.
DS Export- Whether the metric is exportable to Azure Monitor Logs via diagnostic settings. For information on exporting metrics, see Create diagnostic settings in Azure Monitor.

Category	Metric	Name in REST API	Unit	Aggregation	Dimensions	Time Grains	DS Export
Traffic	Connections Active The total number of concurrent TCP connections active from clients.	`ConnectionsActive`	Count	Average	<none>	PT1M	No
Traffic	Data Collection Errors Per Minute The number of data collection events dropped per minute.	`DataCollectionErrorsPerMinute`	Count	Minimum, Maximum, Average	`deployment`, `reason`, `type`	PT1M	No
Traffic	Data Collection Events Per Minute The number of data collection events processed per minute.	`DataCollectionEventsPerMinute`	Count	Minimum, Maximum, Average	`deployment`, `type`	PT1M	No
Traffic	Network Bytes The bytes per second served for the endpoint.	`NetworkBytes`	BytesPerSecond	Average	<none>	PT1M	No
Traffic	New Connections Per Second The average number of new TCP connections per second established from clients.	`NewConnectionsPerSecond`	CountPerSecond	Average	<none>	PT1M	No
Traffic	Request Latency The average complete interval of time taken for a request to be responded in milliseconds	`RequestLatency`	Milliseconds	Average	`deployment`	PT1M	Yes
Traffic	Request Latency P50 The average P50 request latency aggregated by all request latency values collected over the selected time period	`RequestLatency_P50`	Milliseconds	Average	`deployment`	PT1M	Yes
Traffic	Request Latency P90 The average P90 request latency aggregated by all request latency values collected over the selected time period	`RequestLatency_P90`	Milliseconds	Average	`deployment`	PT1M	Yes
Traffic	Request Latency P95 The average P95 request latency aggregated by all request latency values collected over the selected time period	`RequestLatency_P95`	Milliseconds	Average	`deployment`	PT1M	Yes
Traffic	Request Latency P99 The average P99 request latency aggregated by all request latency values collected over the selected time period	`RequestLatency_P99`	Milliseconds	Average	`deployment`	PT1M	Yes
Traffic	Requests Per Minute The number of requests sent to online endpoint within a minute	`RequestsPerMinute`	Count	Average	`deployment`, `statusCode`, `statusCodeClass`, `modelStatusCode`	PT1M	No

Supported metrics for Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments

The following table lists the metrics available for the Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments resource type.

All columns might not be present in every table.
Some columns might be beyond the viewing area of the page. Select Expand table to view all available columns.

Table headings

Category - The metrics group or classification.
Metric - The metric display name as it appears in the Azure portal.
Name in REST API - The metric name as referred to in the REST API.
Unit - Unit of measure.
Aggregation - The default aggregation type. Valid values: Average (Avg), Minimum (Min), Maximum (Max), Total (Sum), Count.
Dimensions - Dimensions available for the metric.
Time Grains - Intervals at which the metric is sampled. For example, PT1M indicates that the metric is sampled every minute, PT30M every 30 minutes, PT1H every hour, and so on.
DS Export- Whether the metric is exportable to Azure Monitor Logs via diagnostic settings. For information on exporting metrics, see Create diagnostic settings in Azure Monitor.

Category	Metric	Name in REST API	Unit	Aggregation	Dimensions	Time Grains	DS Export
Resource	CPU Memory Utilization Percentage Percentage of memory utilization on an instance. Utilization is reported at one minute intervals.	`CpuMemoryUtilizationPercentage`	Percent	Minimum, Maximum, Average	`instanceId`	PT1M	Yes
Resource	CPU Utilization Percentage Percentage of CPU utilization on an instance. Utilization is reported at one minute intervals.	`CpuUtilizationPercentage`	Percent	Minimum, Maximum, Average	`instanceId`	PT1M	Yes
Resource	Data Collection Errors Per Minute The number of data collection events dropped per minute.	`DataCollectionErrorsPerMinute`	Count	Minimum, Maximum, Average	`instanceId`, `reason`, `type`	PT1M	No
Resource	Data Collection Events Per Minute The number of data collection events processed per minute.	`DataCollectionEventsPerMinute`	Count	Minimum, Maximum, Average	`instanceId`, `type`	PT1M	No
Resource	Deployment Capacity The number of instances in the deployment.	`DeploymentCapacity`	Count	Minimum, Maximum, Average	`instanceId`, `State`	PT1M	No
Resource	Disk Utilization Percentage of disk utilization on an instance. Utilization is reported at one minute intervals.	`DiskUtilization`	Percent	Minimum, Maximum, Average	`instanceId`, `disk`	PT1M	Yes
Resource	GPU Energy in Joules Interval energy in Joules on a GPU node. Energy is reported at one minute intervals.	`GpuEnergyJoules`	Count	Minimum, Maximum, Average	`instanceId`	PT1M	No
Resource	GPU Memory Utilization Percentage Percentage of GPU memory utilization on an instance. Utilization is reported at one minute intervals.	`GpuMemoryUtilizationPercentage`	Percent	Minimum, Maximum, Average	`instanceId`	PT1M	Yes
Resource	GPU Utilization Percentage Percentage of GPU utilization on an instance. Utilization is reported at one minute intervals.	`GpuUtilizationPercentage`	Percent	Minimum, Maximum, Average	`instanceId`	PT1M	Yes
Traffic	Request Latency P50 The average P50 request latency aggregated by all request latency values collected over the selected time period	`RequestLatency_P50`	Milliseconds	Average	<none>	PT1M	Yes
Traffic	Request Latency P90 The average P90 request latency aggregated by all request latency values collected over the selected time period	`RequestLatency_P90`	Milliseconds	Average	<none>	PT1M	Yes
Traffic	Request Latency P95 The average P95 request latency aggregated by all request latency values collected over the selected time period	`RequestLatency_P95`	Milliseconds	Average	<none>	PT1M	Yes
Traffic	Request Latency P99 The average P99 request latency aggregated by all request latency values collected over the selected time period	`RequestLatency_P99`	Milliseconds	Average	<none>	PT1M	Yes
Traffic	Requests Per Minute The number of requests sent to online deployment within a minute	`RequestsPerMinute`	Count	Average	`envoy_response_code`	PT1M	No

Metric dimensions

For information about what metric dimensions are, see Multi-dimensional metrics.

This service has the following dimensions associated with its metrics.

Dimension	Description
Cluster Name	The name of the compute cluster resource. Available for all quota metrics.
Vm Family Name	The name of the VM family used by the cluster. Available for quota utilization percentage.
Vm Priority	The priority of the VM. Available for quota utilization percentage.
CreatedTime	Only available for CpuUtilization and GpuUtilization.
DeviceId	ID of the device (GPU). Only available for GpuUtilization.
NodeId	ID of the node created where job is running. Only available for CpuUtilization and GpuUtilization.
RunId	ID of the run/job. Only available for CpuUtilization and GpuUtilization.
ComputeType	The compute type that the run used. Only available for Completed runs, Failed runs, and Started runs.
PipelineStepType	The type of PipelineStep used in the run. Only available for Completed runs, Failed runs, and Started runs.
PublishedPipelineId	The ID of the published pipeline used in the run. Only available for Completed runs, Failed runs, and Started runs.
RunType	The type of run. Only available for Completed runs, Failed runs, and Started runs.

The valid values for the RunType dimension are:

Value	Description
Experiment	Non-pipeline runs.
PipelineRun	A pipeline run, which is the parent of a StepRun.
StepRun	A run for a pipeline step.
ReusedStepRun	A run for a pipeline step that reuses a previous run.

Resource logs

This section lists the types of resource logs you can collect for this service. The section pulls from the list of all resource logs category types supported in Azure Monitor.

Supported resource logs for Microsoft.MachineLearningServices/registries

Category	Category display name	Log table	Supports basic log plan	Supports ingestion-time transformation	Example queries	Costs to export
`RegistryAssetReadEvent`	Registry Asset Read Event		No	No		Yes
`RegistryAssetWriteEvent`	Registry Asset Write Event	AmlRegistryWriteEventsLog Azure ML Registry Write events log. It keeps records of Write operations with registries data access (data plane), including user identity, asset name and version for each access event.	No	No	Queries	Yes

Supported resource logs for Microsoft.MachineLearningServices/workspaces

Category	Category display name	Log table	Supports basic log plan	Supports ingestion-time transformation	Example queries	Costs to export
`AmlComputeClusterEvent`	AmlComputeClusterEvent	AmlComputeClusterEvent AmlCompute Cluster events	No	Yes	Queries	No
`AmlComputeClusterNodeEvent`	AmlComputeClusterNodeEvent		No	No		Yes
`AmlComputeCpuGpuUtilization`	AmlComputeCpuGpuUtilization	AmlComputeCpuGpuUtilization Azure Machine Learning services CPU and GPU utilizaion logs.	No	Yes	Queries	No
`AmlComputeJobEvent`	AmlComputeJobEvent	AmlComputeJobEvent AmlCompute Job events	No	Yes	Queries	No
`AmlRunStatusChangedEvent`	AmlRunStatusChangedEvent	AmlRunStatusChangedEvent Azure Machine Learning services run status event logs.	No	Yes		No
`ComputeInstanceEvent`	ComputeInstanceEvent	AmlComputeInstanceEvent Events when ML Compute Instance is accessed (read/write).	No	Yes		Yes
`DataLabelChangeEvent`	DataLabelChangeEvent	AmlDataLabelEvent Events when data label(s) or its projects is accessed (read, created, or deleted).	No	Yes		Yes
`DataLabelReadEvent`	DataLabelReadEvent	AmlDataLabelEvent Events when data label(s) or its projects is accessed (read, created, or deleted).	No	Yes		Yes
`DataSetChangeEvent`	DataSetChangeEvent	AmlDataSetEvent Events when a registered or unregistered ML datastore is accessed (read, created, or deleted).	No	Yes	Queries	Yes
`DataSetReadEvent`	DataSetReadEvent	AmlDataSetEvent Events when a registered or unregistered ML datastore is accessed (read, created, or deleted).	No	Yes	Queries	Yes
`DataStoreChangeEvent`	DataStoreChangeEvent	AmlDataStoreEvent Events when ML datastore is accessed (read, created, or deleted).	No	Yes		Yes
`DataStoreReadEvent`	DataStoreReadEvent	AmlDataStoreEvent Events when ML datastore is accessed (read, created, or deleted).	No	Yes		Yes
`DeploymentEventACI`	DeploymentEventACI	AmlDeploymentEvent Events when a model deployment happens on ACI or AKS.	No	Yes		Yes
`DeploymentEventAKS`	DeploymentEventAKS	AmlDeploymentEvent Events when a model deployment happens on ACI or AKS.	No	Yes		Yes
`DeploymentReadEvent`	DeploymentReadEvent	AmlDeploymentEvent Events when a model deployment happens on ACI or AKS.	No	Yes		Yes
`EnvironmentChangeEvent`	EnvironmentChangeEvent	AmlEnvironmentEvent Events when ML environments are accessed (read, created, or deleted).	No	Yes	Queries	Yes
`EnvironmentReadEvent`	EnvironmentReadEvent	AmlEnvironmentEvent Events when ML environments are accessed (read, created, or deleted).	No	Yes	Queries	Yes
`InferencingOperationACI`	InferencingOperationACI		No	No		Yes
`InferencingOperationAKS`	InferencingOperationAKS	AmlInferencingEvent Events for inference or related operation on AKS or ACI compute type.	No	Yes		Yes
`ModelsActionEvent`	ModelsActionEvent	AmlModelsEvent Events when ML model is accessed (read, created, or deleted). Incudes events when packaging of models and assets happen into a ready-to-build packages.	No	Yes	Queries	Yes
`ModelsChangeEvent`	ModelsChangeEvent	AmlModelsEvent Events when ML model is accessed (read, created, or deleted). Incudes events when packaging of models and assets happen into a ready-to-build packages.	No	Yes	Queries	Yes
`ModelsReadEvent`	ModelsReadEvent	AmlModelsEvent Events when ML model is accessed (read, created, or deleted). Incudes events when packaging of models and assets happen into a ready-to-build packages.	No	Yes	Queries	Yes
`PipelineChangeEvent`	PipelineChangeEvent	AmlPipelineEvent Events when ML pipeline draft or endpoint or module are accessed (read, created, or deleted).	No	Yes		Yes
`PipelineReadEvent`	PipelineReadEvent	AmlPipelineEvent Events when ML pipeline draft or endpoint or module are accessed (read, created, or deleted).	No	Yes		Yes
`RunEvent`	RunEvent	AmlRunEvent Events when ML experiments are accessed (read, created, or deleted).	No	Yes		Yes
`RunReadEvent`	RunReadEvent	AmlRunEvent Events when ML experiments are accessed (read, created, or deleted).	No	Yes		Yes

Supported resource logs for Microsoft.MachineLearningServices/workspaces/onlineEndpoints

Category	Category display name	Log table	Supports basic log plan	Supports ingestion-time transformation	Example queries	Costs to export
`AmlOnlineEndpointConsoleLog`	AmlOnlineEndpointConsoleLog	AmlOnlineEndpointConsoleLog Azure ML online endpoints console logs. It provides console logs output from user containers.	No	Yes	Queries	Yes
`AmlOnlineEndpointEventLog`	AmlOnlineEndpointEventLog	AmlOnlineEndpointEventLog Azure ML online endpoints event logs. It provides event logs regarding the inference-server container's life cycle.	No	No	Queries	Yes
`AmlOnlineEndpointTrafficLog`	AmlOnlineEndpointTrafficLog	AmlOnlineEndpointTrafficLog Traffic logs for AzureML (machine learning) online endpoints. The table could be used to check the detailed information of the request to an online endpoint. For example, you could use it to check the request duration, the request failure reason, etc.	No	No	Queries	Yes

Azure Monitor Logs tables

This section lists the Azure Monitor Logs tables relevant to this service, which are available for query by Log Analytics using Kusto queries. The tables contain resource log data and possibly more depending on what is collected and routed to them.

Machine Learning

Microsoft.MachineLearningServices/workspaces

Microsoft.MachineLearningServices/registries

Activity log

The linked table lists the operations that can be recorded in the activity log for this service. These operations are a subset of all the possible resource provider operations in the activity log.

For more information on the schema of activity log entries, see Activity Log schema.

The following table lists some operations related to Machine Learning that may be created in the activity log. For a complete listing of Microsoft.MachineLearningServices operations, see Microsoft.MachineLearningServices resource provider operations.

Operation	Description
Creates or updates a Machine Learning workspace	A workspace was created or updated
CheckComputeNameAvailability	Check if a compute name is already in use
Creates or updates the compute resources	A compute resource was created or updated
Deletes the compute resources	A compute resource was deleted
List secrets	On operation listed secrets for a Machine Learning workspace

Log schemas

Azure Machine Learning uses the following schemas.

AmlComputeJobEvent table

Property	Description
TimeGenerated	Time when the log entry was generated
OperationName	Name of the operation associated with the log event
Category	Name of the log event
JobId	ID of the Job submitted
ExperimentId	ID of the Experiment
ExperimentName	Name of the Experiment
CustomerSubscriptionId	SubscriptionId where Experiment and Job as submitted
WorkspaceName	Name of the machine learning workspace
ClusterName	Name of the Cluster
ProvisioningState	State of the Job submission
ResourceGroupName	Name of the resource group
JobName	Name of the Job
ClusterId	ID of the cluster
EventType	Type of the Job event. For example, JobSubmitted, JobRunning, JobFailed, JobSucceeded.
ExecutionState	State of the job (the Run). For example, Queued, Running, Succeeded, Failed
ErrorDetails	Details of job error
CreationApiVersion	Api version used to create the job
ClusterResourceGroupName	Resource group name of the cluster
TFWorkerCount	Count of TF workers
TFParameterServerCount	Count of TF parameter server
ToolType	Type of tool used
RunInContainer	Flag describing if job should be run inside a container
JobErrorMessage	detailed message of Job error
NodeId	ID of the node created where job is running

AmlComputeClusterEvent table

Property	Description
TimeGenerated	Time when the log entry was generated
OperationName	Name of the operation associated with the log event
Category	Name of the log event
ProvisioningState	Provisioning state of the cluster
ClusterName	Name of the cluster
ClusterType	Type of the cluster
CreatedBy	User who created the cluster
CoreCount	Count of the cores in the cluster
VmSize	Vm size of the cluster
VmPriority	Priority of the nodes created inside a cluster Dedicated/LowPriority
ScalingType	Type of cluster scaling manual/auto
InitialNodeCount	Initial node count of the cluster
MinimumNodeCount	Minimum node count of the cluster
MaximumNodeCount	Maximum node count of the cluster
NodeDeallocationOption	How the node should be deallocated
Publisher	Publisher of the cluster type
Offer	Offer with which the cluster is created
Sku	Sku of the Node/VM created inside cluster
Version	Version of the image used while Node/VM is created
SubnetId	SubnetId of the cluster
AllocationState	Cluster allocation state
CurrentNodeCount	Current node count of the cluster
TargetNodeCount	Target node count of the cluster while scaling up/down
EventType	Type of event during cluster creation.
NodeIdleTimeSecondsBeforeScaleDown	Idle time in seconds before cluster is scaled down
PreemptedNodeCount	Preempted node count of the cluster
IsResizeGrow	Flag indicating that cluster is scaling up
VmFamilyName	Name of the VM family of the nodes that can be created inside cluster
LeavingNodeCount	Leaving node count of the cluster
UnusableNodeCount	Unusable node count of the cluster
IdleNodeCount	Idle node count of the cluster
RunningNodeCount	Running node count of the cluster
PreparingNodeCount	Preparing node count of the cluster
QuotaAllocated	Allocated quota to the cluster
QuotaUtilized	Utilized quota of the cluster
AllocationStateTransitionTime	Transition time from one state to another
ClusterErrorCodes	Error code received during cluster creation or scaling
CreationApiVersion	Api version used while creating the cluster

AmlComputeInstanceEvent table

Property	Description
Type	Name of the log event, AmlComputeInstanceEvent
TimeGenerated	Time (UTC) when the log entry was generated
Level	The severity level of the event. Must be one of Informational, Warning, Error, or Critical.
ResultType	The status of the event. Typical values include Started, In Progress, Succeeded, Failed, Active, and Resolved.
CorrelationId	A GUID used to group together a set of related events, when applicable.
OperationName	The name of the operation associated with the log entry
Identity	The identity of the user or application that performed the operation.
AadTenantId	The Microsoft Entra tenant ID the operation was submitted for.
AmlComputeInstanceName	"The name of the compute instance associated with the log entry.

AmlDataLabelEvent table

Property	Description
Type	Name of the log event, AmlDataLabelEvent
TimeGenerated	Time (UTC) when the log entry was generated
Level	The severity level of the event. Must be one of Informational, Warning, Error, or Critical.
ResultType	The status of the event. Typical values include Started, In Progress, Succeeded, Failed, Active, and Resolved.
CorrelationId	A GUID used to group together a set of related events, when applicable.
OperationName	The name of the operation associated with the log entry
Identity	The identity of the user or application that performed the operation.
AadTenantId	The Microsoft Entra tenant ID the operation was submitted for.
AmlProjectId	The unique identifier of the Azure Machine Learning project.
AmlProjectName	The name of the Azure Machine Learning project.
AmlLabelNames	The label class names which are created for the project.
AmlDataStoreName	The name of the data store where the project's data is stored.

AmlDataSetEvent table

Property	Description
Type	Name of the log event, AmlDataSetEvent
TimeGenerated	Time (UTC) when the log entry was generated
Level	The severity level of the event. Must be one of Informational, Warning, Error, or Critical.
ResultType	The status of the event. Typical values include Started, In Progress, Succeeded, Failed, Active, and Resolved.
AmlWorkspaceId	A GUID and unique ID of the Azure Machine Learning workspace.
OperationName	The name of the operation associated with the log entry
Identity	The identity of the user or application that performed the operation.
AadTenantId	The Microsoft Entra tenant ID the operation was submitted for.
AmlDatasetId	The ID of the Azure Machine Learning Data Set.
AmlDatasetName	The name of the Azure Machine Learning Data Set.

AmlDataStoreEvent table

Property	Description
Type	Name of the log event, AmlDataStoreEvent
TimeGenerated	Time (UTC) when the log entry was generated
Level	The severity level of the event. Must be one of Informational, Warning, Error, or Critical.
ResultType	The status of the event. Typical values include Started, In Progress, Succeeded, Failed, Active, and Resolved.
AmlWorkspaceId	A GUID and unique ID of the Azure Machine Learning workspace.
OperationName	The name of the operation associated with the log entry
Identity	The identity of the user or application that performed the operation.
AadTenantId	The Microsoft Entra tenant ID the operation was submitted for.
AmlDatastoreName	The name of the Azure Machine Learning Data Store.

AmlDeploymentEvent table

Property	Description
Type	Name of the log event, AmlDeploymentEvent
TimeGenerated	Time (UTC) when the log entry was generated
Level	The severity level of the event. Must be one of Informational, Warning, Error, or Critical.
ResultType	The status of the event. Typical values include Started, In Progress, Succeeded, Failed, Active, and Resolved.
OperationName	The name of the operation associated with the log entry
Identity	The identity of the user or application that performed the operation.
AadTenantId	The Microsoft Entra tenant ID the operation was submitted for.
AmlServiceName	The name of the Azure Machine Learning Service.

AmlInferencingEvent table

Property	Description
Type	Name of the log event, AmlInferencingEvent
TimeGenerated	Time (UTC) when the log entry was generated
Level	The severity level of the event. Must be one of Informational, Warning, Error, or Critical.
ResultType	The status of the event. Typical values include Started, In Progress, Succeeded, Failed, Active, and Resolved.
OperationName	The name of the operation associated with the log entry
Identity	The identity of the user or application that performed the operation.
AadTenantId	The Microsoft Entra tenant ID the operation was submitted for.
AmlServiceName	The name of the Azure Machine Learning Service.

AmlModelsEvent table

Property	Description
Type	Name of the log event, AmlModelsEvent
TimeGenerated	Time (UTC) when the log entry was generated
Level	The severity level of the event. Must be one of Informational, Warning, Error, or Critical.
ResultType	The status of the event. Typical values include Started, In Progress, Succeeded, Failed, Active, and Resolved.
OperationName	The name of the operation associated with the log entry
Identity	The identity of the user or application that performed the operation.
AadTenantId	The Microsoft Entra tenant ID the operation was submitted for.
ResultSignature	The HTTP status code of the event. Typical values include 200, 201, 202 etc.
AmlModelName	The name of the Azure Machine Learning Model.

AmlPipelineEvent table

Property	Description
Type	Name of the log event, AmlPipelineEvent
TimeGenerated	Time (UTC) when the log entry was generated
Level	The severity level of the event. Must be one of Informational, Warning, Error, or Critical.
ResultType	The status of the event. Typical values include Started, In Progress, Succeeded, Failed, Active, and Resolved.
AmlWorkspaceId	A GUID and unique ID of the Azure Machine Learning workspace.
AmlWorkspaceId	The name of the Azure Machine Learning workspace.
OperationName	The name of the operation associated with the log entry
Identity	The identity of the user or application that performed the operation.
AadTenantId	The Microsoft Entra tenant ID the operation was submitted for.
AmlModuleId	A GUID and unique ID of the module.
AmlModelName	The name of the Azure Machine Learning Model.
AmlPipelineId	The ID of the Azure Machine Learning pipeline.
AmlParentPipelineId	The ID of the parent Azure Machine Learning pipeline (in the case of cloning).
AmlPipelineDraftId	The ID of the Azure Machine Learning pipeline draft.
AmlPipelineDraftName	The name of the Azure Machine Learning pipeline draft.
AmlPipelineEndpointId	The ID of the Azure Machine Learning pipeline endpoint.
AmlPipelineEndpointName	The name of the Azure Machine Learning pipeline endpoint.

AmlRunEvent table

Property	Description
Type	Name of the log event, AmlRunEvent
TimeGenerated	Time (UTC) when the log entry was generated
Level	The severity level of the event. Must be one of Informational, Warning, Error, or Critical.
ResultType	The status of the event. Typical values include Started, In Progress, Succeeded, Failed, Active, and Resolved.
OperationName	The name of the operation associated with the log entry
AmlWorkspaceId	A GUID and unique ID of the Azure Machine Learning workspace.
Identity	The identity of the user or application that performed the operation.
AadTenantId	The Microsoft Entra tenant ID the operation was submitted for.
RunId	The unique ID of the run.

AmlEnvironmentEvent table

Property	Description
Type	Name of the log event, AmlEnvironmentEvent
TimeGenerated	Time (UTC) when the log entry was generated
Level	The severity level of the event. Must be one of Informational, Warning, Error, or Critical.
OperationName	The name of the operation associated with the log entry
Identity	The identity of the user or application that performed the operation.
AadTenantId	The Microsoft Entra tenant ID the operation was submitted for.
AmlEnvironmentName	The name of the Azure Machine Learning environment configuration.
AmlEnvironmentVersion	The name of the Azure Machine Learning environment configuration version.

AMLOnlineEndpointTrafficLog table (preview)

Property	Description
Method	The requested method from client.
Path	The requested path from client.
SubscriptionId	The machine learning subscription ID of the online endpoint.
AzureMLWorkspaceId	The machine learning workspace ID of the online endpoint.
AzureMLWorkspaceName	The machine learning workspace name of the online endpoint.
EndpointName	The name of the online endpoint.
DeploymentName	The name of the online deployment.
Protocol	The protocol of the request.
ResponseCode	The final response code returned to the client.
ResponseCodeReason	The final response code reason returned to the client.
ModelStatusCode	The response status code from model.
ModelStatusReason	The response status reason from model.
RequestPayloadSize	The total bytes received from the client.
ResponsePayloadSize	The total bytes sent back to the client.
UserAgent	The user-agent header of the request, including comments but truncated to a max of 70 characters.
XRequestId	The request ID generated by Azure Machine Learning for internal tracing.
XMSClientRequestId	The tracking ID generated by the client.
TotalDurationMs	Duration in milliseconds from the request start time to the last response byte sent back to the client. If the client disconnected, it measures from the start time to client disconnect time.
RequestDurationMs	Duration in milliseconds from the request start time to the last byte of the request received from the client.
ResponseDurationMs	Duration in milliseconds from the request start time to the first response byte read from the model.
RequestThrottlingDelayMs	Delay in milliseconds in request data transfer due to network throttling.
ResponseThrottlingDelayMs	Delay in milliseconds in response data transfer due to network throttling.

For more information on this log, see Monitor online endpoints.

AMLOnlineEndpointConsoleLog

Property	Description
TimeGenerated	The timestamp (UTC) of when the log was generated.
OperationName	The operation associated with log record.
InstanceId	The ID of the instance that generated this log record.
DeploymentName	The name of the deployment associated with the log record.
ContainerName	The name of the container where the log was generated.
Message	The content of the log.

For more information on this log, see Monitor online endpoints.

AMLOnlineEndpointEventLog (preview)

Property	Description
TimeGenerated	The timestamp (UTC) of when the log was generated.
OperationName	The operation associated with log record.
InstanceId	The ID of the instance that generated this log record.
DeploymentName	The name of the deployment associated with the log record.
Name	The name of the event.
Message	The content of the event.

For more information on this log, see Monitor online endpoints.

See Monitor Machine Learning for a description of monitoring Machine Learning.
See Monitor Azure resources with Azure Monitor for details on monitoring Azure resources.

Share via

Azure Machine Learning monitoring data reference

Metrics

Supported metrics for Microsoft.MachineLearningServices/workspaces

Supported metrics for Microsoft.MachineLearningServices/workspaces/onlineEndpoints

Supported metrics for Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments

Metric dimensions

Resource logs

Supported resource logs for Microsoft.MachineLearningServices/registries

Supported resource logs for Microsoft.MachineLearningServices/workspaces

Supported resource logs for Microsoft.MachineLearningServices/workspaces/onlineEndpoints

Azure Monitor Logs tables

Machine Learning

Activity log

Log schemas

AmlComputeJobEvent table

AmlComputeClusterEvent table

AmlComputeInstanceEvent table

AmlDataLabelEvent table

AmlDataSetEvent table

AmlDataStoreEvent table

AmlDeploymentEvent table

AmlInferencingEvent table

AmlModelsEvent table

AmlPipelineEvent table

AmlRunEvent table

AmlEnvironmentEvent table

AMLOnlineEndpointTrafficLog table (preview)

AMLOnlineEndpointConsoleLog

AMLOnlineEndpointEventLog (preview)

Feedback

Additional resources

Share via

Azure Machine Learning monitoring data reference

Metrics

Supported metrics for Microsoft.MachineLearningServices/workspaces

Supported metrics for Microsoft.MachineLearningServices/workspaces/onlineEndpoints

Supported metrics for Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments

Metric dimensions

Resource logs

Supported resource logs for Microsoft.MachineLearningServices/registries

Supported resource logs for Microsoft.MachineLearningServices/workspaces

Supported resource logs for Microsoft.MachineLearningServices/workspaces/onlineEndpoints

Azure Monitor Logs tables

Machine Learning

Activity log

Log schemas

AmlComputeJobEvent table

AmlComputeClusterEvent table

AmlComputeInstanceEvent table

AmlDataLabelEvent table

AmlDataSetEvent table

AmlDataStoreEvent table

AmlDeploymentEvent table

AmlInferencingEvent table

AmlModelsEvent table

AmlPipelineEvent table

AmlRunEvent table

AmlEnvironmentEvent table

AMLOnlineEndpointTrafficLog table (preview)

AMLOnlineEndpointConsoleLog

AMLOnlineEndpointEventLog (preview)

Related content

Feedback

Additional resources