Monitor Azure Kubernetes Service (AKS) control plane metrics (Preview)

In this article, you learn how to use the control plane metrics (preview) feature to monitor the Azure Kubernetes Service (AKS) control plane.

The control plane metrics feature is fully compatible with Prometheus and Grafana. The feature provides more visibility into the availability and performance of the control plane components, such as the API server, ETCD, Scheduler, Autoscaler, and controller manager. You can use these metrics to maximize overall observability and maintain operational excellence for your AKS cluster.

Prerequisites and limitations

Install the aks-preview extension

Important

AKS preview features are available on a self-service, opt-in basis. Previews are provided "as is" and "as available," and they're excluded from the service-level agreements and limited warranty. AKS previews are partially covered by customer support on a best-effort basis. As such, these features aren't meant for production use. For more information, see the following support articles:

  • Install or update the aks-preview Azure CLI extension using the az extension add or az extension update command.

    # Install the aks-preview extension
    az extension add --name aks-preview
    
    # Update the aks-preview extension
    az extension update --name aks-preview
    

Register the AzureMonitorMetricsControlPlanePreview flag

  1. Register the AzureMonitorMetricsControlPlanePreview feature flag using the az feature register command.

    az feature register --namespace "Microsoft.ContainerService" --name "AzureMonitorMetricsControlPlanePreview"
    

    It takes a few minutes for the status to show Registered.

  2. Verify the registration status using the az feature show command.

    az feature show --namespace "Microsoft.ContainerService" --name "AzureMonitorMetricsControlPlanePreview"
    
  3. When the status reflects Registered, refresh the registration of the Microsoft.ContainerService resource provider using the az provider register command.

    az provider register --namespace "Microsoft.ContainerService"
    

Enable control plane metrics on your AKS cluster

You can enable control plane metrics with the Azure Monitor managed service for Prometheus add-on when creating a new cluster or updating an existing cluster.

Note

Unlike the metrics collected from cluster nodes, control plane metrics are collected by a component that isn't part of the ama-metrics add-on. Enabling the AzureMonitorMetricsControlPlanePreview feature flag and the managed Prometheus add-on ensures control plane metrics are collected. After enabling metric collection, it can take several minutes for the data to appear in the workspace.

Enable control plane metrics on a new AKS cluster

Enable control plane metrics on an existing AKS cluster

  • If your cluster already has the Prometheus add-on, update the cluster to ensure it starts collecting control plane metrics using the az aks update command.

    az aks update --name $CLUSTER_NAME --resource-group $RESOURCE_GROUP
    

Query control plane metrics

Control plane metrics are stored in an Azure Monitor workspace in the cluster's region. You can query the metrics directly from the workspace or through the Azure managed Grafana instance connected to the workspace.

  1. In the Azure portal, navigate to your AKS cluster resource.

  2. From the service menu, under Monitoring, select Insights.

    Screenshot of the Azure Monitor workspace.

Note

AKS provides dashboard templates to help you view and analyze your control plane telemetry data in real time. If you're using Azure managed Grafana to visualize the data, you can import the following dashboards:

Customize control plane metrics

Warning

A bug prevents customization of metrics for AKS Control Plane components. Changes to the configMap do not take effect. Follow this issue issue for updates.

AKS includes a preconfigured set of metrics to collect and store for each component. API server and etcd are enabled by default. You can customize this list through the ama-settings-configmap.

The default targets include the following values:

controlplane-apiserver = true
controlplane-cluster-autoscaler = false
controlplane-kube-scheduler = false
controlplane-kube-controller-manager = false
controlplane-etcd = true

All ConfigMaps should be applied to the kube-system namespace for any cluster.

Customize ingestion profile

For more information about minimal-ingestion profile metrics, see Minimal ingestion profile for control plane metrics in managed Prometheus.

Ingest only minimal metrics from default targets

  • Set default-targets-metrics-keep-list.minimalIngestionProfile="true", which ingests only the minimal set of metrics for each of the default targets: controlplane-apiserver and controlplane-etcd.

Ingest all metrics from all targets

  1. Download the ConfigMap file ama-metrics-settings-configmap.yaml and rename it to configmap-controlplane.yaml.

  2. Set minimalingestionprofile = false.

  3. Under default-scrape-settings-enabled, verify that the targets you want to scrape are set to true. The only targets you can specify are: controlplane-apiserver, controlplane-cluster-autoscaler, controlplane-kube-scheduler, controlplane-kube-controller-manager, and controlplane-etcd.

  4. Apply the ConfigMap using the kubectl apply command.

    kubectl apply -f configmap-controlplane.yaml
    

    After applying the configuration, it takes several minutes for the metrics from the specified targets scraped from the control plane to appear in the Azure Monitor workspace.

Ingest a few other metrics in addition to minimal metrics

The minimal ingestion profile setting helps reduce the ingestion volume of metrics, as it only collects metrics used by default dashboards, default recording rules, and default alerts are collected.

  1. Download the ConfigMap file ama-metrics-settings-configmap and rename it to configmap-controlplane.yaml.

  2. Set minimalingestionprofile = true.

  3. Under default-scrape-settings-enabled, verify that the targets you want to scrape are set to true. The only targets you can specify are: controlplane-apiserver, controlplane-cluster-autoscaler, controlplane-kube-scheduler, controlplane-kube-controller-manager, and controlplane-etcd.

  4. Under default-targets-metrics-keep-list, specify the list of metrics for the true targets. For example:

    controlplane-apiserver= "apiserver_admission_webhook_admission_duration_seconds| apiserver_longrunning_requests"
    
  5. Apply the ConfigMap using the kubectl apply command.

    kubectl apply -f configmap-controlplane.yaml
    

    After applying the configuration, it takes several minutes for the metrics from the specified targets scraped from the control plane to appear in the Azure Monitor workspace.

Ingest only specific metrics from some targets

  1. Download the ConfigMap file ama-metrics-settings-configmap and rename it to configmap-controlplane.yaml.

  2. Set minimalingestionprofile = false.

  3. Under default-scrape-settings-enabled, verify that the targets you want to scrape are set to true. The only targets you can specify here are controlplane-apiserver, controlplane-cluster-autoscaler, controlplane-kube-scheduler,controlplane-kube-controller-manager, and controlplane-etcd.

  4. Under default-targets-metrics-keep-list, specify the list of metrics for the true targets. For example:

    controlplane-apiserver= "apiserver_admission_webhook_admission_duration_seconds| apiserver_longrunning_requests"
    
  5. Apply the ConfigMap using the kubectl apply command.

    kubectl apply -f configmap-controlplane.yaml
    

    After applying the configuration, it takes several minutes for the metrics from the specified targets scraped from the control plane to appear in the Azure Monitor workspace.

Troubleshoot control plane metrics issues

Make sure the feature flag AzureMonitorMetricsControlPlanePreview is enabled and the ama-metrics pods are running.

Note

The troubleshooting methods for Azure managed service Prometheus don't directly translate here, as the components scraping the control plane aren't present in the managed Prometheus add-on.

  • ConfigMap formatting: Make sure you're using proper formatting in the ConfigMap and that the fields, specifically default-targets-metrics-keep-list, minimal-ingestion-profile, and default-scrape-settings-enabled, are correctly populated with their intended values.
  • Isolate control plane from data plane: Start by setting some of the node related metrics to true and verify the metrics are being forwarded to the workspace. This helps determine if the issue is specific to scraping control plane metrics.
  • Events ingested: Once you apply the changes, you can open metrics explorer from the Azure Monitor overview page or from the Monitoring section of the selected cluster and check for an increase or decrease in the number of events ingested per minute. It should help you determine if a specific metric is missing or if all metrics are missing.
  • Specific metric isn't exposed: There are cases where metrics are documented, but aren't exposed from the target and aren't forwarded to the Azure Monitor workspace. In this case, it's necessary to verify other metrics are being forwarded to the workspace.
  • No access to the Azure Monitor workspace: When you enable the add-on, you might specify an existing workspace that you don't have access to. In that case, it might look like the metrics aren't being collected and forwarded. Make sure that you create a new workspace while enabling the add-on or while creating the cluster.

Disable control plane metrics on your AKS cluster

You can disable control plane metrics at any time by disabling the managed Prometheus add-on and unregistering the AzureMonitorMetricsControlPlanePreview feature flag.

  1. Remove the metrics add-on that scrapes Prometheus metrics using the az aks update command.

    az aks update --disable-azure-monitor-metrics --name $CLUSTER_NAME --resource-group $RESOURCE_GROUP
    
  2. Disable scraping of control plane metrics on the AKS cluster by unregistering the AzureMonitorMetricsControlPlanePreview feature flag using the az feature unregister command.

    az feature unregister "Microsoft.ContainerService" --name "AzureMonitorMetricsControlPlanePreview"
    

FAQ

Can I scrape control plane metrics with self hosted Prometheus?

No, you currently can't scrape control plane metrics with self hosted Prometheus. Self hosted Prometheus can only scrape the single instance depending on the load balancer. The metrics aren't reliable, as there are often multiple replicas of the control plane metrics are only visible through managed Prometheus

Why isn't the user agent available through the control plane metrics?

Control plane metrics in Kubernetes don't have the user agent. The user agent is only available through the control plane logs available in the diagnostic settings.

Next steps

For more information about monitoring AKS, see Monitor Azure Kubernetes Service (AKS).