Monitoring and logging data
Applies to: AKS on Azure Stack HCI 22H2, AKS on Windows Server
This article describes how to monitor your Azure Kubernetes Service (AKS) deployment and collect logging data in AKS enabled by Azure Arc. You learn how to set up and access on-premises monitoring using Prometheus
and Graphana
, and how to collect and view logs using Elasticsearch
, Fluent Bit
, and Kibana
(EFK).
Two types of monitoring and logging solutions are available, as described in the following table:
Solution | Azure connectivity | Support and service | Cost | Deployment |
---|---|---|---|---|
Azure Monitor | Requires connecting the Kubernetes cluster to Azure using Azure Arc for Kubernetes. | Full support and servicing from Microsoft. | Requires signing up for the Azure Monitor service. | Use Azure Arc for monitoring clusters. |
On-premises monitoring and logging | Doesn't require Azure connectivity. | Supported as open-source software by Microsoft (with no support agreement or SLAs), the community, and/or external vendors. | Vendor-dependent. | Customer-driven. See Monitor clusters using on-premises monitoring. |
To use Azure Monitor with Kubernetes clusters, see the Azure Monitor overview.
Use on-premises monitoring
It's crucial that you monitor the health, performance, and resource usage of the control plane nodes and workloads on your cluster when running apps in production. The recommended monitoring solution includes the following two tools:
- Prometheus is a monitoring and alerting toolkit that you can use for monitoring containerized workloads. Prometheus works with different types of collectors and agents to collect metrics and store them in a database where you can query the data and view reports. AKS Arc makes it easy to deploy Prometheus, which is described later in this article.
- Grafana is a tool used to view, query, and visualize metrics on the Grafana dashboards. You can also configure Grafana to use Prometheus as the data source. You must have your own licensed copy of Grafana with AKS Arc.
Monitoring solution overview
As part of Prometheus solution in AKS enabled by Arc, the following components are deployed and automatically configured:
The deployment is based on the publicly available Kube-Prometheus-stack Helm chart, which is extended to support the Windows exporter and secures metrics scraping between Prometheus and agents. Once the Prometheus solution is deployed, the Node exporter runs on each Linux node, and the Windows exporter runs on each Windows node.
Note
Since the Prometheus operator, Prometheus, and Kube state metrics components are only supported on Linux, you must provision at least one Linux node in your AKS cluster to deploy this solution.
The objects and endpoints that the Prometheus solution scrapes include the following items:
- Kube state metrics to collect various metrics provided by Kubernetes
- Kubernetes API server
- Kubelet
- Node exporter to collect metrics for Linux nodes
- Windows exporter to collect metrics for Windows nodes
To view the Grafana dashboards available in AKS Arc, see Grafana dashboards available in AKS Arc.
Deploy monitoring solution using PowerShell
This section describes the two options you can use to deploy monitoring on a workload cluster.
Option 1: Deploy the monitoring solution when creating the workload cluster
To enable monitoring, provide the -enableMonitoring
parameter when you use New-AksHciCluster to create the workload cluster, as shown in the following example:
New-AksHciCluster -name mynewcluster -enableMonitoring
Monitoring is installed with the following default configuration:
- The size of the persistent volume that's provisioned to store metrics (
storageSizeGB
) is 100 GB. - The retention time for collected metrics (
retentionTimeHours
) is 240 hours (or 10 days).
Option 2: Deploy the monitoring solution on an existing workload cluster
Run the Install-AksHciMonitoring command to deploy the monitoring solution on an existing workload cluster, as follows:
Install-AksHciMonitoring -Name mycluster -storageSizeGB 100 -retentionTimeHours 240
The -storageSizeGB
parameter sets the size of the persistent volume that's provisioned to store metrics, and the -retentionTimeHours
parameter sets the amount of time the collected metrics are retained.
The monitoring solution is installed in a separate namespace called monitoring
and uses a StorageClass called monitoring-sc
. Prometheus is exposed on an internal endpoint that is accessible only within the cluster at http://akshci-monitoring-prometheus-svc.monitoring:9090
.
Uninstall monitoring solution using PowerShell
Run the Uninstall-AksHciMonitoring
PowerShell command to uninstall the AKS Arc monitoring solution, as follows:
Uninstall-AksHciMonitoring -Name <target cluster name>
The uninstall process removes everything, including the namespace, the StorageClass, and the actual data and metrics of the persistent volume.
Deploy Grafana, and configure it to use Prometheus
You can follow any guidance for deploying Grafana that's publicly available. You can also view Microsoft's deployment guidance to use Grafana, which details how to deploy and configure Grafana to connect it to an AKS Prometheus instance. This GitHub page also describes how to add Grafana dashboards that Microsoft makes available for AKS enabled by Arc.
On-premises logging
Logging is crucial for troubleshooting and diagnostics. The logging solution in AKS Arc is based on Elasticsearch, Fluent Bit, and Kibana (EFK). These components are all deployed as containers:
- Fluent Bit is the log processor and forwarder that collects data and logs from different sources. It then formats, unifies, and stores them in Elasticsearch.
- Elasticsearch is a distributed search and analytics engine capable of centrally storing the logs for fast searches and data analytics.
- Kibana provides interactive visualizations on a web dashboard. This tool lets you view and query logs stored in Elasticsearch, and then you can visualize them through graphs and dashboards.
To set up an on-premises logging solution, see the steps to set up logging to access Kibana. This article includes all the components required to collect, aggregate, and query container logs across the cluster.
For advanced configuration steps, see Windows logging.