Monitor a cloud environment

You need observability of your cloud environment to help ensure that your workloads run smoothly, whether you're a business owner, platform owner, or application owner. You need to know if:

  • Your applications are available and if they perform to your customers' expectations.
  • You have any security threats that require investigation.
  • Your consumption costs are within the expected range.

Monitoring is the process of collecting, analyzing, and acting on telemetry that indicates the health of your platform, resources, and applications. An effective monitoring environment includes your entire cloud estate, which might include resources across multiple clouds and on-premises environments.

Observability is a property of a system that measures how well its internal states can be inferred from its external outputs. You need to deploy services and processes to monitor your cloud environment. And you need to have the ability to observe and understand the behavior of your services that run in the cloud.

Benefits of monitoring

Invest in your monitoring environment to get the following benefits across multiple aspects of your cloud:

  • Availability and performance: Monitor resources to help ensure that your cloud services and applications are available and perform as expected. To identify and respond to problems before they affect users, track key metrics and configure alert rules.

  • Cost Optimization: Use monitoring to track resource usage and scale resources according to demand. This approach helps prevent overprovisioned and underused resources, which optimizes cost. Monitoring can also identify and alert you to any cost overruns or unexpected spikes in usage.

  • Compliance: Use monitoring to maintain logs and records of activities, which help ensure that cloud services comply with policies and regulations. Reports that use this data can assist with regular audits and compliance checks.

  • Security: Implement continuous monitoring to help detect security threats and vulnerabilities so that you can immediately act to protect data and resources. You can also analyze collected data for threat detection and response.

Monitoring platforms

An effective monitoring strategy includes all platforms in your computing environment. In addition to Azure, you might have on-premises, multicloud, and edge resources. Each resource requires the same levels of monitoring. Follow the Cloud Adoption Framework for Azure guidance, and include monitoring in your unified operations strategy. In this strategy, the primary cloud hosts your monitoring tools and other management tools. The monitoring tools monitor all resources across all platforms.

Conceptual diagram that shows the unified operations strategy.

Types of monitoring

Monitoring is a multifaceted discipline that requires a combination of tools, processes, and practices. The following table breaks down various types of monitoring. Different services and features might provide different combinations of these monitoring types. But a comprehensive monitoring environment includes all of these monitoring types across each of the platforms in your computing environment.

Type Description
Infrastructure Infrastructure monitoring includes the performance and availability of cloud resources, such as virtual machines, storage resources, and networks. This type of monitoring helps ensure that the underlying infrastructure functions optimally, which helps maintain the availability and performance of the applications that rely on it.
Application performance monitoring (APM) APM monitors the performance and availability of applications that run in the cloud. It tracks metrics such as response times, error rates, and transaction volumes. APM identifies performance bottlenecks and helps ensure that applications meet user expectations.
Database Database monitoring tracks the performance, availability, and resource consumption of cloud databases. Key metrics include query performance, index usage, and lock status.
Network Network monitoring tracks the performance and availability of network components in the cloud environment. Metrics include bandwidth usage, latency, and packet loss.
Security Security monitoring tracks and analyzes security events and vulnerabilities within the cloud environment, including unauthorized access, malware, and compliance violations. Effective security monitoring helps protect sensitive data, ensure compliance with regulatory requirements, and prevent costly security breaches.
Compliance Compliance monitoring helps ensure that the cloud environment adheres to regulatory and industry standards. It tracks configurations, access controls, and data-handling practices to help ensure compliance with relevant regulations.
Cost Cost monitoring tracks cloud spending and resource usage to identify cost-saving opportunities and prevent budget overruns. It monitors resource usage, identifies underused resources, and optimizes resource configurations to help reduce costs.

Shared responsibilities

In an on-premises environment, you're responsible for all aspects of monitoring because you own and manage all computing resources. In the cloud, you share this responsibility with your cloud provider. Depending on the type of deployment model that you choose, the responsibilities for monitoring various layers of the cloud stack might transfer from you to your cloud provider.

In an infrastructure as a service (IaaS) deployment, the cloud provider monitors the underlying cloud platform, such as the physical infrastructure and virtualization layer. And you monitor the operating system, applications, and data that run on the virtual machines that you deploy to the cloud platform. When the deployment model moves up the stack, the cloud provider takes on more responsibility to monitor the environment. This responsibility culminates in a software as a service (SaaS) deployment because you transfer monitoring responsibility to the cloud provider for the entire stack, including the application and data.

Diagram that shows shared responsibilities for monitoring in the cloud.

You might use monitoring tools from the cloud provider to monitor your layers of the stack, but you're responsible for configuring these tools and analyzing the data that they collect. You need to grant access to various members of your organization and create dashboards and alerts to help them distinguish critical information. You might also need to integrate these components with other tools and ticketing systems that your organization uses.

The cloud provider must perform the same types of service for their layers of the stack that you provide to your internal customers. They must continuously monitor the health and performance of the platform that they contract to you. They provide you with dashboards and alerts to proactively notify you of any service problems. Much like your internal customers, you don't need visibility into the intricacies of how the cloud provider monitors their platform, only that they meet the service-level agreements that you contract with them.

Roles and responsibilities

Most enterprise organizations have a centralized operations team that monitors the overall health and performance of the cloud environment.

This team typically:

  • Sets the strategies for the overall company.
  • Performs centralized configuration of the monitoring environment.
  • Delegates permissions to stakeholders in your organization that require access to the monitoring data that's related to their applications and services.

Organizations have multiple roles that maintain the monitoring environment and that require access to monitoring data to perform their job functions. Each role has different requirements to monitor data based on their particular responsibilities. Depending on the size of your organization, you might have multiple individuals that fill each role, or you might have one individual that fills multiple roles.

Individual organizations might distribute responsibilities differently. The following table shows an example of the roles and responsibilities for a typical organization.

Role Description
Cloud architect The cloud architect designs and oversees the cloud infrastructure to help ensure that it meets the organization's business goals. The cloud architect focuses on reliability, security, and scalability of the cloud architecture. They require high-level telemetry to get a holistic view of the digital estate. This telemetry includes resource usage metrics, APM metrics, cost and billing insights, and compliance reports.
Platform engineer The platform engineer builds and manages the platform that developers use to deploy their applications. The platform engineer might create continuous integration and continuous delivery (CI/CD) pipelines, manage cloud infrastructure as code (IaC), and ensure the scalability and reliability of the platform. The platform engineer requires telemetry about the platform's operational status. This telemetry includes container performance metrics, orchestration logs, IaC validation, and service availability.
System administrator The system administrator manages and maintains servers, operating systems, and other infrastructure components in the cloud. They perform backups, troubleshoot problems, and ensure that systems are up to date. The system administrator requires server and OS-level telemetry, including CPU, memory, and disk usage, network performance, and system logs.
Security engineer The security engineer implements and manages security measures to help protect data and applications from threats. The security engineer handles everything from identity management to threat detection and response. They use telemetry about security events, including access logs, threat-detection alerts, vulnerability assessments, and compliance metrics.
Network administrator The network administrator manages and maintains the cloud network to help ensure that data flows securely and efficiently between servers, applications, and users. The network administrator handles network configurations, monitors performance, and implements security measures. They require network-centric telemetry, including network traffic analysis, latency measurements, packet loss, and firewall logs.
Database administrator (DBA) The DBA manages and maintains databases to help ensure data integrity, performance, and availability. The DBA handles database backups and recovery and optimizes queries for efficiency. They use telemetry about database performance and integrity, including query performance metrics, database response times, transaction logs, and backup or recovery status.
Developer The developer designs, writes, tests, and maintains the software that runs on cloud platforms. The developer creates features and fixes bugs to help ensure that the application remains secure and performs well. They require application-specific telemetry, including error rates, latency, response times, user behavior analytics, and feature usage metrics.

Azure facilitation

Azure has many services that support the different types of monitoring that you need in your cloud environment. Each service targets one or more roles. Combine services to provide the features that you need for a comprehensive monitoring environment.

Service Description Type Roles
Azure Monitor Azure Monitor is at the center of the Azure monitoring ecosystem. It's a comprehensive monitoring solution that you can use to collect, analyze, and respond to monitoring data from your cloud and on-premises environments. Azure Monitor provides complete monitoring of your infrastructure, network, and applications. It also provides a data platform and core features, such as data analysis, visualization, and alerting for other services. Infrastructure,
database,
compliance
Cloud architect,
platform engineer,
system administrator,
DBA
Application Insights Application Insights is a feature of Azure Monitor that provides APM monitoring for your cloud applications. APM Developer
Azure Network Watcher Network Watcher provides monitoring and visualization capabilities for network resources in Azure. Use this service to monitor, diagnose, and view metrics. You can also enable or disable logs for resources in an Azure virtual network. Network Network administrator
Microsoft Sentinel Microsoft Sentinel is a cloud-native security information event management (SIEM) and security orchestration automated response (SOAR) solution. It ingests security telemetry from your Azure resources and other components to provide cyber-threat detection, investigation, response, and proactive hunting. Security Security engineer
Microsoft Defender XDR Defender XDR includes Microsoft security solutions that are native to the Azure platform, client and server Microsoft operating systems, and applications including Office 365, Exchange Online, and SharePoint in Microsoft 365. Each security solution uses AI and machine learning to correlate telemetry and determine if investigations are necessary. When they detect unacceptable behavior, they take action to prevent disruption. Security Security engineer
Microsoft Cost Management Cost Management is a suite of tools that you can use to analyze, monitor, and optimize your Microsoft Cloud costs. Cost Management is available to anyone that has access to a billing account, subscription, resource group, or management group. Cost Cloud architect
Azure Service Health Service Health provides a health status of the services that your Azure resources rely on. It can inform you of any service outages and provide a personalized view of the health of your Azure services and regions. Infrastructure Cloud provider

Reference for monitoring Azure services

The following table provides links to the monitoring guidance for nearly every Azure service.

Azure service Guidance to monitor the Azure service Monitoring data definitions
(where available)
Monitor deployments
(select services)
Azure AI Agent service Management center overview How to enable tracing in Azure AI Agents
Azure AI Foundry Management center overview Monitor your Generative AI Applications

Evaluation and monitoring metrics for generative AI

Use Risks & Safety monitoring

Monitor prompt flow deployments

Run evaluations online

View evaluation results in Azure AI Foundry portal

Visualize your traces
Azure AI Search Monitor Azure AI Search Monitor queries

Analyze performance in Azure AI Search

Collect telemetry data for search traffic analytics

Visualize Azure AI Search Logs and Metrics with Power BI
Azure AI services Enable diagnostic logging Azure AI services
Azure AI Video Indexer Monitor Azure AI Video Indexer Monitoring Azure AI Video Indexer data reference
Azure Analysis Services Monitor Azure Analysis Services Monitoring data reference for Azure Analysis Services
Azure API for FHIR Enable diagnostic logging in Azure API for FHIR
Azure API Management Monitor Azure API Management Monitoring data reference for Azure API Management Observability in Azure API Management

Monitor published APIs in Azure API Management

Integrate Azure API Management with Application Insights

Monitor your APIs
Azure App Configuration Monitor Azure App Configuration Monitoring Azure App Configuration data reference
Azure App Service Monitor Azure App Service Azure App Service monitoring data reference Enable application monitoring

Monitor App Service instances by using Health Check

Azure App Service diagnostics overview
Azure Application Gateway Monitor Azure Application Gateway Monitoring data reference for Azure Application Gateway Application Gateway health probes overview

Backend health

Azure Monitor metrics for Application Gateway
Azure Arc resource bridge Azure Arc resource bridge maintenance operations
Azure Arc site manager How to configure Azure Monitor alerts for a site
Azure Arc-enabled data services Azure Data Studio dashboards for Azure Arc
Azure Arc-enabled Kubernetes Enable monitoring for Azure Kubernetes Service (AKS) cluster
Azure Arc-enabled servers Monitor a hybrid machine with Azure Monitor VM insights
Azure Archive Storage Monitor Azure Blob Storage Monitoring data reference for Azure Blob Storage
Azure Automation Forward Azure Automation job data to Azure Monitor logs
Azure Backup Monitoring and reporting solutions for Azure Backup
Azure Batch Monitor Azure Batch Monitoring data reference for Azure Batch
Azure Blob Storage Monitor Azure Blob Storage Monitoring data reference for Azure Blob Storage Best practices for monitoring Azure Blob Storage

Monitoring your storage service with Azure Monitor Storage insights
Azure Cache for Redis Monitor Azure Cache for Redis Monitoring data reference for Azure Cache for Redis Azure Monitor insights for Azure Cache for Redis
Azure Chaos Studio Set up Azure monitor for a Chaos Studio experiment
Azure Communication Services Monitor SMS
Monitor Voice and video
Monitor Chat
Monitor Phone calling
Monitor Email
Azure confidential ledger Verify Azure Confidential Ledger write transaction receipts
Azure Container Apps Monitor Azure Container Apps Monitor Azure Container Apps metrics Application logging in Azure Container Apps

Health probes in Azure Container Apps
Azure Container Instances Monitor Azure Container Instances Monitoring data reference for Container Instances Configure liveness probes
Azure Container Registry Monitor Azure Container Registry Monitoring Data Reference for Azure Container Registry
Azure Container Storage Enable monitoring for Azure Container Storage
Azure Cosmos DB Monitor Azure Cosmos DB Monitoring data reference for Azure Cosmos DB Azure Cosmos DB insights

Monitor and debug with insights in Azure Cosmos DB

Monitor throughput or request unit usage of an operation in Azure Cosmos DB

Query execution metrics

Index metrics

Explore Azure Monitor in vCore-based Azure Cosmos DB for MongoDB (vCore)
Azure CycleCloud Monitor Azure CycleCloud
Azure Data Box Track and log Azure Data Box, Azure Data Box Heavy events for import order
Azure Data Explorer Monitor Azure Data Explorer Monitoring data reference for Azure Data Explorer
Azure Data Lake Storage Monitor Azure Blob Storage Monitoring data reference for Azure Blob Storage
Azure Database for MySQL Monitoring - Azure Database for MySQL Best practices for monitoring
Azure Database for PostgreSQL Monitoring and metrics - Azure Database for PostgreSQL Intelligent tuning Monitor performance with query store
Azure Databricks Configure diagnostic log delivery for Azure Databricks
Azure DDoS Protection Monitor Azure DDoS Protection Monitor Azure DDoS Protection monitoring data reference
Azure Dedicated Host Monitor Azure Virtual Machines Monitoring data reference for Azure Virtual Machines
Azure Dedicated HSM Monitoring options - Azure Dedicated HSM
Azure DevTest Labs Activity logs - Azure DevTest Labs
Azure Digital Twins Monitor your instance in Azure Digital Twins
Azure Disk Storage Monitor Azure Virtual Machines Monitoring data reference for Azure Virtual Machines
Azure DNS Metrics and alerts
Azure Elastic SAN Metrics for Azure Elastic SAN
Azure Event Grid Enable diagnostic logs for Event Grid resources Monitor data reference (push delivery)
Azure Event Hubs Monitor Azure Event Hubs Monitoring data reference for Azure Event Hubs
Azure ExpressRoute Monitor Azure ExpressRoute Monitoring data reference for Azure ExpressRoute
Azure Files Monitor Azure Files using Azure Monitor Monitoring data reference for Azure Files
Azure Firewall Monitor Azure Firewall Monitoring data reference for Azure Firewall
Azure Front Door Monitor Azure Front Door Monitoring data reference for Azure Front Door Azure Front Door reports

Health probes
Azure Functions Monitor Azure Functions Monitoring data reference for Azure Functions Monitor Azure Functions with Azure Monitor Application Insights

Monitor executions in Azure Functions
Azure FXT Edge Filer Monitor the Azure FXT Edge Filer
Azure HDInsight Monitor Azure HDInsight Monitoring data reference for Azure HDInsight
Azure Health Data Services Logging for Azure Health Data Services
Azure HPC Cache Azure HPC Cache metrics and monitoring
Azure IoT Manage your IoT solution
Azure IoT Central Manage and monitor IoT Central
Azure IoT Edge Tutorial - Azure Monitor workbooks for IoT Edge
Azure IoT Hub Monitor Azure IoT Hub Monitoring data reference for Azure IoT Hub
Azure IoT Operations Deploy observability resources
Azure Key Vault Monitor Azure Key Vault Monitoring data reference for Azure Key Vault Monitoring your key vault service with Key Vault insights

Configure Azure Key Vault alerts

Azure Key Vault monitoring data reference

Azure Key Vault logging

Enable Key Vault logging

Monitoring Key Vault with Azure Event Grid
Azure Kubernetes Service (AKS) Monitor Azure Kubernetes Service (AKS) Monitoring data reference for Azure Kubernetes Service Zero instrumentation application monitoring for Kubernetes

Full stack monitoring

Best practices for monitoring Kubernetes with Azure Monitor
Azure Lab Services Track usage of a lab in Azure Lab Services
Azure Lighthouse Monitor delegated resources at scale
Azure Load Balancer Monitor Azure Load Balancer Monitoring data reference for Azure Load Balancer
Azure Load Testing Monitoring Azure Load Testing Monitor Azure Load Testing data reference
Azure Local Overview of Azure Local monitoring
Azure Logic Apps Monitor Azure Logic Apps Monitoring data reference for Azure Logic Apps
Azure Machine Learning Monitor Azure Machine Learning Azure Machine Learning monitoring data reference Azure Machine Learning model monitoring

Monitor performance of models deployed to production

Monitor online endpoints
Azure Managed Grafana Monitor an Azure Managed Grafana instance with logs
Azure Managed Instance for Apache Cassandra Monitor Azure Managed Instance for Apache Cassandra
Azure Managed Lustre Monitor Azure Managed Lustre Monitoring data reference for Azure Managed Lustre
Azure NAT Gateway Monitor Azure NAT Gateway Monitoring data reference for Azure NAT Gateway
Azure NetApp Files Ways to monitor Azure NetApp Files Metrics for Azure NetApp Files
Azure Notification Hubs Monitor Azure Notification Hubs Monitoring data reference for Azure Notification Hubs
Azure OpenAI Service Monitor Azure OpenAI Service

Use Risks & Safety monitoring
Azure Operator Nexus Azure Operator Nexus: observability using Azure Monitor
Azure Private 5G Core Monitor Azure Private 5G Core with Azure Monitor platform metrics

Monitor with correlated metrics in Azure portal
Azure Private Link Monitor Azure Private Link Monitoring data reference for Azure Private Link
Azure Power BI Embedded Monitor Power BI Embedded Monitoring data reference for Power BI Embedded
Azure Queue Storage Monitor Azure Queue Storage Monitoring data reference for Azure Queue Storage
Azure Red Hat OpenShift Monitor Azure Red Hat OpenShift
Azure Service Bus Monitor Azure Service Bus Monitoring data reference for Azure Service Bus Azure Monitor - Service Bus insights
Azure Service Fabric Monitor Azure Service Fabric Monitoring data reference for Azure Service Fabric
Azure SignalR Service Monitor Azure SignalR Service Monitoring data reference for Azure SignalR Service
Azure Site Recovery Monitor Azure Site Recovery Monitoring data reference for Azure Site Recovery
Azure Sphere Overview Monitor Azure Sphere resources Monitor Azure Sphere data reference
Azure Spot Virtual Machines Monitor Azure Virtual Machines Monitoring data reference for Azure Virtual Machines
Azure SQL Database Monitor Azure SQL Database Monitoring data reference for Azure SQL Database Monitor Azure SQL workloads with database watcher

Tune applications and databases for performance in Azure SQL Database
Azure SQL Edge Troubleshoot Azure SQL Edge deployments
Azure SQL Managed Instance Monitor Azure SQL Managed Instance Tune applications and databases for performance in Azure SQL Managed Instance
Azure Stack Edge Enable Azure Monitor on Azure Stack Edge Pro GPU device
Azure Stack Hub Monitor health and alerts in Azure Stack Hub
Azure Static Web Apps Monitor Azure Static Web Apps Supported metrics for managed Functions in Azure Static Web Apps
Azure Synapse Analytics Monitor Azure Synapse Analytics Monitoring data reference for Azure Synapse Analytics
Azure Table Storage Monitor Azure Table Storage Monitoring data reference for Azure Table Storage
Azure Update Manager Create alerts in Azure Update Manager
Azure Virtual Machine Scale Sets Monitor Azure Virtual Machines Application Insights for Azure VMs and virtual machine scale sets
Azure Virtual Network Monitor Azure Virtual Network Monitoring data reference for Azure Virtual Network
Azure Virtual WAN Monitor Azure Virtual WAN Monitoring data reference for Azure Virtual WAN
Azure VMware Solution Monitor and protect VMs with Azure native services
Azure Virtual Desktop Monitor Azure Virtual Desktop
Azure Virtual Machines Monitor Azure Virtual Machines Monitoring data reference for Azure Virtual Machines Application Insights for Azure VMs and virtual machine scale sets

VM Watch

Availability monitoring
Azure VPN Gateway Monitor Azure VPN Gateway Monitoring data reference for Azure VPN Gateway
Azure Web Application Firewall Web Application Firewall + Azure Front Door

Web Application Firewall + Application Gateway
Resource logs for Azure Web Application Firewall
Azure Web PubSub Monitor Azure Web PubSub Monitoring Azure Web PubSub data reference
Data Factory in Microsoft Fabric Monitor Data Factory
Microsoft Dev Box Monitoring Microsoft Dev Box data reference
Microsoft Entra Domain Services Check the health of Microsoft Entra Domain Services
Microsoft Entra External ID Azure Monitor in external tenants
Microsoft Entra ID What is Microsoft Entra monitoring and health?
Microsoft Sentinel Auditing and health monitoring in Microsoft Sentinel
Microsoft Dev Box Monitoring Microsoft Dev Box data reference
Microsoft Entra Domain Services Check the health of Microsoft Entra Domain Services
Microsoft Entra External ID Azure Monitor in external tenants
Microsoft Entra ID What is Microsoft Entra monitoring and health?
Microsoft Dev Box Monitoring Microsoft Dev Box data reference
Multicloud connector enabled by Azure Arc View multicloud inventory with the multicloud connector enabled by Azure Arc