Monitor your Azure cloud estate
This article explains how to plan, configure, and optimize monitoring across Azure while integrating data from other clouds, on-premises, and edge environments. Monitoring your Azure cloud estate involves continuously observing and analyzing the performance, health, and security of your cloud resources and applications. A strong monitoring strategy includes proactive monitoring to catch problems early and reactive monitoring to trigger alerts and automate responses when unexpected events occur.
Understand your monitoring scope
Your monitoring scope defines your monitoring responsibilities. In a cloud environment, you share monitoring responsibilities, which differ by workload. Understand your monitoring responsibilities, so you have coverage across every monitoring area for each workload. The following table shows what you must monitor based on each workload type. Infrastructure services (IaaS) and platform services (PaaS) operate within a cloud environment like Azure. Software services (SaaS) refer to solutions such as Microsoft 365.
Monitoring areas | On-premises monitoring | IaaS monitoring | PaaS monitoring | SaaS monitoring |
---|---|---|---|---|
Service health | X | X | X | X |
Security | X | X | X | X |
Compliance | X | X | X | X |
Cost | X | X | X | X |
Data | X | X | X | X |
Code and runtime | X | X | X | |
Cloud resources | X | X | X | |
Operating system | X | X | ||
Virtualization layer | X | X | ||
Physical hardware | X |
Plan your monitoring strategy
A monitoring strategy outlines your oversight requirements across every environment. You need a clear plan to unify visibility and support operational maturity. You need to detect, diagnose, and prevent issues with structured insight into your entire system. Here's how:
Establish your monitoring roadmap. Create a roadmap that addresses three progressive levels of operational maturity: detect and respond to issues in real time, diagnose current or past issues, and predict and prevent future issues. This roadmap clarifies how you should grow your monitoring capabilities so you can prioritize improvements, allocate resources effectively, and maintain consistent reliability.
Identify what you need to monitor. Take a thorough inventory of your entire environment, including Azure, other clouds, edge deployments, and on-premises systems. Use Azure Resource Graph Explorer to locate all Azure resources. Start with the sample queries to gather a baseline resource list. This comprehensive approach helps you detect coverage gaps and ensures that you capture critical data from all relevant sources. Use Azure Arc to bring monitoring data from on-premises, other clouds, or edge locations into Azure.
Define reliability targets. Establish uptime service level objectives (SLOs), service level indicators (SLIs), and error budgets for each workload. Include nonfunctional requirements such as recovery time objective (RTO) and recovery point objective (RPO). Clear targets provide benchmarks for measuring operational success and guiding improvement efforts.
Define data collection requirements. Determine which metrics and logs you must collect for compliance, security, and effective issue diagnosis. Start with regulatory compliance requirements, then comply with internal governance rules. Collecting the right data helps you audit effectively, maintain security, and keep systems running optimally. If you don't know what to collect, gather all available logs and metrics to avoid data gaps and cost optimize. Refer to the complete list of Azure monitoring documentation links for guidance on every Azure service.
Define data retention requirements. Decide how long you must keep monitoring data to meet auditing and compliance needs. Abide by internal governance policies to store logs for the necessary duration. Proper retention policies enable historical analyses, support regulatory compliance, and preserve data for security investigations.
Define alert requirements. Determine which critical events must trigger alerts, such as resource outages, performance threshold breaches, or security anomalies. Categorize alerts by severity, outline response actions, and specify escalation paths so urgent events reach the right teams. Use Azure Monitor alerts to configure alert rules, notifications, and action groups. Proactive alerts ensure fast responses and minimize downtime.
Assign monitoring responsibilities. You have two primary responsibilities: monitor your entire cloud estate and monitor each workload. Define baseline monitoring requirements, specify the data you must capture, and clarify who owns each monitoring task. These steps help you avoid overlooked issues, streamline response efforts, and foster consistent practices across your organization.
Test and refine your monitoring approach. Verify that you capture the correct data and trigger alerts at the right thresholds. Adjust data collection and reliability targets based on new findings. Iterative improvements help you adapt to shifting business needs, monitoring gaps, and maintain optimal system performance.
Design a monitoring solution
Designing a monitoring solution refers to creating a system for collecting and storing logs, metrics, and insights. A well-designed solution helps satisfy operational, security, and compliance needs. Here's how:
Consolidate your monitoring solutions. Use one platform to monitor your cloud, on-premises, public cloud, and edge environments. This consolidated approach streamlines your operations, prevents frequent tool-switching, and enables your team to detect and resolve issues quickly. Start with Azure Monitor as your main monitoring solution. Use Azure Arc to collect data from other clouds, on-premises, and edge deployments. Use available Azure monitoring tools and send their data to Azure Monitor for centralized visibility.
Aim to centralize monitoring data. Prefer fewer locations for storing logs and metrics. Fewer locations make it easier to manage and correlate data. There are reasons to have multiple locations to store and analyze monitoring data. For example, security operations, data residency, data resiliency, and number of Azure tenants are all factors that could require you to store your monitoring data in multiple locations. For more information, see Design a Log Analytics workspace architecture.
Understand where to send monitoring data. Collect logs and metrics and store them in destinations that match your operational needs. Choose from theses primary Azure destinations: Azure Log Analytics workspace (interactive and long-term storage), Azure Storage account (long-term storage), Azure Event Hubs (third-party SIEM integration), Azure Data Explorer, and a partner solution. Where generally available, use data collection rules to configure central monitoring data collection. Otherwise, use diagnostic settings.
Automate monitoring. You want an automated way to enforce your monitoring policies in larger environments.
Use Azure Policy. Enforce what you collect and where you send it with Azure Policy. You can start with built-in monitoring policies to enforce diagnostic settings. You can build custom policies as needed. You can also use Azure Policy to manage data collection rules and install the Azure Monitor Agent on virtual machines. Use Azure Policy to define your Azure Monitor alert baseline in an Azure landing zone.
Use infrastructure as code (IaC). Use infrastructure as code to configure and deploy Azure Monitor resources at scale. This method is the professional way to manage your resources.
Optimize monitoring spend. First estimate the cost of your monitoring solution. When you have enough data, use the Azure Pricing Calculator to estimate the cost of the collection long term and adjust the collection settings to meet your budget. Over time, conduct regular reviews the monitoring data you collect and store. What you collect, where you store it, and how long you store it affects the cost. Adjust storage retention periods to optimize cost without stopping the collection of certain monitoring data. To optimize costs further, stop collecting unhelpful logs. For more cost optimization tips, see Cost optimization in Azure Monitor.
Configure monitoring
Configuring monitoring involves setting up the tools and parameters for collecting insights across your Azure environment. Proper configuration provides proactive issue detection and alignment with prescriptive governance in your cloud estate. Here's how:
Monitor service health
Monitoring service availability focuses on detecting service outages, disruptions, and resource issues in your cloud environment. You want real-time visibility into potential problems to maintain consistent operations. Monitoring service health is the bare minimum for monitoring you cloud estate. Here's how:
Monitor underlying service health. You need to be aware of any underlying outages to the cloud services and regions you’re using. Use Azure Service Healthto receive free alerts about service issues, planned maintenance, and other changes affecting your Azure services and regions.
Monitor underlying resource health. You need a way to diagnose and resolve underlying issues in your cloud resources. You also need a history of these outages so you can report any service level agreement (SLA) breaches. Use Azure Resource Healthto monitor the health of your individual cloud resources.
Monitor security
Monitoring security involves tracking identity interactions, vulnerabilities, and network activity to protect your Azure estate. You need continuous security monitoring to safeguard data and maintain compliance within your cloud environments. Here's how:
Monitor identity. You need to understand user interactions, detect potential risky sign-ins, troubleshoot sign-in issues, and audit identity changes to ensure the security and health of your environment. Configure Microsoft Entra monitoring and collect the logs you need to meet your security and compliance requirements.
Monitor security vulnerabilities. You need a single security monitoring solution to detect security vulnerabilities across your various environments. For example, use Microsoft Defender for Cloud to monitor security vulnerabilities in Azure, other public clouds, edge devices, and on-premises private networks. Use Microsoft Sentinel for security information and even management (SIEM) and security orchestration, automation, and response (SOAR). Microsoft Sentinel relies on the Log Analytics platform so you have some key symbiosis.
Monitor network activity. You need to monitor network traffic within your cloud and networks outside of your cloud. Network monitoring helps you troubleshoot performance issues and maintain network security. Use Network Watcher to monitor Azure virtual networks (using flow logs and traffic analytics). Use Connection monitor for multicloud and on-premises network monitoring.
Monitor workload security. For workload security monitoring, see the Well-Architected Framework’s Recommendations for monitoring and threat detection.
Monitor compliance
Monitoring compliance verifies alignment with governance requirements and industry regulations. You must track compliance to reduce risks and follow prescriptive standards for a well-managed Azure estate. Here's how:
Monitor configuration compliance. You need ways to align your environments with your governance policies. Use Azure Policy for automated auditing and enforcement of specific policies. It also monitors compliance against those policies. Azure Policy is free and offers built-in policies that align with many regulatory standards, such as ISO 270001, NIST SP 800-53, PCI DSS, and EU General Data Protection Regulation (GDPR).
Monitor data compliance. You need to automatically assess and manage compliance across your multicloud environment, simplifying compliance and reducing risk. Use Microsoft Purview Compliance Manager to assess and manage compliance across multicloud environments.
Monitor workload compliance. For workload compliance monitoring, see the Well-Architected Framework’s recommendations for establishing a security baseline
Monitor costs
Monitoring costs refers to tracking and controlling your cloud spending across Azure and other environments. You want cost transparency to optimize resource usage and follow prescriptive guidance for financial governance. Here's how:
Understand service pricing. Make sure you understand the pricing of the services and features you’re using. You want to avoid surprises at the billing period. Use the Azure pricing information.
Monitor cloud spend. You should use the available tools to monitor costs across your environments. For Azure spend, use Azure Cost Management to set budgets, get cost optimization recommendations, trigger alerts for cost anomalies, and analyze costs.
Review cloud spend regularly. Incorporate cost reviews into your regular operational cadence. Regular assessments allow for the timely identification of spending patterns and the opportunity to adjust resource usage to optimize costs.
Monitor workload costs. For workload cost monitoring, see the Well-Architected Framework recommendations for collecting and reviewing cost data and optimizing component costs
Monitor data
Monitoring data means overseeing data governance, protection, and usage across Azure, on-premises, multicloud, and SaaS environments. You need data visibility and security to maintain compliance and preserve business continuity in your Azure estate. Here's how.
Monitor enterprise data. You need a way to govern and secure your business data across all environments. Use Microsoft Purview to provide data visibility, security, and compliance across these environments.
Monitor workload data. For workload data monitoring, see the Well-Architected Framework recommendations for data classification, optimizing data costs, and optimizing data performance.
Monitor code and runtime
At the workload level, you need to gather telemetry (application logs, metrics, and traces) on your application code and execution to identify issues and optimize performance. Real-time insights into application behavior enable prescriptive troubleshooting and refinement.
For workloads in Azure, use Application Insights to collect runtime telemetry (instrumentation), so you can identify performance bottlenecks and errors. Application Insights enables you to monitor your live web applications, detect performance anomalies, and gain insights into user interactions, helping you continuously improve performance and usability. For workload-specific code and run monitoring guidance, see the Well-Architected Framework:
Workload monitoring area | Well-Architected Framework guidance |
---|---|
Operational excellence | Instrument an application |
Performance optimization | Prioritize the performance of critical flows Recommendations for optimizing code and infrastructure |
Cost optimization | Optimize code costs Recommendations for optimizing environment costs Optimize flow costs |
Health modeling | Health modeling for workloads |
Monitor cloud resources
Monitoring cloud resources covers watching control-plane activity, resource logs, and performance metrics across Azure. You want detailed visibility into resource usage and changes to maintain security, compliance, and operational excellence. Here's how:
Monitor control plane activities. You need know who created, updated, and deleted resources in your cloud environment. In Azure, you want to monitor control plane activities in across your subscriptions. Azure automatically captures control-plane events for each subscription, called Azure Activity Logs. Create a diagnostic setting to send these Activity Logs to the right destination
Collect cloud resource logs. You need to collect log data for each cloud resource to assess its health and troubleshoot effectively. Different services have different types of logs. In Azure, you must configure Azure resource logs on every service to collect them. If you don't know what to collect, gather all available logs and metrics to avoid data gaps and optimize cost later. To optimize cost, adjust the retention period and eliminate certain logs from collection if unneeded. The logs you collect and how long you keep them should balance cost with compliance, security, and business continuity (root cause analysis). For more information, see Azure Monitor cost optimization best practices
Collect resource metrics. You need visibility into the health and performance of your cloud resources. You need time-series data to get point-in time data to troubleshoot issues. In Azure, every service automatically generates Azure Monitor Metrics. Analyze these metrics in metrics explorer and set up alert rules against them. Check the default retention period for Azure Monitor Metrics. If you need to retain metrics for a longer period, create a diagnostic setting to store in a Log Analytics workspace for analysis and correlation with log data. Where generally available, use data collection rules to configure central monitoring data collection.
Monitor workload resources. For workload-specific cloud resource monitoring guidance, see the Well-Architected Framework:
Workload monitoring area | Well-Architected Framework guidance |
---|---|
Azure services monitoring | Azure Service guides (start with the Operational Excellence section) |
Reliability | Recommendations for designing a reliable monitoring and alerting strategy |
Performance efficiency | Recommendations for defining performance targets Collect workload performance data |
Configure alerting
Configuring alerting means setting up notifications based on performance thresholds or operational conditions. You need timely alerts to respond quickly and follow prescriptive guidance for incident management. Here's how:
Proactively identify health issues. You need to define thresholds for key performance indicators to monitor resource health. This proactive approach ensures timely detection of potential issues and allows swifter remediation. Use Azure Monitor alerts. If you're unsure about the thresholds to use in your alerts, create a metric alert with dynamic thresholds. Use Azure Monitor Baseline Alerts as a starting point.
Define the severity of the alert. Have a system in place to categorize the severity of each alert. Apply a higher severity to resources that are critical to business operations, such as shared services and line of business workloads. Use a lower severity for other resources.
Notify stakeholders. Identify who should receive a notification when an alert triggers. A decentralized approach routes relevant alerts to the right people. Start with a flexible method that alerts stakeholders when a resource approaches anomalous behavior. Configure at least one action group for each subscription. This approach ensures relevant personnel receive alerts. Include an email notification channel as a minimum requirement. Notify operations teams about lower-severity alerts and notify management about high-severity alerts. For more information, see customize alerts with Azure Logic Apps and integrate with IT service management product (ITSM).
Select notification channels. Effective notification strategies enhance response times and mitigate potential impacts. Use email notification as a baseline and add SMS or integrate with incident management systems, as needed.
Visualize monitoring data
Visualize monitoring data refers to creating dashboards and reports that present critical metrics in an accessible format. Clear visualization supports informed decision-making and aligns with prescriptive approaches to managing your Azure estate. Here's how:
Develop monitoring dashboards. Use Azure Monitor workbooks and create Azure portal dashboards. Dashboards present quick insights at a glance. Workbooks let you dive deeper into data with custom queries and analysis. Use dashboards for broad overviews. Use workbooks for detailed troubleshooting or advanced monitoring. If you use Grafana, use Managed Grafana.
Tailor visualizations. Customize charts and reports to different audiences, whether segmented by team (enterprises) or by overall business impact (startups).
Azure monitoring tools
Here's a table of all the Azure services and tools referenced in this article.
Category | Tool | Description |
---|---|---|
Multi-environment monitoring | Azure Monitor | Serves as the central platform that collects telemetry from cloud and on‑premises environments. It monitors resource performance and operational state. |
Multi-environment extension | Azure Arc | Extends Azure management, including monitoring and governance, to on‑premises, multicloud, and edge environments. |
Service health monitoring | Azure Service Health | Provides real‑time status and personalized information about service issues, planned maintenance, and other changes affecting your Azure services and regions. |
Service health monitoring | Azure Resource Health | Tracks the health of individual cloud resources and records issues over time for troubleshooting and reporting. |
Security monitoring | Microsoft Entra monitoring | Tracks identity interactions, sign‑in health, and audits changes to user accounts to safeguard access. |
Security monitoring | Microsoft Defender for Cloud | Protects your cloud resources with threat detection, vulnerability assessments, and security recommendations. |
Security monitoring | Microsoft Sentinel | Acts as a cloud‑native SIEM and SOAR solution that analyzes security telemetry and automates responses to threats. |
Compliance monitoring | Azure Policy | Enforces organizational standards and audits resource compliance at‑scale through automated assessments. |
Compliance monitoring | Microsoft Purview Compliance Manager | Assesses regulatory compliance and provides insights and recommendations to reduce risk. |
Cost monitoring | Azure Pricing Calculator | Estimates the cost of Azure services and helps plan and optimize your monitoring spend. |
Cost monitoring | Azure Cost Management | Monitors and manages cloud spending while providing insights to optimize resource usage and costs. |
Data monitoring | Microsoft Purview | Governs and protects enterprise data by offering discovery, classification, and risk management capabilities. |
Code and runtime monitoring | Application Insights | Monitors application performance with telemetry on code execution, performance, and usage to pinpoint issues. |
Cloud resource monitoring | Azure Resource Graph Explorer | Enables querying and exploration of your Azure resources, offering visibility across your cloud estate. |
Cloud resource monitoring | Network Watcher | Monitors and diagnoses network performance and connectivity for Azure virtual networks and related resources. |
Cloud resource monitoring | Connection Monitor | Provides insights into connectivity across Azure, on‑premises, and multicloud environments. |
Cloud resource monitoring | Azure Monitor Agent | Installs on virtual machines to collect telemetry from operating systems and applications. |
Cloud resource monitoring | Azure Activity Logs | Records control‑plane operations such as resource creation, updates, or deletions across Azure subscriptions. |
Cloud resource monitoring | Azure Resource Logs | Captures diagnostic data from individual Azure services for troubleshooting and performance analysis. |
Cloud resource monitoring | Azure Monitor Metrics | Collects time‑series performance data from Azure services to track resource health and performance. |
Cloud resource monitoring | Metrics explorer | Visualizes and analyzes collected metrics data, supporting trend analysis and troubleshooting. |
Monitoring data storage | Azure Log Analytics workspace | Stores and enables querying of collected log data for detailed analysis and long‑term retention. |
Monitoring data storage | Azure Storage account | Provides secure, scalable storage used for long‑term retention of logs and monitoring data. |
Monitoring data storage | Azure Event Hubs | Ingests large volumes of telemetry and event data, supporting integration with SIEM and other analytics platforms. |
Monitoring data storage | Azure Data Explorer | Offers fast, interactive analysis of large volumes of telemetry data, supporting real‑time analytics. |
Monitoring data configuration | Infrastructure as Code for Azure Monitor | Deploys and manages Azure Monitor resources at scale using code, ensuring consistent configuration across environments. |
Monitoring data configuration | Diagnostic settings in Azure Monitor | Routes monitoring data (logs and metrics) to destinations like Log Analytics, storage accounts, or Event Hubs. |
Monitoring data configuration | Data collection rules | Standardizes the collection and ingestion of monitoring data across your environment. |
Alerting | Azure Monitor alerts | Notifies you when defined thresholds for metrics or log data are breached, allowing you to react promptly to issues. |
Visualization | Azure Monitor workbooks | Enables creation of interactive reports and custom dashboards to analyze monitoring data in detail. |
Visualization | Azure portal dashboards | Displays key monitoring data in customizable dashboards for at‑a‑glance insights. |
Visualization | Managed Grafana | Offers hosted Grafana for visualizing monitoring data, integrating with Azure Monitor for custom dashboards. |
Azure services monitoring documentation
The table provides a near complete list of the monitoring articles for every Azure service in alphabetical order.