แก้ไข

แชร์ผ่าน


Azure Well-Architected Framework perspective on Log Analytics

Well-Architected Framework workload functionality and performance must be monitored in diverse ways and for diverse reasons. Azure Monitor Log Analytics workspaces are the primary log and metric sink for a large portion of the monitoring data. Workspaces support multiple features in Azure Monitor including ad-hoc queries, visualizations, and alerts. For general monitoring principles, see Monitoring and diagnostics guidance. The guidance presents general monitoring principles. It identifies the different types of data. It identifies the required analysis that Azure Monitor supports and it also identifies the data stored in the workspace that enables the analysis.

This article assumes that you understand system design principles. You also need a working knowledge of Log Analytics workspaces and features in Azure Monitor that populate operational workload data. For more information, see Log Analytics workspace overview.

Important

How to use this guide

Each section has a design checklist that presents architectural areas of concern along with design strategies localized to the technology scope.

Also included are recommendations on the technology capabilities or deployment topologies that can help materialize those strategies. The recommendations don't represent an exhaustive list of all configurations available for Log Analytics workspaces and its related Azure Monitor resources. Instead, they list the key recommendations mapped to the design perspectives. Use the recommendations to build your proof-of-concept, design your workload monitoring environment, or optimize your existing workload monitoring solution.

Technology scope

This guide focuses on the interrelated decisions for the following Azure resources.

  • Log Analytics workspaces
  • Workload operational log data
  • Diagnostic settings on Azure resources in your workload

Reliability

The purpose of the Reliability pillar is to provide continued functionality by building enough resilience and the ability to recover fast from failures.

The Reliability design principles provide a high-level design strategy applied for individual components, system flows, and the system as a whole.

The reliability situations to consider for Log Analytics workspaces are:

  • Availability of the workspace.
  • Protection of collected data in the rare case of an Azure datacenter or region failure.

There's currently no standard feature for failover between workspaces in different regions, but there are strategies to use if you have particular requirements for availability or compliance.

Design checklist for Reliability

Start your design strategy based on the design review checklist for Reliability and determine its relevance to your business requirements while keeping in mind the SKUs and features of virtual machines (VMs) and their dependencies. Extend the strategy to include more approaches as needed.

  • Review service limits for Log Analytics workspaces. The service limits section helps you understand restrictions on data collection and retention, and other aspects of the service. These limits help you determine how to properly design your workload observability strategy. Be sure to review Azure Monitor service limits since many of the functions discussed therein, like queries, work hand-in-hand with Log Analytics workspaces.
  • Plan for workspace resilience and recovery. Log Analytics workspaces are regional, with no built-in support for cross-regional redundancy or replication. Also, availability zone redundancy options are limited. As such, you should determine the reliability requirements of your workspaces and strategize to meet those targets. Your requirements might stipulate that your workspace must be resilient to datacenter failures or regional failures, or they might stipulate that you must be able to recover your data to a new workspace in a failover region. Each of these scenarios require additional resources and processes to be put in place to be successful, so balancing your reliability targets with cost and complexity should be carefully considered.
  • Choose the right deployment regions to meet your reliability requirements. Deploy your Log Analytics workspace and data collection endpoints (DCEs) co-located with the workload components emitting operational data. Your choice of the appropriate region in which to deploy your workspace and your DCEs should be informed by where you deploy your workload. You might need to weigh the regional availability of certain Log Analytics functionality, like dedicated clusters, against other factors more central to your workload's reliability, cost, and performance requirements.
  • Ensure that your observability systems are healthy. Like any other component of your workload, ensure that your monitoring and logging systems are functioning properly. To accomplish this, enable features that send health data signals to your operations teams. Set up health data signals specific to your Log Analytics workspaces and associated resources.

Configuration recommendations for Reliability

Recommendation Benefit
Don't include your Log Analytics workspaces in your workload's critical path. Your workspaces are important to a functioning observability system, but the functionality of your workload shouldn't depend on them. Keeping your workspaces and associated functions out of your workload's critical path minimizes the risk of issues affecting your observability system from affecting the runtime execution of your workload.
To support high durability of workspace data, deploy Log Analytics workspaces into a region that supports data resilience. Data resilience is only possible through linking of the workspace to a dedicated cluster in the same region. When you use a dedicated cluster, it lets you spread the associated workspaces across availability zones, which offer protection against datacenter outages. If you don't collect enough data now to justify a dedicated cluster, this preemptive regional choice supports future growth.
Choose your workspace deployment based on proximity to your workload.

Use data collection endpoints (DCE) in the same region as the Log Analytics workspace.
Deploy your workspace in the same region as the instances of your workload. Having your workspace and DCEs in the same region as your workload mitigates the risk of impacts by outages in other regions.

DCEs are used by the Azure Monitor agent and the Logs Ingestion API to send workload operational data to a Log Analytics workspace. You might need multiple DCEs even though your deployment only has a single workspace. For more information on how to configure DCEs for your particular environment, see How to set up data collection endpoints based on your deployment.<br
If your workload is deployed in an active-active design, consider using multiple workspaces and DCEs spread across the regions in which your workload is deployed.

Deploying workspaces in multiple regions adds complexity to your environment. Balance the criteria detailed in Design a Log Analytics workspace architecture with your availability requirements.
If you require the workspace to be available in a region failure, or you don't collect enough data for a dedicated cluster, configure data collection to send critical data to multiple workspaces in different regions. This practice is also known as log multicasting.

For example, configure DCRs for multiple workspaces for Azure Monitor agent running on VMs. Configure multiple diagnostic settings to collect resource logs from Azure resources and send the logs to multiple workspaces.
In this way, workload operational data is available in the alternate workspace if there's a regional failure. But know that resources that rely on the data such as alerts and workbooks wouldn't automatically be replicated to the other regions. Consider storing Azure Resource Manager (ARM) templates for critical alerting resources with configuration for the alternate workspace or deploying them in all regions but disabling them to prevent redundant alerts. Both options support quick enablement in a regional failure.

Tradeoff: This configuration results in duplicate ingestion and retention charges so only use it for critical data.
If you require data to be protected in a datacenter or region failure, configure data export from the workspace to save data in an alternate location.

This option is similar to the previous option of multicasting the data to different workspaces. But this option costs less because the extra data is written to storage.

Use Azure Storage redundancy options, including geo-redundant storage (GRS) and geo-zone-redundant storage (GZRS), to further replicate this data to other regions.

Data export doesn't provide resiliency against incidents impacting the regional ingestion pipeline.
While the historic operational log data might not be readily queryable in the exported state, it ensures the data survives a prolonged regional outage and can be accessed and retained for extended period.

If you require the export of tables not supported by data export, you can use other methods of exporting data, including Logic Apps, to protect your data.

For this strategy to work as a viable recovery plan, you must have processes in place to reconfigure diagnostic settings for your resources in Azure and on all agents that provide data. You must also plan to manually rehydrate your exported data into a new workspace. As with the previously described option, you also need to define processes for those resources that rely on the data like alerts and workbooks.
For mission-critical workloads requiring high availability, consider implementing a federated workspace model that uses multiple workspaces to provide high availability if there's a regional failure. Mission-critical provides prescriptive best practice guidance for designing highly reliable applications on Azure. The design methodology includes a federated workspace model with multiple Log Analytics workspaces to deliver high availability if there are multiple failures, including the failure of an Azure region.

This strategy eliminates egress costs across regions and remains operational with a region failure. But it requires more complexity that you must manage with configuration and processes described in Health modeling and observability of mission-critical workloads on Azure.
Use infrastructure as code (IaC) to deploy and manage your workspaces and associated functions. When you automate as much of your deployment and your mechanisms for resilience and recovery as practical, it ensures that these operations are reliable. You save critical time in your operations processes and minimize the risk of human error.

Ensure that functions like saved log queries are also defined through your IaC to recover them to a new region if recovery is required.
Design DCRs with a single responsibility principle to keep DCR rules simple.

While one DCR could be loaded with all the input, rules, and destinations for the source systems, it's preferable to design narrowly focused rules that rely on fewer data sources. Use composition of rule assignments to arrive at the desired observability scope for the logical target.

Also, minimize transformation in DCRs
When you use narrowly focused DCRs, it minimizes the risk of a rule misconfiguration having a broader effect. It limits the effect to only the scope for which the DCR was built. For more information, see Best practices for data collection rule creation and management in Azure Monitor.

While transformation can be powerful and necessary in some situations, it can be challenging to test and troubleshoot the keyword query language (KQL) work being done. When possible, minimize the risk of data loss by ingesting the data raw and handling transformations downstream at query time.
When setting a daily cap or a retention policy, be sure you're maintaining your reliability requirements by ingesting and retaining the logs that you need. A daily cap stops the collection of data for a workspace once a specified amount is reached, which helps you maintain control over your ingestion volume. But only use this feature after careful planning. Ensure that your daily cap isn't being hit with regularity. If that happens, your cap is set too restrictively. You need to reconfigure the daily cap so you don't miss critical signals coming from your workload.

Likewise, be sure to carefully and thoughtfully approach the lowering of your data retention policy to ensure that you don't inadvertently lose critical data.
Use Log Analytics workspace insights to track ingestion volume, ingested data versus your data cap, unresponsive log sources, and failed queries among other data. Create health status alerts to proactively notify you if a workspace becomes unavailable because of a datacenter or regional failure. This strategy ensures that you're able to successfully monitor the health of your workspaces and proactively act if the health is at risk of degrading. Like any other component of your workload, it's critical that you're aware of health metrics and can identify trends to improve your reliability over time.

Azure Policy

Azure offers no policies related to reliability of Log Analytics workspaces. You can create custom policies to build compliance guardrails around your workspace deployments, such as ensuring workspaces are associated to a dedicated cluster.

While not directly related to the reliability of Log Analytics workspaces, there are Azure policies for nearly every service available. The policies ensure that diagnostics settings are enabled for that service and validate that the service's log data is flowing into a Log Analytics workspace. All services in workload architecture should be sending their log data to a Log Analytics workspace for their own reliability needs, and the policies can help enforce it. Likewise, policies exist to ensure agent-based platforms, such as VMs and Kubernetes, have the agent installed.

Azure Advisor

Azure offers no Azure Advisor recommendations related to the reliability of Log Analytics workspaces.

Security

The purpose of the Security pillar is to provide confidentiality, integrity, and availability guarantees to the workload.

The Security design principles provide a high-level design strategy for achieving these goals by applying approaches to the technical design around your monitoring and logging solution.

Design checklist for Security

Start your design strategy based on the design review checklist for Security and identify vulnerabilities and controls to improve the security posture. Extend the strategy to include more approaches as needed.

  • Review the Azure Monitor security baseline and Manage access to Log Analytics workspaces topics. These topics provide guidance on security best practices.
  • Deploy your workspaces with segmentation as a cornerstone principle. Implement segmentation at the networking, data, and access levels. Segmentation helps ensure that your workspaces are isolated to the appropriate degree and are better protected from unauthorized access to the highest degree possible, while still meeting your business requirements for reliability, cost optimization, operational excellence, and performance efficiency.
  • Ensure that you can audit workspace reads and writes activities and associated identities. Attackers can benefit from viewing operational logs. A compromised identity can lead to log injection attacks. Enable auditing of operations run from the Azure Portal or through API interactions and the associated users. If you're not set up to audit your workspace, you might be putting your organization at risk of being in breach of compliance requirements.
  • Implement robust network controls. Helps secure your network access to your workspace and your logs through network isolation and firewall functions. Insufficiently configured network controls might put you at risk of being accessed by unauthorized or malicious actors.
  • Determine what types of data need immutability or long-term retention. Your log data should be treated with the same rigor as workload data inside production systems. Include log data in your data classification practices to ensure that you're successfully storing sensitive log data according to its compliance requirements.
  • Protect log data at rest through encryption. Segmentation alone won't completely protect confidentiality of your log data. If unauthorized raw access happens, having the log data encrypted at rest helps prevent bad actors from using that data outside of your workspace.
  • Protect sensitive log data through obfuscation. Just like workload data residing in production systems, you must take extra measures to ensure confidentiality is retained for sensitive information that might be intentionally or unintentionally present in operational logs. When you use obfuscation methods, it helps you hide sensitive log data from unauthorized eyes.

Configuration recommendations for Security

Recommendation Benefit
Use customer managed keys if you require your own encryption key to protect data and saved queries in your workspaces.

Azure Monitor ensures that all data and saved queries are encrypted at rest using Microsoft-managed keys (MMK). If you require your own encryption key and collect enough data for a dedicated cluster, use customer-managed key. You can encrypt data by using your own key in Azure Key Vault, for control over the key lifecycle, and ability to revoke access to your data.

If you use Microsoft Sentinel, make sure that you're familiar with the considerations at Set up Microsoft Sentinel customer-managed key.
This strategy lets you encrypt data by using your own key in Azure Key Vault, for control over the key lifecycle, and ability to revoke access to your data.
Configure Log query auditing to track which users are running queries.

Configure the audit logs for each workspace to be sent to the local workspace or consolidate in a dedicated security workspace if you separate your operational and security data. Use Log Analytics workspace insights to periodically review this data. Consider creating log query alert rules to proactively notify you if unauthorized users are attempting to run queries.
Log query auditing records the details for each query run in a workspace. Treat this audit data as security data and secure the LAQueryLogs table appropriately. This strategy bolsters your security posture by helping to ensure that unauthorized access is caught immediately if it ever happens.
Help secure your workspace through private networking and segmentation measures.

Use private link functionality to limit communications between log sources and your workspaces to private networking.
When you use private link, it also lets you control which virtual networks can access a given workspace, further bolstering your security through segmentation.
Use Microsoft Entra ID instead of API keys for workspace API access where available. API key-based access to the query APIs doesn't leave a per-client audit trail. Use sufficiently scoped Entra ID-based access so that you can properly audit programmatic access.
Configure access for different types of data in the workspace required for different roles in your organization.

Set the access control mode for the workspace to Use resource or workspace permissions. This access control lets resource owners use resource-context to access their data without being granted explicit access to the workspace.

Use table level RBAC for users who require access to a set of tables across multiple resources.
This setting simplifies your workspace configuration and helps to ensure users can't access operational data they shouldn't.

Assign the appropriate built-in role to grant workspace permissions to administrators at either the subscription, resource group, or workspace level depending on their scope of responsibilities.

Users with table permissions have access to all the data in the table regardless of their resource permissions.

See Manage access to Log Analytics workspaces for details on the different options for granting access to data in the workspace.
Export logs that require long-term retention or immutability.

Use data export to send data to an Azure Storage account with immutability policies to help protect against data tampering. Not every type of log has the same relevance for compliance, auditing, or security, so determine the specific data types that should be exported.
You might collect audit data in your workspace that's subject to regulations requiring its long-term retention. Data in a Log Analytics workspace can't be altered, but it can be purged. Exporting a copy of the operational data for retention purposes lets you build a solution that meets your compliance requirements.
Determine a strategy to filter or obfuscate sensitive data in your workspace.

You might be collecting data that includes sensitive information. Filter records that shouldn't be collected by using the configuration for the particular data source. Use a transformation if only particular columns in the data should be removed or obfuscated.

If you have standards that require the original data to be unmodified, you can use the 'h' literal in KQL queries to obfuscate query results displayed in workbooks.
Obfuscating or filtering out sensitive data in your workspace helps ensure you maintain confidentiality on sensitive information. In many cases, compliance requirements dictate the ways that you can handle sensitive information. This strategy helps you comply with the requirements proactively.

Azure Policy

Azure offers policies related to the security of Log Analytics workspaces to help enforce your desired security posture. Examples of such policies are:

Azure also offers numerous policies to help enforce private link configuration, such as Log Analytics workspaces should block log ingestion and querying from public networks or even configuring the solution through DINE policies such as Configure Azure Monitor Private Link Scope to use private DNS zones.

Azure Advisor

Azure offers no Azure Advisor recommendations related to the security of Log Analytics workspaces.

Cost Optimization

Cost Optimization focuses on detecting spend patterns, prioritizing investments in critical areas, and optimizing in others to meet the organization's budget while meeting business requirements.

The Cost Optimization design principles provide a high-level design strategy for achieving those business goals. They also help you make tradeoffs as necessary in the technical design related to your monitoring and logging solution.

For more information on how data charges are calculated for your Log Analytics workspaces, see Azure Monitor Logs cost calculations and options.

Design checklist for Cost Optimization

Start your design strategy based on the design review checklist for Cost Optimization for investments and fine tune the design so that the workload is aligned with the budget allocated for the workload. Your design should use the right Azure capabilities, monitor investments, and find opportunities to optimize over time.

  • Perform cost modeling exercises. These exercizes help you understand your current workspace costs and forecast your costs relative to workspace growth. Analyze your growth trends in your workload and ensure that you understand plans for workload expansion to properly forecast your future operational logging costs.
  • Choose the right billing model. Use your cost model to determine the best billing model for your scenario. How you use your workspaces currently, and how you plan to you use them as your workload evolves determines whether a pay-as-you-go or a commitment tier model is the best fit for your scenario.

    Remember that you can choose different billing models for each workspace, and you can combine workspace costs in certain cases, so you can be granular in your analysis and decision-making.
  • Collect just the right amount of log data. Perform regularly scheduled analysis of your diagnostic settings on your resources, data collection rule configuration, and custom application code logging to ensure that you aren't collecting unnecessary log data.
  • Treat nonproduction environments differently than production. Review your nonproduction environments to ensure that you have configured your diagnostic settings and retention policies appropriately. These can often be significantly less robust than production, especially for dev/test or sandbox environments.

Configuration recommendations for Cost Optimization

Recommendation Benefit
Configure the pricing tier for the amount of data that each Log Analytics workspace typically collects. By default, Log Analytics workspaces uses pay-as-you-go pricing with no minimum data volume. If you collect enough data, you can significantly decrease your cost by using a commitment tier, which lets you commit to a daily minimum of data collected in exchange for a lower rate. If you collect enough data across workspaces in a single region, you can link them to a dedicated cluster and combine their collected volume by using cluster pricing.

For more information on commitment tiers and guidance on determining what's most appropriate for your level of usage, see Azure Monitor Logs cost calculations and options. To view estimated costs for your usage at different pricing tiers, see Usage and estimated costs.
Configure data retention and archiving. There's a charge for retaining data in a Log Analytics workspace beyond the default of 31 days. It's 90 days if Microsoft Sentinel is enabled on the workspace and 90 days for Application Insights data. Consider your particular requirements for having data readily available for log queries. You can significantly reduce your cost by configuring archived logs. Archived logs let you retain data for up to seven years and still access it occasionally. You access the data by using search jobs or restoring a set of data to the workspace.
If you use Microsoft Sentinel to analyze security logs, consider employing a separate workspace to store those logs. When you use a dedicated workspace for log data that your SIEM uses, it can help you control costs. The workspaces that Microsoft Sentinel uses are subject to Microsoft Sentinel pricing. Your security requirements dictate the types of logs that are required to be included in your SIEM solution. You might be able to exclude operational logs, which would be charged at the standard Log Analytics pricing if they're in a separate workspace.
Configure tables used for debugging, troubleshooting, and auditing as Basic Logs. Tables in a Log Analytics workspace configured for Basic Logs have a lower ingestion cost in exchange for limited features and a charge for log queries. If you query these tables infrequently and don't use them for alerting, this query cost can be more than offset by the reduced ingestion cost.
Limit data collection from data sources for the workspace. The primary factor for the cost of Azure Monitor is the amount of data that you collect in your Log Analytics workspace. Be sure that you collect no more data than you require to assess the health and performance of your services and applications. For each resource, select the right categories for the diagnostic settings you configure to provide the amount of operational data you need. It helps you successfully manage your workload, and not manage ignored data.

There might be a tradeoff between cost and your monitoring requirements. For example, you might be able to detect a performance issue more quickly with a high sample rate, but you might want a lower sample rate to save costs. Most environments have multiple data sources with different types of collection, so you need to balance your particular requirements with your cost targets for each. See Cost optimization in Azure Monitor for recommendations on configuring collection for different data sources.
Regularly analyze workspace usage data to identify trends and anomalies.

Use Log Analytics workspace insights to periodically review the amount of data collected in your workspace. Further analyze data collection by using methods in Analyze usage in Log Analytics workspace to determine if there's other configurations that can decrease your usage further.
By helping you understand the amount of data collected by different sources, it identifies anomalies and upward trends in data collection that could result in excess cost. This consideration is important when you add a new set of data sources to your workload. For example, if you add a new set of VMs, enable new Azure diagnostics settings on a service, or change log levels in your application.
Create an alert when data collection is high. To avoid unexpected bills, you should be proactively notified anytime you experience excessive usage. Notification lets you address any potential anomalies before the end of your billing period.
Consider a daily cap as a preventative measure to ensure that you don't exceed a particular budget. A daily cap disables data collection in a Log Analytics workspace for the rest of the day after your configured limit is reached. Don't use this practice as a method to reduce costs as described in When to use a daily cap, but instead to prevent runaway ingestion due to misconfiguration or abuse.

If you set a daily cap, create an alert when the cap is reached. Be sure to also create an alert rule when some percentage is reached. For example, you can set an alert rule for when 90 percent capacity is reached. This alert gives you an opportunity to investigate and address the cause of the increased data before the cap shuts off critical data collection from your workload.

Azure Policy

Azure offers no policies related to cost optimization of Log Analytics workspaces. You can create custom policies to build compliance guardrails around your workspace deployments, such as ensuring that your workspaces contain the right retention settings.

Azure Advisor

Azure Advisor makes recommendations to move specific tables in a workspace to the low-cost Basic Log data plan for tables that receive relatively high ingestion volume. Understand the limitations by using basic logs before switching. For more information, see When should I use Basic Logs?. Azure Advisor might also recommend changing pricing commitment tier for the whole workspace based on overall usage volume.

Operational Excellence

Operational Excellence primarily focuses on procedures for development practices, observability, and release management.

The Operational Excellence design principles provide a high-level design strategy for achieving those goals towards the operational requirements of the workload.

Design checklist for Operational Excellence

Start your design strategy based on the design review checklist for Operational Excellence for defining processes for observability, testing, and deployment related to Log Analytics workspaces.

  • Use infrastructure as code (IaC) for all functions related to your workload's Log Analytics workspaces. Minimize the risk of human error that can occur with manually administering and operating your log collection, ingestion, storage and querying functions, including saved queries and query packs, by automating as many of those functions as possible through code. Also, include alerts that report health status changes and the configuration of diagnostic settings for resources that send logs to your workspaces in your IaC code. Include the code with your other workload-related code to ensure that your safe deployment practices are maintained for the management of your workspaces.
  • Ensure that your workspaces are healthy, and you're notified when issues arise. Like any other component of your workload, your workspaces can encounter issues. The issues can cost valuable time and resources to troubleshoot and resolve, and potentially leave your team unaware of the production workload status. Being able to proactively monitor workspaces and mitigate potential issues helps your operations teams minimize the time they spend troubleshooting and fixing issues.
  • Separate your production from nonproduction workloads. Avoid unnecessary complexity that can cause extra work for an operations team by using different workspaces for your production environment than those used by nonproduction environments. Comingled data can also lead to confusion as testing activities might appear to be events in production.
  • Prefer built-in tools and functions over non-Microsoft solutions Use built-in tools to extend the functionality of your monitoring and logging systems. You might need to put additional configurations in place to support requirements like recoverability or data sovereignty that aren't available out-of-the-box with Log Analytics workspaces. In these cases, whenever practical, use native Azure or Microsoft tools to keep the number of tools that your organization must support to a minimum.
  • Treat your workspaces as static rather than ephemeral components Like other types of data stores, workspaces shouldn't be considered among the ephemeral components of your workload. The Well-Architected Framework generally favors immutable infrastructure and the ability to quickly and easily replace resources within your workload as part of your deployments. But the loss of workspace data can be catastrophic and irreversible. For this reason, leave workspaces out of deployment packages that replace infrastructure during updates, and only perform in-place upgrades on the workspaces.
  • Ensure that operations staff is trained on Kusto Query Language Train staff to create or modify queries when needed. If operators are unable to write or modify queries, it can slow critical troubleshooting or other functions as operators must rely on other teams to do that work for them.

Configuration recommendations for Operational Excellence

Recommendation Benefit
Design a workspace strategy to meet your business requirements.

See Design a Log Analytics workspace architecture for guidance on designing a strategy for your Log Analytics workspaces. Include how many to create and where to place them.

If you required your workload to use a centralized platform team offering, ensure that you set all necessary operational access. Also, construct alerts to ensure workload observability needs are met.
A single or at least minimal number of workspaces maximize your workload's operational efficiency. It limits the distribution of your operational and security data, increases visibility into potential issues, makes patterns easier to identify, and minimizes your maintenance requirements.

You might have requirements for multiple workspaces such as multiple tenants, or you might need workspaces in multiple regions to support your availability requirements. So, ensure that you have appropriate processes in place to manage this increased complexity.
Use infrastructure as code (IaC) to deploy and manage your workspaces and associated functions. Use infrastructure as code (IaC) to define the details of your workspaces in ARM templates, Azure BICEP, or Terraform. It lets you use your existing DevOps processes to deploy new workspaces and Azure Policy to enforce their configuration.

Colocating all of your IaC code with your application code helps ensure that your safe deployment practices are maintained for all deployments.
Use Log Analytics workspace insights to track the health and performance of your Log Analytics workspaces, and create meaningful and actionable alerts to be proactively notified of operational issues.

Log Analytics workspace insights provides a unified view of the usage, performance, health, agents, queries, and change log for all your workspaces.

Each workspace has an operation table that logs important activities affecting workspace.
Review the information that Log Analytics insights provides regularly to track the health and operation of each of your workspaces. When you use this information, it lets you create easily understood visualizations like dashboards or reports that operations and stakeholders can use to track the health of your workspaces.

Create alert rules based on this table to be proactively notified when an operational issue occurs. You can use recommended alerts for the workspace to simplify how you create the most critical alert rules.
Practice continuous improvement by frequently revisiting Azure diagnostic settings on your resources, data collection rules, and application log verbosity.

Ensure that you're optimizing your log collection strategy through frequent reviews of your resource settings. From an operational standpoint, look to reduce the noise in your logs by focusing on those logs that provide useful information about a resource's health status.
By optimizing in this manner, you enable operators to investigate and troubleshoot issues when they arise, or perform other routine, improvised, or emergency tasks.

When new diagnostic categories are made available for a resource type, review the types of logs that are emitted with this category to understand whether enabling them might help you optimize your collection strategy. For example, a new category might be a subset of a larger set of activities that are being captured. The new subset might let you reduce the volume of logs coming in by focusing on the activities that are important for your operations to track.

Azure Policy and Azure Advisor

Azure offers no policies nor Azure Advisor recommendations related to the operational excellence of Log Analytics workspaces.

Performance efficiency

Performance Efficiency is about maintaining user experience even when there's an increase in load by managing capacity. The strategy includes scaling resources, identifying and optimizing potential bottlenecks, and optimizing for peak performance.

The Performance Efficiency design principles provide a high-level design strategy for achieving those capacity goals against the expected usage.

Design checklist for Performance Efficiency

Start your design strategy based on the design review checklist for Performance Efficiency for defining a baseline for your Log Analytics workspaces and associated functions.

  • Be familiar with fundamentals of log data ingestion latency in Azure Monitor. There are several factors that contribute to latency when ingesting logs into your workspaces. Many of these factors are inherent to the Azure Monitor platform. Understanding the factors and the normal latency behavior can help you set appropriate expectations within your workload operations teams.
  • Separate your nonproduction and production workloads. Production-specific workspaces mitigate any overhead that nonproduction systems might introduce. It reduces the overall footprint of your workspaces, requiring fewer resources to handle log data processing.
  • Choose the right deployment regions to meet your performance requirements. Deploy your Log Analytics workspace and data collection endpoints (DCEs) close to your workload. Your choice of the appropriate region in which to deploy your workspace and your DCEs should be informed by where you deploy the workload. You might need to weigh the performance benefits of deploying your workspaces and DCEs in the same region as your workload against your reliability requirements if you have already deployed your workload into a region that cannot support those requirements for your log data.

Configuration recommendations for Performance Efficiency

Recommendation Benefit
Configure log query auditing and use Log Analytics workspace insights to identify slow and inefficient queries.

Log query auditing stores the compute time required to run each query and the time until results are returned. Log Analytics workspace insights uses this data to list potentially inefficient queries in your workspace. Consider rewriting these queries to improve their performance. Refer to Optimize log queries in Azure Monitor for guidance on optimizing your log queries.
Optimized queries return results faster and use less resources on the back end, which makes the processes that rely on those queries more efficient as well.
Understand service limits for Log Analytics workspaces.

In certain high-traffic implementations, you might run into service limits that affect your performance and your workspace or workload design. For example, the query API limits the number of records and data volume returned by a query. The Logs Ingestion API limits the size of each API call.

For a complete list of Azure Monitor and Log Analytics workspaces limits and limits specific to the workspace itself, see Azure Monitor service limits.
Understanding the limits that might affect the performance of your workspace helps you design appropriately to mitigate them. You might decide to use multiple workspaces to avoid hitting limits associated with a single workspace.

Weigh the design decisions to mitigate service limits against requirements and targets for other pillars.
Create DCRs specific to data source types inside one or more defined observability scopes. Create separate DCRs for performance and events to optimize the backend processing compute utilization. When you use separate DCRs for performance and events, it helps mitigate backend resource exhaustion. By having DCRs that combine performance events, it forces every associated virtual machine to transfer, process, and run configurations that might not be applicable according to the installed software. An excessive compute resource consumption and errors in processing a configuration might happen and cause the Azure Monitor Agent (AMA) to become unresponsive.

Azure Policy and Azure Advisor

Azure offers no policies nor Azure Advisor recommendations related to the performance of Log Analytics workspaces.

Next step