Bewerken

Delen via


Azure Well-Architected Framework perspective on Azure Local

Azure Local extends Azure to customer-owned infrastructure, enabling local execution of modern and traditional applications across distributed locations. This solution offers a unified management experience on a single control plane and supports a wide range of validated hardware from trusted Microsoft partners. You can use Azure Local and Azure Arc capabilities to keep business systems and application data on-premises to address data sovereignty, regulation and compliance, and latency requirements.

This article assumes you have an understanding of hybrid systems and have working knowledge of Azure Local. The guidance in this article provides architectural recommendations that are mapped to the principles of the Azure Well-Architected Framework pillars.

Important

How to use this guide

Each section has a design checklist that presents architectural areas of concern along with design strategies localized to the technology scope.

Also included are recommendations on the technology capabilities that can help materialize those strategies. The recommendations don't represent an exhaustive list of all configurations available for Azure Local and its dependencies. Instead, they list the key recommendations mapped to the design perspectives. Use the recommendations to build your proof-of-concept or optimize your existing environments.

Foundational architecture that demonstrates the key recommendations:
Azure Local baseline reference architecture.

Technology scope

This review focuses on the interrelated decisions for the following Azure resources:

  • Azure Local (platform), version 23H2 and later
  • Azure Arc VMs (workload)

Note

This article covers the preceding scope and provides checklists and recommendations that are organized by platform architecture and workload architecture. Platform concerns are the responsibility of the platform administrators. Workload concerns are the responsibility of the workload operator and application developers. These roles and responsibilities are distinct and can be owned by separate teams or individuals. Keep that distinction in mind when you apply the guidance.

This guidance doesn't focus on specific resource types that you can deploy on Azure Local, such as Azure Arc VMs, Azure Kubernetes Service (AKS), and Azure Virtual Desktop. When you deploy these resource types on Azure Local, refer to the respective workload guidance to design solutions that meet your business requirements.

Reliability

The purpose of the Reliability pillar is to provide continued functionality by building enough resilience and the ability to recover fast from failures.

The Reliability design principles provide a high-level design strategy applied for individual components, system flows, and the system as a whole.

In hybrid cloud deployments, the goal is to reduce the effects of one component failure. Use these design checklists and configuration suggestions to lessen the impact of a component failure for workloads that you deploy on Azure Local.

It's important to distinguish between platform reliability and workload reliability. Workload reliability has a dependency on the platform. Application owners or developers must design applications that can deliver the defined reliability targets.

Design checklist

Start your design strategy based on the design review checklist for Reliability. Determine its relevance to your business requirements while keeping in mind the performance of Azure Local. Extend the strategy to include more approaches as needed.

  • (Azure Local platform architecture and workload architecture) Define workload reliability targets.

    • Set your service-level objectives (SLOs) so that you can evaluate availability targets. Calculate SLOs as a percentage, such as 99.9%, 99.95%, or 99.995%, that reflects workload uptime. Keep in mind that this calculation isn't based only on the platform metrics that the Azure Local instance or workload emits. To get a comprehensive target measurement, factor in nuanced factors that are quantified, such as expected downtime during releases, routine operations, supportability, or other workload-specific or organizational-specific factors.

    • Microsoft-provided service-level agreements (SLAs) often influence SLO calculations. But Microsoft doesn't provide an SLA for the uptime and connectivity of Azure Local instances or the deployed workload, because Microsoft doesn't control the customer datacenter reliability (such as power and cooling) or the people and processes that administer the platform.

  • (Azure Local platform architecture) Consider how performance and operations affect reliability.

    Degraded performance of the instance or its dependencies can make the Azure Local platform unavailable. For example:

    • Without proper workload capacity planning, it's challenging to rightsize Azure Local instances in the design phase, which is a requirement so that the workload can meet the desired reliability targets. Use the Azure Local sizer tool during instance design. Consider the "N+1 minimum requirement for number of machines" if you require highly available VMs. For business-critical or mission-critical workloads, consider using a "N+2 number of machines" for the instance size if resiliency is paramount.

    • The reliability of the platform depends on how well the critical platform dependencies, such as physical disk types, perform. You must choose the right disk types for your requirements. For workloads that need low-latency and high-throughput storage, we recommend an all-flash (NVMe/SSD only) storage configuration. For general purpose compute, a hybrid storage (NVMe or SSDs for cache and HDDs for capacity) configuration might provide more storage space. But the tradeoff is that spinning disks have significantly lower performance if your workload exceeds the cache working set, and HDDs have a much lower mean time between failure value compared to NVMe/SSDs.

      Performance Efficiency describes these examples in more detail.

    Improper Azure Local operations can affect patching and upgrades, testing, and consistency of deployments. Here are some examples:

    • If the Azure Local platform doesn't evolve with the latest hardware original equipment manufacturer (OEM) firmware, drivers, and innovations, the platform might not take advantage of the latest resiliency features. Apply hardware OEM driver and firmware updates regularly. For more information, see Azure Local solution catalog.

    • You must test the target environment for connectivity, hardware, and identity and access management before your deployment. Otherwise, you might deploy the Azure Local solution to an unstable environment, which can create reliability problems. You can use the environmental checker tool in standalone mode to detect problems, even before the instance hardware is available.

      For operational guidance, see Operational Excellence.

  • (Azure Local platform architecture) Provide fault tolerance to the instance and its infrastructure dependencies.

    • Storage design choices. For most deployments, the default option to "automatically create workload and infrastructure volumes" is sufficient. If you select the advanced option: "create required infrastructure volumes only", configure the appropriate volume fault tolerance within Storage Spaces Direct based on your workload requirements. These decisions influence the performance, capacity, and resiliency capabilities of the volumes. For example, a three-way mirror increases reliability and performance for instances with three or more machines. For more information, see Fault tolerance for storage efficiency and Create Storage Spaces Direct virtual disks and volumes.

    • Network architecture. Use a validated network topology to deploy Azure Local. Multi-machine instances, with four or more physical machines, require the "storage switched" design. Instances with two or three machines can optionally use the "storage switchless" design. Regardless of the instance size, we recommend that you use dual top of rack (ToR) switches for the management and compute intents (north and south uplinks) to provide increased fault tolerance. The dual ToR topology also provides resiliency during switch servicing (firmware update) operations. For more information, see Validated network topologies.

  • (Workload architecture) Build redundancy to provide resiliency.

    • Consider a workload that you deploy on a single Azure Local instance as a locally redundant deployment. The instance provides high availability at the platform level, but you must remember that you deploy the instance "in a single rack". Therefore, for business-critical or mission-critical use cases, we recommend that you deploy multiple instances of a workload or service across two or more separate Azure Local instances, ideally in separate physical locations.

    • Use industry-standard high-availability patterns for workloads, for example a design that provides active/passive synchronous or asynchronous data replication (such as SQL Server Always On). Another example is an external network load balancing (NLB) technology that can route user requests across the multiple workload instances that run on Azure Local instances that you deploy in separate physical locations. Consider using a partner external NLB device. Or evaluate the load balancing options that support traffic routing for hybrid and on-premises services, such as an Azure Application Gateway instance that uses Azure ExpressRoute or a VPN tunnel to connect to an on-premises service.

      For more information, see Deploy workloads instances across multiple Azure Local instances.

  • (Workload architecture) Plan and test recoverability based on your workload recovery point objective (RPO) and recovery time objective (RTO) targets.

    Have a well-documented disaster recovery plan. Test the recovery steps regularly to ensure that your business continuity plans and processes are valid. Determine whether Azure Site Recovery is a viable choice for protecting VMs that run on Azure Local. For more information, see Protect VM workloads with Azure Site Recovery on Azure Local (preview).

  • (Workload architecture) Configure and regularly test workload backup and restore procedures.

    Business requirements for data recovery and retention drive the strategy for workload backups. A comprehensive strategy includes considerations for workload operating system (OS) and application persistent data, with the ability to restore individual (point-in-time) file-level and folder-level data. Configure the backup retention policies based on your data recovery and compliance requirements, which determine the number and age of available data recovery points. Explore Azure Backup as an option to enable host-level or VM guest-level backups for Azure Local. Review data protection solutions from Backup independent software vendor partners where relevant. For more information, see Azure Backup guidance and best practices and Azure Backup for Azure Local.

Recommendations

Recommendation Benefit
Reserve the equivalent of one capacity disk worth of space per machine within the Storage Spaces Direct storage pool. If you choose to create workload volumes after you deploy an Azure Local instance (Advanced option: "create required infrastructure volumes only"), we recommend that you leave 5% to 10% of the total pool capacity unallocated in the storage pool. This reserved and unused, or free, capacity enables Storage Spaces Direct to repair "in-place" when a physical disk fails, which improves data resiliency and performance if a physical disk failure occurs.
Ensure that all physical machines have network access to the list of required outbound HTTPS endpoints for Azure Local and Azure Arc. To reliably manage, monitor, and operate Azure Local instances or workload resources, the required outbound network endpoints must have access, either directly or through a proxy server. A temporary interruption doesn't affect the running status of the workload but might affect manageability.
If you opt to create workload volumes (virtual disks) manually, use the most appropriate resiliency type to maximize workload resiliency and performance. For any user volumes that you create manually after you deploy the instance, create a storage path for the volumes in Azure. The volume can store workload VM configuration files, VM virtual hard disks (VHDs), and VM images via the storage path. For Azure Local instances with three or more machines, consider using a three-way mirror to provide the highest resiliency and performance capabilities. We recommend that you use mirrored volumes for business-critical or mission-critical workloads.
Consider implementing workload anti-affinity rules to ensure that the VMs that host multiple instances of the same service run on separate physical hosts. This concept is similar to "availability sets" in Azure. Make all components redundant. For business-critical or mission-critical workloads, use multiple Azure Arc VMs or Kubernetes replica sets or pods to deploy multiple instances of your applications or services. This approach increases resiliency if an unplanned outage of a single physical machine occurs.

Security

The purpose of the Security pillar is to provide confidentiality, integrity, and availability guarantees to the workload.

The Security design principles provide a high-level design strategy for achieving those goals by applying approaches to the technical design of Azure Local.

Azure Local is a secure-by-default product that has more than 300 security settings enabled during the cloud deployment process. Default security settings provide a consistent security baseline to ensure that devices start in a known good state. And you can use drift protection controls to provide at-scale management.

Default security features in Azure Local include hardened OS security settings, Windows Defender Application Control, volume encryption via BitLocker, secret rotation, local built-in user accounts, and Microsoft Defender for Cloud. For more information, see Review security features.

Design checklist

Start your design strategy based on the design review checklist for Security. Identify vulnerabilities and controls to improve the security posture. Extend the strategy to include more approaches as needed.

  • (Azure Local platform architecture) Review the security baselines. Azure Local and security standards provide baseline guidance to strengthen the security posture of the platform and hosted workloads. If your workload needs to comply with specific regulatory compliance regulations, factor in the regulatory security standards, such as Payment Card Industry Data Security Standards and Federal Information Processing Standard 140.

    Azure Local platform-provided default settings enable security features, including identity controls, network filtering, and encryption. These settings form a good security baseline for a newly provisioned Azure Local instance. You can customize each setting based on your organizational security requirements.

    Make sure that you detect and protect against undesired security configuration drift.

  • (Azure Local platform architecture) Detect, prevent, and respond to threats. Continuously monitor the Azure Local environment and protect against existing and evolving threats.

    We recommend that you enable Defender for Cloud on Azure Local. Enable the basic Defender for Cloud plan (free tier) by using Defender Cloud Security Posture Management to monitor and identify steps that you can take to secure your Azure Local platform, along with other Azure and Azure Arc resources.

    To benefit from the enhanced security features, including security alerts for individual servers and Azure Arc VMs, enable Microsoft Defender for Servers on your Azure Local instance machines and Azure Arc VMs.

    • Use Defender for Cloud to measure the security posture of Azure Local machines and workloads. Defender for Cloud provides a single pane of glass experience to help manage security compliance.

    • Use Defender for Servers to monitor the hosted VMs for threats and misconfigurations. You can also enable endpoint detection and response capabilities on Azure Local machines.

    • Consider aggregating security and threat intelligence data from all sources into a centralized security information and event management (SIEM) solution, such as Microsoft Sentinel.

  • (Azure Local platform architecture and workload architecture) Create segmentation to contain the blast radius. There are several strategies to attain segmentation.

    • Identity. Keep roles and responsibilities for the platform and workload separate. Allow only authorized identities to carry out the specific operations that align with their designated roles. Azure Local platform administrators use both Azure and local domain credentials to do platform duties. Workload operators and application developers manage workload security. To simplify delegating permissions, use Azure Local built-in role-based access control (RBAC) roles, such as 'Azure Local Administrator' for platform administrators and 'Azure Local VM Contributor' or 'Azure Local VM Reader' for workload operators. For more information about specific built-in role actions, see Azure RBAC documentation for hybrid and multicloud roles.

    • Network. Isolate networks if needed. For example, you can provision multiple logical networks that use separate virtual local area networks (vLANs) and network address ranges. When you use this approach, ensure that the management network can reach each logical network and vLAN so that Azure Local machines can communicate with the vLAN networks through the ToR switches or gateways. This configuration is required for availability management of the workload, such as allowing infrastructure management agents to communicate with the workload guest OS.

    • Review Recommendations for building a segmentation strategy for additional information.

  • (Azure Local platform architecture and workload architecture) Use a trusted identity provider to control access. We recommend Microsoft Entra ID for all authentication and authorization purposes. You can join a workload to an on-premises Windows Server Active Directory domain if required. Take advantage of features that support strong passwords, multifactor authentication, RBAC, and controls for the management of secrets.

  • (Azure Local platform architecture and workload architecture) Isolate, filter, and block network traffic. You might have a workload use case that requires virtual networks, microsegmentation via network security groups, network quality of service policies, or virtual appliance chaining so that you can bring in partner appliances for filtering. If you have such a workload, see software-defined network considerations for network reference patterns for a list of the supported features and capabilities that Network Controller provides.

  • (Workload architecture) Encrypt data to protect against tampering. Encrypt data in transit, data at rest, and data in use.

    • Data-at-rest encryption is enabled on data volumes that you create during deployment. These data volumes include both infrastructure volumes and workload volumes. For more information, see Manage BitLocker encryption.

    • Use trusted launch for Azure Arc VMs to improve security of Gen 2 VMs by using OS features of modern operating systems, such as Secure Boot, which can use a virtual Trusted Platform Module.

  • Operationalize secret management. Based on your organizational requirements, change the credentials that are associated with the deployment user identity for Azure Local. For more information, see Manage secrets rotation.

  • (Azure Local platform architecture) Enforce security controls. Use Azure Policy to audit and enforce built-in policies, such as "Application control policies should be consistently enforced" or "Encrypted volumes should be implemented". You can use these Azure policies to audit security settings and assess the compliance status of Azure Local. For examples of the available policies, see Azure policies.

  • (Workload architecture) Improve workload security posture with built-in policies. To assess Azure Arc VMs that run on Azure Local, you can apply built-in policies via the security benchmark, Azure Update Manager, or the Azure Policy guest configuration extension. You can use various policies to check the following conditions:

    • Log Analytics agent installation
    • Out-of-date system updates that need to be up to date with the latest security patches
    • Vulnerability assessment and potential mitigations
    • Use of secure communication protocols

Recommendations

Recommendation Benefit
Use the security baseline and drift controls settings to apply and maintain security settings on instance machines. These configurations help to protect against unwanted changes and drift because they automatically refresh security settings every 90 minutes to enforce the intended security posture of Azure Local.
Use Windows Defender Application Control in Azure Local. Windows Defender Application Control reduces the attack surface of Azure Local. Use the Azure portal or PowerShell to view policy settings and control policy modes. Windows Defender Application Control policies help to control which drivers and apps are allowed to run on your system.
Enable volume encryption via BitLocker for data encryption-at-rest protection. BitLocker protects OS and data volumes by encrypting the instance shared volumes that are created on the Azure Local. BitLocker uses XTS-AES 256-bit encryption. We recommended that you keep the volume encryption default setting enabled during Azure Local cloud deployment for all data volumes.
Export BitLocker recovery keys to store them in a secure location that's external from the Azure Local instance. You might need BitLocker keys during specific troubleshooting or recovery actions. We recommend that you export, save, and back up encrypt keys for OS and data volumes from each Azure Local instance via the 'Get-AsRecoveryKeyInfo' PowerShell cmdlet. Save the keys in a secure external location, such as Azure Key Vault.
Use a SIEM solution to increase security monitoring and alerting capabilities. To do so, you can onboard Azure Arc-enabled servers (Azure Local platform machines) to Microsoft Sentinel. Alternatively, if you use a different SIEM solution, configure syslog forwarding of security events to the chosen solution. Forward security event data by using Microsoft Sentinel or syslog forwarding to provide alerting and reporting capabilities through integration with a customer-managed SIEM solution.
Use Server Message Block (SMB) signing to enhance data-in-transit protection, which is enabled in the "default security settings." SMB signing allows you to digitally sign SMB traffic between an Azure Local platform and systems external to the platform (north or south). Configure signing for external SMB traffic between the Azure Local platform and other systems to help prevent relay attacks.
Use the SMB encryption setting to enhance data-in-transit protection, which is enabled in the "default security settings." The SMB encryption for in-instance traffic setting controls the encryption of traffic between physical machines in the Azure Local instance (east or west) on your storage network.

Cost Optimization

Cost Optimization focuses on detecting spend patterns, prioritizing investments in critical areas, and optimizing in others to meet the organization's budget while meeting business requirements.

The Cost Optimization design principles provide a high-level design strategy for achieving those goals and making tradeoffs as necessary in the technical design related to Azure Local and its environment.

Design checklist

Start your design strategy based on the design review checklist for Cost Optimization for investments. Fine-tune the design so that the workload is aligned with the budget that's allocated for the workload. Your design should use the right Azure capabilities, monitor investments, and find opportunities to optimize over time.

Azure Local incurs costs for hardware, software licensing, workloads, guest VMs (Windows Server or Linux) licensing, and other integrated cloud services, such as Azure Monitor and Defender for Cloud.

  • (Azure Local platform architecture and workload architecture) Estimate realistic costs as part of cost modeling. Use the Azure pricing calculator to select and configure services like Azure Local, Azure Arc, and AKS on Azure Local. Experiment with various configurations and payment options to model costs.

  • (Azure Local platform architecture and workload architecture) Optimize the cost of Azure Local hardware. Choose a hardware OEM partner that aligns with your business and commercial requirements. To explore the certified list of validated machines, integrated systems, and premier solutions, see Azure Local solutions catalog. Communicate your workload characteristics, size, quantity, and performance with your hardware partner so that you can rightsize a cost-effective hardware solution for the Azure Local machine and instance size.

  • (Azure Local platform architecture) Optimize your licensing costs. Azure Local software is licensed and billed on a "per physical CPU core" basis. Use existing on-premises core licenses with Azure Hybrid Benefit to reduce licensing costs for Azure Local workloads, such as Azure Arc VMs that run Windows Server, SQL Server, or AKS and Azure Arc-enabled Azure SQL Managed Instance. For more information, see Azure Hybrid Benefit cost calculator.

  • (Azure Local platform architecture) Save on environment costs. Evaluate whether the following options can help optimize your resource usage.

    • Take advantage of discount programs that Microsoft offers. Consider using Azure Hybrid Benefit to reduce the cost to run Azure Local and Windows Server workloads. For more information, see Azure Hybrid Benefit for Azure Local.

    • Explore promotional offers. Take advantage of the Azure Local 60-day free trial after registration for initial proof of concepts or validations.

  • (Azure Local platform architecture) Save on operational costs.

    • Evaluate technology options for patching, updating, and other operations. Update Manager is free for Azure Local instances that have Azure Hybrid Benefit and Azure Arc VM management enabled. For more information, see Update Manager FAQ and Update Manager pricing.

    • Evaluate costs related to observability. Set up alert rules and data collection rules (DCRs) to meet your monitoring and auditing needs. The amount of data that your workload ingests, processes, and retains directly influences costs. Optimize by using smart retention policies, limiting the number and frequency of alerts, and choosing the right storage tier for storing logs. For more information, see Cost Optimization guidance for Log Analytics.

  • (Workload architecture) Evaluate density over isolation. Use AKS on Azure Local to improve density and simplify workload management so that you can enable containerized applications to scale across multiple datacenter or edge locations. For more information, see AKS on Azure Local pricing.

Recommendations

Recommendation Benefit
Use Azure Hybrid Benefit for Azure Local if you have Windows Server Datacenter licenses with Software Assurance. With Azure Hybrid Benefit for Azure Local, you can maximize the value of your on-premises licenses and modernize your existing infrastructure to Azure Local at no additional cost.
Choose either the Windows Server subscription add-on or bring your own license to license and activate the Windows Server VMs and use them on Azure Local. For more information, see License Windows Server VMs on Azure Local. While you can use any existing Windows Server licenses and activation methods available, optionally, you can enable "Windows Server subscription add-on" available for Azure Local only to subscribe Windows Server guest licenses through Azure which is charged for the total number of physical cores in the Azure Local instance.
Use the Azure verification for VMs benefit extended to Azure Local so that supported Azure-exclusive workloads can work outside of the cloud. This benefit is enabled by default on Azure Local version 23H2 or later. Use this benefit so that the VMs can operate in other Azure environments and workloads can benefit from offers that are available only in Azure, such as Extended Security Updates enabled by Azure Arc.

Operational Excellence

Operational Excellence primarily focuses on procedures for development practices, observability, and release management.

The Operational Excellence design principles provide a high-level design strategy for achieving those goals for the operational requirements of the workload.

Monitoring and diagnostics are crucial. You can use metrics to measure performance statistics and to troubleshoot and remediate problems quickly. For more information about how to troubleshoot problems, see Operational Excellence design principles and Collect diagnostic logs for Azure Local.

Design checklist

Start your design strategy based on the design review checklist for Operational Excellence for defining processes for observability, testing, and deployment related to Azure Local.

  • (Azure Local platform architecture) Increase supportability of Azure Local. Observability is enabled by default at the time of deployment. These capabilities enhance the supportability of the platform. Telemetry and diagnostic information is shared securely from the platform by using the AzureEdgeTelemetryAndDiagnostics extension, which is installed on all Azure Local machines by default. For more information, see Azure Local observability.

  • (Azure Local platform architecture) Use Azure services to reduce operational complexity and increase management scale. Azure Local is integrated with Azure to enable services such as Update Manager for patching the platform and Azure Monitor for monitoring and alerting. You can use Azure Arc and Azure Policy to manage security configuration and compliance auditing. Implement Defender for Cloud to help manage cyber threats and vulnerability. Use Azure as the control plane for these operational processes and procedures to help reduce complexity, improve efficiencies of scale, and improve management consistency.

  • (Workload architecture) Plan IP address network range requirements for workloads in advance. Azure Local provides a platform to deploy and manage virtualized or containerized workloads. Also consider the IP address requirements for logical networks that your workload uses. Review these resources:

  • (Workload configuration) Enable monitoring and alerting for workloads that you deploy on Azure Local. You can use Azure Monitor for virtual machines, or VM Insights for Arc VMs, or use Container Insights and managed Prometheus AKS clusters.

    Evaluate whether you should use a centralized Log Analytics workspace for your workload. For an example of a shared log sink (data location), see Workload management and monitoring recommendations.

  • (Azure Local platform architecture) Use proper validation techniques for a safe deployment. Use the environmental checker tool in standalone mode to assess the readiness of the target environment before you deploy an Azure Local solution. This tool validates the proper configuration of required connectivity, hardware, Windows Server Active Directory, networks, and Azure Arc integration prerequisites.

  • (Azure Local platform architecture) Get current and stay current. Use the Azure Local solution catalog to stay current with the latest hardware OEM innovations for Azure Local instance deployments. Consider using premium solutions to benefit from extra integration, turn-key deployment capabilities, and a simplified update experience.

    Use Update Manager to update the platform and manage the OS, core agents, and services, including solution extensions. Stay current, and consider using the "Enable automatic upgrade" setting where possible for extensions.

Recommendations

Recommendation Benefit
Enable Monitor Insights on Azure Local instances to enhance monitoring and alerting by using native Azure capabilities.

Insights can monitor key Azure Local features by using the instance performance counters and event log channels that are collected by the DCR.

For certain hardware infrastructure, such as Dell APEX, you can visualize hardware events in real time.

For more information, see Feature workbooks.
Azure manages Insights, so it's always up to date, it's scalable across multiple instances, and it's highly customizable.

Insights provides access to default workbooks with basic metrics, along with specialized workbooks that are created for monitoring key features of Azure Local. This feature provides near real-time monitoring. You can create graphs and customized visualization by using aggregation and the filter functionality. You can also configure custom alert rules.

The cost of Insights is based on the quantity of data ingested and the data retention settings of the Log Analytics workspace. When you enable Azure Local Insights, we recommended that you use the DCR created by the Insights creation experience. The prefix of the DCR name is AzureStackHCI-. It's configured to collect only the required data.
Set up alerts, and configure the alert processing rules based on your organizational requirements. Get notified of changes in health, metrics, logs, or other types of observability data.

- Health alerts
- Log alerts
- Metric alerts

For more information, see Recommended rules for metric alerts.
Integrate Monitor alerts with Azure Local to get several key benefits at no extra cost. Get near real-time monitoring and customize alerts to notify the right team or admin for remediation.

You can collect a comprehensive list of metrics for compute, storage, and network resources in Azure Local. Perform advanced logic operations on your log data and evaluate metrics of your Azure Local instance at regular intervals.
Use the update feature to integrate and manage various aspects of the Azure Local solution in one place. For more information, see About updates in Azure Local. The update orchestrator is installed during the initial Azure Local instance deployment. This feature automates updates and management operations. To keep Azure Local in a supported state, make sure that you update your instances on a regular cadence to move to new baseline builds when they become available. This method provides new capabilities and improvements to the platform.

For more information about release trains, the cadence of updates, and the support window of each baseline build, see Azure Local version 23H2 release information.
To help with hands-on skilling, labs, training events, product demos, or proof-of-concept projects, consider using Jumpstart HCIBox. Rapidly deploy Azure Local without the need for physical hardware by using a VM on Azure to deploy the solution. HCIBox supports Azure Local version 23H2 to enable rapid testing and evaluation of the latest capabilities of Azure edge products, such as native Azure Arc and AKS integration in a self-contained sandbox.

You can deploy this sandbox to an Azure subscription by using a VM that supports nested virtualization to emulate an Azure Local instance inside an Azure VM. Get Azure Local features like the new cloud deployment feature with minimal manual effort.

For more information, see Microsoft Tech Community blog.

Performance Efficiency

Performance Efficiency is about maintaining user experience even when there's an increase in load by managing capacity. The strategy includes scaling resources, identifying and optimizing potential bottlenecks, and optimizing for peak performance.

The Performance Efficiency design principles provide a high-level design strategy for achieving those capacity goals against the expected usage.

Design checklist

Start your design strategy based on the design review checklist for Performance Efficiency. Define a baseline that's based on key indicators for Azure Local.

  • (Azure Local platform architecture) Use the Azure Local-validated hardware or integrated systems from OEM partner offerings. Consider using the premium solution builders in the Azure Local catalog to optimize the performance of your Azure Local environment.

  • (Azure Local platform storage architecture) Choose the right physical disk types for the Azure Local machines based on your workload performance and capacity requirements. For high-performance workloads that require low latency and high-throughput storage, consider using an all-flash (NVMe/SSD only) storage configuration. For general purpose compute or large storage capacity requirements, consider using hybrid storage (SSD or NVMe for cache tier and HDDs for capacity tier), which might provide increased storage capacity.

  • (Azure Local platform architecture) Use the Azure Local sizer tool during the instance design (pre-deployment) phase. Azure Local instances should be sized appropriately by using the workload capacity, performance, and resiliency requirements as inputs. The size determines the maximum number of physical machines that can be offline simultaneously (cluster quorum), such as any planned (maintenance) or unplanned (power or hardware failure) events. For more information, see Cluster quorum overview.

  • (Azure Local platform architecture) Use all-flash (NVMe or SSD) based solutions for workloads that have high-performance or low-latency requirements. These workloads include but are not limited to highly transactional database technologies, production AKS clusters, or any mission-critical or business-critical workloads with low-latency or high-throughput storage requirements. Use all-flash deployments to maximize storage performance. All-NVMe or all-SSD configurations (especially at a very small scale) improve storage efficiency and maximize performance because no drives are used as a cache tier. For more information, see All-flash-based storage.

  • (Azure Local platform architecture) Establish a performance baseline for Azure Local instance storage before you deploy production workloads. Configure Monitor Azure Local features with Insights to monitor the performance of a single Azure Local instance or multiple instances simultaneously.

  • (Azure Local platform architecture) Consider using the Monitor for Resilient File System (ReFS) deduplication and compression feature after you enable Insights for the Azure Local instance. Determine whether you should use this feature based on your workload storage usage and capacity requirements. This feature provides monitoring for ReFS deduplication and compression savings, performance impact, and jobs. For more information, see Monitor ReFS deduplication and compression.

    As a minimum requirement, plan to reserve 1 x physical machines (N+1) worth of capacity across the instance to ensure that instance machines can be drained when they perform updates via Update Management. Consider reserving 2 physical machines (N+2) machines work of capacity for business-critical or mission-critical use cases.

Recommendations

Recommendation Benefit
If you select the advanced option to "create infrastructure volumes only" during Azure Local instance deployment, we recommend that you create the virtual disks by using mirroring when you create workload volumes for performance-intensive workloads. This recommendation benefits workloads that have strict latency requirements or that need high throughput with a mix of random read and write input/output operations per second (IOPs), such as SQL Server databases, Kubernetes clusters, or other performance-sensitive VMs. Deploy the workload VHDs on volumes that use mirroring to maximize performance and resiliency. Mirroring is faster than any other resiliency type.
Consider using DiskSpd to test workload storage performance capabilities of the Azure Local instance.

You can also use VMFleet to generate load and measure the performance of a storage subsystem. Evaluate whether you should use VMFleet for measuring storage subsystem performance.
Establish a baseline for Azure Local instance performance before you deploy production workloads. DiskSpd allows administrators to test the storage performance of the instance by using various command line parameters. The main function of DiskSpd is to issue read and write operations and output performance metrics, such as latency, throughput, and IOPs.

Tradeoffs

There are design tradeoffs with the approaches described in the pillar checklists. Here are some examples of advantages and drawbacks.

Building redundancy increases costs

  • Understand your workload's requirements up front, such as the workload RTO and RPO targets and storage performance requirements (IOPs and throughput), when you design and procure the hardware for an Azure Local solution. To deploy highly available workloads, we recommend a minimum of a three-machine instance, which enables three-way mirroring for workload volumes and data. For the compute resources, ensure that you deploy a minimum of "N+1 number of physical machines", which reserves the capacity of a "single machine worth of space" in the instance at all times. For business-critical or mission-critical workloads, consider reserving "N+2 machines worth of capacity" to provide increased resiliency. For example, if two machines in the instance are offline, the workload can remain online. This approach provides increased resiliency for a scenario such as, if a machine running workload goes offline during a planned update procedure (resulting in two machines being offline simultaneously).

  • For business-critical or mission-critical workloads, we recommend that you deploy two or more separate Azure Local instances and deploy multiple instances of your workload services across the separate instances. Use a workload design pattern that takes advantage of data replication and application load balancing technologies. For example, SQL Server always-on availability groups use synchronous or asynchronous database replication to achieve low RTO and RTO targets across separate instances in different datacenters.

  • Consequently, an increase in workload resiliency and a decrease in RTO and RPO targets increases costs and requires well-architected applications and operational rigor.

Providing scalability without effective workload planning increases costs

  • Incorrect instance sizing can lead to insufficient capacity or reduced return on investment (ROI) if the hardware is overprovisioned. Both scenarios affect costs.

  • Increased capacity equals higher costs. During the Azure Local instance design phase, adequate planning is required to rightsize the capabilities and number of instance machines based on workload capacity requirements. Therefore, you must understand the workload requirements (vCPUs, memory, storage, and X number of VMs) and allow for some extra headroom in addition to projected growth. You can perform an add-machine gesture when you use a "storage switched" design. But it can take a long time to get more hardware after your deployment. And an add-note gesture is more complex than sizing the instance hardware and number of machines (maximum 16 machines) appropriately during the initial deployment.

  • There are disadvantages if you overprovision the machine hardware specification and select the incorrect number of machines (size of the instance). For example, if the workload requirements are much smaller than the instance's overall capacity and the hardware is underused throughout the hardware warranty period, the ROI value might decrease.

Azure policies

Azure provides an extensive set of built-in policies related to Azure Local and its dependencies. Some of the preceding recommendations can be audited through Azure Policy. For example, you can check whether:

  • Host and VM networking should be protected.
  • Encrypted volumes should be implemented.
  • Application control policies should be consistently enforced.
  • Secured-core requirements should be met.

Review the Azure Local built-in policies. Defender for Cloud has new recommendations that show the compliance state for the built-in policies. For more information, see Built-in policies for Azure Security Center.

If your workload runs on Azure Arc VMs that you deploy on Azure Local, consider built-in policies, such as denying the creation or modification of Extended Security Updates licenses. For more information, see Built-in policy definitions for Azure Arc-enabled workloads.

Consider creating custom policies to provide extra governance for both the Azure Local resources and Azure Arc VMs that you deploy on an Azure Local instance. For example:

  • Auditing Azure Local host registration with Azure
  • Ensuring that hosts run the latest OS version
  • Checking for required hardware components and network configurations
  • Verifying the enablement of necessary Azure services and security settings
  • Confirming the installation of required extensions
  • Assessing the deployment of Kubernetes clusters and AKS integration

Azure Advisor recommendations

Azure Advisor is a personalized cloud consultant that helps you follow best practices to optimize your Azure deployments. Here are some recommendations that can help you improve the reliability, security, cost effectiveness, performance, and operational excellence of your VMs.

Next steps