Azure Well-Architected Framework perspective on Azure Disk Storage
Azure managed disks are a type of Azure Disk Storage that simplify the management of storage for Azure Virtual Machines. Managed disks are block-level storage volumes that Azure manages. They're similar to physical disks in an on-premises server, but they operate in a virtual environment. When you use a managed disk, you must specify the disk size type and configure the disk. After you configure the disk, Azure manages subsequent operations and maintenance tasks.
This article assumes that as an architect, you've reviewed the storage options and chose Azure Disk Storage as the storage service for your workload. The guidance in this article provides architectural recommendations that are mapped to the principles of the Well-Architected Framework pillars.
This guide focuses on how to make decisions about Azure managed disks. But managed disks are a critical dependency of Azure Virtual Machines. As a prerequisite, read and implement the recommendations in Azure Well-Architected Framework perspective on Virtual Machines and scale sets.
Important
How to use this guide
Each section has a design checklist that presents architectural areas of concern along with design strategies localized to the technology scope.
Also included are recommendations for the technology capabilities that can help materialize those strategies. The recommendations don't represent an exhaustive list of all configurations that are available for Azure Disk Storage and its dependencies. Instead, they list the key recommendations mapped to the design perspectives. Use the recommendations to build your proof-of-concept or to optimize your existing environments.
Foundational architecture that demonstrates the key recommendations: Azure Virtual Machines baseline architecture.
Technology scope
This review focuses on the interrelated decisions for the following Azure resources:
- Azure Disk Storage
Reliability
The purpose of the Reliability pillar is to provide continued functionality by building enough resilience and the ability to recover fast from failures.
Reliability design principles provide a high-level design strategy applied for individual components, system flows, and the system as a whole.
Design checklist
Start your design strategy based on the design review checklist for Reliability. Determine its relevance to your business requirements while keeping in mind the features and capabilities of Azure Disk Storage. Extend the strategy to include more approaches as needed.
Review best practices to achieve high availability with managed disks. Optimize your application for high availability by considering these recommendations and how they relate to the configuration of your managed disks and virtual machines (VMs).
Define reliability and recovery targets. Review the Azure service-level agreements (SLAs). The disk types that you attach to your VM affect the VM SLA. For the highest SLA, only use Azure Ultra Disk Storage, Azure Premium SSD v2, or Premium SSD disks for OS and data disks. For guidance about calculating your reliability targets, see Recommendations for defining reliability targets.
Create a recovery plan. Evaluate data-protection features, backup and restore operations, and failover procedures. Decide whether to use Azure Backup, Azure Site Recovery, or create your own backup solution by using incremental disk snapshots or restore points. A custom backup solution increases your costs.
Monitor potential availability problems. Subscribe to the Azure Service Health dashboard. Use disk storage metrics in Azure Monitor to help prevent disk throttling. Manually check VMs to ensure that attached disks don't reach their storage capacity. For guidance about how to integrate these metrics into your overall workload health monitoring strategy, see Health modeling for workloads.
Use failure mode analysis. Consider internal dependencies, such as the availability of virtual networks or Azure Key Vault, to help minimize points of failure.
Recommendations
Recommendation | Benefit |
---|---|
Distribute VMs and disks across multiple availability zones. Use a zone-redundant virtual machine scale set in flexible orchestration mode, or deploy VMs and disks across three availability zones. Use zone balancing to equally spread the instances across zones. |
You provision VM and disk instances in physically separate locations within each Azure region. Each location is tolerant to local failures. Depending on resource availability, you might have an uneven number of instances across zones. Zone balancing supports availability by making sure that, if one zone is down, the other zones have sufficient instances. Two instances in each zone provide a buffer during upgrades. |
Use Ultra Disk Storage, Premium SSD v2, and Premium SSD disks. | Single-instance VMs that use Premium SSD OS disks and Ultra Disk Storage, Premium SSD v2, or Premium SSD data disks have the highest uptime SLA. |
For maximum availability and durability, use a zone-redundant storage (ZRS) disk, especially when you share disks between VMs. | ZRS disks minimize the effect of a failure in an availability zone and increase recoverability from such zonal failures. If a zone fails and your VM remains active, workloads on ZRS disks continue to run. But if an outage does affect your VM, and you want to recover disks before the outage resolves, you can force-detach your ZRS disks from the failed VM. Then the ZRS disks can attach to a different VM. When you share a disk among multiple VMs, use a ZRS disk to prevent the shared disk from becoming a single point of failure. |
Implement one of the backup options. For managed solutions, use Backup or Site Recovery. If you need to curate your own backup solution, use restore points or snapshots. | Identify the ideal backup option for your needs to help maximize your environment's recoverability. |
If you manage your own snapshots, copy them across regions by using scripts. | Use scripts to simplify transferring data from one region to another. Use this option if you can't use Site Recovery. You can still create disaster recovery backups in other regions when you use this option. |
Security
The purpose of the Security pillar is to provide confidentiality, integrity, and availability guarantees to the workload.
The Security design principles provide a high-level design strategy for achieving those goals by applying approaches to the technical design of Azure Disk Storage.
Design checklist
Start your design strategy based on the design review checklist for Security and identify vulnerabilities and controls to improve your security posture. Extend the strategy to include more approaches as needed.
Limit the ability to export or import managed disks. Use this approach to increase the security of your data. To limit export or import capabilities, you can use one of these methods:
- Create a custom role-based access control (RBAC) role that has the permissions necessary to import and export.
- Use Microsoft Entra ID authentication.
- Set up private links.
- Configure an Azure policy.
- Configure a network access policy.
For more information, see Restrict managed disks from being imported or exported.
Take advantage of encryption options. By default, managed disks are encrypted with server-side encryption (SSE), which helps you protect your data and meet organization and compliance commitments. You might need other configurations and options. You can:
- Use SSE with encryption keys that you manage.
- Enable encryption at host.
- Enable double encryption at rest.
For more information, see Server-side encryption of Azure Disk Storage.
Secure your shared access signature (SAS) with Microsoft Entra ID. Microsoft Entra ID provides extra security compared to a shared key and SAS, and it's easier to use. Grant security principals only necessary permissions to perform their tasks.
Protect secrets. Protect secrets, such as customer-managed keys and SAS tokens. We generally don't recommend these forms of authorization. But if you use them, make sure to rotate your keys, set key expirations as early as practical, and securely store these secrets.
Detect threats. Enable Microsoft Defender for Cloud so that you can trigger security alerts when anomalies in activity occur. Defender for Cloud notifies subscription administrators by email. The email includes details about the suspicious activity and recommendations to investigate and remediate threats.
Use tags and labels. Apply tags and labels to important disks to help ensure that you apply the appropriate levels of protection to the disks.
Harden all workload components. Reduce extraneous surface area and tighten configurations to help reduce the likelihood of attacks. Properly secure any related resources that you use with your managed disks, such as backup recovery vaults or Azure key vaults.
Recommendations
Recommendation | Benefit |
---|---|
Use encryption at host for your managed disks whenever possible. | Encryption at host provides end-to-end encryption for environments that have it enabled. Encryption begins on your VM and flows through to its attached disks. |
Apply an Azure Resource Manager lock on the disk. | Lock a disk to help prevent it from being deleted so that you don't lose data. |
Disable traffic to the public endpoints of your disk. Create private endpoints for clients that run in Azure. | Disable traffic to the public endpoints so that traffic travels over the Microsoft backbone network, which helps eliminate exposure to the public internet. |
If possible, use Azure RBAC to limit access to resources and functions. | Use RBAC to avoid tokens or keys that could be compromised. Microsoft Entra ID authenticates the security principal, such as a user, group, managed identity, or service principal. Microsoft Entra ID returns an OAuth 2.0 token. The token is used to authorize a request against the disk service. |
Microsoft discourages the use of SAS tokens. If you must create one, review this list of SAS best practices before you create and distribute it. Set the expiration of SAS tokens to 60 days or less. |
Best practices can help you prevent a SAS token from being leaked and quickly recover from a leak if one occurs. |
Consider using your own encryption key, or a customer-managed key, to help protect the data in your managed disk. | A customer-managed key provides greater flexibility and control if you need it. For example, you can store encryption keys in Key Vault and automatically rotate them. |
Cost Optimization
Cost Optimization focuses on detecting spend patterns, prioritizing investments in critical areas, and optimizing in others to meet the organization's budget while meeting business requirements.
The Cost Optimization design principles provide a high-level design strategy for achieving those goals and making tradeoffs as necessary in the technical design related to Azure Disk Storage.
Design checklist
Start your design strategy based on the design review checklist for Cost Optimization for investments. Fine-tune the design so that the workload is aligned with the budget that's allocated for the workload. Your design should use the right Azure capabilities, monitor investments, and find opportunities to optimize over time.
Understand how Azure Disk Storage is billed. Different disk types are billed in different ways and have different features that can affect billing. To design the most cost-optimized environment, see Understand Azure Disk Storage billing. For exact billing, find the specific pricing details and apply the appropriate settings. For more information, see Managed disks pricing.
Estimate the cost of capacity and operations. Use the pricing calculator to model the costs that are associated with disk types, transactions, and capabilities. Compare the costs that are associated with various regions, account types, namespace types, and redundancy configurations.
Choose a billing model. Evaluate whether a commitment-based model is more cost efficient than a consumption-based model. If you don't know how much capacity you need, start with a consumption-based model, monitor capacity metrics, and evaluate your choice later.
Decide which features you need. Some features, such as snapshots or on-demand bursting, incur extra transaction costs, capacity costs, and other charges. For example, if you enable snapshots, you're billed for the amount of storage that each snapshot uses. When you decide which capabilities your disks need, review the pricing and billing details for those capabilities.
Create guardrails. Create budgets based on subscriptions and resource groups. Use governance policies to restrict resource types, configurations, and locations. You can also use RBAC to block actions that can lead to overspending.
Monitor costs. to ensure that you stay within budgets, compare costs against forecasts, and see where overspending might have occurred. Use the cost analysis feature in the Azure portal. You also can export cost data to a storage account and use Excel or Power BI to analyze that data.
Monitor disk resources. Use sample scripts to search for unattached disks.
Recommendations
Recommendation | Benefit |
---|---|
Carefully select the appropriate disk types for your workloads. Understand the available disk types and their features before you deploy an environment. Then use the pricing calculator to estimate costs. | One of the best ways to reduce costs is to plan for your requirements, and use the pricing calculator to model the environment. |
Use reserved capacity for Premium SSD disks. | Reserved capacity for Premium SSDs reduces the total cost of your environment because you prepay for your capacity at a discount. |
Assess whether the features that existing disks offer can improve performance without switching to another disk size or type. Features like disk bursting or changing performance tiers could improve performance to levels that meet your needs. | Depending on your environment and needs, enabling features to improve your disk performance can be more cost effective than switching to a different disk type. These features incur costs but might incur less costs than different disk types. |
Directly adjust the performance of your Ultra Disk Storage and Premium SSD v2 disks to fit your performance needs. | These two disk types support a set number of adjustments to the disk's performance within a 24 hour period. This setting allows your workloads to be cost efficient while meeting your performance needs. You can increase performance (increase cost) to meet higher demand and then lower performance (decrease cost) when the increase is no longer needed. For example, a transaction-intensive database might need a large amount of input/output operations per second (IOPS) at a small size. Or a gaming application might need a large amount of IOPS but only during peak hours. |
Operational Excellence
Operational Excellence primarily focuses on procedures for development practices, observability, and release management.
The Operational Excellence design principles provide a high-level design strategy for achieving those goals towards the operational requirements of the workload.
Design checklist
Start your design strategy based on the design review checklist for Operational Excellence for defining processes for observability, testing, and deployment related to Azure Disk Storage.
Create maintenance and emergency recovery plans. Evaluate data-protection features, backup operations, and restore operations. Select backup solutions that you can use to recover from regional disasters.
Create internal documentation. Document your organization's standard practices. Incorporate existing Azure documentation to streamline your processes. Include documentation about attaching a disk to Windows or Linux VMs or expanding a disk on Windows or Linux VMs.
Detect threats. Enable Defender for Cloud so that you can trigger security alerts when anomalies in activity occur. Defender for Cloud notifies subscription administrators by email. The email includes details about the suspicious activity and recommendations to investigate and remediate threats.
Recommendations
Recommendation | Benefit |
---|---|
Use Azure Monitor to analyze metrics and create alerts. | Azure Monitor provides insight about how your disks and VMs perform. Use these metrics to ensure that your performance remains optimal. |
Review the available backup options for managed disks. | Understand the available options so that you can select the configuration that best suits your needs. |
Performance Efficiency
Performance Efficiency is about maintaining user experience even when there's an increase in load by managing capacity. The strategy includes scaling resources, identifying and optimizing potential bottlenecks, and optimizing for peak performance.
The Performance Efficiency design principles provide a high-level design strategy for achieving those capacity goals against the expected usage.
Design checklist
Start your design strategy based on the design review checklist for Performance Efficiency. Define a baseline that's based on key performance indicators for Azure Disk Storage.
Choose optimal disk types. Identify the disk types that you need before you deploy your resources. This approach helps you maximize performance and cost efficiency. The five disk types include Ultra Disk Storage, Premium SSD v2, Premium SSD, Azure Standard SSD, and Azure Standard HDD. For the highest performance, use Premium SSD for your VM's OS disk, and use Ultra Disk Storage or Premium SSD v2 for your data disks.
Reduce the travel distance between the client and server. Place data in regions that are closest to connecting clients, ideally in the same region. Default network configurations provide the best performance. Modify network settings only to improve security. In general, network settings don't decrease travel distance and don't improve performance.
Collect performance data. Monitor your disks and VMs to identify performance bottlenecks that occur from throttling. For more information, see Storage IO metrics.
Benchmark your disks. Create a test environment and determine whether it meets your needs and expectations. For more information, see Benchmark a disk.
Recommendations
Recommendation | Benefit |
---|---|
Create disks in the same region as the VM that you attach them to. If clients from a different region don't require the same data, create a separate disk in each region. | Reduce the physical distance between VMs and their disks, services, and on-premises clients to help improve performance and reduce network latency. This approach also reduces cost for applications that you host in Azure because bandwidth usage within a single region is free. |
For workloads and solutions that require the lowest latency, such as e-commerce workloads or databases, use a Premium SSD OS disk and Ultra Disk Storage or Premium SSD v2 data disks. | This configuration offers the best reliability and highest SLA and performance. |
Use Azure metrics to monitor your environment and help prevent disk throttling. | Use Azure metrics to identify disks that are being throttled and address them. Throttling leads to suboptimal performance and problems like increased latency. |
For disks that are being throttled, evaluate whether changing to a larger disk size or changing to a more performant disk is better for your needs. For Premium SSD disks that are being throttled, if you have short-term bursts of demand, enable on-demand bursting. For longer-term extended demand, change the tier of the disk or evaluate whether Premium SSD v2 or Ultra Disk Storage disks better fit your needs. |
Place applications on disks that aren't being throttled to help ensure optimal performance without increased latency. |
When you upload a virtual hard disk (VHD), use the Add-AzVHD Azure PowerShell command. | The Add-AzVHD Azure PowerShell command automates most of the upload process to help streamline the process. |
For existing deployments that are on-premises or in another public cloud provider, use Azure Migrate and Modernize. | Azure Migrate and Modernize can evaluate your deployment and provide curated suggestions for the best sizing of disks and VMs in a prospective Azure deployment. |
Azure policies
Azure provides an extensive set of built-in policies related to Azure Disk Storage and its dependencies. Some of the preceding recommendations can be audited through Azure Policy. For example, you can check whether:
- Public network access to your managed disks is disabled.
- Backup is enabled.
- Double encryption is enabled.
- Specific disk encryption sets are used with your disks.
- Customer-managed keys are used.
- Managed disks are zone resilient.
- Notification policies for key expiration are configured.
- Autorotate for customer-managed keys is enabled.
For comprehensive governance, review the Azure Policy built-in definitions for Azure compute and other policies that might impact the security of the storage infrastructure.
Azure Advisor recommendations
Azure Advisor is a personalized cloud consultant that helps you follow best practices to optimize your Azure deployments. Here are some recommendations that can help you improve the reliability, security, cost effectiveness, performance, and operational excellence of Azure Disk Storage.