Reliability recommendations

Article
01/27/2025

Azure Advisor helps you ensure and improve the continuity of your business-critical applications. You can get reliability recommendations on the Reliability tab on the Advisor dashboard.

Sign in to the Azure portal.
Search for and select Advisor from any page.
On the Advisor dashboard, select the Reliability tab.

AgFood Platform

Upgrade to the latest ADMA DotNet SDK version

We identified calls to an ADMA DotNet SDK version that is scheduled for deprecation. To ensure uninterrupted access to ADMA, latest features, and performance improvements, switch to the latest SDK version.

Potential benefits: Ensure uninterrupted access to ADMA

Impact: Medium

For more information, see Azure Data Manager for Agriculture REST APIs Reference

ResourceType: microsoft.agfoodplatform/farmbeats
Recommendation ID: 77f976ab-59e3-474d-ba04-32a7d41c9cb1
Subcategory: ServiceUpgradeAndRetirement

Upgrade to the latest FarmBeats API version

We identified calls to an ADMA API version that is scheduled for deprecation. To ensure uninterrupted access to ADMA, latest features, and performance improvements, switch to the latest ADMA API version.

Potential benefits: Ensure uninterrupted access to FarmBeats

Impact: Medium

For more information, see Azure Data Manager for Agriculture REST APIs Reference

ResourceType: microsoft.agfoodplatform/farmbeats
Recommendation ID: 1233e513-ac1c-402d-be94-7133dc37cac6
Subcategory: ServiceUpgradeAndRetirement

Upgrade to the latest ADMA Python SDK version

We identified calls to an ADMA Python SDK version that is scheduled for deprecation. To ensure uninterrupted access to ADMA, latest features, and performance improvements, switch to the latest SDK version.

Potential benefits: Ensure uninterrupted access to ADMA

Impact: Medium

For more information, see Azure Data Manager for Agriculture REST APIs Reference

ResourceType: microsoft.agfoodplatform/farmbeats
Recommendation ID: c4ec2fa1-19f4-491f-9311-ca023ee32c38
Subcategory: ServiceUpgradeAndRetirement

Upgrade to the latest ADMA JavaScript SDK version

We identified calls to an ADMA JavaScript SDK version that is scheduled for deprecation. To ensure uninterrupted access to ADMA, latest features, and performance improvements, switch to the latest SDK version.

Potential benefits: Ensure uninterrupted access to ADMA

Impact: Medium

For more information, see Azure Data Manager for Agriculture REST APIs Reference

ResourceType: microsoft.agfoodplatform/farmbeats
Recommendation ID: 9e49a43a-dbe2-477d-9d34-a4f209617fdb
Subcategory: ServiceUpgradeAndRetirement

API Management

Migrate API Management service to stv2 platform

Support for API Management instances hosted on the stv1 platform will be retired by 31 August 2024. Migrate to stv2 based platform before that to avoid service disruption.

Potential benefits: Improve service stability and leverage new platform features

Impact: High

For more information, see Azure API Management - global Azure - stv1 platform retirement (August 2024)

ResourceType: microsoft.apimanagement/service
Recommendation ID: 3dd24a8c-af06-49c3-9a04-fb5721d7a9bb
Subcategory: ServiceUpgradeAndRetirement

Hostname certificate rotation failed

The API Management service failing to refresh the hostname certificate from the Key Vault can lead to the service using a stale certificate and runtime API traffic being blocked. Ensure that the certificate exists in the Key Vault, and the API Management service identity is granted secret read access.

Potential benefits: Ensure service availability

Impact: High

For more information, see Configure custom domain name for Azure API Management instance - Azure API Management

ResourceType: microsoft.apimanagement/service
Recommendation ID: 8962964c-a6d6-4c3d-918a-2777f7fbdca7
Subcategory: Other

The legacy portal was deprecated 3 years ago and retired in October 2023. However, we are seeing active usage of the portal which may cause service disruption soon when we disable it.

We highly recommend that you migrate to the new developer portal as soon as possible to continue enjoying our services and take advantage of the new features and improvements.

Potential benefits: Ensure business continuity

Impact: High

For more information, see Migrate to the new developer portal from the legacy developer portal - Azure API Management

ResourceType: microsoft.apimanagement/service
Recommendation ID: 6124b23c-0d97-4098-9009-79e8c56cbf8c
Subcategory: undefined

Dependency network status check failed

Azure API Management service dependency not available. Please, check virtual network configuration.

Potential benefits: Improve service stability

Impact: High

For more information, see Deploy Azure API Management instance to external VNet

ResourceType: microsoft.apimanagement/service
Recommendation ID: 53fd1359-ace2-4712-911c-1fc420dd23e8
Subcategory: Other

SSL/TLS renegotiation blocked

SSL/TLS renegotiation attempt blocked; secure communication might fail. To support client certificate authentication scenarios, enable 'Negotiate client certificate' on listed hostnames. For browser-based clients, this option might result in a certificate prompt being presented to the client.

Potential benefits: Ensure service availability

Impact: Medium

For more information, see Secure APIs using client certificate authentication in API Management - Azure API Management

ResourceType: microsoft.apimanagement/service
Recommendation ID: b7316772-5c8f-421f-bed0-d86b0f128e25
Subcategory: Other

Deploy an Azure API Management instance to multiple Azure regions for increased service availability

Azure API Management supports multi-region deployment, which enables API publishers to add regional API gateways to an existing API Management instance. Multi-region deployment helps reduce request latency perceived by geographically distributed API consumers and improves service availability.

Potential benefits: Increased resilience against regional failures

Impact: High

For more information, see Deploy Azure API Management instance to multiple Azure regions - Azure API Management

ResourceType: microsoft.apimanagement/service
Recommendation ID: 2e4d65a3-1e77-4759-bcaa-13009484a97e
Subcategory: HighAvailability

Enable and configure autoscale for API Management instance on production workloads.

API Management instance in production service tiers can be scaled by adding and removing units. The autoscaling feature can dynamically adjust the units of an API Management instance to accommodate a change in load without manual intervention.

Potential benefits: Increase scalability and optimize cost.

Impact: High

For more information, see Configure autoscale of an Azure API Management instance

ResourceType: microsoft.apimanagement/service
Recommendation ID: f4c48f42-74f2-41bf-bf99-14e2f9ea9ac9
Subcategory: Scalability

App Service Certificates

Domain verification required to issue your App Service Certificate

You have an App Service Certificate that's currently in a Pending Issuance status and requires domain verification. Failure to validate domain ownership will result in an unsuccessful certificate issuance. Domain verification isn't automated for App Service Certificates and will require action. If you've recently verified domain ownership and have been issued a certificate, you may disregard this message.

Potential benefits: Ensure successful issuance of App Service Certificate.

Impact: High

For more information, see Add and manage TLS/SSL certificates - Azure App Service

ResourceType: microsoft.certificateregistration/certificateorders
Recommendation ID: a2385343-200c-4eba-bbe2-9252d3f1d6ea
Subcategory: Other

App Service

Verify contact information for App Service Domain

Verify the accuracy of the contact information for your App Service Domain immediately to avoid domain suspension.

Potential benefits: Prevent domain suspension.

Impact: High

For more information, see Buy a custom domain - Azure App Service

ResourceType: microsoft.domainregistration/domains
Recommendation ID: b9b84818-1e7c-45af-8918-a0d280911ca6
Subcategory: Other

Scale out your App Service plan

Consider scaling out your App Service Plan to at least two instances to avoid cold start delays and service interruptions during routine maintenance.

Potential benefits: Optimize user experience and availability

Impact: Medium

For more information, see The Ultimate Guide to Running Healthy Apps in the Cloud - Azure App Service

ResourceType: microsoft.web/serverfarms
Recommendation ID: 45cfc38d-3ffd-4088-bb15-e4d0e1e160fe
Subcategory: Scalability

Scale out your App Service plan to avoid CPU exhaustion

High CPU utilization can lead to runtime issues with applications. Your application exceeded 90% CPU over the last couple of days. To reduce CPU usage and avoid runtime issues, scale out the application.

Potential benefits: Keep your app healthy

Impact: High

ResourceType: microsoft.web/sites
Recommendation ID: 1294987d-c97d-41d0-8fd8-cb6eab52d87b
Subcategory: Scalability

Check your app's service health issues

We have a recommendation related to your app's service health. Open the Azure Portal, go to the app, click the Diagnose and Solve to see more details.

Potential benefits: Keep your app healthy

Impact: High

For more information, see Best practices for Azure App Service - Azure App Service

ResourceType: microsoft.web/sites
Recommendation ID: a85f5f1c-c01f-4926-84ec-700b7624af8c
Subcategory: Other

Fix the backup database settings of your App Service resource

When an application has an invalid database configuration, its backups fail. For details, see your application's backup history on your app management page.

Potential benefits: Ensure business continuity

Impact: High

ResourceType: microsoft.web/sites
Recommendation ID: b30897cc-2c2e-4677-a2a1-107ae982ff49
Subcategory: DisasterRecovery

Fix the backup storage settings of your App Service resource

When an application has invalid storage settings, its backups fail. For details, see your application's backup history on your app management page.

Potential benefits: Ensure business continuity

Impact: High

ResourceType: microsoft.web/sites
Recommendation ID: 80efd6cb-dcee-491b-83a4-7956e9e058d5
Subcategory: DisasterRecovery

Scale up your App Service plan SKU to avoid memory problems

The App Service Plan containing your application exceeded 85% memory allocation. High memory consumption can lead to runtime issues your applications. Find the problem application and scale it up to a higher plan with more memory resources.

Potential benefits: Keep your app healthy

Impact: High

ResourceType: microsoft.web/sites
Recommendation ID: 66d3137a-c4da-4c8a-b6b8-e03f5dfba66e
Subcategory: Scalability

Fix application code, a worker process crashed due to an unhandled exception

A worker process in your application crashed due to an unhandled exception. To identify the root cause, collect memory dumps and call stack information at the time of the crash.

Potential benefits: Keep your app healthy and highly available

Impact: High

For more information, see Crash Monitoring in Azure App Service - Azure App Service

ResourceType: microsoft.web/sites
Recommendation ID: 3e35f804-52cb-4ebf-84d5-d15b3ab85dfc
Subcategory: Other

Upgrade your App Service to a Standard plan to avoid request rejects

When an application is part of a shared App Service plan and meets its quota multiple times, incoming requests might be rejected. Your web application can’t accept incoming requests after meeting a quota. To remove the quota, upgrade to a Standard plan.

Potential benefits: Keep your app healthy

Impact: High

ResourceType: microsoft.web/sites
Recommendation ID: 78c5ab69-858a-43ca-a5ac-4ca6f9cdc30d
Subcategory: Scalability

Move your App Service resource to Standard or higher and use deployment slots

When an application is deployed multiple times in a week, problems might occur. You deployed your application multiple times last week. To help you reduce deployment impact to your production web application, move your App Service resource to the Standard (or higher) plan, and use deployment slots.

Potential benefits: Keep your app healthy while updating

Impact: High

ResourceType: microsoft.web/sites
Recommendation ID: 59a83512-d885-4f09-8e4f-c796c71c686e
Subcategory: Other

Use deployment slots for your App Service resource

When an application is deployed multiple times in a week, problems might occur. You deployed your application multiple times over the last week. To help you manage changes and help reduce deployment impact to your production web application, use deployment slots.

Potential benefits: Keep your app healthy while updating

Impact: High

ResourceType: microsoft.web/sites
Recommendation ID: 0dc165fd-69bf-468a-aa04-a69377b6feb0
Subcategory: Other

Consider changing your application architecture to 64-bit

Your App Service is configured as 32-bit, and its memory consumption is approaching the limit of 2 GB. If your application supports, consider recompiling your application and changing the App Service configuration to 64-bit instead.

Potential benefits: Improve your application reliability

Impact: Medium

For more information, see Application performance FAQs - Azure

ResourceType: microsoft.web/sites
Recommendation ID: 8be322ab-e38b-4391-a5f3-421f2270d825
Subcategory: Scalability

Consider upgrading the hosting plan of the Static Web App(s) in this subscription to Standard SKU.

The combined bandwidth used by all the Free SKU Static Web Apps in this subscription is exceeding the monthly limit of 100GB. Consider upgrading these applications to Standard SKU to avoid throttling.

Potential benefits: Higher availability for the apps by avoiding throttling.

Impact: High

For more information, see Pricing – Static Web Apps

ResourceType: microsoft.web/staticsites
Recommendation ID: dc3edeee-f0ab-44ae-b612-605a0a739612
Subcategory: Scalability

Application Gateway for Containers

Migrate to supported version of AGC

The version of Application Gateway for Containers was provisioned with a preview version and isn't supported for production. Ensure you provision a new gateway using the latest API version.

Potential benefits: Ensure supportability and resiliency for production workloads

Impact: High

For more information, see What is Application Gateway for Containers?

ResourceType: microsoft.servicenetworking/trafficcontrollers
Recommendation ID: db83b3d4-96e5-4cfe-b736-b3280cadd163
Subcategory: ServiceUpgradeAndRetirement

Application Gateway

Upgrade your SKU or add more instances

Deploying two or more medium or large sized instances ensures business continuity (fault tolerance) during outages caused by planned or unplanned maintenance.

Potential benefits: Ensure business continuity through application gateway resilience

Impact: Medium

For more information, see Multi-region load balancing - Azure Reference Architectures

ResourceType: microsoft.network/applicationgateways
Recommendation ID: 6a2b1e70-bd4c-4163-86de-5243d7ac05ee
Subcategory: BusinessContinuity

Avoid hostname override to ensure site integrity

Avoid overriding the hostname when configuring Application Gateway. Having a domain on the frontend of Application Gateway different than the one used to access the backend, can lead to broken cookies or redirect URLs. Make sure the backend is able to deal with the domain difference, or update the Application Gateway configuration so the hostname doesn't need to be overwritten towards the backend. When used with App Service, attach a custom domain name to the Web App and avoid use of the *.azurewebsites.net host name towards the backend. Note that a different frontend domain isn't a problem in all situations, and certain categories of backends like REST APIs, are less sensitive in general.

Potential benefits: Ensure site integrity and avoid broken cookies or redirect urls through a resilient Application Gateway configuration.

Impact: Medium

For more information, see Troubleshoot redirection to App Service URL - Azure Application Gateway

ResourceType: microsoft.network/applicationgateways
Recommendation ID: 52a9d0a7-efe1-4512-9716-394abd4e0ab1
Subcategory: Other

Change subnet of V1 gateway as the current subnet contains a NAT gateway

Your Application Gateway may be deleted after October 2024 due to a failed internal upgrade. This is because it lacks a dedicated subnet and contains a NAT Gateway. To resolve, either change the subnet, remove the NAT Gateway, or migrate to V2. Allow a day for the message to disappear once fixed

Potential benefits: Avoid disruption in management of Application Gateway V1 resource

Impact: High

For more information, see Frequently asked questions about Application Gateway

ResourceType: microsoft.network/applicationgateways
Recommendation ID: 511a9f7b-7b5e-4713-b18d-0b7464a84d1f
Subcategory: BusinessContinuity

Deploy your Application Gateway across Availability Zones

Achieve zone redundancy by deploying Application Gateway across Availability Zones. Zone redundancy boosts resilience by enabling Application Gateway to survive various outages. Zone redundancy ensures continuity even if one zone is affected and enhances overall reliability.

Potential benefits: Resiliency of Application Gateways is considerably increased when using Availability Zones.

Impact: High

For more information, see Scaling and Zone-redundant Application Gateway v2

ResourceType: microsoft.network/applicationgateways
Recommendation ID: 5c488377-be3e-4365-92e8-09d1e8d9038c
Subcategory: HighAvailability

Update VNet permission of Application Gateway users

To improve security and provide a more consistent experience across Azure, all users must pass a permission check to create or update an Application Gateway in a Virtual Network. The users or service principals minimum permission required is Microsoft.Network/virtualNetworks/subnets/join/action.

Potential benefits: Avoid disruptions in management of Application Gateway resource

Impact: High

For more information, see Azure Application Gateway infrastructure configuration

ResourceType: microsoft.network/applicationgateways
Recommendation ID: 6cc8be07-8c03-4bd7-ad9b-c2985b261e01
Subcategory: Other

Ensure autoscaling is used for increased performance and resiliency

When configuring the Application Gateway, it's recommended to provision autoscaling to scale in and out in response to changes in demand. This helps to minimize the effects of a single failing component.

Potential benefits: Increase performance and resiliency.

Impact: Medium

For more information, see Scaling and Zone-redundant Application Gateway v2

ResourceType: microsoft.network/applicationgateways
Recommendation ID: c9c9750b-9ddb-436f-b19a-9c725539a0b5
Subcategory: Scalability

Change subnet of V1 gateway named GatewaySubnet as it's reserved for VPN/Express Route

Your Application Gateway is at risk of deletion after October 2024 due to a failed internal upgrade. This is due to subnet named Gatewaysubnet, which is reserved for VPN/ExpressRoute. To resolve, please change the subnet or migrate to V2. Allow a day for the message to disappear once fixed

Potential benefits: Avoid disruption in management of Application Gateway V1 resource

Impact: High

For more information, see Frequently asked questions about Application Gateway

ResourceType: microsoft.network/applicationgateways
Recommendation ID: df989782-82d1-420d-b354-71956bd9379c
Subcategory: BusinessContinuity

Reactivate the Subscription to unblock internal upgrade for V1 gateway

Your Application Gateway is at risk of deletion after October 2024 due to a failed internal upgrade. This is because the subscription is set to a state other than Active. To fix this, please activate the subscription. Allow a day for this message to disappear once the issue is fixed.

Potential benefits: Avoid disruption in management of Application Gateway V1 resource

Impact: High

For more information, see Reactivate a disabled Azure subscription - Microsoft Cost Management

ResourceType: microsoft.network/applicationgateways
Recommendation ID: fa44bc92-1747-4cef-9f78-7861be4c0db9
Subcategory: BusinessContinuity

Implement ExpressRoute Monitor on Network Performance Monitor

When ExpressRoute circuit isn't monitored by ExpressRoute Monitor on Network Performance, you miss notifications of loss, latency, and performance of on-premises to Azure resources, and Azure to on-premises resources. For end-to-end monitoring, implement ExpressRoute Monitor on Network Performance.

Potential benefits: Improve time-to-detect and time-to-mitigate issues in your network and provide insights on your network path via ExpressRoute

Impact: Medium

For more information, see Azure ExpressRoute: Configure NPM for circuits

ResourceType: microsoft.network/expressroutecircuits
Recommendation ID: 17454550-1543-4068-bdaf-f3ed7cdd3d86
Subcategory: MonitoringAndAlerting

Use managed TLS certificates

When Front Door manages your TLS certificates, it reduces your operational costs, and helps you to avoid costly outages caused by forgetting to renew a certificate. Front Door automatically issues and rotates the managed TLS certificates.

Potential benefits: Ensure service availability by having Front Door manage and rotate your certificates

Impact: Medium

For more information, see Azure Front Door - Best practices

ResourceType: microsoft.network/frontdoors
Recommendation ID: 5185d64e-46fd-4ed2-8633-6d81f5e3ca59
Subcategory: Other

Consider having at least two origins

Multiple origins support redundancy by distributing traffic across multiple instances of the application. If one instance is unavailable, then other backend origins can still receive traffic.

Potential benefits: Increase your workload resiliency

Impact: High

For more information, see Azure Well-Architected Framework perspective on Azure Front Door - Microsoft Azure Well-Architected Framework

ResourceType: microsoft.network/frontdoors
Recommendation ID: 589ab0b0-1362-44fd-8551-0e7847767600
Subcategory: HighAvailability

Use the same domain name on Front Door and your origin

When you rewrite the Host header, request cookies and URL redirections might break. When you use platforms like Azure App Service, features like session affinity and authentication and authorization might not work correctly. Make sure to validate whether your application is going to work correctly.

Potential benefits: Ensure application integrity by preserving original host name

Impact: Medium

For more information, see Azure Front Door - Best practices

ResourceType: microsoft.network/frontdoors
Recommendation ID: 79f543f9-60e6-4ef6-ae42-2095f6149cba
Subcategory: Other

Avoid placing Traffic Manager behind Front Door

Using Traffic Manager as one of the origins for Front Door isn't recommended, as this can lead to routing issues. If you need both services in a high availability architecture, always place Traffic Manager in front of Azure Front Door.

Potential benefits: Increase your workload resiliency

Impact: Medium

For more information, see Azure Front Door - Best practices

ResourceType: microsoft.network/frontdoors
Recommendation ID: 825ff735-ed9a-4335-b132-321df86b0e81
Subcategory: Other

Resolve issues for private endpoint not in succeeded state

Private Endpoint not in a succeeded state potentially influences application availability and reliability. Healthy state of connectivity over private endpoints is crucial to reliably and securely access resources. Troubleshoot and resolve issues that cause a failed state.

Potential benefits: Resume private connectivity and availability of application

Impact: Medium

For more information, see Troubleshoot Azure Private Link Service connectivity problems

ResourceType: microsoft.network/privateendpoints
Recommendation ID: 5db013ba-e657-4b80-93f7-8c5b5f9e780a
Subcategory: BusinessContinuity

Add at least one more endpoint to the profile, preferably in another Azure region

Profiles need more than one endpoint to ensure availability if one of the endpoints fails. We also recommend that endpoints be in different regions.

Potential benefits: Improve resiliency by allowing failover

Impact: Medium

For more information, see Traffic Manager Endpoint Types

ResourceType: microsoft.network/trafficmanagerprofiles
Recommendation ID: 6cd70072-c45c-4716-bf7b-b35c18e46e72
Subcategory: BusinessContinuity

Add an endpoint configured to "All (World)"

For geographic routing, traffic is routed to endpoints in defined regions. When a region fails, there is no pre-defined failover. Having an endpoint where the Regional Grouping is configured to "All (World)" for geographic profiles avoids traffic black holing and guarantees service availability.

Potential benefits: Improve resiliency by avoiding traffic black holes

Impact: High

For more information, see Manage endpoints in Azure Traffic Manager

ResourceType: microsoft.network/trafficmanagerprofiles
Recommendation ID: 0bbe0a49-3c63-49d3-ab4a-aa24198f03f7
Subcategory: BusinessContinuity

Add or move one endpoint to another Azure region

All endpoints associated to this proximity profile are in the same region. Users from other regions may experience long latency when attempting to connect. Adding or moving an endpoint to another region will improve overall performance for proximity routing and provide better availability if all endpoints in one region fail.

Potential benefits: Improve resiliency by allowing failover to another region

Impact: Medium

For more information, see Configure performance traffic routing method using Azure Traffic Manager

ResourceType: microsoft.network/trafficmanagerprofiles
Recommendation ID: 0db76759-6d22-4262-93f0-2f989ba2b58e
Subcategory: BusinessContinuity

ExpressRoute IP routes nearing specified limit

Your ExpressRoute circuit is close to reaching its IP route limits. Exceeding these limits will disrupt the connectivity. Connectivity will restore once routes are within limits Suggestions: Regularly monitor route counts. Explore Virtual WAN RouteMap to reduce advertised IP routes.

Potential benefits: Monitoring IP route counts prevents connectivity issues and ensures stability.

Impact: High

For more information, see Azure Virtual WAN FAQ

ResourceType: microsoft.network/virtualhubs
Recommendation ID: e3489565-d891-406e-91d1-44f476563850
Subcategory: HighAvailability

Implement multiple ExpressRoute circuits in your Virtual Network for cross premises resiliency

When an ExpressRoute gateway only has one ExpressRoute circuit associated to it, resiliency issues might occur. To ensure peering location redundancy and resiliency, connect one or more additional circuits to your gateway.

Potential benefits: Improve resiliency in case of ExpressRoute peering location failure

Impact: Medium

For more information, see Azure ExpressRoute: Designing for high availability

ResourceType: microsoft.network/virtualnetworkgateways
Recommendation ID: 70f87e66-9b2d-4bfa-ae38-1d7d74837689
Subcategory: BusinessContinuity

Move to production gateway SKUs from Basic gateways

The Basic VPN SKU is for development or testing scenarios. If you're using the VPN gateway for production, move to a production SKU, which offers higher numbers of tunnels, Border Gateway Protocol (BGP), active-active configuration, custom IPsec/IKE policy, and increased stability and availability.

Potential benefits: Additional available features and higher stability and availability

Impact: Medium

For more information, see Azure VPN Gateway configuration settings

ResourceType: microsoft.network/virtualnetworkgateways
Recommendation ID: e070c4bf-afaf-413e-bc00-e476b89c5f3d
Subcategory: HighAvailability

Enable Active-Active gateways for redundancy

In active-active configuration, both instances of the VPN gateway establish site-to-site (S2S) VPN tunnels to your on-premise VPN device. When a planned maintenance or unplanned event happens to one gateway instance, traffic is automatically switched over to the other active IPsec tunnel.

Potential benefits: Ensure business continuity through connection resilience

Impact: Medium

For more information, see Design highly available gateway connectivity - Azure VPN Gateway

ResourceType: microsoft.network/virtualnetworkgateways
Recommendation ID: c249dc0e-9a17-423e-838a-d72719e8c5dd
Subcategory: BusinessContinuity

Implement Site Resiliency for ExpressRoute

To ensure maximum resiliency, Microsoft recommends that you connect to two ExpressRoute circuits in two peering locations. The goal of Maximum Resiliency is to enhance availability and ensure the highest level of resilience for critical workloads.

Potential benefits: Maximum Resiliency in ExpressRoute is designed to ensure there isn’t a single point of failure within the Microsoft network path. This is achieved by offering dual (2) circuits across two different locations for site diversity in ExpressRoute. The goal of Maximum Resiliency is to enhance availability and ensure the highest level of resilience for critical workloads.

Impact: High

For more information, see Design and architect Azure ExpressRoute for resiliency

ResourceType: microsoft.network/virtualnetworkgateways
Recommendation ID: 8d61a7d4-5405-4f43-81e3-8c6239b844a6
Subcategory: null

Implement Zone Redundant ExpressRoute Gateways

Implement zone-redundant Virtual Network Gateway in Azure Availability Zones. This brings resiliency, scalability, and higher availability to your Virtual Network Gateways.

Potential benefits: Provides zonal resiliency and redundancy for ExpressRoute

Impact: High

For more information, see Create a zone-redundant virtual network gateway in Azure availability zones - Azure VPN Gateway

ResourceType: microsoft.network/virtualnetworkgateways
Recommendation ID: c9af1ef6-55bc-48af-bfe4-2c80490159f8
Subcategory: null

Use NAT gateway for outbound connectivity

Prevent connectivity failures due to source network address translation (SNAT) port exhaustion by using NAT gateway for outbound traffic from your virtual networks. NAT gateway scales dynamically and provides secure connections for traffic headed to the internet.

Potential benefits: Prevent outbound connection failures with NAT gateway

Impact: Medium

For more information, see Source Network Address Translation (SNAT) for outbound connections - Azure Load Balancer

ResourceType: microsoft.network/virtualnetworks
Recommendation ID: 56f0c458-521d-4b8b-a704-c0a099483d19
Subcategory: HighAvailability

Azure AI Search

Create a Standard search service (2GB)

When you exceed your storage quota, indexing operations stop working. You're close to exceeding your storage quota of 2GB. If you need more storage, create a Standard search service or add extra partitions.

Potential benefits: capability to handle more data

Impact: Medium

For more information, see Service limits for tiers and skus - Azure AI Search

ResourceType: microsoft.search/searchservices
Recommendation ID: 97b38421-f88c-4db0-b397-b2d81eff6630
Subcategory: Scalability

Create a Standard search service (50MB)

When you exceed your storage quota, indexing operations stop working. You're close to exceeding your storage quota of 50MB. To maintain operations, create a Basic or Standard search service.

Potential benefits: capability to handle more data

Impact: Medium

For more information, see Service limits for tiers and skus - Azure AI Search

ResourceType: microsoft.search/searchservices
Recommendation ID: 8d31f25f-31a9-4267-b817-20ee44f88069
Subcategory: Scalability

Avoid exceeding your available storage quota by adding more partitions

When you exceed your storage quota, you can still query, but indexing operations stop working. You're close to exceeding your available storage quota. If you need more storage, add extra partitions.

Potential benefits: Able to index additional data

Impact: Medium

For more information, see Service limits for tiers and skus - Azure AI Search

ResourceType: microsoft.search/searchservices
Recommendation ID: b3efb46f-6d30-4201-98de-6492c1f8f10d
Subcategory: Scalability

Azure Arc-enabled Kubernetes Configuration

Upgrade Microsoft Flux extension to the newest major version

The Microsoft Flux extension has a major version release. Plan for a manual upgrade to the latest major version for Microsoft Flux for all Azure Arc-enabled Kubernetes and Azure Kubernetes Service (AKS) clusters within 6 months for continued support and new functionality.

Potential benefits: Continued support and new functionality

Impact: Medium

For more information, see Available extensions for Azure Arc-enabled Kubernetes clusters - Azure Arc

ResourceType: microsoft.kubernetesconfiguration/extensions
Recommendation ID: 4bc7a00b-edbb-4963-8800-1b0f8897fecf
Subcategory: ServiceUpgradeAndRetirement

Upcoming Breaking Changes for Microsoft Flux Extension

The Microsoft Flux extension frequently receives updates for security and stability. The upcoming update, in line with the OSS Flux Project, will modify the HelmRelease and HelmChart APIs by removing deprecated fields. To avoid disruption to your workloads, necessary action is needed.

Potential benefits: Improved stability, security, and new functionality

Impact: High

For more information, see Available extensions for Azure Arc-enabled Kubernetes clusters - Azure Arc

ResourceType: microsoft.kubernetesconfiguration/extensions
Recommendation ID: 79cfad72-9b6d-4215-922d-7df77e1ea3bb
Subcategory: ServiceUpgradeAndRetirement

Upgrade Microsoft Flux extension to a supported version

Current version of Microsoft Flux on one or more Azure Arc enabled clusters and Azure Kubernetes clusters is out of support. To get security patches, bug fixes and Microsoft support, upgrade to a supported version.

Potential benefits: Get security patches, bug fixes and Microsoft support

Impact: Medium

For more information, see Available extensions for Azure Arc-enabled Kubernetes clusters - Azure Arc

ResourceType: microsoft.kubernetesconfiguration/extensions
Recommendation ID: c8e3b516-a0d5-4c64-8a7a-71cfd068d5e8
Subcategory: ServiceUpgradeAndRetirement

Azure Arc-enabled Kubernetes

Upgrade to the latest agent version of Azure Arc-enabled Kubernetes

For the best Azure Arc enabled Kubernetes experience, improved stability and new functionality, upgrade to the latest agent version.

Potential benefits: Arc-enabled K8s latest agent version

Impact: Medium

For more information, see Upgrade Azure Arc-enabled Kubernetes agents - Azure Arc

ResourceType: microsoft.kubernetes/connectedclusters
Recommendation ID: 6d55ea5b-6e80-4313-9b80-83d384667eaa
Subcategory: ServiceUpgradeAndRetirement

Azure Arc-enabled servers

Upgrade to the latest version of the Azure Connected Machine agent

The Azure Connected Machine agent is updated regularly with bug fixes, stability enhancements, and new functionality. For the best Azure Arc experience, upgrade your agent to the latest version.

Potential benefits: Improved stability and new functionality

Impact: Medium

For more information, see Managing the Azure Connected Machine agent - Azure Arc

ResourceType: microsoft.hybridcompute/machines
Recommendation ID: 9d5717d2-4708-4e3f-bdda-93b3e6f1715b
Subcategory: Other

Azure Cache for Redis

Increase fragmentation memory reservation

Fragmentation and memory pressure can cause availability incidents. To help in reduce cache failures when running under high memory pressure, increase reservation of memory for fragmentation through the maxfragmentationmemory-reserved setting available in the Advanced Settings options.

Potential benefits: Avoid availability incidents when your cache has high memory fragmentation

Impact: Medium

For more information, see How to configure Azure Cache for Redis - Azure Cache for Redis

ResourceType: microsoft.cache/redis
Recommendation ID: 7c380315-6ad9-4fb2-8930-a8aeb1d6241b
Subcategory: Other

Configure geo-replication for Cache for Redis instances to increase durability of applications

Geo-Replication enables disaster recovery for cached data, even in the unlikely event of a widespread regional failure. This can be essential for mission-critical applications. We recommend that you configure passive geo-replication for Premium Azure Cache for Redis instances.

Potential benefits: Geo-Replication enables disaster recovery for cached data.

Impact: High

For more information, see Configure passive geo-replication for Premium Azure Cache for Redis instances - Azure Cache for Redis

ResourceType: microsoft.cache/redis
Recommendation ID: c9e4a27c-79e6-4e4c-904f-b6612b6cd892
Subcategory: DisasterRecovery

Azure Container Apps

Renew custom domain certificate

The custom domain certificate you uploaded is near expiration. To prevent possible service downtime, renew your certificate and upload the new certificate for your container apps.

Potential benefits: Your service wont fail because of expired certificate.

Impact: Medium

For more information, see Custom domain names and certificates in Azure Container Apps

ResourceType: microsoft.app/containerapps
Recommendation ID: b9ce2d2e-554b-4391-8ebc-91c570602b04
Subcategory: Other

An issue has been detected that is preventing the renewal of your Managed Certificate.

We detected the managed certificate used by the Container App has failed to auto renew. Follow the documentation link to make sure that the DNS settings of your custom domain are correct.

Potential benefits: Avoid downtime due to an expired certificate.

Impact: High

For more information, see Custom domain names and free managed certificates in Azure Container Apps

ResourceType: microsoft.app/containerapps
Recommendation ID: fa6c0880-da2e-42fd-9cb3-e1267ec5b5c2
Subcategory: Other

Increase the minimal replica count for your containerized application

The minimal replica count set for your Azure Container App containerized application might be too low, which can cause resilience, scalability, and load balancing issues. For better availability, consider increasing the minimal replica count.

Potential benefits: Better availability for your container app.

Impact: Medium

For more information, see Scaling in Azure Container Apps

ResourceType: microsoft.app/containerapps
Recommendation ID: 9be5f344-6fa5-4abc-a1f2-61ae6192a075
Subcategory: HighAvailability

Re-create your your Container Apps environment to avoid DNS issues

There's a potential networking issue with your Container Apps environments that might cause DNS issues. We recommend that you create a new Container Apps environment, re-create your Container Apps in the new environment, and delete the old Container Apps environment.

Potential benefits: Avoid DNS failures in your Container Apps Environment.

Impact: High

For more information, see Quickstart: Deploy your first container app using the Azure portal

ResourceType: microsoft.app/managedenvironments
Recommendation ID: c692e862-953b-49fe-9c51-e5d2792c1cc1
Subcategory: Other

Azure Cosmos DB

Configure Azure Cosmos DB containers with a partition key

When Azure Cosmos DB nonpartitioned collections reach their provisioned storage quota, you lose the ability to add data. Your Cosmos DB nonpartitioned collections are approaching their provisioned storage quota. Migrate these collections to new collections with a partition key definition so they can automatically be scaled out by the service.

Potential benefits: Scale your containers seamlessly with increase in storage or request rates without running into any limits

Impact: High

For more information, see Partitioning and horizontal scaling - Azure Cosmos DB

ResourceType: microsoft.documentdb/databaseaccounts
Recommendation ID: 5e4e9f04-9201-4fd9-8af6-a9539d13d8ec
Subcategory: Scalability

Use static Cosmos DB client instances in your code and cache the names of databases and collections

A high number of metadata operations on an account can result in rate limiting. Metadata operations have a system-reserved request unit (RU) limit. Avoid rate limiting from metadata operations by using static Cosmos DB client instances in your code and caching the names of databases and collections.

Potential benefits: Optimize your RU usage and avoid rate limiting

Impact: Medium

For more information, see Azure Cosmos DB performance tips for .NET SDK v2

ResourceType: microsoft.documentdb/databaseaccounts
Recommendation ID: bdb595a4-e148-41f9-98e8-68ec92d1932e
Subcategory: Scalability

Check linked Azure Key Vault hosting your encryption key

When an Azure Cosmos DB account can't access its linked Azure Key Vault hosting the encyrption key, data access and security issues might happen. Your Azure Key Vault's configuration is preventing your Cosmos DB account from contacting the key vault to access your managed encryption keys. If you recently performed a key rotation, ensure that the previous key, or key version, remains enabled and available until Cosmos DB completes the rotation. The previous key or key version can be disabled after 24 hours, or after the Azure Key Vault audit logs don't show any activity from Azure Cosmos DB on that key or key version.

Potential benefits: Update your configurations to continue using customer-managed keys and access your data

Impact: Medium

For more information, see Configure customer-managed keys - Azure Cosmos DB

ResourceType: microsoft.documentdb/databaseaccounts
Recommendation ID: 44a0a07f-23a2-49df-b8dc-a1b14c7c6a9d
Subcategory: Other

Configure consistent indexing mode on Azure Cosmos DB containers

Azure Cosmos containers configured with the Lazy indexing mode update asynchronously, which improves write performance, but can impact query freshness. Your container is configured with the Lazy indexing mode. If query freshness is critical, use Consistent Indexing Mode for immediate index updates.

Potential benefits: Improve query result consistency and reliability

Impact: Medium

For more information, see Manage indexing policies in Azure Cosmos DB

ResourceType: microsoft.documentdb/databaseaccounts
Recommendation ID: 213974c8-ed9c-459f-9398-7cdaa3c28856
Subcategory: Other

Hotfix - Upgrade to 2.6.14 version of the Async Java SDK v2 or to Java SDK v4

There's a critical bug in version 2.6.13 (and lower) of the Azure Cosmos DB Async Java SDK v2 causing errors when a Global logical sequence number (LSN) greater than the Max Integer value is reached. The error happens transparently to you by the service after a large volume of transactions occur in the lifetime of an Azure Cosmos DB container. Note: While this is a critical hotfix for the Async Java SDK v2, we still highly recommend you migrate to the Java SDK v4.

Potential benefits: If action isn’t taken, all create, read, update, and delete operations may begin to fail with NumberFormatException

Impact: High

For more information, see Azure Cosmos DB: SQL Async Java API, SDK & resources

ResourceType: microsoft.documentdb/databaseaccounts
Recommendation ID: bc9e5110-a220-4ab9-8bc9-53f92d3eef70
Subcategory: ServiceUpgradeAndRetirement

Critical issue - Upgrade to the current recommended version of the Java SDK v4

There's a critical bug in version 4.15 and lower of the Azure Cosmos DB Java SDK v4 causing errors when a Global logical sequence number (LSN) greater than the Max Integer value is reached. This happens transparently to you by the service after a large volume of transactions occur in the lifetime of an Azure Cosmos DB container. Avoid this problem by upgrading to the current recommended version of the Java SDK v4

Potential benefits: If action isn’t taken, all create, read, update, and delete operations may begin to fail with NumberFormatException

Impact: High

For more information, see Azure Cosmos DB Java SDK v4 for API for NoSQL release notes and resources

ResourceType: microsoft.documentdb/databaseaccounts
Recommendation ID: 38942ae5-3154-4e0b-98d9-23aa061c334b
Subcategory: ServiceUpgradeAndRetirement

Use the new 3.6+ endpoint to connect to your upgraded Azure Cosmos DB's API for MongoDB account

Some of your applications are connecting to your upgraded Azure Cosmos DB's API for MongoDB account using the legacy 3.2 endpoint - [accountname].documents.azure.com. Use the new endpoint - [accountname].mongo.cosmos.azure.com (or its equivalent in sovereign, government, or restricted clouds).

Potential benefits: Take advantage of the latest features in version 3.6+ of Azure Cosmos DB's API for MongoDB

Impact: Medium

For more information, see 4.0 server version supported features and syntax in Azure Cosmos DB for MongoDB

ResourceType: microsoft.documentdb/databaseaccounts
Recommendation ID: 123039b5-0fda-4744-9a17-d6b5d5d122b2
Subcategory: ServiceUpgradeAndRetirement

Upgrade your Azure Cosmos DB API for MongoDB account to v4.2 to save on query/storage costs and utilize new features

Your Azure Cosmos DB API for MongoDB account is eligible to upgrade to version 4.2. Upgrading to v4.2 can reduce your storage costs by up to 55% and your query costs by up to 45% by leveraging a new storage format. Numerous additional features such as multi-document transactions are also included in v4.2.

Potential benefits: Improved reliability, query/storage efficiency, performance, and new feature capabilities

Impact: Medium

For more information, see Upgrade the Mongo version - Azure Cosmos DB for MongoDB

ResourceType: microsoft.documentdb/databaseaccounts
Recommendation ID: 0da795d9-26d2-4f02-a019-0ec383363c88
Subcategory: Other

Enable Server Side Retry (SSR) on your Azure Cosmos DB's API for MongoDB account

When an account is throwing a TooManyRequests error with the 16500 error code, enabling Server Side Retry (SSR) can help mitigate the issue.

Potential benefits: Prevent throttling and improve your query reliability and performance

Impact: High

ResourceType: microsoft.documentdb/databaseaccounts
Recommendation ID: ec6fe20c-08d6-43da-ac18-84ac83756a88
Subcategory: Other

Add a second region to your production workloads on Azure Cosmos DB

Production workloads on Azure Cosmos DB run in a single region might have availability issues, this appears to be the case with some of your Cosmos DB accounts. Increase their availability by configuring them to span at least two Azure regions. NOTE: Additional regions incur additional costs.

Potential benefits: Improve the availability of your production workloads

Impact: Medium

For more information, see High availability (Reliability) in Azure Cosmos DB for NoSQL

ResourceType: microsoft.documentdb/databaseaccounts
Recommendation ID: b57f7a29-dcc8-43de-86fa-18d3f9d3764d
Subcategory: BusinessContinuity

Upgrade old Azure Cosmos DB SDK to the latest version

An Azure Cosmos DB account using an old version of the SDK lacks the latest fixes and improvements. Your Azure Cosmos DB account is using an old version of the SDK. For the latest fixes, performance improvements, and new feature capabilities, upgrade to the latest version.

Potential benefits: Improved reliability, performance, and new feature capabilities

Impact: Medium

For more information, see Azure Cosmos DB

ResourceType: microsoft.documentdb/databaseaccounts
Recommendation ID: 51a4e6bd-5a95-4a41-8309-40f5640fdb8b
Subcategory: Other

Upgrade outdated Azure Cosmos DB SDK to the latest version

An Azure Cosmos DB account using an old version of the SDK lacks the latest fixes and improvements. Your Azure Cosmos DB account is using an outdated version of the SDK. We recommend upgrading to the latest version for the latest fixes, performance improvements, and new feature capabilities.

Potential benefits: Improved reliability, performance, and new feature capabilities

Impact: High

For more information, see Azure Cosmos DB

ResourceType: microsoft.documentdb/databaseaccounts
Recommendation ID: 60a55165-9ccd-4536-81f6-e8dc6246d3d2
Subcategory: ServiceUpgradeAndRetirement

Enable service managed failover for Cosmos DB account

Enable service managed failover for Cosmos DB account to ensure high availability of the account. Service managed failover automatically switches the write region to the secondary region in case of a primary region outage. This ensures that the application continues to function without any downtime.

Potential benefits: Azure's Service-Managed Failover feature enhances system availability by automating failover processes, reducing downtime, and improving resilience.

Impact: Medium

For more information, see High availability (Reliability) in Azure Cosmos DB for NoSQL

ResourceType: microsoft.documentdb/databaseaccounts
Recommendation ID: 5de9f2e6-087e-40da-863a-34b7943beed4
Subcategory: Other

Enable HA for your Production workload

Many clusters with consistent workloads do not have high availability (HA) enabled. It's recommended to activate HA from the Scale page in the Azure Portal to prevent database downtime in case of unexpected node failures and to qualify for SLA guarantees.

Potential benefits: Activate HA to avoid database downtime in case of an unexpected node failure

Impact: High

For more information, see Scale or configure a cluster - Azure Cosmos DB for MongoDB vCore

ResourceType: microsoft.documentdb/databaseaccounts
Recommendation ID: 64fbcac1-f652-4b6f-8170-2f97ffeb5631
Subcategory: HighAvailability

Enable zone redundancy for multi-region Cosmos DB accounts

This recommendation suggests enabling zone redundancy for multi-region Cosmos DB accounts to improve high availability and reduce the risk of data loss in case of a regional outage.

Potential benefits: Improved high availability and reduced risk of data loss

Impact: High

For more information, see High availability (Reliability) in Azure Cosmos DB for NoSQL

ResourceType: microsoft.documentdb/databaseaccounts
Recommendation ID: 8034b205-167a-4fd5-a133-0c8cb166103c
Subcategory: HighAvailability

Add at least one data center in another Azure region

Your Azure Managed Instance for Apache Cassandra cluster is designated as a production cluster but is currently deployed in a single Azure region. For production clusters, we recommend adding at least one more data center in another Azure region to guard against disaster recovery scenarios.

Potential benefits: Ensure applications have another region in case of disaster recovery

Impact: Medium

For more information, see Building resilient applications - Azure Managed Instance for Apache Cassandra

ResourceType: microsoft.documentdb/databaseaccounts
Recommendation ID: 92056ca3-8fab-43d1-bebf-f9c377ef20e9
Subcategory: DisasterRecovery

Avoid being rate limited for Control Plane operation

We found high number of Control Plane operations on your account through resource provider. Request that exceeds the documented limits at sustained levels over consecutive 5-minute periods may experience request being throttling as well failed or incomplete operation on Azure Cosmos DB resources.

Potential benefits: Optimize control plane operation and avoid operation failure due to rate limiting

Impact: Medium

For more information, see Service quotas and default limits - Azure Cosmos DB

ResourceType: microsoft.documentdb/databaseaccounts
Recommendation ID: a030f8ab-4dd4-4751-822b-f231a0df5f5a
Subcategory: Scalability

Azure Data Explorer

Resolve virtual network issues

Service failed to install or resume due to virtual network (VNet) issues. To resolve this issue, follow the steps in the troubleshooting guide.

Potential benefits: Improve reliability, availability, performance, and new feature capabilities

Impact: High

For more information, see Troubleshoot access, ingestion, and operation of your Azure Data Explorer cluster in your virtual network - Azure Data Explorer

ResourceType: microsoft.kusto/clusters
Recommendation ID: fa2649e9-e1a5-4d07-9b26-51c080d9a9ba
Subcategory: Other

Add subnet delegation for 'Microsoft.Kusto/clusters'

If a subnet isn’t delegated, the associated Azure service won’t be able to operate within it. Your subnet doesn’t have the required delegation. Delegate your subnet for 'Microsoft.Kusto/clusters'.

Potential benefits: Improve reliability, availability, performance, and new feature capabilities

Impact: High

For more information, see What is subnet delegation in Azure virtual network?

ResourceType: microsoft.kusto/clusters
Recommendation ID: f2bcadd1-713b-4acc-9810-4170a5d01dea
Subcategory: Other

Azure Database for MySQL

High Availability - Add primary key to the table that currently doesn't have one.

Our internal monitoring system has identified significant replication lag on the High Availability standby server. This lag is primarily caused by the standby server replaying relay logs on a table that lacks a primary key. To address this issue and adhere to best practices, it's recommended to add primary keys to all tables. Once this is done, proceed to disable and then re-enable High Availability to mitigate the problem.

Potential benefits: By implementing this approach, the standby server will be shielded from the adverse effects of high replication lag caused by the absence of a primary key on any table. This approach can contribute to reduced failover times, ultimately supporting the goal of maintaining business continuity.

Impact: High

For more information, see Troubleshoot Replication Latency - Azure Database for MySQL - Flexible Server

ResourceType: microsoft.dbformysql/flexibleservers
Recommendation ID: cf388b0c-2847-4ba9-8b07-54c6b23f60fb
Subcategory: Other

Replication - Add a primary key to the table that currently doesn't have one

Our internal monitoring observed significant replication lag on your replica server because the replica server is replaying relay logs on a table that lacks a primary key. To ensure that the replica server can effectively synchronize with the primary and keep up with changes, add primary keys to the tables in the primary server and then recreate the replica server.

Potential benefits: By implementing this approach, the replica server will achieve a state of close synchronization with the primary server.

Impact: High

For more information, see Troubleshoot Replication Latency - Azure Database for MySQL - Flexible Server

ResourceType: microsoft.dbformysql/flexibleservers
Recommendation ID: fb41cc05-7ac3-4b0e-a773-a39b5c1ca9e4
Subcategory: Other

Scale replica server's SKU to match the source server SKU

The replica server is experiencing replication lag. This is due to the replica server's SKU being smaller than the source server SKU. To ensure smooth replication, we recommend scaling up the SKU of your replica server.

Potential benefits: Keeps replication lag in check.

Impact: High

For more information, see Service Tiers - Azure Database for MySQL - Flexible Server

ResourceType: microsoft.dbformysql/flexibleservers
Recommendation ID: 91fd3a33-3b2f-48bb-81db-a2a54cfa2d76
Subcategory: Scalability

Upgrade to Transport Layer Security (TLS) 1.2

Upgrade to Transport Layer Security (TLS) 1.2 from TLS 1.0 or TLS 1.1 for the application. TLS 1.0 and TLS 1.1 were deprecated in March 2021.

Potential benefits: Improved security. Compliance with newest standards.

Impact: High

For more information, see Networking Overview - Azure Database for MySQL - Flexible Server

ResourceType: microsoft.dbformysql/flexibleservers
Recommendation ID: f259e897-9924-45db-a1ea-788f768548da
Subcategory: ServiceUpgradeAndRetirement

Azure Database for PostgreSQL

Remove inactive logical replication slots (important)

Inactive logical replication slots can result in degraded server performance and unavailability due to write ahead log (WAL) file retention and buildup of snapshot files. Your Azure Database for PostgreSQL flexible server might have inactive logical replication slots. THIS NEEDS IMMEDIATE ATTENTION. Either delete the inactive replication slots, or start consuming the changes from these slots, so that the slots' Log Sequence Number (LSN) advances and is close to the current LSN of the server.

Potential benefits: Improve PostgreSQL availability by removing inactive logical replication slots

Impact: High

For more information, see Logical replication and logical decoding - Azure Database for PostgreSQL - Flexible Server

ResourceType: microsoft.dbforpostgresql/flexibleservers
Recommendation ID: 33f26810-57d0-4612-85ff-a83ee9be884a
Subcategory: Other

Configure geo redundant backup storage

Configure GRS to ensure that your database meets its availability and durability targets even in the face of failures or disasters.

Potential benefits: Ensures recovery from regional failure or disaster.

Impact: Medium

For more information, see Backup and restore - Azure Database for PostgreSQL - Flexible Server

ResourceType: microsoft.dbforpostgresql/flexibleservers
Recommendation ID: 5295ed8a-f7a1-48d3-b4a9-e5e472cf1685
Subcategory: DisasterRecovery

Remove inactive logical replication slots

When an Orcas PostgreSQL flexible server has inactive logical replication slots, degraded server performance and unavailability due to write ahead log (WAL) file retention and buildup of snapshot files might occur. THIS NEEDS IMMEDIATE ATTENTION. Either delete the inactive replication slots, or start consuming the changes from these slots, so that the slots' Log Sequence Number (LSN) advances and is close to the current LSN of the server.

Potential benefits: Improve PostgreSQL availability by removing inactive logical replication slots

Impact: High

For more information, see Logical decoding - Azure Database for PostgreSQL - Single Server

ResourceType: microsoft.dbforpostgresql/servers
Recommendation ID: 6f33a917-418c-4608-b34f-4ff0e7be8637
Subcategory: Other

Azure IoT Hub

Upgrade Microsoft Edge device runtime to a supported version for IoT Hub

When Edge devices use outdated versions, performance degradation might occur. We recommend you upgrade to the latest supported version of the Azure IoT Edge runtime.

Potential benefits: Ensure business continuity with latest supported version for your Edge devices

Impact: Medium

For more information, see Update IoT Edge version on devices

ResourceType: microsoft.devices/iothubs
Recommendation ID: 51b1fad8-4838-426f-9871-107bc089677b
Subcategory: ServiceUpgradeAndRetirement

Upgrade device client SDK to a supported version for IotHub

When devices use an outdated SDK, performance degradation can occur. Some or all of your devices are using an outdated SDK. We recommend you upgrade to a supported SDK version.

Potential benefits: Ensure business continuity with supported SDK for your devices

Impact: Medium

For more information, see Azure IoT Hub device and service SDKs

ResourceType: microsoft.devices/iothubs
Recommendation ID: d448c687-b808-4143-bbdc-02c35478198a
Subcategory: ServiceUpgradeAndRetirement

IoT Hub Potential Device Storm Detected

This is when two or more devices are trying to connect to the IoT Hub using the same device ID credentials. When the second device (B) connects, it causes the first one (A) to become disconnected. Then (A) attempts to reconnect again, which causes (B) to get disconnected.

Potential benefits: Improve connectivity of your devices

Impact: Medium

For more information, see Troubleshooting Azure IoT Hub error codes

ResourceType: microsoft.devices/iothubs
Recommendation ID: 8d7efd88-c891-46be-9287-0aec2fabd51c
Subcategory: Other

Upgrade Device Update for IoT Hub SDK to a supported version

When a Device Update for IoT Hub instance uses an outdated version of the SDK, it doesn't get the latest upgrades. For the latest fixes, performance improvements, and new feature capabilities, upgrade to the latest Device Update for IoT Hub SDK version.

Potential benefits: Ensure business continuity with supported SDK

Impact: Medium

For more information, see Introduction to Device Update for Azure IoT Hub

ResourceType: microsoft.devices/iothubs
Recommendation ID: d1ff97b9-44cd-4acf-a9d3-3af500bd79d6
Subcategory: ServiceUpgradeAndRetirement

Add IoT Hub units or increase SKU level

When an IoT Hub exceeds its daily message quota, operation and cost problems might occur. To ensure smooth operation in the future, add units or increase the SKU level.

Potential benefits: The IoT Hub can receive messages again.

Impact: High

For more information, see Troubleshooting Azure IoT Hub error codes

ResourceType: microsoft.devices/iothubs
Recommendation ID: e4bda6ac-032c-44e0-9b40-e0522796a6d2
Subcategory: Scalability

Azure Kubernetes Service (AKS)

Set node pool subnet size to maximum auto scale setting

To allow AKS to efficiently scale out nodes, update the subnet size for node pools to match the maximum settings for the auto-scaler.

Potential benefits: Efficient scaling for demand. Reduced resource constraints.

Impact: High

For more information, see Configure Azure CNI networking for dynamic allocation of IPs and enhanced subnet support - Azure Kubernetes Service

ResourceType: microsoft.containerservice/managedclusters
Recommendation ID: 29a14bcd-36ad-41ea-9138-70049121eaea
Subcategory: HighAvailability

Use AKS Backup for a cluster with persistent volumes

Azure Kubernetes Service (AKS) backup is a cloud-native solution for backing up and restoring containerized apps and data in an AKS cluster. AKS Backup supports scheduled backups for cluster state and persistent volumes. AKS Backup offers granular control over a namespace or an entire cluster.

Potential benefits: Backups for cluster state and persistent volumes

Impact: Medium

For more information, see What is Azure Kubernetes Service (AKS) backup? - Azure Backup

ResourceType: microsoft.containerservice/managedclusters
Recommendation ID: 29f2eea3-b0d8-4934-a0f8-171dbd70ba13
Subcategory: DisasterRecovery

Enable Autoscaling for your system node pools

To ensure your system pods are scheduled even during times of high load, enable autoscaling on your system node pool.

Potential benefits: Enabling Autoscaler for system node pool ensures system pods are scheduled and cluster can function.

Impact: High

For more information, see Use the cluster autoscaler in Azure Kubernetes Service (AKS) - Azure Kubernetes Service

ResourceType: microsoft.containerservice/managedclusters
Recommendation ID: 70829b1a-272b-4728-b418-8f1a56432d33
Subcategory: HighAvailability

Have at least 2 nodes in your system node pool

Ensure your system node pools have at least 2 nodes for reliability of your system pods. With a single node, your cluster can fail in the event of a node or hardware failure.

Potential benefits: Having 2 nodes ensures resiliency against node failures.

Impact: High

For more information, see Use system node pools in Azure Kubernetes Service (AKS) - Azure Kubernetes Service

ResourceType: microsoft.containerservice/managedclusters
Recommendation ID: a9228ae7-4386-41be-b527-acd59fad3c79
Subcategory: HighAvailability

Create a dedicated system node pool

A cluster without a dedicated system node pool is less reliable. We recommend you dedicate system node pools to only serve critical system pods, preventing resource starvation between system and competing user pods. Enforce this behavior with the CriticalAddonsOnly=true:NoSchedule taint on the pool.

Potential benefits: Ensures cluster reliability by preventing resource scarcity for core system pods

Impact: High

For more information, see Use system node pools in Azure Kubernetes Service (AKS) - Azure Kubernetes Service

ResourceType: microsoft.containerservice/managedclusters
Recommendation ID: f31832f1-7e87-499d-a52a-120f610aba98
Subcategory: HighAvailability

Ensure B-series Virtual Machine's (VMs) aren't used in production environments

When a cluster has one or more node pools using a non-recommended burstable VM SKU, full vCPU capability 100% is unguaranteed. Ensure B-series VM's aren't used in production environments.

Potential benefits: Best practice for consistent performance

Impact: Medium

For more information, see Bv1 size series - Azure Virtual Machines

ResourceType: microsoft.containerservice/managedclusters
Recommendation ID: fac2ad84-1421-4dd3-8477-9d6e605392b4
Subcategory: HighAvailability

Azure NetApp Files

Configure AD DS Site for Azure Netapp Files AD Connector

If Azure NetApp Files can't reach assigned AD DS site domain controllers, the domain controller discovery process queries all domain controllers. Unreachable domain controllers may be used, causing issues with volume creation, client queries, authentication, and AD connection modifications.

Potential benefits: Optimize DNS Connectivity with Azure Netapp Files

Impact: High

For more information, see Understand guidelines for Active Directory Domain Services site design and planning

ResourceType: microsoft.netapp/netappaccounts
Recommendation ID: 2e795f35-fce6-48dc-a5ac-6860cb9a0442
Subcategory: Other

Ensure Roles assigned to Microsoft.NetApp Delegated Subnet has Subnet Read Permissions

Roles that are required for the management of Azure NetApp Files resources, must have "Microsoft.network/virtualNetworks/subnets/read" permissions on the subnet that is delegated to Microsoft.NetApp If the role, whether Custom or Built-In doesn't have this permission, then Volume Creations will fail

Potential benefits: Prevent volume creation failures by ensuring subnet/read permissions

Impact: High

ResourceType: microsoft.netapp/netappaccounts/capacitypools/volumes
Recommendation ID: 4e112555-7dc0-4f33-85e7-18398ac41345
Subcategory: HighAvailability

Review SAP configuration for timeout values used with Azure NetApp Files

High availability of SAP while used with Azure NetApp Files relies on setting proper timeout values to prevent disruption to your application. Review the 'Learn more' link to ensure your configuration meets the timeout values as noted in the documentation.

Potential benefits: Improve resiliency of SAP Application on ANF

Impact: High

For more information, see Get started with SAP on Azure VMs

ResourceType: microsoft.netapp/netappaccounts/capacitypools/volumes
Recommendation ID: 8754f0ed-c82a-497e-be31-c9d701c976e1
Subcategory: Other

Implement disaster recovery strategies for your Azure NetApp Files resources

To avoid data or functionality loss during a regional or zonal disaster, implement common disaster recovery techniques such as cross region replication or cross zone replication for your Azure NetApp Files volumes.

Potential benefits: Manage disaster recovery easily with Azure NetApp Files replication features

Impact: High

For more information, see Understand data protection and disaster recovery options in Azure NetApp Files

ResourceType: microsoft.netapp/netappaccounts/capacitypools/volumes
Recommendation ID: cda11061-35a8-4ca3-aa03-b242dcdf7319
Subcategory: DisasterRecovery

Azure Netapp Files - Enable Continuous Availability for SMB Volumes

For Continuous Availability, we recommend enabling Server Message Block (SMB) volume for your Azure Netapp Files.

Potential benefits: Prevent application disruptions by enabling Continuous Availability for SMB volumes

Impact: High

For more information, see Enable Continuous Availability on existing Azure NetApp Files SMB volumes

ResourceType: microsoft.netapp/netappaccounts/capacitypools/volumes
Recommendation ID: e4bebd74-387a-4a74-b757-475d2d1b4e3e
Subcategory: HighAvailability

Azure Site Recovery

Enable soft delete for your Recovery Services vaults

Soft delete helps you retain your backup data in the Recovery Services vault for an additional duration after deletion, giving you an opportunity to retrieve it before it's permanently deleted.

Potential benefits: Helps recovery of backup data in cases of accidental deletion

Impact: Medium

For more information, see Soft delete for Azure Backup - Azure Backup

ResourceType: microsoft.recoveryservices/vaults
Recommendation ID: 3ebfaf53-4d8c-4e67-a948-017bbbf59de6
Subcategory: DisasterRecovery

Enable Cross Region Restore for your recovery Services Vault

Cross Region Restore (CRR) allows you to restore Azure VMs in a secondary region (an Azure paired region), helping with disaster recovery.

Potential benefits: As one of the restore options, Cross Region Restore (CRR) allows you to restore Azure VMs in a secondary region, which is an Azure paired region.

Impact: Medium

For more information, see Restore VMs by using the Azure portal using Azure Backup - Azure Backup

ResourceType: microsoft.recoveryservices/vaults
Recommendation ID: 9b1308f1-4c25-4347-a061-7cc5cd6a44ab
Subcategory: DisasterRecovery

Azure Spring Apps

Upgrade Application Configuration Service to Gen 2

We notice you are still using Application Configuration Service Gen1 which will be end of support by April 2024. Application Configuration Service Gen2 provides better performance compared to Gen1 and the upgrade from Gen1 to Gen2 is zero downtime so we recommend to upgrade as soon as possible.

Potential benefits: Higher stability and availability

Impact: Medium

For more information, see Use Application Configuration Service for Tanzu - Azure Spring Apps Enterprise plan

ResourceType: microsoft.appplatform/spring
Recommendation ID: 39d862c8-445c-40c6-ba59-0e86134df606
Subcategory: Other

Azure SQL Database

Enable cross region disaster recovery for SQL Database

Enable cross region disaster recovery for Azure SQL Database for business continuity in the event of regional outage.

Potential benefits: Enabling disaster recovery creates a continuously synchronized readable secondary database for a primary database.

Impact: High

For more information, see Cloud business continuity - disaster recovery - Azure SQL Database

ResourceType: microsoft.sql/servers/databases
Recommendation ID: 2ea11bcb-dfd0-48dc-96f0-beba578b989a
Subcategory: DisasterRecovery

Enable zone redundancy for Azure SQL Database to achieve high availability and resiliency.

To achieve high availability and resiliency, enable zone redundancy for the SQL database or elastic pool to use availability zones and ensure the database or elastic pool is resilient to zonal failures.

Potential benefits: Enabling zone redundancy ensures Azure SQL Database is resilient to zonal hardware and software failures and the recovery is transparent to applications.

Impact: High

For more information, see Availability through local and zone redundancy - Azure SQL Database

ResourceType: microsoft.sql/servers/databases
Recommendation ID: 807e58d0-e385-41ad-987b-4a4b3e3fb563
Subcategory: HighAvailability

Azure Stack HCI

Upgrade to the latest version of AKS enabled by Arc

Upgrade to the latest version of API/SDK of AKS enabled by Azure Arc for new functionality and improved stability.

Potential benefits: The latest version of AKS enabled by Azure Arc with new functionality and improved stability.

Impact: Low

For more information, see Azure SDK Releases

ResourceType: microsoft.azurestackhci/clusters
Recommendation ID: 09e56b5a-9a00-47a7-82dd-9bd9569eb6ed
Subcategory: ServiceUpgradeAndRetirement

Upgrade to the latest version of AKS enabled by Arc

Upgrade to the latest version of API/SDK of AKS enabled by Azure Arc for new functionality and improved stability.

Potential benefits: The latest version of AKS enabled by Azure Arc with new functionality and improved stability.

Impact: Low

For more information, see Azure SDK Releases

ResourceType: microsoft.azurestackhci/clusters
Recommendation ID: 2ac72093-309f-41ec-bf9d-55e9fc490563
Subcategory: ServiceUpgradeAndRetirement

Classic deployment model storage

Action required: Migrate classic storage accounts by 8/30/2024.

Migrate your classic storage accounts to Azure Resource Manager to ensure business continuity. Azure Resource Manager will provide all of the same functionality plus a consistent management layer, resource grouping, and access to new features and updates.

Potential benefits: Ensure the ability to manage your data by migrating your classic storage account(s)

Impact: High

ResourceType: microsoft.classicstorage/storageaccounts
Recommendation ID: fd04ff97-d3b3-470a-9544-dfea3a5708db
Subcategory: HighAvailability

Classic deployment model virtual machine

Migrate off Cloud Services (classic) before 31 August 2024

Cloud Services (classic) is retiring. To avoid any loss of data or business continuity, migrate off before 31 Aug 2024.

Potential benefits: Continuity of your service

Impact: Medium

For more information, see Migrate Azure Cloud Services (classic) to Azure Cloud Services (extended support)

ResourceType: microsoft.classiccompute/domainnames
Recommendation ID: 13ff4efb-6c84-4684-8838-52c123e3e3a2
Subcategory: ServiceUpgradeAndRetirement

Cognitive Services

Upgrade your application to use the latest API version from Azure OpenAI

An Azure OpenAI resource with an older API version lacks the latest features and functionalities. We recommend that you use the latest REST API version.

Potential benefits: Our new API versions contain the latest and greatest features and capabilities.

Impact: Medium

For more information, see Azure OpenAI Service REST API reference - Azure OpenAI

ResourceType: microsoft.cognitiveservices/accounts
Recommendation ID: 13fed411-54aa-4923-b830-23b51539d79d
Subcategory: ServiceUpgradeAndRetirement

Quota exceeded for this resource, wait or upgrade to unblock

If the quota for your resource is exceeded your resource becomes blocked. You can wait for the quota to automatically get replenished soon, or, to use the resource again now, upgrade it to a paid SKU.

Potential benefits: If you upgrade to a paid SKU you can use the resource again today.

Impact: Medium

For more information, see Plan and manage costs for Azure AI Foundry - Azure AI Foundry

ResourceType: microsoft.cognitiveservices/accounts
Recommendation ID: 3f83aee8-222d-445c-9a46-2af5fe5b4777
Subcategory: Scalability

Container Registry

Use Premium tier for critical production workloads

Premium registries provide the highest amount of included storage, concurrent operations and network bandwidth, enabling high-volume scenarios. The Premium tier also adds features such as geo-replication, availability zone support, content-trust, customer-managed keys and private endpoints.

Potential benefits: The Premium tier provides the highest amount of performance, scale and resiliency options

Impact: High

For more information, see Registry Service Tiers and Features - Azure Container Registry

ResourceType: microsoft.containerregistry/registries
Recommendation ID: af0cdbce-c610-499b-9bd7-b169cdb1bb2e
Subcategory: HighAvailability

Ensure Geo-replication is enabled for resilience

Geo-replication enables workloads to use a single image, tag and registry name across regions, provides network-close registry access, reduced data transfer costs and regional Registry resilience if a regional outage occurs. This feature is only available in the Premium service tier.

Potential benefits: Improved resilience and pull performance, simplified registry management and reduced data transfer costs

Impact: High

For more information, see Geo-replicate Azure Container Registry to Multiple Regions - Azure Container Registry

ResourceType: microsoft.containerregistry/registries
Recommendation ID: dcfa2602-227e-4b6c-a60d-7b1f6514e690
Subcategory: HighAvailability

Content Delivery Network

Azure CDN From Edgio, Managed Certificate Renewal Unsuccessful. Additional Validation Required.

Azure CDN from Edgio employs CNAME delegation to renew certificates with DigiCert for managed certificate renewals. It's essential that Custom Domains resolve to an azureedge.net endpoint for the automatic renewal process with DigiCert to be successful. Ensure your Custom Domain's CNAME and CAA records are configured correctly. Should you require further assistance, please submit a support case to Azure to re-attempt the renewal request.

Potential benefits: Ensure service availability.

Impact: High

ResourceType: microsoft.cdn/profiles
Recommendation ID: ceecfd41-89b3-4c64-afe6-984c9cc03126
Subcategory: Other

Renew the expired Azure Front Door customer certificate to avoid service disruption

When customer certificates for Azure Front Door Standard and Premium profiles expire, you might have service disruptions. To avoid service disruption, renew the certificate before it expires.

Potential benefits: Ensure service availability.

Impact: High

For more information, see Configure HTTPS for your custom domain - Azure Front Door

ResourceType: microsoft.cdn/profiles
Recommendation ID: 4e1c2077-7c73-4ace-b4aa-f11b36c28290
Subcategory: BusinessContinuity

Re-validate domain ownership for the Azure Front Door managed certificate renewal

Azure Front Door (AFD) can't automatically renew the managed certificate because the domain isn't CNAME mapped to AFD endpoint. For the managed certificate to be automatically renewed, revalidate domain ownership.

Potential benefits: undefined

Impact: High

For more information, see How to add a custom domain - Azure Front Door

ResourceType: microsoft.cdn/profiles
Recommendation ID: bfe85fd2-ee53-4c35-8781-7790da2107e1
Subcategory: BusinessContinuity

Switch Secret version to 'Latest' for the Azure Front Door customer certificate

Configure the Azure Front Door (AFD) customer certificate secret to 'Latest' for the AFD to refer to the latest secret version in Azure Key Vault, allowing the secret can be automatically rotated.

Potential benefits: Latest’ version can be automatically rotated.

Impact: Medium

For more information, see Configure HTTPS for your custom domain - Azure Front Door

ResourceType: microsoft.cdn/profiles
Recommendation ID: 2c057605-4707-4d3e-bbb0-a7fe9b6a626b
Subcategory: Other

Validate domain ownership by adding DNS TXT record to DNS provider

Validate domain ownership by adding the DNS TXT record to your DNS provider. Validating domain ownership through TXT records enhances security and ensures proper control over your domain.

Potential benefits: Ensure service availability.

Impact: High

For more information, see How to add a custom domain - Azure Front Door

ResourceType: microsoft.cdn/profiles
Recommendation ID: 9411bc9f-d181-497c-b519-4154ae04fb00
Subcategory: BusinessContinuity

Migrate away from Azure CDN from Edgio by January 15, 2025

Migrate from Azure CDN Standard/Premium by Edgio before 15 January 2025 when the Edgio platform is scheduled to shut down. It's recommended to move to Azure Front Door for compatibility. Alternatively, consider using Azure Traffic Manager or Akamai CDN available in the Azure Marketplace.

Potential benefits: Avoid downtime and ensure business continuity.

Impact: High

For more information, see Azure updates

ResourceType: microsoft.cdn/profiles
Recommendation ID: 2c9e3f2a-7373-45e1-ab8b-f361e5f0c37f
Subcategory: ServiceUpgradeAndRetirement

Data Factory

Implement BCDR strategy for cross region redundancy in Azure Data Factory

Implementing BCDR strategy improves high availability and reduced risk of data loss

Potential benefits: Improves high availability and reduced risk of data loss

Impact: Medium

For more information, see BCDR for Azure Data Factory and Azure Synapse Analytics pipelines - Azure Architecture Center

ResourceType: microsoft.datafactory/factories
Recommendation ID: 617ee02c-be69-441e-8294-dee5a237efff
Subcategory: DisasterRecovery

Enable auto upgrade on your SHIR

Auto-upgrade of Self-hosted Integration runtime has been disabled. Know that you aren't getting the latest changes and bug fixes on the Self-Hosted Integration runtime. Review them to enable the SHIR auto upgrade

Potential benefits: To get the latest changes and bug fixes on the Self-Hosted Integration runtime

Impact: Medium

For more information, see Self-hosted integration runtime autoupdate and expire notification - Azure Data Factory

ResourceType: microsoft.datafactory/factories
Recommendation ID: 939b97dc-fdca-4324-ba36-6ea7e1ab399b
Subcategory: null

Fluid Relay

Azure Fluid Relay client library should be upgraded

If the Azure Fluid Relay service is invoked with an old client library, it might cause appplication problems. To ensure your application remains operational, upgrade your Azure Fluid Relay client library to the latest version. Upgrading provides the most up-to-date functionality, and enhancements in performance and stability.

Potential benefits: Improved reliability

Impact: Medium

For more information, see Version compatibility with Fluid Framework releases - Azure Fluid Relay

ResourceType: microsoft.fluidrelay/fluidrelayservers
Recommendation ID: a5e8a0f8-2c84-407a-b3d8-f371d684363b
Subcategory: ServiceUpgradeAndRetirement

HDInsight

Apply critical updates by dropping and recreating your HDInsight clusters (certificate rotation round 2)

The HDInsight service attempted to apply a critical certificate update on your running clusters. However, due to some custom configuration changes, we're unable to apply the updates on all clusters. To prevent those clusters from becoming unhealthy and unusable, drop and recreate your clusters.

Potential benefits: Ensure cluster health and stability

Impact: High

For more information, see Set up clusters in HDInsight with Apache Hadoop, Apache Spark, Apache Kafka, and more

ResourceType: microsoft.hdinsight/clusters
Recommendation ID: 69740e3e-5b96-4b0e-b9b8-4d7573e3611c
Subcategory: Other

Non-ESP ABFS clusters [Cluster Permissions for Word Readable]

Plan to introduce a change in non-ESP ABFS clusters, which restricts non-Hadoop group users from running Hadoop commands for storage operations. This change is to improve cluster security posture. Customers need to plan for the updates before September 30, 2023.

Potential benefits: This change is to improve cluster security posture

Impact: High

For more information, see Release notes for Azure HDInsight

ResourceType: microsoft.hdinsight/clusters
Recommendation ID: 24acd95e-fc9f-490c-b32d-edc6d747d0bc
Subcategory: Other

Restart brokers on your Kafka Cluster Disks

When data disks used by Kafka brokers in HDInsight clusters are almost full, the Apache Kafka broker process can't start and fails. To mitigate, find the retention time for every topic, back up the files that are older, and restart the brokers.

Potential benefits: Avoid Kafka broker issues

Impact: High

For more information, see Broker fails to start due to a full disk in Azure HDInsight

ResourceType: microsoft.hdinsight/clusters
Recommendation ID: 35e3a19f-16e7-4bb1-a7b8-49e02a35af2e
Subcategory: Other

Cluster Name length update

The max length of cluster name will be changed to 45 from 59 characters, to improve the security posture of clusters. This change will be implemented by September 30th, 2023.

Potential benefits: Security posture improvement for HDInsight

Impact: Medium

For more information, see Release notes for Azure HDInsight

ResourceType: microsoft.hdinsight/clusters
Recommendation ID: 41a248ef-50d4-4c48-81fb-13196f957210
Subcategory: Other

Upgrade your cluster to the the latest HDInsight image

A cluster created one year ago doesn't have the latest image upgrades. Your cluster was created 1 year ago. As part of the best practices, we recommend you use the latest HDInsight images for the best open source updates, Azure updates, and security fixes. The recommended maximum duration for cluster upgrades is less than six months.

Potential benefits: Get the latest fixes and features

Impact: High

For more information, see Before you start with Azure HDInsight

ResourceType: microsoft.hdinsight/clusters
Recommendation ID: 8f163c95-0029-4139-952a-42bd0d773b93
Subcategory: ServiceUpgradeAndRetirement

Upgrade your HDInsight Cluster

A cluster not using the latest image doesn't have the latest upgrades. Your cluster isn't using the latest image. We recommend you use the latest versions of HDInsight images for the best of open source updates, Azure updates, and security fixes. HDInsight releases happen every 30 to 60 days.

Potential benefits: Get the latest fixes and features

Impact: High

For more information, see Release notes for Azure HDInsight

ResourceType: microsoft.hdinsight/clusters
Recommendation ID: 97355d8e-59ae-43ff-9214-d4acf728467a
Subcategory: ServiceUpgradeAndRetirement

Gateway or virtual machine not reachable

We have detected a Network prob failure, it indicates unreachable gateway or a virtual machine. Verify all cluster hosts’ availability. Restart virtual machine to recover. If you need further assistance, don't hesitate to contact Azure support for help.

Potential benefits: Improved availability

Impact: High

ResourceType: microsoft.hdinsight/clusters
Recommendation ID: b3bf9f14-c83e-4dd3-8f5c-a6be746be173
Subcategory: Other

VM agent is 9.9.9.9. Upgrade the cluster.

Our records indicate that one or more of your clusters are using images dated February 2022 or older (image versions 2202xxxxxx or older). There is a potential reliability issue on HDInsight clusters that use images dated February 2022 or older.Consider rebuilding your clusters with latest image.

Potential benefits: Improved Reliability in Scaling and Network connectivity

Impact: High

ResourceType: microsoft.hdinsight/clusters
Recommendation ID: e4635832-0ab1-48b1-a386-c791197189e6
Subcategory: ServiceUpgradeAndRetirement

Media Services

Increase Media Services quotas or limits

When a media account hits its quota limits, disruption of service might occur. To avoid any disruption of service, review current usage of assets, content key policies, and stream policies and increase quota limits for the entities that are close to hitting the limit. You can request quota limits be increased by opening a ticket and adding relevant details. TIP: Don't create additional Azure Media accounts in an attempt to obtain higher limits.

Potential benefits: Avoid any disruption to service due to customer exceeding quota limits.

Impact: Medium

For more information, see Quotas and limits in Azure Media Services

ResourceType: microsoft.media/mediaservices
Recommendation ID: b7c9fd99-a979-40b4-ab48-b1dfab6bb41a
Subcategory: Scalability

Service Bus

Use Service Bus premium tier for improved resilience

When running critical applications, the Service Bus premium tier offers better resource isolation at the CPU and memory level, enhancing availability. It also supports Geo-disaster recovery feature enabling easier recovery from regional disasters without having to change application configurations.

Potential benefits: Service Bus premium tier offers better resiliency with CPU and memory resource isolation as well as Geo-disaster recovery

Impact: Low

For more information, see Azure Service Bus premium messaging tier - Azure Service Bus

ResourceType: microsoft.servicebus/namespaces
Recommendation ID: 29765e2c-5286-4039-963f-f8231e56cc3e
Subcategory: HighAvailability

Use Service Bus autoscaling feature in the premium tier for improved resilience

When running critical applications, enabling the auto scale feature allows you to have enough capacity to handle the load on your application. Having the right amount of resources running can reduce throttling and provide a better user experience.

Potential benefits: Enabling autoscale prevents users from capacity constraints

Impact: High

For more information, see Azure Service Bus - Automatically update messaging units - Azure Service Bus

ResourceType: microsoft.servicebus/namespaces
Recommendation ID: 68e62f5c-4ed1-4b78-a2a0-4d9a4cebf106
Subcategory: Scalability

SQL Server on Azure Virtual Machines

Enable Azure backup for SQL on your virtual machines

For the benefits of zero-infrastructure backup, point-in-time restore, and central management with SQL AG integration, enable backups for SQL databases on your virtual machines using Azure backup.

Potential benefits: SQL aware backups with no-infra for backup, centralized management, AG integration and point-in-time restore

Impact: Medium

For more information, see Back up SQL Server databases to Azure - Azure Backup

ResourceType: microsoft.sqlvirtualmachine/sqlvirtualmachines
Recommendation ID: 77f01e65-e57f-40ee-a0e9-e18c007d4d4c
Subcategory: DisasterRecovery

Storage

Use Managed Disks for storage accounts reaching capacity limit

When Premium SSD unmanaged disks in storage accounts are about to reach their Premium Storage capacity limit, failures might occur. To avoid failures when this limit is reached, migrate to Managed Disks that don't have an account capacity limit. This migration can be done through the portal in less than 5 minutes.

Potential benefits: Avoid scale issues when account reaches capacity limit

Impact: High

For more information, see Scalability and performance targets for standard storage accounts - Azure Storage

ResourceType: microsoft.storage/storageaccounts
Recommendation ID: d42d751d-682d-48f0-bc24-bb15b61ac4b8
Subcategory: Scalability

Configure blob backup

Azure blob backup helps protect data from accidental or malicious deletion. We recommend that you configure blob backup.

Potential benefits: Protect data from accidental or malicious deletion

Impact: Medium

For more information, see Overview of Azure Blobs backup - Azure Backup

ResourceType: microsoft.storage/storageaccounts
Recommendation ID: 8ef907f4-f8e3-4bf1-962d-27e005a7d82d
Subcategory: DisasterRecovery

Subscriptions

Turn on Azure Backup to get simple, reliable, and cost-effective protection for your data

Keep your information and applications safe with robust, one click backup from Azure. Activate Azure Backup to get cost-effective protection for a wide range of workloads including VMs, SQL databases, applications, and file shares.

Potential benefits: Ensure your business-critical applications stay protected

Impact: Medium

For more information, see Azure Backup Documentation - Azure Backup

ResourceType: microsoft.subscriptions/subscriptions
Recommendation ID: 9e91a63f-faaf-46f2-ac7c-ddfcedf13366
Subcategory: DisasterRecovery

Create an Azure Service Health alert

Azure Service Health alerts keep you informed about issues and advisories in four areas (Service issues, Planned maintenance, Security and Health advisories). These alerts are personalized to notify you about disruptions or potential impacts on your chosen Azure regions and services.

Potential benefits: Stay informed about issues and advisories across 4 areas (Service issues, Planned maintenance, Security advisories and Health advisories)

Impact: High

For more information, see Receive Service health alerts on Azure service notifications using Azure portal - Azure Service Health

ResourceType: microsoft.subscriptions/subscriptions
Recommendation ID: 242639fd-cd73-4be2-8f55-70478db8d1a5
Subcategory: MonitoringAndAlerting

Virtual Machines

Improve data reliability by using Managed Disks

Virtual machines in an Availability Set with disks that share either storage accounts or storage scale units aren't resilient to single storage scale unit failures during outages. Migrate to Azure Managed Disks to ensure that the disks of different VMs in the Availability Set are sufficiently isolated to avoid a single point of failure.

Potential benefits: Ensure business continuity through data resilience

Impact: High

ResourceType: microsoft.compute/availabilitysets
Recommendation ID: 02cfb5ef-a0c1-4633-9854-031fbda09946
Subcategory: HighAvailability

Use Azure Disks with Zone Redundant Storage (ZRS) for higher resiliency and availability

Azure Disks with ZRS provide synchronous replication of data across three Availability Zones in a region, making the disk tolerant to zonal failures without disruptions to applications. For higher resiliency and availability, migrate disks from LRS to ZRS.

Potential benefits: By designing your applications to use ZRS Disks, your data is replicated across 3 Availability Zones, making your disk resilient to a zonal outage

Impact: High

For more information, see Convert a disk from LRS to ZRS - Azure Virtual Machines

ResourceType: microsoft.compute/disks
Recommendation ID: d4102c0f-ebe3-4b22-8fe0-e488866a87af
Subcategory: HighAvailability

Enable virtual machine replication to protect your applications from regional outage

Virtual machines are resilient to regional outages when replication to another region is enabled. To reduce adverse business impact during an Azure region outage, we recommend enabling replication of all business-critical virtual machines.

Potential benefits: Ensure business continuity in case of any Azure region outage

Impact: Medium

For more information, see Set up Azure VM disaster recovery to a secondary region with Azure Site Recovery - Azure Site Recovery

ResourceType: microsoft.compute/virtualmachines
Recommendation ID: ed651749-cd37-4fd5-9897-01b416926745
Subcategory: DisasterRecovery

Update your outbound connectivity protocol to Service Tags for Azure Site Recovery

IP address-based allowlisting is a vulnerable way to control outbound connectivity for firewalls, Service Tags are a good alternative. We highly recommend the use of Service Tags, to allow connectivity to Azure Site Recovery services for the machines.

Potential benefits: Ensures better security, stability and resiliency than hard coded IP Addresses

Impact: High

For more information, see About networking in Azure VM disaster recovery with Azure Site Recovery - Azure Site Recovery

ResourceType: microsoft.compute/virtualmachines
Recommendation ID: bcfeb92b-fe93-4cea-adc6-e747055518e9
Subcategory: Other

Upgrade VM from Premium Unmanaged Disks to Managed Disks at no additional cost

Azure Managed Disks provide higher resiliency, simplified service management, higher scale target and more choices among several disk types. Your VM is using premium unmanaged disks that can be migrated to managed disks at no additional cost through the portal in less than 5 minutes.

Potential benefits: Leverage higher resiliency and other benefits of Managed Disks

Impact: High

For more information, see Overview of Azure Disk Storage - Azure Virtual Machines

ResourceType: microsoft.compute/virtualmachines
Recommendation ID: 57ecb3cd-f2b4-4cad-8b3a-232cca527a0b
Subcategory: HighAvailability

Upgrade your deprecated Virtual Machine image to a newer image

Virtual Machines (VMs) in your subscription are running on images scheduled for deprecation. Once the image is deprecated, new VMs can't be created from the deprecated image. To prevent disruption to your workloads, upgrade to a newer image. (VMRunningDeprecatedImage)

Potential benefits: Minimize any potential disruptions to your VM workloads

Impact: High

For more information, see Deprecated Azure Marketplace images - Azure Virtual Machines

ResourceType: microsoft.compute/virtualmachines
Recommendation ID: 11f04d70-5bb3-4065-b717-1f11b2e050a8
Subcategory: ServiceUpgradeAndRetirement

Upgrade to a newer offer of Virtual Machine image

Potential benefits: Minimize any potential disruptions to your VM workloads

Impact: High

For more information, see Deprecated Azure Marketplace images - Azure Virtual Machines

ResourceType: microsoft.compute/virtualmachines
Recommendation ID: 937d85a4-11b2-4e13-a6b5-9e15e3d74d7b
Subcategory: ServiceUpgradeAndRetirement

Upgrade to a newer SKU of Virtual Machine image

Potential benefits: Minimize any potential disruptions to your VM workloads

Impact: High

For more information, see Deprecated Azure Marketplace images - Azure Virtual Machines

ResourceType: microsoft.compute/virtualmachines
Recommendation ID: 681acf17-11c3-4bdd-8f71-da563c79094c
Subcategory: ServiceUpgradeAndRetirement

Provide access to mandatory URLs missing for your Azure Virtual Desktop environment

For a session host to deploy and register to Windows Virtual Desktop (WVD) properly, you need a set of URLs in the 'allowed list' in case your VM runs in a restricted environment. For specific URLs missing from your allowed list, search your application event log for event 3702.

Potential benefits: Ensure successful deployment and session host functionality when using Windows Virtual Desktop service

Impact: Medium

For more information, see Required FQDNs and endpoints for Azure Virtual Desktop

ResourceType: microsoft.compute/virtualmachines
Recommendation ID: 53e0a3cb-3569-474a-8d7b-7fd06a8ec227
Subcategory: Other

Use Availability zones for better resiliency and availability

Availability Zones (AZ) in Azure help protect your applications and data from datacenter failures. Each AZ is made up of one or more datacenters equipped with independent power, cooling, and networking. By designing solutions to use zonal VMs, you can isolate your VMs from failure in any other zone.

Potential benefits: Usage of zonal VMs protect your apps from zonal outage in any other zones.

Impact: High

For more information, see Tutorial - Move Azure single instance Virtual Machines from regional to zonal availability zones - Azure Virtual Machines

ResourceType: microsoft.compute/virtualmachines
Recommendation ID: 066a047a-9ace-45f4-ac50-6325840a6b00
Subcategory: HighAvailability

Convert Standard to Premium disk for higher uptime

Use a Premium SSD managed disk in a Single Instance virtual machine for the highest uptime. Conversion is allowed from a Standard managed disk to a Premium managed disk.

Potential benefits: Enhanced performance, configurability, and uptime

Impact: Low

For more information, see Best practices for high availability with Azure VMs and managed disks - Azure Virtual Machines

ResourceType: microsoft.compute/virtualmachines
Recommendation ID: 2b5cf6e5-2792-49b2-9ec0-0e901be6488b
Subcategory: BusinessContinuity

DNS Servers should be configured at the Virtual Network level

Set the DNS Servers for the VM at the Virtual Network level to ensure consistency throughout the environment. In the configuration of the primary network interface, DNS Servers setting should be set to Inherit from virtual network.

Potential benefits: Ensures consistency and reliable name resolution

Impact: Low

For more information, see Name resolution for resources in Azure virtual networks

ResourceType: microsoft.compute/virtualmachines
Recommendation ID: 490262e8-313c-431f-a143-a9c2cadba41b
Subcategory: Other

Enable Backups on your Virtual Machines

Secure your data by enabling backups for your virtual machines.

Potential benefits: Protection of your Virtual Machines

Impact: Medium

For more information, see What is Azure Backup? - Azure Backup

ResourceType: microsoft.compute/virtualmachines
Recommendation ID: 651c7925-17a3-42e5-85cd-73bd095cf27f
Subcategory: DisasterRecovery

Add additional VM or use Premium disks for higher uptime

Add a second instance VM to Availability Set or upgrade to Premium SSD managed disks for highest uptime.

Potential benefits: Enhanced performance, configurability, and uptime

Impact: Medium

For more information, see Best practices for high availability with Azure VMs and managed disks - Azure Virtual Machines

ResourceType: microsoft.compute/virtualmachines
Recommendation ID: e5e707f2-f41f-4aa6-bccf-3fb9748e5b66
Subcategory: BusinessContinuity

Upgrade your Virtual Machine Scale Set to alternative image version

VMSS in your subscription are running on images that have been scheduled for deprecation. Once the image is deprecated, your Virtual Machine Scale Set workloads would no longer scale out. Upgrade to newer version of the image to prevent disruption to your workload.

Potential benefits: Minimize any potential disruptions to your Virtual Machine Scale Set workloads

Impact: High

For more information, see Deprecated Azure Marketplace images - Azure Virtual Machines

ResourceType: microsoft.compute/virtualmachinescalesets
Recommendation ID: 3b739bd1-c193-4bb6-a953-1362ee3b03b2
Subcategory: ServiceUpgradeAndRetirement

Upgrade your Virtual Machine Scale Set to alternative image offer

VMSS in your subscription are running on images that have been scheduled for deprecation. Once the image is deprecated, your Virtual Machine Scale Set workloads would no longer scale out. To prevent disruption to your workload, upgrade to newer offer of the image.

Potential benefits: Minimize any potential disruptions to your Virtual Machine Scale Set workloads

Impact: High

For more information, see Deprecated Azure Marketplace images - Azure Virtual Machines

ResourceType: microsoft.compute/virtualmachinescalesets
Recommendation ID: 3d18d7cd-bdec-4c68-9160-16a677d0f86a
Subcategory: ServiceUpgradeAndRetirement

Upgrade your Virtual Machine Scale Set to alternative image SKU

VMSS in your subscription are running on images that have been scheduled for deprecation. Once the image is deprecated, your Virtual Machine Scale Set workloads would no longer scale out. To prevent disruption to your workload, upgrade to newer SKU of the image.

Potential benefits: Minimize any potential disruptions to your Virtual Machine Scale Set workloads

Impact: High

For more information, see Deprecated Azure Marketplace images - Azure Virtual Machines

ResourceType: microsoft.compute/virtualmachinescalesets
Recommendation ID: 44abb62e-7789-4f2f-8001-fa9624cb3eb3
Subcategory: ServiceUpgradeAndRetirement

Enable automatic repair policy on Azure Virtual Machine Scale Sets (VMSS)

Enabling automatic instance repairs helps achieve high availability by maintaining a set of healthy instances. If an unhealthy instance is found by the Application Health extension or load balancer health probe, automatic instance repairs attempt to recover the instance by triggering repair actions.

Potential benefits: Increase resiliency by automating repair of failed instances

Impact: High

For more information, see Automatic instance repairs with Azure Virtual Machine Scale Sets - Azure Virtual Machine Scale Sets

ResourceType: microsoft.compute/virtualmachinescalesets
Recommendation ID: b4d988a9-85e6-4179-b69c-549bdd8a55bb
Subcategory: HighAvailability

Upgrade to Standard SSD OS disk

Upgrade the operating system (OS) disk from Standard HDD to Standard SSD for increased uptime of single-instance virtual machine and improved input/output operations and throughput.

Potential benefits: Boost single-instance VM uptime from 95% to 99.5%.

Impact: Medium

For more information, see Azure Disks Standard SSD billable transaction cap blog

ResourceType: microsoft.compute/virtualmachines
Recommendation ID: 3c03549b-9c0a-4c13-bed4-def3c7e34ddd
Subcategory: HighAvailability

Workloads

Deploy Hyperspace Web servers as part of a Virtual Machine Scale Set Flex configured for 3 zones

We have observed that your Hyperspace Web servers in the Virtual Machine Scale Set Flex set up aren't spread across 3 zones in the selected region. For services like Hyperspace Web in Epic systems that require high availability and large scale, it's recommended that servers are deployed as part of Virtual Machine Scale Set Flex and spread across 3 zones. With Flexible orchestration, Azure provides a unified experience across the Azure VM ecosystem

Potential benefits: High availability and on-demand large scale for Hyperspace web servers in Epic DB

Impact: Medium

For more information, see Create an Azure scale set that uses Availability Zones - Azure Virtual Machine Scale Sets

ResourceType: microsoft.workloads/epicvirtualinstances/hyperspacewebinstances
Recommendation ID: dfa50c39-104a-418b-873a-c145fe521c9b
Subcategory: HighAvailability

Configure Local host cache on Citrix VDI servers to ensure seamless connection brokering operations

We have observed that your Citrix VDI servers aren't configured Local host Cache. Local Host Cache (LHC) is a feature in Citrix Virtual Apps and Desktops that allows connection brokering operations to continue when an outage occurs.LHC engages when the site database is inaccessible for 90 seconds.

Potential benefits: Seamless connection brokering operations

Impact: Medium

ResourceType: microsoft.workloads/epicvirtualinstances/presentationinstances
Recommendation ID: f3d23f88-aee2-4b5a-bfd6-65b22bd70fc0
Subcategory: HighAvailability

Configure an Always On availability group for Multi-purpose SQL servers (MPSQL)

MPSQL servers with an Always On availability group have better availability. Your MPSQL servers aren't configured as part of an Always On availability group in the shared infrastructure in your Epic system. Always On availability groups improve database availability and resource use.

Potential benefits: Improved Database availability and resource use

Impact: Medium

For more information, see What is an Always On availability group? - SQL Server Always On

ResourceType: microsoft.workloads/epicvirtualinstances/sharedinstances
Recommendation ID: 3ca22452-0f8f-4701-a313-a2d83334e3cc
Subcategory: HighAvailability

Ensure high availability for production SAP app server

Verify high availability configuration for SAP application server of production SAP workloads.

Potential benefits: Minimize downtime to enhance system availability

Impact: High

For more information, see Azure VMs HA architecture and scenarios for SAP NetWeaver

ResourceType: microsoft.workloads/sapvirtualinstances/applicationinstances
Recommendation ID: 90a86c8e-efab-47a1-bb4d-63f231b15292
Subcategory: HighAvailability

Ensure high availability across zones for SAP app server

Verify high availability configuration across multiple availability zones within the same region for SAP application servers of production workloads.

Potential benefits: Minimize downtime to enhance system availability

Impact: High

For more information, see SAP workload configurations with Azure Availability Zones

ResourceType: microsoft.workloads/sapvirtualinstances/applicationinstances
Recommendation ID: b914567c-cfc4-42a5-8d16-939b77b6b4d0
Subcategory: HighAvailability

Use Premium or Ultra Disk for single app server VM

Use Premium Storage or Ultra Disks for SAP application server.

Potential benefits: Maximize the Azure single VM SLA

Impact: High

For more information, see Azure VMs HA architecture and scenarios for SAP NetWeaver

ResourceType: microsoft.workloads/sapvirtualinstances/applicationinstances
Recommendation ID: a7202ec4-8a6e-45ef-9b6e-df2486bcaa86
Subcategory: HighAvailability

Set the Idle timeout in Azure Load Balancer to 30 minutes for ASCS HA setup in SAP workloads

To prevent load balancer timeout, make sure that all Azure Load Balancing Rules have: 'Idle timeout (minutes)' set to the maximum value of 30 minutes. Open the load balancer, select 'load balancing rules' and add or edit the rule to enable the setting.

Potential benefits: Reliability of HA setup in SAP workloads

Impact: Medium

For more information, see High availability for SAP HANA on Azure VMs on SLES

ResourceType: microsoft.workloads/sapvirtualinstances/centralinstances
Recommendation ID: 45c2994f-a01d-4024-843e-a2a84dae48b4
Subcategory: HighAvailability

Enable Floating IP in the Azure Load balancer for ASCS HA setup in SAP workloads

For port resuse and better high availability, enable floating IP in the load balancing rules for the Azure Load Balancer for HA set up of ASCS instance in SAP workloads. Open the load balancer, select 'load balancing rules' and add or edit the rule to enable.

Potential benefits: Reliability of HA setup in SAP workloads

Impact: Medium

For more information, see High availability for SAP HANA on Azure VMs on SLES

ResourceType: microsoft.workloads/sapvirtualinstances/centralinstances
Recommendation ID: aec9b9fb-145f-4af8-94f3-7fdc69762b72
Subcategory: HighAvailability

Enable HA ports in the Azure Load Balancer for ASCS HA setup in SAP workloads

For port resuse and better high availability, enable HA ports in the load balancing rules for HA set up of ASCS instance in SAP workloads. Open the load balancer, select 'load balancing rules' and add or edit the rule to enable.

Potential benefits: Reliability of HA setup in SAP workloads

Impact: Medium

For more information, see High availability for SAP HANA on Azure VMs on SLES

ResourceType: microsoft.workloads/sapvirtualinstances/centralinstances
Recommendation ID: c3811f93-a1a5-4a84-8fba-dd700043cc42
Subcategory: HighAvailability

Disable TCP timestamps on VMs placed behind Azure Load Balancer in ASCS HA setup in SAP workloads

Disable TCP timestamps on VMs placed behind AzurEnabling TCP timestamps will cause the health probes to fail due to TCP packets being dropped by the VM's guest OS TCP stack causing the load balancer to mark the endpoint as down

Potential benefits: Reliability of HA setup in SAP workloads

Impact: Medium

ResourceType: microsoft.workloads/sapvirtualinstances/centralinstances
Recommendation ID: 27899d14-ac62-41f4-a65d-e6c2a5af101b
Subcategory: Other

Ensure that stonith is enabled for the Pacemaker configuration in ASCS HA setup in SAP workloads

In a Pacemaker cluster, the implementation of node level fencing is done using a STONITH (Shoot The Other Node in the Head) resource. To help manage failed nodes, ensure that 'stonith-enable' is set to 'true' in the HA cluster configuration.

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability of SAP HANA on Azure VMs on RHEL

ResourceType: microsoft.workloads/sapvirtualinstances/centralinstances
Recommendation ID: 28a00e1e-d0ad-452f-ad58-95e6c584e594
Subcategory: HighAvailability

Set the corosync token in Pacemaker cluster to 30000 for ASCS HA setup in SAP workloads (RHEL)

The corosync token setting determines the timeout that is used directly, or as a base, for real token timeout calculation in HA clusters. To allow memory-preserving maintenance, set the corosync token to 30000 for SAP on Azure.

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability of SAP HANA on Azure VMs on RHEL

ResourceType: microsoft.workloads/sapvirtualinstances/centralinstances
Recommendation ID: deede7ea-68c5-4fb9-8f08-5e706f88ac67
Subcategory: Other

Set the expected votes parameter to '2' in Pacemaker cofiguration in ASCS HA setup in SAP workloads (RHEL)

For a two node HA cluster, set the quorum 'expected-votes' parameter to '2' as recommended for SAP on Azure to ensure a proper quorum, resilience, and data consistency.

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability of SAP HANA on Azure VMs on RHEL

ResourceType: microsoft.workloads/sapvirtualinstances/centralinstances
Recommendation ID: 35ef8bba-923e-44f3-8f06-691deb679468
Subcategory: HighAvailability

Enable the 'concurrent-fencing' parameter in Pacemaker cofiguration in ASCS HA setup in SAP workloads (ConcurrentFencingHAASCSRH)

Concurrent fencing enables the fencing operations to be performed in parallel, which enhances high availability (HA), prevents split-brain scenarios, and contributes to a robust SAP deployment. Set this parameter to 'true' in the Pacemaker cluster configuration for ASCS HA setup.

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability of SAP HANA on Azure VMs on RHEL

ResourceType: microsoft.workloads/sapvirtualinstances/centralinstances
Recommendation ID: 0fffcdb4-87db-44f2-956f-dc9638248659
Subcategory: Other

Ensure that stonith is enabled for the cluster configuration in ASCS HA setup in SAP workloads

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability for SAP HANA on Azure VMs on SLES

ResourceType: microsoft.workloads/sapvirtualinstances/centralinstances
Recommendation ID: 6921340e-baa1-424f-80d5-c07bbac3cf7c
Subcategory: HighAvailability

Set the stonith timeout to 144 for the cluster configuration in ASCS HA setup in SAP workloads

The ‘stonith-timeout’ specifies how long the cluster waits for a STONITH action to complete. Setting it to '144' seconds allows more time for fencing actions to complete. We recommend this setting for HA clusters for SAP on Azure.

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability for SAP HANA on Azure VMs on SLES

ResourceType: microsoft.workloads/sapvirtualinstances/centralinstances
Recommendation ID: 4eb10096-942e-402d-b4a6-e4e271c87a02
Subcategory: Other

Set the corosync token in Pacemaker cluster to 30000 for ASCS HA setup in SAP workloads (SUSE)

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability for SAP HANA on Azure VMs on SLES

ResourceType: microsoft.workloads/sapvirtualinstances/centralinstances
Recommendation ID: 9f30eb2b-6a6f-4fa8-89dc-85a395c31233
Subcategory: Other

Set 'token_retransmits_before_loss_const' to 10 in Pacemaker cluster in ASCS HA setup in SAP workloads

The corosync token_retransmits_before_loss_const determines how many token retransmits are attempted before timeout in HA clusters. For stability and reliability, set the 'totem.token_retransmits_before_loss_const' to '10' for ASCS HA setup.

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability for SAP HANA on Azure VMs on SLES

ResourceType: microsoft.workloads/sapvirtualinstances/centralinstances
Recommendation ID: f32b8f89-fb3c-4030-bd4a-0a16247db408
Subcategory: Other

The 'corosync join' timeout specifies in milliseconds how long to wait for join messages in the membership protocol so when a new node joins the cluster, it has time to synchronize its state with existing nodes. Set to '60' in Pacemaker cluster configuration for ASCS HA setup.

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability for SAP HANA on Azure VMs on SLES

ResourceType: microsoft.workloads/sapvirtualinstances/centralinstances
Recommendation ID: fed84141-4942-49b3-8b0c-73a8b352f754
Subcategory: Other

Set the 'corosync consensus' in Pacemaker cluster to '36000' for ASCS HA setup in SAP workloads

The corosync 'consensus' parameter specifies in milliseconds how long to wait for consensus before starting a round of membership in the cluster configuration. Set 'consensus' in the Pacemaker cluster configuration for ASCS HA setup to 1.2 times the corosync token for reliable failover behavior.

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability for SAP HANA on Azure VMs on SLES

ResourceType: microsoft.workloads/sapvirtualinstances/centralinstances
Recommendation ID: 73227428-640d-4410-aec4-bac229a2b7bd
Subcategory: Other

Set the 'corosync max_messages' in Pacemaker cluster to '20' for ASCS HA setup in SAP workloads

The corosync 'max_messages' constant specifies the maximum number of messages that one processor can send on receipt of the token. Set it to 20 times the corosync token parameter in the Pacemaker cluster configuration to allow efficient communication without overwhelming the network.

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability for SAP HANA on Azure VMs on SLES

ResourceType: microsoft.workloads/sapvirtualinstances/centralinstances
Recommendation ID: 14a889a6-374f-4bd4-8add-f644e3fe277d
Subcategory: Other

Set 'expected votes' to '2' in the cluster configuration in ASCS HA setup in SAP workloads (SUSE)

For a two node HA cluster, set the quorum 'expected_votes' parameter to 2 as recommended for SAP on Azure to ensure a proper quorum, resilience, and data consistency.

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability for SAP HANA on Azure VMs on SLES

ResourceType: microsoft.workloads/sapvirtualinstances/centralinstances
Recommendation ID: 89a9ddd9-f9bf-47e4-b5f7-a0a4edfa0cdb
Subcategory: HighAvailability

Set the two_node parameter to 1 in the cluster cofiguration in ASCS HA setup in SAP workloads

For a two node HA cluster, set the quorum parameter 'two_node' to 1 as recommended for SAP on Azure.

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability for SAP HANA on Azure VMs on SLES

ResourceType: microsoft.workloads/sapvirtualinstances/centralinstances
Recommendation ID: 2030a15b-ff0b-47c3-b934-60072ccda75e
Subcategory: HighAvailability

Enable 'concurrent-fencing' in Pacemaker ASCS HA setup in SAP workloads (ConcurrentFencingHAASCSSLE)

Concurrent fencing enables the fencing operations to be performed in parallel, which enhances HA, prevents split-brain scenarios, and contributes to a robust SAP deployment. Set this parameter to 'true' in the Pacemaker cluster configuration for ASCS HA setup.

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability for SAP HANA on Azure VMs on SLES

ResourceType: microsoft.workloads/sapvirtualinstances/centralinstances
Recommendation ID: dc19b2c9-0770-4929-8f63-81c07fe7b6f3
Subcategory: Other

Ensure the number of 'fence_azure_arm' instances is one in Pacemaker in HA enabled SAP workloads

If you're using Azure fence agent for fencing with either managed identity or service principal, ensure that there's one instance of fence_azure_arm (an I/O fencing agent for Azure Resource Manager) in the Pacemaker configuration for ASCS HA setup for high availability.

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability for SAP HANA on Azure VMs on SLES

ResourceType: microsoft.workloads/sapvirtualinstances/centralinstances
Recommendation ID: cb56170a-0ecb-420a-b2c9-5c4878a0132a
Subcategory: HighAvailability

Set stonith-timeout to 900 in Pacemaker configuration with Azure fence agent for ASCS HA setup

For reliable function of the Pacemaker for ASCS HA set the 'stonith-timeout' to 900. This setting is applicable if you're using the Azure fence agent for fencing with either managed identity or service principal.

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability for SAP HANA on Azure VMs on SLES

ResourceType: microsoft.workloads/sapvirtualinstances/centralinstances
Recommendation ID: 05747c68-715f-4c8f-b027-f57a931cc07a
Subcategory: HighAvailability

Create the softdog config file in Pacemaker configuration for ASCS HA setup in SAP workloads

The softdog timer is loaded as a kernel module in linux OS. This timer triggers a system reset if it detects that the system has hung. Ensure that the softdog configuation file is created in the Pacemaker cluster forASCS HA set up

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability for SAP HANA on Azure VMs on SLES

ResourceType: microsoft.workloads/sapvirtualinstances/centralinstances
Recommendation ID: 88261a1a-6a32-4fb6-8bbd-fcd60fdfcab6
Subcategory: HighAvailability

Ensure the softdog module is loaded in for Pacemaler in ASCS HA setup in SAP workloads

The softdog timer is loaded as a kernel module in linux OS. This timer triggers a system reset if it detects that the system has hung. First ensure that you created the softdog configuration file, then load the softdog module in the Pacemaker configuration for ASCS HA setup

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability for SAP HANA on Azure VMs on SLES

ResourceType: microsoft.workloads/sapvirtualinstances/centralinstances
Recommendation ID: 3730bc11-c81c-43eb-896a-8fce0bac139d
Subcategory: HighAvailability

Ensure high availability for production SAP central service

Verify high availability configuration for SAP central services instance of production SAP workloads.

Potential benefits: Minimize downtime to enhance system availability

Impact: High

For more information, see Azure VMs HA architecture and scenarios for SAP NetWeaver

ResourceType: microsoft.workloads/sapvirtualinstances/centralinstances
Recommendation ID: d2c08f71-906b-4915-a08e-c56215913fb2
Subcategory: HighAvailability

Ensure high availability across zones of SAP central service

Verify high availability configuration across multiple availability zones within the same region for SAP central services of production workloads.

Potential benefits: Minimize downtime to enhance availability

Impact: High

For more information, see SAP workload configurations with Azure Availability Zones

ResourceType: microsoft.workloads/sapvirtualinstances/centralinstances
Recommendation ID: 9db6dd7f-af0e-45aa-89df-d35062baaefb
Subcategory: HighAvailability

Use Premium or Ultra Disk for SAP central service VM

Use Premium Storage or Ultra Disks for the SAP central services instance.

Potential benefits: Maximize the Azure single VM SLA

Impact: High

For more information, see Azure VMs HA architecture and scenarios for SAP NetWeaver

ResourceType: microsoft.workloads/sapvirtualinstances/centralinstances
Recommendation ID: bbdfaf94-719f-4cb2-897a-9e237007328a
Subcategory: HighAvailability

Set the Idle timeout in Azure Load Balancer to 30 minutes for HANA DB HA setup in SAP workloads

To prevent load balancer timeout, ensure that all Azure Load Balancing Rules 'Idle timeout (minutes)' parameter is set to the maximum value of 30 minutes. Open the load balancer, select 'load balancing rules' and add or edit the rule to enable the recommended settings.

Potential benefits: Reliability of HA setup in SAP workloads

Impact: Medium

For more information, see High availability for SAP HANA on Azure VMs on SLES

ResourceType: microsoft.workloads/sapvirtualinstances/databaseinstances
Recommendation ID: 1c1deb1c-ae1b-49a7-88d3-201285ad63b6
Subcategory: HighAvailability

Enable Floating IP in the Azure Load balancer for HANA DB HA setup in SAP workloads

For more flexible routing, enable floating IP in the load balancing rules for the Azure Load Balancer for HA set up of HANA DB instance in SAP workloads. Open the load balancer, select 'load balancing rules' and add or edit the rule to enable the recommended settings.

Potential benefits: Reliability of HA setup in SAP workloads

Impact: Medium

For more information, see High availability for SAP HANA on Azure VMs on SLES

ResourceType: microsoft.workloads/sapvirtualinstances/databaseinstances
Recommendation ID: cca36756-d938-4f3a-aebf-75358c7c0622
Subcategory: HighAvailability

Enable HA ports in the Azure Load Balancer for HANA DB HA setup in SAP workloads

For enhanced scalability, enable HA ports in the Load balancing rules for HA set up of HANA DB instance in SAP workloads. Open the load balancer, select 'load balancing rules' and add or edit the rule to enable the recommended settings.

Potential benefits: Reliability of HA setup in SAP workloads

Impact: Medium

For more information, see High availability for SAP HANA on Azure VMs on SLES

ResourceType: microsoft.workloads/sapvirtualinstances/databaseinstances
Recommendation ID: a5ac35c2-a299-4864-bfeb-09d2348bda68
Subcategory: HighAvailability

Disable TCP timestamps on VMs placed behind Azure Load Balancer in HANA DB HA setup in SAP workloads

Disable TCP timestamps on VMs placed behind Azure Load Balancer. Enabling TCP timestamps causes the health probes to fail due to TCP packets dropped by the VM's guest OS TCP stack causing the load balancer to mark the endpoint as down.

Potential benefits: Reliability of HA setup in SAP workloads

Impact: Medium

For more information, see Azure Load Balancer health probes

ResourceType: microsoft.workloads/sapvirtualinstances/databaseinstances
Recommendation ID: 760ba688-69ea-431b-afeb-13683a03f0c2
Subcategory: Other

Ensure high availability for production SAP database

Ensure high availability configuration for SAP database instance of production SAP workloads

Potential benefits: Minimize downtime to enhance system availability

Impact: High

For more information, see Azure VMs HA architecture and scenarios for SAP NetWeaver

ResourceType: microsoft.workloads/sapvirtualinstances/databaseinstances
Recommendation ID: c16626fe-2b55-4e01-9ddf-7d25f694f2ef
Subcategory: HighAvailability

Ensure high availability across zones of SAP database

Verify high availability configuration across multiple availability zones within the same region for SAP database instance of production workloads

Potential benefits: Minimize downtime to enhance system availability

Impact: High

For more information, see SAP workload configurations with Azure Availability Zones

ResourceType: microsoft.workloads/sapvirtualinstances/databaseinstances
Recommendation ID: ab3fc753-4f6e-481f-a42a-7d9a85c56b43
Subcategory: HighAvailability

Use Premium or Ultra Disk for prod system of database VM

Use Premium Storage or Ultra Disks for the SAP database instance

Potential benefits: Maximize the Azure single VM SLA

Impact: High

For more information, see Azure VMs HA architecture and scenarios for SAP NetWeaver

ResourceType: microsoft.workloads/sapvirtualinstances/databaseinstances
Recommendation ID: 4a047f75-39f1-4ec7-a5e7-2261d1741b0c
Subcategory: HighAvailability

Set PREFER_SITE_TAKEOVER parameter to 'true' in the Pacemaker configuration for HANA DB HA setup

The PREFER_SITE_TAKEOVER parameter in SAP HANA defines if the HANA system replication (SR) resource agent prefers to takeover the secondary instance instead of restarting the failed primary locally. For reliable function of HANA DB high availability (HA) setup, set PREFER_SITE_TAKEOVER to 'true'.

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability of SAP HANA on Azure VMs on RHEL

ResourceType: microsoft.workloads/sapvirtualinstances/databaseinstances
Recommendation ID: 255e9f7b-db3a-4a67-b87e-6fdc36ea070d
Subcategory: HighAvailability

Enable stonith in the cluster cofiguration in HA enabled SAP workloads for VMs with Redhat OS

In a Pacemaker cluster, the implementation of node level fencing is done using STONITH (Shoot The Other Node in the Head) resource. To help manage failed nodes, ensure that 'stonith-enable' is set to 'true' in the HA cluster configuration of your SAP workload.

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability of SAP HANA on Azure VMs on RHEL

ResourceType: microsoft.workloads/sapvirtualinstances/databaseinstances
Recommendation ID: 4594198b-b114-4865-8ed8-be06db945408
Subcategory: HighAvailability

Set the corosync token in Pacemaker cluster to 30000 for HA enabled HANA DB for VM with RHEL OS

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability of SAP HANA on Azure VMs on RHEL

ResourceType: microsoft.workloads/sapvirtualinstances/databaseinstances
Recommendation ID: 604f3822-6a28-47db-b31c-4b0dbe317625
Subcategory: Other

Set the expected votes parameter to '2' in HA enabled SAP workloads (RHEL)

For a two node HA cluster, set the quorum votes to '2' as recommended for SAP on Azure to ensure a proper quorum, resilience, and data consistency.

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability of SAP HANA on Azure VMs on RHEL

ResourceType: microsoft.workloads/sapvirtualinstances/databaseinstances
Recommendation ID: 937a1997-fc2d-4a3a-a9f6-e858a80921fd
Subcategory: HighAvailability

Enable the 'concurrent-fencing' parameter in the Pacemaker cofiguration for HANA DB HA setup

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability of SAP HANA on Azure VMs on RHEL

ResourceType: microsoft.workloads/sapvirtualinstances/databaseinstances
Recommendation ID: 6cc63594-c89f-4535-b878-cdd13659cfc5
Subcategory: Other

Set parameter PREFER_SITE_TAKEOVER to 'true' in the cluster cofiguration in HA enabled SAP workloads

The PREFER_SITE_TAKEOVER parameter in SAP HANA topology defines if the HANA SR resource agent prefers to takeover the secondary instance instead of restarting the failed primary locally. For reliable function of HANA DB HA setup, set it to 'true'.

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability for SAP HANA on Azure VMs on SLES

ResourceType: microsoft.workloads/sapvirtualinstances/databaseinstances
Recommendation ID: 230fddab-0864-4c5e-bb27-037bec7c46c6
Subcategory: HighAvailability

Enable stonith in the cluster configuration in HA enabled SAP workloads for VMs with SUSE OS

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability for SAP HANA on Azure VMs on SLES

ResourceType: microsoft.workloads/sapvirtualinstances/databaseinstances
Recommendation ID: 210d0895-074c-4cc7-88de-b0a9e00820c6
Subcategory: HighAvailability

Set the stonith timeout to 144 for the cluster configuration in HA enabled SAP workloads

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability for SAP HANA on Azure VMs on SLES

ResourceType: microsoft.workloads/sapvirtualinstances/databaseinstances
Recommendation ID: 64e5e17e-640e-430f-987a-721f133dbd5c
Subcategory: HighAvailability

Set the corosync token in Pacemaker cluster to 30000 for HA enabled HANA DB for VM with SUSE OS

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability for SAP HANA on Azure VMs on SLES

ResourceType: microsoft.workloads/sapvirtualinstances/databaseinstances
Recommendation ID: a563e3ad-b6b5-4ec2-a444-c4e30800b8cf
Subcategory: Other

Set 'token_retransmits_before_loss_const' to 10 in Pacemaker cluster in HA enabled SAP workloads

The corosync token_retransmits_before_loss_const determines how many token retransmits are attempted before timeout in HA clusters. Set the totem.token_retransmits_before_loss_const to 10 as recommended for HANA DB HA setup.

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability for SAP HANA on Azure VMs on SLES

ResourceType: microsoft.workloads/sapvirtualinstances/databaseinstances
Recommendation ID: 99681175-0124-44de-93ae-edc08f9dc0a8
Subcategory: Other

Set the 'corosync join' in Pacemaker cluster to 60 for HA enabled HANA DB in SAP workloads

The 'corosync join' timeout specifies in milliseconds how long to wait for join messages in the membership protocol so when a new node joins the cluster, it has time to synchronize its state with existing nodes. Set to '60' in Pacemaker cluster configuration for HANA DB HA setup.

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability for SAP HANA on Azure VMs on SLES

ResourceType: microsoft.workloads/sapvirtualinstances/databaseinstances
Recommendation ID: b8ac170f-433e-4d9c-8b75-f7070a2a5c92
Subcategory: Other

Set the 'corosync consensus' in Pacemaker cluster to 36000 for HA enabled HANA DB in SAP workloads

The corosync 'consensus' parameter specifies in milliseconds how long to wait for consensus before starting a new round of membership in the cluster. For reliable failover behavior, set 'consensus' in the Pacemaker cluster configuration for HANA DB HA setup to 1.2 times the corosync token.

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability for SAP HANA on Azure VMs on SLES

ResourceType: microsoft.workloads/sapvirtualinstances/databaseinstances
Recommendation ID: 63e27ad9-1804-405a-97eb-d784686ffbe3
Subcategory: Other

Set the 'corosync max_messages' in Pacemaker cluster to 20 for HA enabled HANA DB in SAP workloads

The corosync 'max_messages' constant specifies the maximum number of messages that one processor can send on receipt of the token. To allow efficient communication without overwhelming the network, set it to 20 times the corosync token parameter in the Pacemaker cluster configuration.

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability for SAP HANA on Azure VMs on SLES

ResourceType: microsoft.workloads/sapvirtualinstances/databaseinstances
Recommendation ID: 7ce9ff70-f684-47a2-b26f-781f80b1bccc
Subcategory: Other

Set the expected votes parameter to 2 in HA enabled SAP workloads (SUSE)

Set the expected votes parameter to '2' in the cluster configuration in HA enabled SAP workloads to ensure a proper quorum, resilience, and data consistency.

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability for SAP HANA on Azure VMs on SLES

ResourceType: microsoft.workloads/sapvirtualinstances/databaseinstances
Recommendation ID: 37240e75-9493-433a-8671-2e2582584875
Subcategory: HighAvailability

Set the two_node parameter to 1 in the cluster configuration in HA enabled SAP workloads

For a two node HA cluster, set the quorum parameter 'two_node' to 1 as recommended for SAP on Azure.

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability for SAP HANA on Azure VMs on SLES

ResourceType: microsoft.workloads/sapvirtualinstances/databaseinstances
Recommendation ID: 41cd63e2-69a4-4a4f-bb69-1d3f832001f9
Subcategory: HighAvailability

Enable the 'concurrent-fencing' parameter in the cluster configuration in HA enabled SAP workloads

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability for SAP HANA on Azure VMs on SLES

ResourceType: microsoft.workloads/sapvirtualinstances/databaseinstances
Recommendation ID: d763b894-7641-4c5d-9bc3-6f2515a6eb67
Subcategory: Other

Ensure there is one instance of fence_azure_arm in the Pacemaker configuration for HANA DB HA setup

If you're using Azure fence agent for fencing with either managed identity or service principal, ensure that one instance of fence_azure_arm (an I/O fencing agent for Azure Resource Manager) is in the Pacemaker configuration for HANA DB HA setup for high availability.

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability for SAP HANA on Azure VMs on SLES

ResourceType: microsoft.workloads/sapvirtualinstances/databaseinstances
Recommendation ID: 1f4b5e87-69e9-470a-8245-f337fd0d5528
Subcategory: HighAvailability

Set stonith-timeout to 900 in Pacemaker configuration with Azure fence agent for HANA DB HA setup

If you're using the Azure fence agent for fencing with either managed identity or service principal, ensure reliable function of the Pacemaker for HANA DB HA setup, by setting the 'stonith-timeout' to 900.

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability for SAP HANA on Azure VMs on SLES

ResourceType: microsoft.workloads/sapvirtualinstances/databaseinstances
Recommendation ID: 943f7572-1884-4120-808d-ac2a3e70e33a
Subcategory: HighAvailability

Ensure that the softdog config file is in the Pacemaker configuration for HANA DB in SAP workloads

The softdog timer is loaded as a kernel module in Linux OS. This timer triggers a system reset if it detects that the system is hung. Ensure that the softdog configuration file is created in the Pacemaker cluster for HANA DB HA setup.

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability for SAP HANA on Azure VMs on SLES

ResourceType: microsoft.workloads/sapvirtualinstances/databaseinstances
Recommendation ID: 63233341-73a2-4180-b57f-6f83395161b9
Subcategory: HighAvailability

Ensure the softdog module is loaded in Pacemaker in ASCS HA setup in SAP workloads

The softdog timer is loaded as a kernel module in Linux OS. This timer triggers a system reset if it detects that the system is hung. First ensure that you created the softdog configuration file, then load the softdog module in the Pacemaker configuration for HANA DB HA setup.

Potential benefits: Reliability of HA setup in SAP workloads

Impact: High

For more information, see High availability for SAP HANA on Azure VMs on SLES

ResourceType: microsoft.workloads/sapvirtualinstances/databaseinstances
Recommendation ID: b27248cd-67dc-4824-b162-4563adaa6d70
Subcategory: HighAvailability

Next steps

Learn more about Reliability - Microsoft Azure Well Architected Framework

Share via

Reliability recommendations

AgFood Platform

Upgrade to the latest ADMA DotNet SDK version

Upgrade to the latest FarmBeats API version

Upgrade to the latest ADMA Python SDK version

Upgrade to the latest ADMA JavaScript SDK version

API Management

Migrate API Management service to stv2 platform

Hostname certificate rotation failed

The legacy portal was deprecated 3 years ago and retired in October 2023. However, we are seeing active usage of the portal which may cause service disruption soon when we disable it.

Dependency network status check failed

SSL/TLS renegotiation blocked

Deploy an Azure API Management instance to multiple Azure regions for increased service availability

Enable and configure autoscale for API Management instance on production workloads.

App Service Certificates

Domain verification required to issue your App Service Certificate

App Service

Verify contact information for App Service Domain

Scale out your App Service plan

Scale out your App Service plan to avoid CPU exhaustion

Check your app's service health issues

Fix the backup database settings of your App Service resource

Fix the backup storage settings of your App Service resource

Scale up your App Service plan SKU to avoid memory problems

Fix application code, a worker process crashed due to an unhandled exception

Upgrade your App Service to a Standard plan to avoid request rejects

Move your App Service resource to Standard or higher and use deployment slots

Use deployment slots for your App Service resource

Consider changing your application architecture to 64-bit

Consider upgrading the hosting plan of the Static Web App(s) in this subscription to Standard SKU.

Application Gateway for Containers

Migrate to supported version of AGC

Application Gateway

Upgrade your SKU or add more instances

Avoid hostname override to ensure site integrity

Change subnet of V1 gateway as the current subnet contains a NAT gateway

Deploy your Application Gateway across Availability Zones

Update VNet permission of Application Gateway users

Ensure autoscaling is used for increased performance and resiliency

Change subnet of V1 gateway named GatewaySubnet as it's reserved for VPN/Express Route

Reactivate the Subscription to unblock internal upgrade for V1 gateway

Implement ExpressRoute Monitor on Network Performance Monitor

Use managed TLS certificates

Consider having at least two origins

Use the same domain name on Front Door and your origin

Avoid placing Traffic Manager behind Front Door

Resolve issues for private endpoint not in succeeded state

Add at least one more endpoint to the profile, preferably in another Azure region

Add an endpoint configured to "All (World)"

Add or move one endpoint to another Azure region

ExpressRoute IP routes nearing specified limit

Implement multiple ExpressRoute circuits in your Virtual Network for cross premises resiliency

Move to production gateway SKUs from Basic gateways

Enable Active-Active gateways for redundancy

Implement Site Resiliency for ExpressRoute

Implement Zone Redundant ExpressRoute Gateways

Use NAT gateway for outbound connectivity

Azure AI Search

Create a Standard search service (2GB)

Create a Standard search service (50MB)

Avoid exceeding your available storage quota by adding more partitions

Azure Arc-enabled Kubernetes Configuration

Upgrade Microsoft Flux extension to the newest major version

Upcoming Breaking Changes for Microsoft Flux Extension

Upgrade Microsoft Flux extension to a supported version

Azure Arc-enabled Kubernetes

Upgrade to the latest agent version of Azure Arc-enabled Kubernetes

Azure Arc-enabled servers

Upgrade to the latest version of the Azure Connected Machine agent

Azure Cache for Redis

Increase fragmentation memory reservation

Configure geo-replication for Cache for Redis instances to increase durability of applications

Azure Container Apps

Renew custom domain certificate

An issue has been detected that is preventing the renewal of your Managed Certificate.

Increase the minimal replica count for your containerized application

Re-create your your Container Apps environment to avoid DNS issues

Azure Cosmos DB

Configure Azure Cosmos DB containers with a partition key