Troubleshoot solution updates for Azure Local, version 23H2
Applies to: Azure Local, version 23H2
This article describes how to troubleshoot solution updates that are applied to your Azure Local to keep it up-to-date.
About troubleshooting updates
If your system was created via a new deployment of Azure Local, version 23H2, then an orchestrator was installed during the deployment. The orchestrator manages all of the updates for the platform - OS, drivers and firmware, agents and services.
The new update solution includes a retry and remediation logic. This logic attempts to fix update issues in a non-disruptive way, such as retrying a Cluster-Aware Update (CAU) run. If an update run can't be remediated automatically, it fails. When an update fails, Microsoft recommends inspecting the details for the failure message to determine the appropriate next action. You can attempt to resume the update, if appropriate, to determine if a retry will resolve the issue.
Troubleshoot readiness checks
Readiness checks are essential to ensure that you apply updates smoothly, keep your systems up-to-date, and maintain correct system functionality. Readiness checks are performed and reported separately in two scenarios:
System health checks that run once every 24 hours.
Update readiness checks that run after downloading the update content and before beginning the installation.
It is common for the results of system health checks and update readiness checks to differ. This happens because update readiness checks use the latest validation logic from the solution update to be installed, while system health checks always use validation logic from the installed version.
Both system and pre-update readiness checks perform similar validations and categorize three types of readiness checks: Critical, Warning, and Informational.
- Critical: Readiness checks that prevent you from applying the update. This status indicates issues that you must resolve before proceeding with the update.
- Warning: Readiness checks that also prevent you from applying the update, but you can bypass these using PowerShell. This status indicates potential issues that might not be severe enough to stop the update but should be addressed to ensure a smooth update process.
- Informational: Readiness checks that don't block the update. This status provides information about the system's state and any potential issues that shouldn't affect the update process directly. These checks are for your awareness and might not require immediate action.
The troubleshooting steps differ depending on which scenario the readiness checks are from.
Using Azure portal
Scenario 1: System health checks
This scenario occurs when preparing to install system updates in Azure Update Manager:
In the system list, view the Critical state of Update readiness.
Select one or more systems from the list, then select One-time Update.
On the Check readiness page, review the list of readiness checks and their results.
Select the View details links under Affected systems.
When the details box opens, you can view more details, individual system results, and the Remediation for failing health checks.
Follow the remediation instructions to resolve the failures.
Note
The system health checks run every 24 hours, so it may take up to 24 hours for the new results to sync to the Azure portal after remediating the failures. To initiate a new system health check immediately or further troubleshoot, see the PowerShell section.
Scenario 2: Update readiness checks
This scenario occurs when installing and tracking system updates in Azure Update Manager:
In History, select the failed update run from the list.
On the Check readiness page, review the list of readiness checks and their results.
Select the View details links under Affected systems.
When the details box opens, you can view more details, individual system results, and the Remediation for failing health checks.
Follow the remediation instructions to resolve the failures and then select the Try again button to retry the pre-update readiness checks and Resume the update.
To further troubleshoot, see the PowerShell section.
Using PowerShell
Scenario 1: System health checks
To troubleshoot system health checks via PowerShell:
To validate that the system health checks failed, run the following command on one of the machines in your system:
Get-SolutionUpdateEnvironment
Here's a sample output:
PS C:\Users\lcmuser> Get-SolutionUpdateEnvironment ResourceId : redmond SbeFamily : VirtualForTesting HardwareModel : Virtual Machine LastChecked : 9/12/2023 10:34:42 PM PackageVersions : {Solution: 10.2309.0.20, Services: 10.2309.0.20, Platform: 1.0.0.0, SBE: 4.0.0.0} CurrentVersion : 10.2309.0.20 CurrentSbeVersion : 4.0.0.0 LastUpdated : State : AppliedSuccessfully HealthState : Failure HealthCheckResult : {Storage Pool Summary, Storage Services Physical Disks Summary, Storage Services Physical Disks Summary, Storage Services Physical Disks Summary...} HealthCheckDate : 9/12/2023 7:03:32 AM AdditionalData : {[SBEAdditionalData, Solution Builder extension is partially installed. Please install the latest Solution Builder Extension provided by your hardware vendor. For more information, see https://aka.ms/SBE.]} HealthState : Success HealthCheckResult : {} HealthCheckDate : 8/4/2022 9:10:36 PM PS C:\Users\lcmuser>
Review the
HealthState
on your system and view theFailure
orWarning
value.To filter the
HealthCheckResult
property to identify failing tests, run the following command:$result = Get-SolutionUpdateEnvironment $result.HealthCheckResult | Where-Object {$_.Status -ne "SUCCESS"} | FL Title,Status,Severity,Description,Remediation
Here's a sample output:
Title : The machine proxy on each failover cluster node should be set to a local proxy server Status : FAILURE Severity : INFORMATIONAL Description : Validating cluster setup for update. Remediation : `https://learn.microsoft.com/en-us/windows-server/failover-clustering/cluster-aware-updating-requirements# tests-for-cluster-updating-readiness` Title : The CAU clustered role should be installed on the failover cluster to enable self-updating mode Status : FAILURE Severity : INFORMATIONAL Description : Validating cluster setup for update. Remediation : `https://learn.microsoft.com/en-us/windows-server/failover-clustering/cluster-aware-updating-requirements# tests-for-cluster-updating-readiness`
Review the
Remediation
field for the failed tests and take action as appropriate to resolve the failures.After resolving the failures, invoke the system health checks again by running the following command:
Invoke-SolutionUpdatePrecheck -SystemHealth
Use
Get-SolutionUpdateEnvironment
to confirm the failing health check failures have been resolved. It may take a few minutes for the system health checks to run.Here's a sample output:
PS C:\Users\lcmuser> Get-SolutionUpdateEnvironment | FL HealthState, HealthCheckResult, HealthCheckDate HealthState : InProgress HealthCheckResult : HealthCheckDate : 1/1/0001 12:00:00 AM PS C:\Users\lcmuser> Get-SolutionUpdateEnvironment | FL HealthState, HealthCheckResult, HealthCheckDate HealthState : Success HealthCheckResult : {Storage Pool Summary, Storage Subsystem Summary, Storage Services Summary, Storage Services Summary...} HealthCheckDate : 10/18/2024 11:56:49 PM
Scenario 2: Update readiness checks
When update readiness checks fail, this causes the update to fail on the system. To troubleshoot update readiness checks via PowerShell:
To validate that the update readiness checks failed, run the following command on one of the machines in your system:
Get-SolutionUpdate | FT Version,State,HealthCheckResult
Here's a sample output:
PS C:\Users\lcmuser> Get-SolutionUpdate | FT Version,State,HealthCheckResult Version State HealthCheckResult ------- ----- ----------------- 10.2405.2.7 HealthCheckFailed {Storage Subsystem Summary, Storage Pool Summary, Storage Services Physical Disks Summary, Stora... PS C:\Users\lcmuser>
Review the
State
for the update and view theHealthCheckFailed
value.To filter the
HealthCheckResult
property to identify failing tests, run the following command:$result = Get-SolutionUpdate $result.HealthCheckResult | Where-Object {$_.Status -ne "SUCCESS"} | FL Title,Status,Severity,Description,Remediation
Here's a sample output:
Title : The machine proxy on each failover cluster node should be set to a local proxy server Status : FAILURE Severity : INFORMATIONAL Description : Validating cluster setup for update. Remediation : https://learn.microsoft.com/en-us/windows-server/failover-clustering/cluster-aware-updating-requirements# tests-for-cluster-updating-readiness Title : The CAU clustered role should be installed on the failover cluster to enable self-updating mode Status : FAILURE Severity : INFORMATIONAL Description : Validating cluster setup for update. Remediation : https://learn.microsoft.com/en-us/windows-server/failover-clustering/cluster-aware-updating-requirements# tests-for-cluster-updating-readiness
Review the
Remediation
field for the failed tests and take action as appropriate to resolve the failures.After resolving the failures, invoke the update readiness checks again by running the following command:
Get-SolutionUpdate -Id <some ID> | Start-SolutionUpdate -PrepareOnly
Troubleshoot update failures
If there is an issue that causes an update to fail, reviewing the detailed step progress to identify where it failed is often the best way to determine if the issue is something that can be remediated through a simple repair (and resume) or a support engagement is required to resolve the issue. Key items to note for the failing step include:
Failing step name and description.
Which machine or server the step failed on (in case of a machine-specific issue).
Failure message string (may pinpoint the issue to a specific known issue with documented remediation).
Microsoft recommends using the Azure portal to identify the failing step information as shown at Resume an update. Alternatively, see the next section for how view similar details in PowerShell using Start-MonitoringActionplanInstanceToComplete
.
See the table below for update failure scenarios and remediation guidelines.
Step names | Type of issue | Remediation |
---|---|---|
Any | Power loss or other similar interruption to system during the update. | 1. Restore power. 2. Run a system health check. 3. Resume the update. |
CAU updates | Cluster Aware Update (CAU) update run fails with a max retries exceeded failure. |
If there is an indication that multiple CAU attempts have been made and that they have all failed, it is often best to investigate the first failure. Use the start and end time of the first failure to match up with the correct Get-CauReport output to further investigate the failure. |
Any | Memory, power supply, boot driver, or similar critical failure on one or more nodes. | See Repair a node on Azure Local, version 23H2 for how to repair the failing node. Once the node has been repaired the update can be resumed. |
Collect update logs
You can also collect diagnostic logs to help Microsoft identify and fix the issues.
To collect logs for updates using the Azure portal, see Resume an update.
To collect logs for the update failures see Collect diagnostic logs for Azure Local, version 23H2.
View update summary report
To view a detailed update summary report using PowerShell, follow these steps on the client that you're using to access your system:
Establish a remote PowerShell session with the machine. Run PowerShell as administrator and run the following command:
Enter-PSSession -ComputerName <machine_IP_address> -Credential <username\password for the machine>
Get all the solutions updates and then filter the solution updates corresponding to a specific version. The version used corresponds to the version of solution update that failed to install.
$Update = Get-SolutionUpdate | ? Version -eq "<Version string>" -verbose
Identify the action plan for the failed solution update run.
$Failure = $update | Get-SolutionUpdateRun
Identify the
ResourceID
for the update.$Failure
Here's a sample output:
PS C:\Users\lcmuser> $Update = Get-SolutionUpdate| ? Version -eq "10.2303.1.7" -verbose PS C:\Users\lcmuser> $Failure = $Update|Get-SolutionUpdateRun PS C:\Users\lcmuser> $Failure ResourceId : redmond/Solution10.2303.1.7/6bcc63af-b1df-4926-b2bc-26e06f460ab0 Progress : Microsoft.AzureStack.Services.Update.ResourceProvider.UpdateService.Models.Step TimeStarted : 4/21/2023 10:02:54 PM LastUpdatedTime : 4/21/2023 3:19:05 PM Duration : 00:16:37.9688878 State : Failed
Note the
ResourceID
GUID. This GUID corresponds to theActionPlanInstanceID
.View the summary for the
ActionPlanInstanceID
that you noted earlier.Start-MonitoringActionplanInstanceToComplete -actionPlanInstanceID <Action Plan Instance ID>
Here's sample output:
PS C:\Users\lcmuser> Start-MonitoringActionplanInstanceToComplete -actionPlanInstanceID 6bcc63af-b1df-4926-b2bc-26e06f460ab0
Resume an update
To resume a previously failed update run, you can retry the update run via the Azure portal or PowerShell.
The Azure portal
We highly recommend using the Azure portal, to browse to your failed update and select the Try again button. This functionality is available at the Download updates, Check readiness, and Install stages of an update run.
If you're unable to successfully rerun a failed update or need to troubleshoot an error further, follow these steps:
Select the View details of an error.
When the details box opens, you can review the error details. For more information on collecting diagnostic logs, you can click on the How to collect logs link near the Open a support ticket button.
For more information on retrieving logs, see Collect diagnostic logs for Azure Local.
Additionally, you can select the Open a support ticket button, fill in the appropriate information, and attach your logs so that they're available to Microsoft Support.
For more information on creating a support ticket, see Create a support request.
PowerShell
If you're using PowerShell and need to resume a previously failed update run, use the following command:
Get-SolutionUpdate | ? Version -eq "10.2302.0.31" | Start-SolutionUpdate
To resume a previously failed update due to update health checks in a Warning state, use the following command:
Get-SolutionUpdate | ? Version -eq "10.2302.0.31" | Start-SolutionUpdate -IgnoreWarnings
Next steps
Learn more about how to Run updates via PowerShell.
Learn more about how to Run updates via the Azure portal.