Freigeben über


You might miss "Health Service Heartbeat Failure" alerts when reset monitors automatically.

Background:
If you close an alert in SCOM that was raised by a monitor, the underlying monitor still persists in unhealthy state and will not generate a new alert until it changes to health state and detects an unhealthy state again.

Official recommendation is "Do not close any alerts raised by an monitor".
In reality we all know that is nearly impossible because some admin sometimes will not take care about the difference between alerts raised by a monitor or a rule.

More and more customer implement any kind of monitor reset scripts or tools. In general there are two ways to solve the gap.

1.) Running a script to "reactivate" the closed alert when it was closed but the Monitor is still unhealthy.
2.) Running a PowerShell script to reset the underlying monitor

There are several community solutions (PowerShell scripts or Orchestrator/SMA Runbooks) available.

Potential Issue:
But there is one important issue you should be aware of in case you reset the monitors.

The Monitor called "Health Service Heartbeat Failure" will not recalculate its state after resetting.
This monitor is responsible for two different alerts "Health Service Heartbeat Failure" and "Failed to Connect to Computer"

If you reset the state of the monitor called "Health Service Heartbeat Failure" you need to recalculate the monitor state additionally because otherwise the monitor stays in healthy state und will not create a new alert until the agent comes back online and failed to heartbeat again.

You need to recalculate the monitor called "Health Service Heartbeat Failure" in the same script you reset it. This recalculation will also set the state of the second monitor called "Failed to Connect to Computer" to critical and raise an new alert for this monitor too.

Typically customers uses the following method to reset the monitor.
PartialMonitoringObject.ResetMonitoringState ($monitor) https://msdn.microsoft.com/en-us/library/bb424488.aspx

To recalculate the same monitor you can use the following method.
PartialMonitoringObject.RecalculateMonitoringState  ($monitor)
https://msdn.microsoft.com/en-us/library/cc136340.aspx

Using this you should receive a new heartbeat alert after the monitor resets.

As an example I'll show you such a reset/recalculate script.

#  '*************************************************************************************************************
#  ' Disclaimer
#  '
#  ' This sample is not supported under any Microsoft standard support program or service. This sample
#  ' is provided AS IS without warranty of any kind. Microsoft further disclaims all implied warranties
#  ' including, without limitation, any implied warranties of merchantability or of fitness for a particular
#  ' purpose. The entire risk arising out of the use or performance of this sample and documentation
#  ' remains with you. In no event shall Microsoft, its authors, or anyone else involved in the creation,
#  ' production, or delivery of this sample be liable for any damages whatsoever (including, without limitation,
#  ' damages for loss of business profits, business interruption, loss of business information, or other
#  ' pecuniary loss) arising out of the use of or inability to use this sample or documentation, even
#  ' if Microsoft has been advised of the possibility of such damages.
#  '
#  '*************************************************************************************************************

# Import Operations Manager Module and create Connection
Import-Module OperationsManager
New-SCOMManagementGroupConnection -ComputerName localhost

# gets all closed alerts, raised by monitors, closed by any other account than "System" and resolved within the last hour.
$alerts = Get-SCOMAlert -Criteria "ResolutionState = 255 and IsMonitorAlert =1 AND ResolvedBy <> 'System'" | ? { ($_.TimeResolved -gt (((Get-date).ToUniversalTime()).addhours(-1))) }

Foreach ($alert in $alerts)
{
If($alert -ne $null)
{
Write-host -ForegroundColor Yellow "AlertName :" $alert.Name
# Get IDs
$monitorRuleId = $alert.MonitoringRuleId;
$monitorObjectId = $alert.MonitoringObjectId;

# Get Objects
$monitor = Get-SCOMMonitor -Id $monitorRuleId;
$monitoringObject = Get-SCOMMonitoringObject -Id $monitorObjectId;

# Reset Monitor
if($monitoringObject -ne $null)
{
$monitors = new-object "System.Collections.Generic.List[Microsoft.EnterpriseManagement.Configuration.ManagementPackMonitor]";
$monitors.Add($monitor);

$healthState = $monitoringObject.GetMonitoringStates($monitors)[0];

if($healthState.HealthState -eq "Error" -or $healthState.HealthState -eq "Warning")
{
$result = $monitoringObject.ResetMonitoringState($monitor)
Write-Host "Reset:" $result.Status

If (($result.Status -eq "Succeeded") -and ($alert.MonitoringRuleID -eq "b59f78ce-c42a-8995-f099-e705dbb34fd4"))
{
$result = $monitoringObject.RecalculateMonitoringState($monitor)
Write-Host "Recalculate:" $result.Status

}
}
}
}
}

# ---------------------------------------------------------------------------------------------------------

As an alternative you can use the built-in Recalculate Health task you can see in Health-Explorer trigger it via PowerShell.

Daniel_2016-03-18 09_51_53-Start

That's not that difficult as it sounds.
As you know there is an build-in task in Health Explorer to trigger a recalculation of a selected monitor.
If you click on that "Recalculate Health" button, SCOM triggers an internal task that is send to the agent to trigger the On-demand detection.

This SCOM task is called "System.Health.TriggerState"

To use this task in any PowerShell Script you can simply add the following snippet to your reset script.

    $RecalculateTask = Get-ScomTask -Name System.Health.TriggerState
    If ($alert.MonitoringRuleID -eq "b59f78ce-c42a-8995-f099-e705dbb34fd4")
      {
      Start-scomtask -task $RecalculateTask -Instance $monitoringObject
      }