Jaa


The Mystery of the stuck alerts

Occasionally I run across an issue that requires immediate action be taken to resolve the problem while any root cause analysis is an afterthought.  Last week I handled such an issue. 

My customer has a connector that was continually sending around 20 alerts from OpsMgr to an event correlation product.  There are many different monitoring solutions that send data to this product and the connector was flooding the event correlation product with the same 20 alerts which resulted in plugging up their alert ticketing system.  We consulted with the 3rd party connector vendor for a few hours but progress was a bit slow so I tried to think of a way, from an OpsMgr perspective, to make this stop occurring. 

I consulted a co-worked, Tim "god of everything connector" Helton, and he came up with an idea.  His idea involved installing a bogus connector, changing the ConnectorId property on the "stuck" alerts to the bogus connector, and uninstalling the bogus connector.  Since you can't change the ConnectorId to NULL through the SDK you have to install the bogus connector first.  Once the bogus connector is uninstalled the ConnectorId will then change to NULL.

I started by getting a bogus connector.  I used a simple connector from a class that Tim taught but you can easily create your own.  Ambrose Wong wrote a good connector guide which contains some sample code for creating a connector.

I then installed the connector in my lab and obtained the GUID of the connector.  Next I created a PS script to change the ConnectorId on all those pesky alerts.  Here is my script:

##The alertID’s were ID’s of stuck alerts
##The Connector ID is Tim's demo connector

##This function does the set
function SetAlertDestination([string]$s)
{
$alert = get-alert | where {$_.ID -eq $s}
Set-AlertDestination -Alert $alert -Connector $connector
}

##Get the bogus connector using its GUID
$connector = get-connector | where {$_.ID -eq 'aaaaaaaa-1111-2222-3333-0305e82c3301'}

##Set it to the bogus connector
SetAlertDestination "5ce5fd8f-790e-4e87-aabc-c0aae3f935a6"
SetAlertDestination "20087c39-6669-4791-910d-14e9d89d5650"
SetAlertDestination "1e6362a8-cb06-4f2a-a4ec-90e2a1b6fa24"
SetAlertDestination "c5a748e8-d299-4743-b77d-75279716c6ea"
SetAlertDestination "a33c0582-f1b9-42de-9639-60057641a261"
SetAlertDestination "6ee1f643-5589-4481-a38e-b8d36b57b5cc"
SetAlertDestination "e93e0bf5-0563-4edc-b8df-fa0bc4f96e0a"
SetAlertDestination "339510ea-7dee-482c-9388-398f7982c72d"
SetAlertDestination "2703c193-3712-4eb4-9d16-5754a4b4d646"
SetAlertDestination "f1474a98-1383-4173-89b4-3f98dfa7a58b"
SetAlertDestination "6c70a0bf-0ca3-4039-9195-a4640d2edc01"
SetAlertDestination "97331f49-b535-4edb-ba83-722ed0c36f4d"
SetAlertDestination "203ff402-a3fd-4fe4-b672-2ff30449011a"
SetAlertDestination "4855f316-9bcf-447f-8888-5bd19106abd5"
SetAlertDestination "a0716409-3d8e-42a4-b70f-ecbf9dbfec92"
SetAlertDestination "3edc53e6-2d43-4a4d-98ce-b50261fe63e5"
SetAlertDestination "7b6dfd88-1d59-46d6-bbc9-351bd04e1349"
SetAlertDestination "56fea496-f4fe-4563-9b6e-2e614d75b601"
SetAlertDestination "3a71c969-7eb8-49fb-904f-6b9673d52396"
SetAlertDestination "fb51278a-8b01-4484-9470-d2bd1a1c9316"

After doing some testing in my lab I had my customer install the demo connector and run the script.  The script ran and did end up changing the ConnectorId of the stuck alerts and the connector stopped trying to forward them.  The connector was then uninstalled and the ConnectorId property went to NULL on all the bad alerts.  We don't know the root cause here, but at least all is now well in ticketing system.