How to generate an alert in Microsoft OMS when a computer is down or unreachable
Hi everyone, we have been seeing a lot of votes for "Alert if the Machine is down/Unreachable" in the User Voice for Microsoft Operations Management Suite (OMS) so while we’re working on this feature, I thought I would go ahead and share a workaround for this that you can use in the meantime. As you’ll see, it’s actually fairly simple to make OMS send an email alert when the machine is down.
So the main question here is, “How do we check if the machine is down?” We can’t just check whether we’re receiving a heartbeat from the server because the agent service (Microsoft Monitoring Agent), which is sending the heartbeat, might be down and not the machine? To deal with that scenario, we must also ping the server itself.
With that in mind, let’s break it down into two parts:
- Checking the heartbeat
- Pinging the server
Check the heartbeat
This is fairly simple. All we need to do is go to Log Search in the OMS portal and search for Heartbeat. You can either write the query for this, or you can use the portal to do it. Heartbeat is a type of log, and you can select it under the types on the left side. Here is an example:
- Go to the OMS portal.
- Go to Log Search.
- Type * in the search box and search so that it will search for all the logs.
- Now you will see this in the left hand side:
- Now select Heartbeat here:
- Now you can see the heartbeat for all of the computers that are monitored by OMS.
- You will see the computer names on the left side, so go ahead and select the computer that you want to monitor:
- Now the query in the search box will look something like this:
* (Type=Heartbeat) (Computer=OMSTESTRC)
This will search for all logs with that computer name and type as Heartbeat.
- To optimize our query, remove the *. Now we can generate an alert for this by clickingAlert:
- This will take you to the Add Alert Rule page with the query already in place:
- Give it a Name, Description and Severity.
- Set Time window and Generate alert based on to a value greater than 2. This is because Microsoft Monitoring Agent sends a heartbeat to OMS every minute (every 60 seconds). If we are missing more than two heartbeats, it will generate this alert. Usually we recommend setting this to 5 minutes.
Now we have alerts generated whenever we don't receive a heartbeat for a certain amount of time, so next we need to look at how we can ping the computer using OMS.
Pinging the server
To ping the computer, we need to have access to the local (on-premises) environment so that we can reach the computer. To automatically ping the computer after our alert is generated, we have the option of attaching a Runbook to the alert. Here is where you see it on the Alert page:
NOTE: To attach a runbook, you will need an Automation account first. For more information, see How to Setup and Configure Microsoft Azure Automation Runbooks.
There are two types of runbooks that you can attach: An Azure Runbook and a Hybrid Runbook Worker.
So this should answer our first question, which is “How do I get access of the local (on-premise) environment? ” The answer is a Hybrid Runbook Worker!
What is Hybrid Runbook Worker? Runbooks in Azure Automation cannot access resources in your local data center because they run in the Azure cloud. The Hybrid Runbook Worker feature of Azure Automation allows you to run Runbooks on computers that are located in your datacenter to manage local resources. These Runbooks are stored and managed in Azure Automation and then delivered to one or more on-premises computers.
To read more about the Hybrid Runbook Worker, please visit https://azure.microsoft.com/en-us/documentation/articles/automation-hybrid-runbook-worker/.
That documentation also shows how to configure it in your environment so I will not go into depth on that here.
Now, after you have the Hybrid Runbook Worker feature, the question becomes “How do I ping the computer? ” Because there aren't many cmdlets to ping a server using Azure Automation, I went ahead and wrote a script to do that. This script not only pings the computer, but it also sends an email with the results. This is a PowerShell workflow script, and you can modify the script for your needs and requirements.
The script
workflow Ping { InlineScript { function send-email([string]$R){
Write-output "Ping was successful" Write-output "Sending email" $Output = $Server+" is Reachable" $Emailoutput = $output $CredUser = "ping@alert.com" $CredPassword ="Password" $SMTPServer = "smtp-mail.outlook.com" $EmailTo = "abc@xyz.com" $Subject = "VM Status" $emailCreds = New-Object System.Management.Automation.PSCredential($CredUser,(ConvertTo-SecureString -String $CredPassword -AsPlainText -Force)) Send-MailMessage -from $CredUser -to $EmailTo -Subject $Subject -Body $Emailoutput -Credential $emailCreds -SmtpServer $SMTPServer -usessl -port 587
}
function send-emailUR([string]$UR){
Write-output "Ping unsuccessful" Write-output "Sending email" $Output = $Server+" is Unreachable" $Emailoutput = $output $CredUser = "ping@alert.com" $CredPassword ="Password" $SMTPServer = "smtp-mail.outlook.com" $EmailTo = "abc@xyz.com" $Subject = "VM Status" $emailCreds = New-Object System.Management.Automation.PSCredential($CredUser,(ConvertTo-SecureString -String $CredPassword -AsPlainText -Force)) Send-MailMessage -from $CredUser -to $EmailTo -Subject $Subject -Body $Emailoutput -Credential $emailCreds -SmtpServer $SMTPServer -usessl -port 587
}
#This function will perform a simple, small size #single packet ping of a machine and return
function ping-host([string]$server) { if ([string]::IsNullOrEmpty($server) ) {return $false}
#ping first for reachability check
$po = New-Object net.NetworkInformation.PingOptions $po.set_ttl(64) $po.set_dontfragment($true) [Byte[]] $pingbytes = (65,72,79,89) $ping = new-object Net.NetworkInformation.Ping $savedEA = $Erroractionpreference $ErrorActionPreference = "silentlycontinue" $pingres = $ping.send($server, 1000, $pingbytes, $po) if (-not $?) {return $false} $ErrorActionPreference = $savedEA
#Calling email functions if ($pingres.status -eq "Success") { send-email abc } else { send-emailUR bcd }
}
#Calling the Ping function ping-host ServerName
}
}
When you use this script, just be sure to modify the appropriate sections for your environment.
Once you have this Runbook in the Automation account, go ahead and attach it as a Hybrid Runbook Worker, and you should be good to go. Now you will receive emails for heartbeat not found and Server Unreachable messages as well!
I hope you found this helpful, and any feedback would be appreciated.
I’d also like to extend special thanks to my colleague Jeff Fanjoy for helping me and mentoring me through this process.
Cheers!
Rupanter Chhabra
Support Engineer
Azure | Manageability & Security Division
Customer Service & Support (CSS)
Comments
- Anonymous
September 08, 2016
Excellent Guys. It will be very helpful me and totally new to me. Let me try to implement in environment. - Anonymous
September 08, 2016
The comment has been removed- Anonymous
September 09, 2016
Hello Kasun,You will need Hybrid worker in any one machine of your network, then you can use Hybrid worker feature to run this script in Automation.Thanks
- Anonymous
- Anonymous
September 09, 2016
I'm confused by your alert configuration. The way it is configured it is going to throw an alert every 15 minutes because it is providing a heartbeat not because it isn't...Your screenshot even shows 14 results for your current query.If you have a health server should your results not be 0 for that 15 minute period?My apologies if I'm misunderstanding.- Anonymous
September 09, 2016
Hello Dave,Screenshots are just to show you where and how things can be done, everything can be changed as per you configuration and needs.Let me explain :The query shows results because my server was sending heartbeat, it shows results from past 7 days by default.Also, I mentioned below the image so as to how you can configure the Time window and Generate alert based on to 2 minutes. You can make changes to that as per your needs. Every minute there should be a heartbeat, we might skip one or two so keeping it at 5 is ideal, Time window = 5, Count < 5 then it will generate an alert.Thanks- Anonymous
September 12, 2016
Thanks Rupanter, I think my confusion was with 'This search returned x results for the time window selected'. I think I'm just figuring out now that those results represent the query based on the time window and do not take in to consideration the schedule settings. - Anonymous
September 12, 2016
FYI prior to this I have been using the following query:Type=Heartbeat | measure max(TimeGenerated) as LastCall by Computer | where LastCall < NOW-5MINUTEWorks quite nicely as well. I will try it in conjunction with your ping routine.- Anonymous
April 19, 2017
Hi Dave,I have been using the same but with a slight change in the time difference, however, there is no alert triggered though there were servers down many instances. please help me if there is a different way you have used. I am using this Type=Heartbeat | measure max(TimeGenerated) as LastCall by Computer | Where LastCall<NOW-8MINUTES
- Anonymous
- Anonymous
- Anonymous
- Anonymous
September 13, 2016
May be there is a difference in heartbeat between my environment which uses SCOM and the MMA agent ( I don't think there is) but I set up my alert to mirror the blog and got a bunch of SPAM alerts. This is because the query of a computer's heartbeat events will return results as long as it is up in the time frame. So instead of setting your alert to alert based on to a results value greater than 2 you need to set to a value of less than your down time sensitivity. So if you want an alert if has been down for 15 minutes you may want to check every 10 minutes for results of less than 3. The at means that you will get alert if the server misses 7 heartbeats in 10 minutes. A second alert will confirm the server was down for 15 minutes. If you do less than 0 for 15 your server could be down for up 29 minutes before you receive an alert.- Anonymous
September 13, 2016
Also, if you are using SCOM make sure your query reflects where the heartbeat came from like so Computer="SVR01.contoso.com" Type=Heartbeat Direct. MMA/SCOM appears to double up the heartbeat so every 15 mins I was getting around 30 heartbeats for one server
- Anonymous
- Anonymous
November 28, 2016
Hi, I would like to turn on alert when my internal page in iis in my own sever fall down. How can I configure oms ? I configured alert in my server like this Type=Heartbeat | measure max(TimeGenerated) as LastCall by Computer | where LastCall < NOW-5MINUTE - Anonymous
August 20, 2017
Hi Rupanter, Thanks for the article and I found this very useful. But I just wanted to highlight that there is a mistake in the step "Set Time window and Generate alert based on to a value greater than 2. This is because Microsoft Monitoring Agent sends a heartbeat to OMS every minute (every 60 seconds). If we are missing more than two heartbeats, it will generate this alert. Usually we recommend setting this to 5 minutes." wherein we have to set the value to "less than 2" as we have to get the alert when a system misses its heartbeat. I have tested this and I get alert when the VM goes down and not while it is UP and running. So please make the changes in the article.