Reporting on SCOM Data: Piecing the data together (Part 2)
In my last blog post, Reporting on SCOM Data: Piecing the data together (Part 1), I discussed collecting host details from the SCOM DW db. This provided a great foundation for correlating Health State and Alert data into the existing hierarchy that our customer wanted to see. In this post, I'll focus on querying the Health State and Alert aggregates by host and updating our existing tables to enable reporting on that data. This data is updated every 10 minutes in our reporting database. Again, a huge shout out to Jimmy Harper for his help with the queries and imparting his extensive SCOM knowledge on me. :)
Here are the steps that we followed to populate the Health State and Alert counts for each host:
- Populate the Host_HealthLoad table with current Health State (from the SCOM OperationsManager db) for each host.
- Populate the Host_AlertCountLoad table with the current count of Warning and Critical Alerts (from the SCOM DW db) for each host.
- Update the Host_HealthLoad table with on Alert counts from the Host_AlertCountLoad table.
- Update the Host table with current Health State and Alert Counts from the Host_HealthLoad table.
First, we need our Host_HealthLoad table. This is our loading/staging table for Health State & Alert Counts for each host.
CREATE TABLE Host_HealthLoad (
[HealthLoadId] [int] IDENTITY(1,1) NOT NULL,
[DnsHostName] [nvarchar](255) NULL,
[EntityHealth] [tinyint] NULL,
[AvailabilityHealth] [tinyint] NULL,
[ConfigurationHealth] [tinyint] NULL,
[PerformanceHealth] [tinyint] NULL,
[SecurityHealth] [tinyint] NULL,
[WarningOpenAlertCount] [int] NULL,
[CriticalOpenAlertCount] [int] NULL
)
To populate the Host_HealthLoad table with Health State data, we used the following query against the SCOM OperationsManager db in an SSIS package scheduled every 10 minutes (all other queries in this blog are tasks in the same SSIS package). This query leverages the State table and MonitorView view to retrieve the current Health State of specific Monitor Categories. In our case, we're looking for following monitor categories for each host: "Entity Health" (which is used to display Overall Health of the host), "Availability", "Configuration", "Performance", and "Security". We're also setting WarningOpenAlertCount and CriticalOpenAlertCount equal to zero (0) in this query since we're populating them from the SCOM DW db (more on that in a bit). It's also important to note that we're truncating the Host_HealthLoad table immediately before inserting this results of this query into it.
SELECT
DnsHostName = bme.DisplayName,
EntityHealth =
(SELECT TOP 1
s.HealthState
FROM
State s WITH (NOLOCK)
INNER JOIN
MonitorView mv WITH (NOLOCK)
ON mv.Id = s.MonitorId
WHERE
s.BaseManagedEntityId = bme.BaseManagedEntityId
AND mv.DisplayName = 'Entity Health'),
AvailabilityHealth =
(SELECT TOP 1
s.HealthState
FROM
State s WITH (NOLOCK)
INNER JOIN
MonitorView mv WITH (NOLOCK)
ON mv.Id = s.MonitorId
WHERE
s.BaseManagedEntityId = bme.BaseManagedEntityId
AND mv.DisplayName = 'Availability'),
ConfigurationHealth =
(SELECT TOP 1
s.HealthState
FROM
State s WITH (NOLOCK)
INNER JOIN
MonitorView mv WITH (NOLOCK)
ON mv.Id = s.MonitorId
WHERE
s.BaseManagedEntityId = bme.BaseManagedEntityId
AND mv.DisplayName = 'Configuration'),
PerformanceHealth =
(SELECT TOP 1
s.HealthState
FROM
State s WITH (NOLOCK)
INNER JOIN
MonitorView mv WITH (NOLOCK)
ON mv.Id = s.MonitorId
WHERE
s.BaseManagedEntityId = bme.BaseManagedEntityId
AND mv.DisplayName = 'Performance'),
SecurityHealth =
(SELECT TOP 1
s.HealthState
FROM
State s WITH (NOLOCK)
INNER JOIN
MonitorView mv WITH (NOLOCK)
ON mv.Id = s.MonitorId
WHERE
s.BaseManagedEntityId = bme.BaseManagedEntityId
AND mv.DisplayName = 'Security'),
WarningOpenAlertCount = 0,
CriticalOpenAlertCount = 0
FROM
BaseManagedEntity bme WITH (NOLOCK)
INNER JOIN
ManagedType mt WITH (NOLOCK)
ON mt.ManagedTypeId = bme.BaseManagedTypeId
WHERE
bme.IsDeleted = 0
AND mt.TypeName = 'Microsoft.Windows.Computer'
Next, we'll need to populate our Host_AlertCountLoad table. Here's the schema for that table:
CREATE TABLE Host_AlertCountLoad (
[AlertCountLoadId] [int] IDENTITY(1,1) NOT NULL,
[DnsHostName] [nvarchar](255) NULL,
[Severity] [tinyint] NULL,
[AlertCount] [int] NULL
)
To improve efficiency of the query, we decided to UNION the results for Warning and Critical Alert counts that we're retrieving from the SCOM DW db. This is also the reason that we update the Host_HealthLoad table with these counts after we've inserted them into the Host_AlertCountLoad table. In addition, by breaking out the Health State data load from the Alert data load, we were able to support pulling these data sets from different data sources (e.g. - OperationsManager db & DW db).
This query leverages the vAlert, vAlertResolutionState, and vManagedEntity views. The results are inserted into the Host_AlertCountLoad table. Like the Host_HealthLoad table, it's important to note that we're truncating the Host_AlertCountLoad table immediately before inserting this results of this query into it.
DECLARE @ParamDuration INT = 7 -- we only go back 7 days
SELECT
DnsHostName = a.Path,
Severity = 1,
AlertCount = COUNT(a.AlertGuid)
FROM
(SELECT
me2.Path,
a.AlertGuid,
a.AlertName,
a.Severity,
a.DBCreatedDateTime
FROM
Alert.vAlert a WITH (NOLOCK)
INNER JOIN
(SELECT
ars.AlertGuid,
CurrentResolutionState = MAX(ars.ResolutionState)
FROM
Alert.vAlertResolutionState ars WITH (NOLOCK)
GROUP BY
ars.AlertGuid) crs
ON crs.AlertGuid = a.AlertGuid
INNER JOIN
vManagedEntity me2 WITH (NOLOCK)
ON me2.ManagedEntityRowId = a.ManagedEntityRowId
WHERE
a.Severity = 1
AND a.DBCreatedDateTime > (GETUTCDATE() - @ParamDuration)
AND crs.CurrentResolutionState = 0) a
GROUP BY
a.Path
UNION
SELECT
DnsHostName = a.Path,
Severity = 2,
AlertCount = COUNT(a.AlertGuid)
FROM
(SELECT
me2.Path,
a.AlertGuid,
a.AlertName,
a.Severity,
a.DBCreatedDateTime
FROM
Alert.vAlert a WITH (NOLOCK)
INNER JOIN
(SELECT
ars.AlertGuid,
CurrentResolutionState = MAX(ars.ResolutionState)
FROM
Alert.vAlertResolutionState ars WITH (NOLOCK)
GROUP BY
ars.AlertGuid) crs
ON crs.AlertGuid = a.AlertGuid
INNER JOIN
vManagedEntity me2 WITH (NOLOCK)
ON me2.ManagedEntityRowId = a.ManagedEntityRowId
WHERE
a.Severity = 2
AND a.DBCreatedDateTime > (GETUTCDATE() - @ParamDuration)
AND crs.CurrentResolutionState = 0) a
GROUP BY
a.Path
Now that we have the Alert counts in the Host_AlertCountLoad table, we simply update the Host_HealthLoad table with those results.
BEGIN TRAN
UPDATE
hl
SET
hl.WarningOpenAlertCount =
ISNULL((SELECT
SUM(acl.AlertCount)
FROM
Host_AlertCountLoad acl
WHERE
acl.ServerName = hl.DnsHostName
AND acl.Severity = 1), 0)
FROM
Host_HealthLoad hl
UPDATE
hl
SET
hl.CriticalOpenAlertCount =
ISNULL((SELECT
SUM(acl.AlertCount)
FROM
Host_AlertCountLoad acl
WHERE
acl.ServerName = hl.DnsHostName
AND acl.Severity = 2), 0)
FROM
Host_HealthLoad hl
COMMIT TRAN
With a loaded and updated Host_HealthLoad table, we simply update the Host table with our current values.
UPDATE
h
SET
h.EntityHealth = l.EntityHealth,
h.AvailabilityHealth = l.AvailabilityHealth,
h.ConfigurationHealth = l.ConfigurationHealth,
h.PerformanceHealth = l.PerformanceHealth,
h.SecurityHealth = l.SecurityHealth,
h.WarningOpenAlertCount = l.WarningOpenAlertCount,
h.CriticalOpenAlertCount = l.CriticalOpenAlertCount
FROM
Host h WITH (NOLOCK)
INNER JOIN
Host_HealthLoad l WITH (NOLOCK)
ON l.DnsHostName = h.DnsHostName
We've now got the data to support a picture of SCOM Health State and Alert counts across the entire SCOM enterprise being updated every 10 minutes. You can review my first blog post (Reporting on SCOM Data: A Single Pane of Glass) in this series for more details, but here are the types of visualizations that we created using this data.
Health State Map
Drill down for Health State by Monitor Category
Host-level details for Health State and Alert count, linking off to the SCOM Web Console for more in-depth Alert and Performance Monitoring details.
In summary, we've been able to provide a "Single Pane of Glass" for Health State and Alert data across our customer's enterprise. These reports have been incredibly well received by all levels of the customer and has enabled us to increase our new work efforts with them.
I hope you've found these posts helpful and hope to hear from some of you that may have implemented some or all of the solution, along with any improvements that you've made.
Viel Spass!
Joseph