Repeated Event Monitors with a Missing Event Reset
One of the most common requests I get is to write a monitor that raises if a given event happens repeatedly in a window of time. Raising the alert is quite simple, but I've always been dissatisfied with the options for resolving the alert. Your choices are:
- Event Reset, which works well if your service has a reliable "positive" event to clear the negative event. But this is rarely the case.
- Manual Reset, which doesn't allow for the computer to confirm its own health.
- Timer Reset, which wait a pre-determined window of time before resetting the monitor.
What I really wanted was a monitor where the unhealthy trigger was a repeated event and the healthy trigger was a missing event. The solution is really just that, and does not require any custom coding. All I did was look at the module workflow in a repeated event monitor and a missing event monitor, and put the modules together in a single monitor.
Here is an example of a simple two-state monitor. Again, there is no custom code in this management pack. I'm merely putting together pre-existing modules.
The only trouble I had with the Authoring Console was validation of the System.ExpressionFilter modue. I had to edit that in notepad. For the rest of the modules the Authoring Console will complain that it can't validate your input, but you can "Ignore" your way past it.
Here are the modules:
And the module flow:
This is the only part I had to edit in Notepad. (Don't worry I'll attach the full management pack in both XML and MP formats.)
<MonitorImplementation>
<MemberModules>
<DataSource ID="DataSource" TypeID="Windows!Microsoft.Windows.BaseEventProvider">
<ComputerName>$Config/ComputerName$</ComputerName>
<LogName>$Config/LogName$</LogName>
</DataSource>
<ConditionDetection ID="Filter" TypeID="System!System.ExpressionFilter">
<Expression>$Config/Expression$</Expression>
</ConditionDetection>
Using the monitor type is a bit of a challange since the Authoring Console won't give you a nice UI for it, but you can copy the expression from a rule. In the sample MP I'm including I have a rule from which I copied the experssion that the monitors use. Just add the <RepeatCount> and <IntervalSeconds> elements.
<ComputerName>$Target/Property[Type="Windows!Microsoft.Windows.Computer"]/NetworkName$</ComputerName>
<LogName>Application</LogName>
<RepeatCount>5</RepeatCount>
<IntervalSeconds>300</IntervalSeconds>
<Expression>
<And>
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery Type="UnsignedInteger">EventDisplayNumber</XPathQuery>
</ValueExpression>
<Operator>Equal</Operator>
<ValueExpression>
<Value Type="UnsignedInteger">4</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery Type="String">PublisherName</XPathQuery>
</ValueExpression>
<Operator>Equal</Operator>
<ValueExpression>
<Value Type="String">WSH</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
<Expression>
<RegExExpression>
<ValueExpression>
<XPathQuery Type="String">EventDescription</XPathQuery>
</ValueExpression>
<Operator>ContainsSubstring</Operator>
<Pattern>SCOM Demo Event</Pattern>
</RegExExpression>
</Expression>
</And>
</Expression>
Please note that there is one weakness in using the System.ConsolidatorCondition module. The Count value must be at least two. This will not work on a single event. If you need a monitor that triggers on a single event and resolves on a missing event, I'll leave that as an exercise for the reader.
I'm attaching a sample management pack that uses the library as well as a VBScript that generates the events I used in the demonstration. This works in SCOM 2007 R2 and SCOM 2012. It should also work in SCOM 2007 RTM if you down-rev the MP references, but I don't have a SCOM 2007 RTM system to test on, so use at your own risk if you do. It should work, I just can't verify that.