SCOM 2007/2012: Optimizing rules
Most of the SCOM management pack are vendor specific and sealed, you will have only few options to change the monitoring parameter via overrides.
Overrides allow changing of threshold, parameter and frequency but not work-flow behave.
Following article discuss about optimizing rule in sealed MP.
Workflow in SCOM:
A module is building blocks for any monitor,rule and discovery in SCOM, refer below links for a basic idea on modules.
Workflow Basics: https://technet.microsoft.com/en-in/library/ff381314.aspx
SCOM module: https://technet.microsoft.com/en-us/library/ff381327.aspx
Rule:
A rule are stateless workflow in SCOM with below property.
- Generate Alerts just like Monitors.
- Do not affect the health state of the target object.Rules do not appear in the Health Explorer.
- Always target classes.
- Must define at least one write action.
- No option to configure recovery and diagnostic task.
- Used for schedule scripts and commands
- Collect data from sources like EventLog, Text or Log Files and Perfmon and store in DB and DW
Issues with rules:
- Noisy and generate duplicate alerts.
- By design there is no auto close for alerts generated from rules.
- By design can not configure recovery task to automate recovery.
Rule workflow has below properties:
- One or more data source modules.
- Zero or one condition detection modules.
- One or more write action modules.
Available options in optimizing alerts generated from rule:
we can optimize alerts and performance data collection by optimizing Datasource module, Write Action Module and adding Condition Detection Module
1) Data source module:
For rule most of the time we use predefined module types defined in system center library management packs like event id collection , average threshold …etc.
Best practice :
- Always keep data source module configuration very simple and use maximum expression filter to ensure required data is collected and avoid unnecessary data collection.
- For performance collection make sure pool time is set appropriately, recommended value is 120 sec and above. If the pool time is very small it will flood DB and DW with data causing db file growth in turn decreases in performance.
- Always use Optimized Collection configuration for counter (If you select optimization for a collection rule, a value is only collected if it differs from the previous sample by a specified tolerance, either an absolute value or a percentage. )This helps reduce network traffic and the volume of data stored in the Operations Manager database. Optimization should be used for performance counters that are expected to only change gradually. For counters that are expected to very significantly from one value to the next, optimized collection should be disabled.
- Never select all instance check box unless there is need for performance collect and alert, this will reduce number of workflows running on agent and data collected.
- For schedule script always ensure appropriate frequency/timeout is selected and running script at smaller intervals not recommended.
- Ensure appropriate exception handling is placed for script base rule and check for script module failure error at least once in a week.
- Always use cookdown process for multi instance script base data collection.
Refer below link for more information on cookdown
- https://technet.microsoft.com/en-in/library/ff381335.aspx
- http://blogs.technet.com/b/jonathanalmquist/archive/2011/11/17/cookdown-example-between-two-or-more-workflows.aspx
2) Condition Detection Module:
module can filter and optimize data collected by Data source module.
Filter: most of the predefined module for rule workflow has composite module of scheduler + probe action + condition detection in single module bases of dll ex: Microsoft.Windows.EventProvider module below
https://msdn.microsoft.com/en-us/library/ee809339.aspx
For custom module you can use System.ExpressionFilter to apply matching condition for alert and collection of event id’s. refer below link for express in filter
https://msdn.microsoft.com/en-us/library/ee692962.aspx
Optimization of rule :
We can add a System.ConsolidatorCondition module to the workflow to check number of occurrence of issue in given time. Refer below link for more information on consolidator and there is three type of consolidation modules.
https://msdn.microsoft.com/en-us/library/ee809324.aspx
Refer below links for more information
- http://social.technet.microsoft.com/wiki/contents/articles/20301.how-to-add-consolidation-for-url-monitoring-in-scom-20072012.aspx
- http://blogs.technet.com/b/kevinholman/archive/2014/12/18/creating-a-repeated-event-detection-rule.aspx
- http://sc.scomurr.com/scom-2012-system-expressionfilter-and-consolidation/
- http://wmug.co.uk/wmug/b/aquilaweb/archive/2011/03/10/creating-a-repeat-event-detection-rule
3 ) Write Action Module:
Alert generating rule will have System.Health.GenerateAlert module in write action module and for performance and event id collection there are two module Microsoft.SystemCenter.CollectPerformanceData and Microsoft.SystemCenter.DataWarehouse.PublishPerformanceData
We do not have much option to optimize performance and event collection rule in write action module.
For Alerting rule there is one of the option to decrease alerts know as suppression.
The Suppression element is used to control how alerts are suppressed, based on repeater trigger conditions. There are three possible options:Do not suppress alerts and create a new alert for every occurrence of the trigger event.
Suppress to one alert per workflow instance; this creates a maximum of one alert for each monitored object of the class that the rule is targeted at. A new alert is not created until the open alert is resolved.
Suppress based on values in the data that trigger the alert; for example, event parameter.
To configure the module with no suppression, use an empty element. For example:
<Suppression/>
To suppress on workflow, the suppression element contains an empty SuppressionValue element as follows:
<Suppression>
<SuppressionValue/>
</Suppression>
To suppress on some value in the data item that is passed to the module, you can specify one or more $Data references as suppression values. In the following example, a Windows event alert generating rule is configured to suppress alerts only if the first event parameter matches the previous data item that generated an alert:
<Suppression>
<SuppressionValue>$Data/Params/Param[1]$</SuppressionValue>
</Suppression>
Process of implementing new rules for monitoring :
To optimize rule for any new management pack implementation following tasks should be performed.
- Collecting list of rules will be implemented
- Understanding current configurations and fine tune.
- Optimize alerts
Detail process:
1) Collecting list of rules will be implemented
You can pull list of rules form a management pack with below methods:
MPviewer tool : This is one of the good tool to list monitoring configuration details from MP without importing in SCOM.
Refer below link for more information:
Get-SCOMRUle: You can use SCOM powershell to get list of rules form Mp once MP is imported in SCOM.
Get-SCOMManagementPack -DisplayName <<MP display name>> | Get-SCOMRule | Export-csv -path <<CSV file path>>
2) Understanding current configurations and fine-tune:
One of the biggest changeling with vendor provided mp is to fine tune monitor and rules, as most of the vendors provide a high count of rules which will be tedious to understand and fine tune.
There two ways to handle this:
- 1) Disable all rules for MP and review in each rule to enable. Refer below link for bulk override create URL: http://social.technet.microsoft.com/wiki/contents/articles/30714.scom-2012-bulk-override-creation-for-monitors-and-rules-using-powershell.aspx
- 2) Get noisy alerts and performance date collected from scom reports, you can use below reports and fine tune top alerts,events and performance collection rules.
- Most common alerts
- Most common Events
- Performance data collected for management pack
- 3) Optimize alerts:
fine tuning of alerts can be done with creation of overrides but we can not change workflow of module in sealed management pack. Need to follow below process to change module workflow as below.
- Disable rule: Disable rule by creating override and save in new unsealed MP by this new override mp will get reference of the mp.
- Add reference of target class: find the MP which has rule target class defined and add the reference.
- Re-create the rule: recreate the same rule with same configuration and with optimize options discussed above and ensure rule do not have same ID.
Note: If rule using any custom module, the module need to be public as access level, if the module is private as access level need to create same module in new unsealed MP.
Note: Review all event id rules and disable collection if it is not required by creation of override, as in most of the environments event id collection do not play big part of monitoring and event collection is noisy.