How I do it: SCOM Override handling
Summary for the impatient
This post describes why and how to implement a certain naming convention for the Override description field and provides an automated solution to enforce this syntax, handle temporary Overrides and report on Overrides.
Use case
Working with Overrides for SCOM workflows is a common task. There is a lot of documentation available describing how to set Overrides for specific targets and how to store them correctly (preferably in separate custom Override MPs).
But I figured out that many customer still struggle with the handling of their Overrides. Common questions I often hear are:
- Who has set a specific Override?
- Who was the client/customer who ordered an override (e.g. a specific user or department)
- What is this Override doing? Why was it set?
- How can I handle temporary Overrides which are only necessary for a certain time period?
- In ITIL driven environments: what is the change ticket id for this specific override?
Out of the box SCOM provides the “Overrides” Report which lists all Overrides for specific MPs created in a specific time frame. This is really useful as a general overview but does not answer all questions asked above.
Well, each Override has a description field which can be edited with the required information. Simple, easy, end of story.
Hmm, yes but there are still some caveats:
- What if I want to report on the entered data in an automated and structured way?
- The description field is not a mandatory field. How can I force my Operator to enter the required information in a structured way?
How I do it: Solution Overview
My solution for handling Overrides consists of two parts:
- Structured naming convention for each Override description in form of a written policy/work instruction or however you might call it.
- A Management Pack which reports and controls the compliance of the defined naming convention
1. Structured naming convention
What kind of convention you will use and which kind of delimiter you choose is completely up to you. I will usually use this syntax:
<Creation date>#<Author>#<Client>#<Expiration date>|<NULL>#<Ticket ID>|<NULL>#<Description>
Where:
- Creation date: YYYYMMDD
- Author: Username or any other common abbreviation of the user who sets the Override
- Client: Username, department name or SELF, describing who has ordered the Override. SELF it Author and Client are the same person.
- [Optional] Expiration date for temporary Overrides: YYYYMMDD or NULL/empty string.
- [Optional] Ticket ID of ITSM tool e.g. for the corresponding change. Can also be NULL/empty
- Description: Textual description why the override was set
Examples:
- 20151001#Dirk#SELF###Set because …
Override was set and ordered by Dirk on October 1st 2015 without a ticket and it is never expiring - 20151015#Dirk#John##SR123456#Set because …
Override was set by Dirk on October 15th 2015 on behalf of John with ticket id SR123456 and it is never expiring. - 20151018#Dirk#John#20151130##Set because …
Override was set by Dirk on October 18th 2015 on behalf of John. This Override will expire on November 30th 2015.
2. Management Pack
The attached simple demo Management Pack consists of one custom data source which
- Executes a custom PowerShell script on a daily basis (SyncTime 9:00, Intervall 86400 seconds), which
- Creates an export of all existing Overrides as CSV for later analysis and reporting
- Checks if there are Overrides in Management Packs using a specific prefix (in this demo XXX) not in compliance with the above defined naming convention and creates a specific property bag with the result
- Checks if there are temporary Overrides in Management Packs using a specific prefix (in this demo XXX) which have already expired and creates a specific property bag with the result
- A custom unit monitor type which consumes the property bags
- One unit monitor which creates an alarm if there are Overrides not compliant with the naming convention
- One unit monitor which creates an alarm if there are expired temporary Overrides
Explanations:
- MP Filter in the script
I will not check the syntax in ALL Override MPs, because there might be default overrides in certain MPs which do not comply with the naming convention for a good reason. As it is best practice to store all custom Overrides in own custom Override MPs, these MPs should use a certain common prefix. In this demo I have chosen XXX as the prefix. So only Overrides contained in MPs beginning with the internal name XXX will be checked. - Workflow target
I have chosen the RMS Emulator role as a target for all workflows. This might not be the best choice, but it is a good trade-off between simplicity, functionality and scalability.
The solution in action
A correct Override description according to the naming convention would look like that:
But this is just a convention which cannot be enforced...
Now the Override Handling Management Pack enters the stage :)
After importing the MP you can see three Monitors attached to the Configuration Health roll up Monitor on the RMS Emulator:
If you create a new Override in a MP starting with XXX and not complying with the naming convention for the description field by e.g. leaving it empty…
… or by creating a temporary Override which has already expired …
… these monitors will change state and generate an alert with the problematic Override GUIDs included in the alert description:
Outcome 1: Detailed alarming
So you will get a detailed monitoring and alerting thus enforcing the naming convention and never forget any expired Overrides anymore!
The data source will also create an export of ALL available Overrides as a CSV file with the name YYYY-MM-DD_<Management Group>_OverrideExport.csv:
This CSV file contains a lot of information about the Overrides. Once you import it into Excel it could look like this (I have filtered the table to match the three previously created Overrides and filtered out all unnecessary columns):
Outcome 2: Flexible reporting
You can use the daily CSV files to create flexible reports in Excel and to track changes!
The PowerShell Script in the data source will also generate detailed Event log entries (9001-9099) in the Operations Manager Eventlog on the RMS Emulator which simplifies troubleshooting and debugging:
Implementing the solution in your own SCOM environment
The attached unsealed Management Pack is for demonstration purpose in TEST environments only and not suited for production environments.
The requirements for testing this MP:
Management Server Action account must be member of SCOM Admin role. The PowerShell data source will be executed under this account and will try to connect to the SCOM SDK. If you don’t like this, simply modify the MP by creating a suitable RunAs account to execute the data source under a different security context.
Before you import the MP please modify the overrideable parameters PrefixCustomOverrideMP and ExportDirectory for both Unit Monitors to suit your needs or create Overrides.
Please be aware that you should set the same values in both workflows to make use of Workflow Cookdown!
Conclusion
That is my way of handling SCOM Overrides. I use the naming conventions and the automation processes behind in several customer environments successfully. It is simple to implement and can easily adapted to your specific needs.
Let me know what you think of it.