Console sluggish or frozen during agent approval process
I was working with a large customer last week, and found some very interesting performance metrics which impact console sessions. This particular customer pushes the limits of Operations Manager 2007 in every way. They have 6000+ agents in a single Management Group, and have an average of 50 concurrent console sessions at any given point in time. Achieving these limits was a work in progress over several months, but eventually we got there.
However, there was one very curious point of failure, where connected console sessions would become extremely sluggish or even freeze for a period of ~10 minutes. It was something that eluded us for quite some time, until last week.
If you are familiar with KB956240, you know that this hotfix updates the Operations Manager 2007 Data Abstraction Layer component (Microsoft.mom.dataaccesslayer.dll). Although KB956240 helps reduce the performance impact that the types configuration changes mentioned in the article will produce, there has been no change to the underlying configuration update process in Operations Manager 2007.
The good news
This configuration update process has been changed in Operations Manager 2007 R2. This change will also be included in the Post SP1 Rollup. In R2 and Post SP1 Rollup, Management Group configuration updates that occur during certain types of configuration changes will not produce such a performance impact.
Am I affected by this issue?
It depends on the number of agents in your Management Group, how heavily your company employs the console (concurrent connections), and how long it takes for configuration updates to reach agents in your environment.
If you’ve got 6000 agents in your MG, you likely have a dynamic environment in which new servers (agents) are deployed regularly, and expired or failed servers (agents) are decommissioned on a regular basis. You are likely affected by this issue.
If your administrators are always in the console, you’ll likely have some complaints that the console became extremely slow or maybe even froze for a duration of ~10 minutes. You are likely affected by this issue.
If your Root Management Server and/or OperationsManager database server cannot handle the “bursty” type of traffic and transactions that configuration updates produce, or if there is significant network latency effecting the time it take configuration updates to reach your agent population, you are likely affected by this issue.
Although each of the abovementioned conditions are true, every Management Group is affected by this issue. It’s really a matter of how frequently you are affected, and to what degree your environment will be impacted.
What can I do now?
From what I’ve observed in the field, the biggest performance impact is produced from approving agents. The only guidance I could give to help reduce the impact, at least during peak times where having poor console performance isn’t an option, is to schedule agent approval during times in which you know will not impact your console users quite as much.
One thing to note
Whether a single agent is approved, or a batch of 200 agents are approved, the same configuration update process is initiated. If there are multiple agents waiting to be approved, select each agent and approve all at the same time. Or, use an agent approval script that batches agent approval.
If you have auto-approval turned on (indicated below in yellow), unfortunately there is no way to control the approval process.