OpsMgr2007 SP1 RC rolling ugrade and cluster aware application monitoring.
Rolling upgrade of OpsMgr2007 is performed the way that server bits are upgraded, new management packs are requested and delivered and those are later pushed down to agents. Operations Manager 2007 SP1 Release Candidate contains an issue when monitoring of cluster aware application (like SQL, Exchange …) is present within topology.
This issue is manifesting itself by unloading several rules responsible for discovering instance of Virtual Server as well as monitoring failover of cluster resource group matching that particular instance of Virtual Server. It is caused by changes to configuration of native modules defined inside of Microsoft.Windows.Cluster.Library management pack.
Rules will be unloaded until new binaries are approved, delivered and replace binaries shipped with Microsoft operations Manager 2007 RTM. Impact of this unload is that during the rolling upgrade of OpsMgr2007 SP1 RC, newly created cluster aware applications are not discovered, already present cluster aware applications are not “actively” monitored if they had failed over to another cluster node or when cluster service stopped, paused or crashed.
Rules are re-loaded and are able to process new configuration once binaries present with OpsMgr2007 SP1 RC are replaced and loaded by runtime on agent (health service) running on cluster node.
Sample of event notifying about discovery rule unload:
Event Type: Error
Event Source: HealthService
Event Category: Health Service
Event ID: 4511
Date: 11/09/2007
Time: 9:26:27 AM
User: N/A
Computer: testBox
Description:
Initialization of a module of type "ClusterDiscoveryDS" (CLSID "{97B1EF21-757C-4004-86BB-57939E2C98D8}") failed with error code “Element not found” causing the rule "Microsoft.Windows.Cluster.Classes.Discovery" running for instance "Cluster Service" with id:"{0753905A-5ACE-5C70-1B0A-7980743053FA}" in management group “marius”
Comments
Anonymous
November 13, 2007
I need to clarify that you will be able to monitor your cluster aware application with upgrade to OpsMgr2007 SP1 RC. What I tried to say was that during the rolling upgrade, rules will be temporarily unloaded until SP1 binaries are replaced on cluster node. After that, all will come back to normal. The reason I posted this was the fact that some customer were not approving binaries when seeing unload events. That in fact caused that monitoring was not happening. They were supposed to do the opposite, approve binaries ASAP.Anonymous
July 21, 2008
Hi, I've a SCOM 2007 with SP1 (RTM) and I've received the error message on Windows 2000 based cluster machines. I have the following MPs: Microsoft.Windows.Cluster.Library 6.0.6278.0 Microsoft.Windows.Cluster.Management.Library 6.0.6277.1 Microsoft.Windows.Cluster.Management.Monitoring 6.0.6277.1 and two Windows 2003 Cluster related MPs, also 6.0.6277.1 versions. What could be wrong? What can I do to resolve this issue. In other system I can monitor Windows 2003 based clusters without any problem. Many thanks, ZoltanAnonymous
July 22, 2008
We are not able to monitor WIN2k cluster with said management packs.In fact, we are right now not able to monitor WIN2k cluster at all. I was playing around with some custom MP that at least discovers instances and provides alert rules, but had to give up as has no more Win2k clusters available to me. Could you please post your question to Microsoft.Public.OpsMgr.General newsgroup including exact event? I will try to asnwer and help there ...