Backwards Compatibility Mapper fails to start event on OpsMgr 2007 SP1
Many of you (especially those trying to monitor Exchange using converted management pack) had noticed that with Operations Manager 2007 SP1, noisy event complaining about transient state of instance discovery appeared inside of agent’s event log rather frequently ultimately causing its flooding.
Sample event:
Event Type: Warning
Event Source: Health Service Modules
Event Category:None
Event ID: 10720
Date: 3/11/2008
Time: 9:06:45 PM
User: N/A
Computer: SMCNODE10
Description:
The Backwards Compatibility mapper module was not able to start because discovery has not fully completed for this Health Service. This is a transient issue, the affected rule(s) will be unloaded and the Health Service will restart them when initialization is complete.
One or more workflows were affected by this.
Workflow name: System.Mom.BackwardCompatibility.ServiceStateMonitoring
Instance name: SMCLUSTER05.smx.net
Instance ID: {7E669FE2-B4F9-00CB-1E1D-99B07E791F91}
Management group: VerifyMP_SP1
Root cause of this issue lies in the fact that backward compatibility feature requires some properties for instance of Microsoft.Windows.Computer being always populated after discovery completes. And as you probably guessed, with cluster involvement, there will be an instance which unfortunately doesn’t fulfill such requirement. It is instance of Microsoft.Windows.Cluster.VirtualServer. Following is class hierarchy for that managed entity type:
VirtualServer really is a special managed entity, originally designed to help the discovery of cluster aware application with optimized (not converted) management pack (In the near future I will try to get to post comments about how it should be used with such discovery). It was never really meant to serve as a target for a real monitoring workflow and as such, it only populates some properties on top of keys (keys are always populated – that is how instance is recognized by OpsMgr). VirtualServer as defined by OpsMgr is a cluster resource group with cluster resource IPAddress and cluster resource NetworkName. Because it is also a Computer, it will also appear in Computer state view andas said before, without any unit monitors targeting this instance, it will be marked as not monitored with all so far released versions of OpsMgr. It is also going to appear inside of “Agenless Managed” due to the fact that such instance never hosts HealthService. Following is a T-SQL that one could use to investigate some of the computer properties (just replace <partial name of virtual server> below with real substring identifying your VirtualServer)
use OperationsManager
select BME.DisplayName, C.PrincipalName, C.NetworkName, C.NetbiosComputerName, C.DNSName
from MTV_Computer C
join BaseManagedEntity BME with(nolock) on BME.BaseManagedEntityId = C.BaseManagedEntityId
join ManagedType MT with(nolock) on MT.ManagedTypeId = BME.BaseManagedTypeId
where BME.IsDeleted = 0
and MT.TypeName = 'Microsoft.Windows.Computer'
and BME.DisplayName like '%<partial name of virtual server>%'
Following is query result for smcluster05 prior applying workaround:
We can see that problem is that there is no value for DNSName property. This issue was already addressed with future release of OpsMgr 2007 (SP2), but there are some workarounds that could be applied before it is available to public. The one that I will introduce in this post consists of a discovery rule, which will populate property DNSName. Such discovery rule will be targeted to the instance of VirtualServer. OpsMgr guarantees that VS instance is monitored by health service which is running on the active cluster node (node where cluster resource group is currently online). With a failover, monitoring of the instance is stopped and is started on new active node once failover finishes. That guarantees that discovery rule populating DNSName property will make sure that its value will remain filled and available for backward compatibility feature to be used.
Following screenshot displays the fact that after MP is delivered to the health service running on the cluster node (event 1201), there could be a set of additional events 10720 before discovery rule is loaded, executed and data delivered to the operational DB. Events will no longer appear after final configuration reload event (1210).
Executing query again shows that property is now populated:
Please import attached management pack in your test environment to evaluate if this workaround works for you. It is not sealed and can be further customized if you wish to do so. As expected, this workaround is provided AS IS, with no warranties, and confer no rights. Use of included samples is subject to the terms specified at Microsoft.
Comments
Anonymous
March 14, 2008
Hi Marius, Thanks for this, it fixed issues we were having with Cluster MP discovery. http://derekhar.blogspot.com/2008/03/cluster-mp-discovery-failing-on-sp1.htmlAnonymous
July 21, 2008
hotfix is available at http://support.microsoft.com/kb/951380