Jaa


Dependency Monitor Hotfix to increase health state calculation reliability

I write this to respond to the release of the hotfix. I would like to bring such fix to your attention and for your consideration. This fix should increase reliability of health state monitoring for numerous cases where dependency monitor is used.

Issue was discovered with dependency monitor may incorrectly indicate the wrong state due to a race condition during monitor registration. This could surface when the contributing instances are not available or in maintenance mode during registration, when target instance is leaving maintenance mode, and sometimes during distributed application creation.

Main symptoms may include unexpected alerts generated, incorrect state indicated based upon the rollup algorithm ad the state of its contributing monitors. (Many cases where state is not reflected at all and shows “Not Monitored” especially for distributed application.)

DA issue had been tried and evaluated by a customer and fix addressed their problem (this referral should not be used as advice to deploy into production immediately as it is encouraged to perform individual evaluation in your own pre-production environment). Please, in the case this hotfix won’t help your case, report it thru connect site so I have a chance to investigate your scenario.

Hotfix should be deployed to every computer experiencing issues with dependency monitor. In majority cases, monitor resides in RMS only.

IMPORTANT NOTE: Application of this hotfix will reset the Health Service configuration state on each computer where it is installed. It is therefore important to review unhealthy state within the Operations Manager console and resolve where possible symptoms causing unhealthy state prior to hotfix installation. Failure to do so may cause event based monitors to be reset to Healthy state and related Alerts automatically resolved, which may lead to loss of visibility into issues impacting the monitored environment.

Comments

  • Anonymous
    March 12, 2009
    The comment has been removed
  • Anonymous
    March 12, 2009
    Daniele, I'm sorry to hear that. We definitely did not see those numbers you show in your blog post. We are trying to re-evaluate what could have caused that. Please contact us offline with description of your env ...