Exchange 2013 Managed Availability HealthSet Troubleshooting
Knowing how to deal with annoying HealthSet's was scary at the beginning of my experience with Exchange 2013.
The new introduced feature called Managed Availability, is a built-in monitoring system that can take recovery actions that can cause serious issues trying to solve a small issue.
There are 3 types of components that can be related to HealthSet's
- Probe : used to determine if Exchange components are active
- Monitor : when probes signal a different state then the one stored in the patters of the monitoring engine, monitoring engine will determine whether a component or feature is unhealthy.
- Responder : will take action if a monitor will alert the responder about an unhealthy state. Responders will take different actions depending on the type of component or feature. Actions can start with just recycling the application pool and can go to as far as restarting the server or even worse putting the server offline so it won't accept any connections.
In this Blog post I will talk about troubleshooting different HealthSet's in Exchange 2013.
Troubleshooting Exchange HealthSet MailboxSpace
We should start with getting a health report for the Exchange 2013 server by using the Get-HealthReport cmdlet
Get-HealthReport -Identity EXCH2K13
Image 1
If you want to list only those HealthSet's that are Unhealthy, Degraded, Disabled you can use this cmdlet :
Get-HealthReport -Server EXCH2K13| where { $_.alertvalue -ne "Healthy" }
Let's list a couple of these components for the MailboxSpace HealthSet
Get-MonitoringItemIdentity -Identity MailboxSpace -Server EXCH2K13 | ft Identity,ItemType,TargetResource -autosize
Image 2
As you can see the HealthSet has multiple Probes, Monitors, Responders.
What if you have a HealthSet with status Unhealthy or Repairing like the MailboxSpace for a Test DB ?
We need to investigate further to check what are the monitors that are causing the HealthSet to go into Unhealthy state.
Get-ServerHealth -Identity EXCH2K13 -HealthSet "MailboxSpace"
Image 3
As you can see above, a lot of Monitors are Unhealthy.
Assuming that in your production environment you have a TEST DB located on C: drive that you probably don't want to move or delete but because of limited space available on C you are getting these Unhealthy monitors.
We can use Add-ServerMonitoringOverride to disable these monitors.
Add-ServerMonitoringOverride
https://technet.microsoft.com/en-us/library/jj218628(v=exchg.150).aspx
The limitations for this is the 60 days limit for a server override
Add-ServerMonitoringOverride -Duration 60.00:00:00 -Identity ProbeMonitorResponderName -ItemType Monitor -PropertyName Enabled -PropertyValue 0
Using the result we got in Image2 with Get-MonitoringItemIdentity and combining that with Get-ServerHealth we will identify Monitors that need to be overridden.
We have the following Monitors with Unhealthy or Repairing state :
MailboxSpace\DatabaseLogicalPhysicalSizeRatioEscalationNotification\DB01
MailboxSpace\DatabaseLogicalPhysicalSizeRatioEscalationNotification\DB02
MailboxSpace\DatabaseLogicalPhysicalSizeRatioEscalationProcessingMonitor
MailboxSpace\DatabaseSizeMonitor\DB01
MailboxSpace\DatabaseSizeMonitor\DB02
MailboxSpace\Stora-PrgeLogicalDriveSpaceMonitor\C:
Add-ServerMonitoringOverride -ItemType Monitor -Identity "MailboxSpace\DatabaseLogicalPhysicalSizeRatioEscalationNotification\DB01" -PropertyValue 0 -PropertyName Enabled -Duration "60.00:00:00" -Server EXCH2K13
Add a server override for all the Monitors above, please make sure of the ItemType if it's Probe, Monitor, Responder
At the end you can verify your Server Overrides with Get-ServerMonitoringOverride
Image 4
Now we should check ServerHealth to see if the Monitors have been disabled
Get-ServerHealth -Identity EXCH2K13 -HealthSet "MailboxSpace" | ft -Autosize
Image 5
MailboxSpace HealthSet is Healthy now.
Image 6
Troubleshooting FEP HealthSet
Some of you don't have ForeFront installed so you would want to disable this HealthSet on the server.
We will achieve this simply by changing the xml file that corresponds to FEP Health set
Browse to C:\Program Files\Microsoft\Exchange\V15\Bin\Monitoring\
Search for FEPActiveMonitoringContext. Open the file with Notepad
Change Line 12 : Enabled = “True”
Replace TRUE with FALSE to disable FEP monitoring.
The file should look something like this :
<?xml version="1.0" encoding="iso-8859-1"?>
< Definition xsi:noNamespaceSchemaLocation="..\..\WorkItemDefinition.xsd" xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance">
<!--FEPService Maintenance definition section-->
<MaintenanceDefinition
AssemblyPath="Microsoft.Exchange.Monitoring.ActiveMonitoring.Local.Components.dll"
TypeName="Microsoft.Exchange.Monitoring.ActiveMonitoring.FEP.FEPDiscovery"
Name="FEP.Maintenance.Workitem"
ServiceName="FEP"
RecurrenceIntervalSeconds="0"
TimeoutSeconds="30"
MaxRetryAttempts="0"
Enabled = "false">
After you modify the above line you should restart Microsoft Exchange Health Management service on the server where you modified the xml file
Troubleshooting CAS Proxy HealthSet's
What if you have TMG in your Organization and you need to set OWA/ECP with Basic Authentication
You will probably disable Forms Authentication on OWA and ECP
Soon after you have disabled forms Authentication you will start seeing that some server components will go in inactive state like OWA.Proxy, ECP.Proxy , RWS.Proxy
You can check with : Get-ServerComponentState -Identity EXCH2K13
Image 7
We can set the component back to Active manually by running this cmdlet :
Set-ServerComponentState -Identity EXCH2K13 -Component EcpProxy -State Active -Requester HealthAPI
After 1 hour the components will return to an Inactive state.
If we continue forward with troubleshooting and check Crimson Logs on your server you will find events related to ECP.Proxy Probe.
More information about Crimson channel event logging can be found here
https://technet.microsoft.com/en-us/library/dd351258(v=exchg.150).aspx#Crimson
Event Viewer > Application and Services Logs > Microsoft > Exchange > ActiveMonitoring > ProbeResult
Find the event related to Probe Result (Name=ECPProxyTestPRobe/MSExchangeECPAppPool) select Details and at StateAttribute3 you will see
"FailurePoint=FrontEnd,HttpStatusCode=401,Error=Unauthorized,Details=,HttpProxySubErrorCode=,WebExceptionStatus=,LiveIdAuthResult="
ECP.Proxy Probe is failling with 401 Unauthorized error, credential used can be seen at StateAtrribute2
Verify HealthSet for ECP and OWA
Get-HealthReport -Server EXCH2K13
You will see the ECP,OWA,ECP.Proxy,OWA.Proxy,RWS Proxy HealthSet's are Unhealthy
To remove this behavior we can disable the Monitoring Probes for OWA, ECP , RWS
Open Windows Explorer and browse to :
C:\Program Files\Microsoft\Exchange Server\V15\Bin\Monitoring\Config\
Open ClientAccessProxyTest.xml with Notepad
Change the "true" value of the following Monitoring Probes
ECPProbeEnabled = "false"
OWAProbeEnabled = "false"
ReportingProbeEnabled = "false"
Save the ClientAccessProxyTest.xml and close it
Restart Microsoft Exchange Health Manager on the server where you modified xml file
Disabling the Monitoring Probes has no impact on the Exchange Servers Proxy functionality.
If you want to modify any other settings to the xml files locate in Bin\Monitoring\Config\ please consult a Microsoft Exchange Support Engineer before doing any modifications to those files.
To conclude this the problem is with the Authentication method used on the IIS sites ECP, OWA.
Monitoring Probes can only use Forms Based Authentication and Windows Authentication to test ECP , OWA , RWS functionality.
I hope the information provided was helpful for you.
If you have any questions please feel free to send an email to a-crtimo@microsoft.com
Comments
- Anonymous
January 01, 2003
Good Article...For me.. Add-ServerMonitoringOverride to disable the monitors and making them healthy not look as good solution...May be Ms should have designed in different way :) - Anonymous
January 01, 2003
thank you! was struggling where to find the cas proxy healthsets to disable with tmg - Anonymous
December 01, 2014
Is it me or does this just seem like a band-aid. I would like to find out why it is marked Unhealthy, not how to override it and "brush it under the rug". - Anonymous
December 01, 2014
Forgive me for asking a stupid question, but what's the difference between "Microsoft Exchange Health Manager" service (which I cannot find) and "Microsoft Exchange Health Management" service (which I can actually find on my Exchange 2013 server)? - Anonymous
April 07, 2015
How is disabling monitors troubleshooting? - Anonymous
March 10, 2016
Big thnx!!! - Anonymous
May 23, 2017
Hi Cristi, Do we have a an option to permanently disable a monitor? - Anonymous
November 19, 2017
The title is misleading.Disabling monitors is not a way to troubleshoot unhealthy exchange healthsets. - Anonymous
April 10, 2018
Really helpful Article. - Anonymous
July 05, 2018
All this does is hide the alerts for a certain amount of time. Does nothing to actually resolve the issue. Not troubleshooting at all