Why Exchange 2013 CU6+ use out-of-site DCs/GCs
This is no more applicable post Exchange 2013 CU11 for On-premise environments - also not valid for Exchange 2016!
We have few escalations from our customers, who recognized huge traffic between Exchange 2013 CU6+ and out-of-site DCs/GCs.
When we Get-ExchangeServer –Status we can see that Exchange uses out-of-site DCs but at a same time in event 2080 we can see that other In-Site DC are availible.
Here is how it looks in Exchange 2010 and Exchange 2013 RTM – CU5:
I used topology with 4 DC in-Site and 1 Out-Site
From event 2080:
Process Microsoft.Exchange.Directory.TopologyService.exe (PID=2276). Exchange Active Directory Provider has discovered the following servers with the following characteristics:
(Server name | Roles | Enabled | Reachability | Synchronized | GC capable | PDC | SACL right | Critical Data | Netlogon | OS Version)
In-site:
DC001.CU1.com CDG 1 7 7 1 0 1 1 7 1
dc2.CU1.com CDG 1 7 7 1 0 1 1 7 1
DC3.CU1.com CDG 1 7 7 1 0 1 1 7 1
dc4.CU1.com CDG 1 7 7 1 0 1 1 7 1
Out-of-site:
dc5.CU1.com CDG 1 7 7 1 0 1 1 7 1
Get-ExchangeServer exch5-cu1 -Status
CurrentDomainControllers : {dc2.CU1.com, DC001.CU1.com, dc4.CU1.com, DC3.CU1.com}
CurrentGlobalCatalogs : {dc2.CU1.com, DC001.CU1.com, dc4.CU1.com, DC3.CU1.com}
CurrentConfigDomainController : DC001.CU1.com
>netstat -n | findstr 3268
We can see established connections with all 4 GCs
Turn off DC4
Information MSExchange ADAccess 2070 Topology:
Process MSExchangeHMWorker.exe (ExHMWorker) (PID=3116). Exchange Active Directory Provider lost contact with domain controller dc4.CU1.com. Error was 0x34 (Unavailable) (Active directory response: The server is unavailable.). Exchange Active Directory Provider will attempt to reconnect with this domain controller when it is reachable.
Get-ExchangeServer exch5-cu1 -Status
CurrentDomainControllers : {DC001.CU1.com, DC3.CU1.com, dc2.CU1.com }
CurrentGlobalCatalogs : {DC001.CU1.com, DC3.CU1.com, dc2.CU1.com}
CurrentConfigDomainController : DC001.CU1.com
>netstat -n | findstr 3268
We can see established connections with 3 GCs
Turn off DC3
CurrentDomainControllers : {DC001.CU1.com, dc2.CU1.com}
CurrentGlobalCatalogs : {DC001.CU1.com, dc2.CU1.com}
>netstat -n | findstr 3268
We can see established connections with 2 In-Site GCs
Turn off DC2
CurrentDomainControllers : {DC001.CU1.com}
CurrentGlobalCatalogs : {DC001.CU1.com}
CurrentConfigDomainController : DC001.CU1.com
>netstat -n | findstr 3268
We can see connections only to DC001
From event 2080:
Process Microsoft.Exchange.Directory.TopologyService.exe (PID=2276). Exchange Active Directory Provider has discovered the following servers with the following characteristics:
(Server name | Roles | Enabled | Reachability | Synchronized | GC capable | PDC | SACL right | Critical Data | Netlogon | OS Version)
In-site:
DC001.CU1.com CDG 1 7 7 1 0 1 1 7 1
dc2.CU1.com CDG 1 0 0 0 0 0 0 0 0
DC3.CU1.com CDG 1 0 0 0 0 0 0 0 0
dc4.CU1.com CDG 1 0 0 0 0 0 0 0 0
Out-of-site:
dc5.CU1.com CDG 1 7 7 1 0 1 1 7 1
In other words: we do not try to establish connection to Out-of-site DC while we have at least one In-site DC availible.
What happenes as soon as you update your servers to CU6+:
Get-ExchangeServer exch5-cu1 -Status
CurrentDomainControllers : {dc2.CU1.com, DC001.CU1.com, dc4.CU1.com, DC3.CU1.com}
CurrentGlobalCatalogs : {dc2.CU1.com, DC001.CU1.com, dc4.CU1.com, DC3.CU1.com}
CurrentConfigDomainController : DC001.CU1.com
>netstat -n | findstr 3268
We can see established connections with all 4 GCs
Same as in RTM
Turn off DC4
Get-ExchangeServer exch5-cu1 -Status
CurrentDomainControllers : {DC001.CU1.com, DC3.CU1.com, dc2.CU1.com }
CurrentGlobalCatalogs : {DC001.CU1.com, DC3.CU1.com, dc2.CU1.com}
CurrentConfigDomainController : DC001.CU1.com
>netstat -n | findstr 3268
We can see established connections with 3 GCs
Same as RTM
Turn off DC3
CurrentDomainControllers : {DC001.CU1.com, dc2.CU1.com, dc5.CU1.com}
CurrentGlobalCatalogs : {DC001.CU1.com, dc2.CU1.com, dc5.CU1.com}
CurrentConfigDomainController : DC001.CU1.com
NEW!!!
We established connection to Out-of-Site DC dc5.cu1.com
It is by design. Saying that if number of in-site DCs are less than MinSuitableServer, which is by default 3, out-site DCs will be used. Once the number of in-site DCs is larger than MinSuitableServer, out-site DCs should not be used any more.
Previously when Exchange process asks for domain controllers, topology service only returns servers from either In-Site list or Out-of-Site list. That says, as long as there is one single DC suitable in In-Site list, topology service will return it back and does not further search Out-of-Site list, no matter how many is requested by the client.
This might cause some load unbalanced issue, especially during site failover. Good domain controllers left in the being failed out site take much more load than outside DCs.
To fix this, a new configurable setting, MinSuitableServer, is introduced. Topology service will first check whether there are enough suitable servers in In-Site list. If no, it will add servers from Out-of-Site list. Similar change is done in topology discovery, too.
How we can return it back or configure?
If we really want to use in-site DCs only, even though there is just 1 available (as it was in 2010 or 2013 RTM-CU5), we can add an entry:
MinSuitableServer = "1"
in Microsoft.Exchange.Directory.TopologyService.exe.config:
In section <Topology MinimumPrefixMatch = "2"
EnableWholeForestDiscovery = "true"
MinSuitableServer = "1" <----------ADD THIS VALUE
ForestWideAffinityRequested = "true"/>
I turned DC4 off as we do not need it
Also I added MinSuitableServer = "2" and restarted Microsoft Exchange Active Directory Topology aka MSExchangeADTopology or whole server
CurrentDomainControllers : {DC3.CU1.com, dc2.CU1.com, DC001.CU1.com}
CurrentGlobalCatalogs : {DC3.CU1.com, dc2.CU1.com, DC001.CU1.com}
CurrentConfigDomainController : dc2.CU1.com
Turn DC3 off
From event 2080:
Process Microsoft.Exchange.Directory.TopologyService.exe (PID=2504). Exchange Active Directory Provider has discovered the following servers with the following characteristics:
(Server name | Roles | Enabled | Reachability | Synchronized | GC capable | PDC | SACL right | Critical Data | Netlogon | OS Version)
In-site:
DC001.CU1.com CDG 1 7 7 1 0 1 1 7 1
dc2.CU1.com CDG 1 7 7 1 0 1 1 7 1
DC3.CU1.com CDG 1 0 0 0 0 0 0 0 0
dc4.CU1.com CDG 1 0 0 0 0 0 0 0 0
Out-of-site:
dc5.CU1.com CDG 1 7 7 1 0 1 1 7 1
[PS] C:\Windows\system32>Get-ExchangeServer Exch5-cu1 -Status | fl Current*
CurrentDomainControllers : {DC001.CU1.com, dc2.CU1.com}
CurrentGlobalCatalogs : {DC001.CU1.com, dc2.CU1.com}
CurrentConfigDomainController : dc2.CU1.com
Turn off DC2
CurrentDomainControllers : {DC001.CU1.com, dc5.CU1.com}
CurrentGlobalCatalogs : {DC001.CU1.com, dc5.CU1.com}
CurrentConfigDomainController : DC001.CU1.com
Start DC3
CurrentDomainControllers : {DC001.CU1.com, DC3.CU1.com}
CurrentGlobalCatalogs : {DC001.CU1.com, DC3.CU1.com}
CurrentConfigDomainController : DC001.CU1.com
So we returned back to In-site DC as soon it became available.
Now set MinSuitableServer = "1"
CurrentDomainControllers : {dc2.CU1.com, DC3.CU1.com, DC001.CU1.com}
CurrentGlobalCatalogs : {dc2.CU1.com, DC3.CU1.com, DC001.CU1.com}
CurrentConfigDomainController : DC001.CU1.com
Turn off DC2
CurrentDomainControllers : {DC3.CU1.com, DC001.CU1.com}
CurrentGlobalCatalogs : {DC3.CU1.com, DC001.CU1.com}
CurrentConfigDomainController : DC001.CU1.com
Turn off DC3
CurrentDomainControllers : {DC001.CU1.com}
CurrentGlobalCatalogs : {DC001.CU1.com}
CurrentConfigDomainController : DC001.CU1.com
In other words: same as it were in 2010 and 2013 RTM-CU5.
Comments
- Anonymous
August 12, 2015
Thanks a lot for making this available, very useful for a current issue my customer have. - Anonymous
August 12, 2015
Thanks a lot for sharing that info
very valuable information - Anonymous
August 12, 2015
Thanks a lot, very useful information! - Anonymous
August 28, 2015
Now we have KB for IT https://support.microsoft.com/en-us/kb/3088777 - Anonymous
February 26, 2016
Hi, we had an issue that was "fixed" by this setting but ours is a single AD site environment.
We started off with 2 W2k3 DC with Exchange 2013 CU9. Everything was working fine. To upgrade the AD, we introduced 2 new W2k12R2 DC. Exchange was still working at this point.
We shutdown the 2 old DC and only the 2 new DC remained. Now Exchange services cannot start and we are getting eventid 4027, 2142,2193. Seems like the Exchange servers are still trying to contact the old DCs. Exchange services started when the old DCs were booted up.
We changed these 2 settings in the Microsoft.Exchange.Directory.TopologyService.exe.config after doing some Google search that found this link
https://social.technet.microsoft.com/Forums/exchange/en-US/34b1c301-ad12-4655-aeea-772e70c654bc/event-id-2142-2077-2069-msexchangeadtopology-exchange-2013?forum=exchangesvradmin
We changed the value below
MinPercentageOfHealthyDC = "50" to "10"
And we added MinSuitableServer = “1”
After this, with only the 2 new DC started and Exchange servers rebooted, Exchange services starts fine.
My questions
1) Does the above settings with the default value really means we need at least 50% of available DC before Exchange services will start?
2) Why does Microsoft choose such a default value? It seems pretty silly not to let Exchange use the surviving DC even if it is less than 50%.
3) Should we "fine tune" these value based on the number of DC we have in a site? If so how and where is the official guidance as I cannot find it.
4) Is this a bug or a feature?
Thank you. - Anonymous
March 04, 2016
Very useful post. Thanks!