For Exchange 2010, 2013, and 2016 do this before calling Microsoft

Hi Everybody! I wanted to post some steps that might save you some time and money. Although all of the below settings are published on Microsoft somewhere, they aren't all together or always specifically called out to assist in creating a more efficient and stable Exchange environment. I have noted where applicable if it is an official recommendation from the Exchange Product Group ( PG ). I, as a Microsoft Exchange Support Escalation Engineer have enjoyed quite a bit of success with these settings in relation to the symptoms I list below. If you are experiencing any of the following, do this before opening a case with Microsoft. Many cases can be resolved right here. The folks that will need a case with Microsoft will short cut the troubleshooting time and reduce the items in any given action plan which, will further shorten the time to resolution. Even if you are not having problems, this is a good tune up.

1. Intermittent Outlook connectivity issues.
2. Intermittent Activesync connectivity issues.
3. Slow mail delivery to Outlook or Activesync devices.
4. High LDAP search times.
5. High CPU utilization.

My team mate David Paulson maintains a script that will test several of the things I mention below and give you remediation advice. This needs to be freshly downloaded every time you want to use it as it is constantly updated with new versions of Exchange and .Net hotfixes if any. It can be downloaded from here:

Exchange Server Performance Health Checker Script https://github.com/dpaulson45/HealthChecker/releases .

Run the above script, follow it's guidance , then come back here and see if I covered anything the script did not.

For All 2010,2013, and 2016 Exchange Servers:

1.Set the following in the registry. *Please Note, the keys do not exist and will need to be created:

Minimum Connection Timeout. Configure the RPC timeout on Exchange servers to make sure that components which use RPC will trigger a keep alive signal within the time frame you specify here. This will help keep network devices in front of Exchange from closing sessions prematurely:
HKLM\Software\Policies\Microsoft\Windows NT\RPC\MinimumConnectionTimeout
DWORD  0x00000078 (120 decimal)

Set Keep alive timeout. Determines how often TCP sends keep-alive transmissions. TCP sends keep-alive transmissions to verify that an idle connection is still active. Many network devices such as load balancers and firewalls use an aggressive 5 minute idle session time out. This will help keep those devices from closing a session prematurely:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\KeepAliveTime
DWORD value = 1200000 (12000000 decimal)

There is debate about the above value. I have it set for 20 minutes, but anywhere between 15 and 30 minutes would be viable.

My team mate Josh Jerdon has written an excellent script that will tell you your current  KeepAliveTime for all servers in your Exchange org and set it for you if you like. It can be found here: https://gallery.technet.microsoft.com/office/TCP-Keep-Alive-Time-Report-c9a240d0 .

 

*Note: Item 1 is expected to be done in conjunction with item 9. Please be sure to follow both.

2. Install the latest tcpip.sys for your server OS. This is a recommendation from several different networking engineers that I have worked with on Exchange cases involving a suspected network issue. There is no official recommendation from PG to do this, it is just another one of those items I have enjoyed some success with. If you are doing your Windows Updates, you can ignore this as you are getting the needed updates.

3. If your servers are hosted on VMWare, follow this: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2039495. Exchange is a very bursty server and there is a potential issue in ESXi 4.x and 5.x where packet loss can occur during periods of very high traffic bursts. Increasing the Rx buffer as described in this article can prevent that.

4. Disable Hyper-threading on physical servers and at the VM level for virtualized servers. This is an official recommendation from PG and is discussed in https://technet.microsoft.com/en-us/library/dn879075(v=exchg.150).aspx under the processing section.

5. DO NOT use Dynamic Memory Allocation, it is not supported as discussed here: https://technet.microsoft.com/en-us/library/jj619301(v=exchg.160).aspx#BKMK_ExchangeMemory . Please use fixed or reserved memory.

6. Set your Power management. This is an official recommendation from PG and is discussed in https://technet.microsoft.com/en-us/library/dn879075(v=exchg.150).aspx in the power management section.:
Set BIOS to allow the operating system (OS) to manage power.
In the OS, turn on the High Performance power plan.

7. Use the default SNP offload settings where available, and make sure that RSS is enabled (the default setting in Windows Server 2012 and later). RSS will help scale CPU utilization, especially on 10GbE. RSS and TCP Chimney Offload has to be set both in the OS and in the NIC device configuration to work. This is an official recommendation from PG and is discussed in https://technet.microsoft.com/en-us/library/dn879075(v=exchg.150).aspx . To set in the OS run the following in an elevated cmd prompt:
netsh int tcp set global rss=enabled
netsh int tcp set global chimney=Automatic

8. For Exchange 2010, the page file size minimum and maximum must be set to physical RAM plus 10MB As discussed in Exchange 2010 System Requirements (See Below for Exchange 2013 and 2016) https://technet.microsoft.com/library/aa996719%28EXCHG.140%29.aspx.

9. Make sure your TCP Idle session time out on the load balancer is set to at least 30 minutes. As you traverse network devices out ( firewalls, routers, ect. ), the idle session time out should get successively higher. So, NLB is set at 30 minutes, router set to 35, firewall set to 40, all the way out to either the client or the network border. Higher times are acceptable ( within reason ) as long as you follow the formula. DON'T SKIP THIS STEP!! IT IS VERY IMPORTANT! This is discussed in https://msdn.microsoft.com/en-us/library/dn643702.aspx

Specific to Exchange 2013/2016:

1. Install at least N-1 Cumulative Update. The current list with release date can be found here: Exchange Server Updates: build numbers and release dates https://technet.microsoft.com/en-us/library/hh135098(v=exchg.150).aspx. If you are not on at least N-1, you are dealing with issues you don't need to. PG's official recommendation is to keep no more than an N-1 where N = the current CU. Also, the .NET recommendations I am going to make need CU7 or higher to be most effective.

2. Install the latest .NET for your CU of Exchange. The .NET Support Matrix can be found here: https://technet.microsoft.com/en-us/library/ff728623(v=exchg.150).aspx

3. Set the page file to installed memory + 10MB or set the page file to 32GB + 10MB (32,778MB) if more than 32GB of memory is installed. Make sure your page file is not set to be on your mailbox database or log file drives. This is an official recommendation from PG and is discussed in Exchange 2013 Sizing and Configuration Recommendations https://technet.microsoft.com/en-us/library/dn879075(v=exchg.150).aspx .

4. Do not - Do not - Do not scale up your hardware past 24 cores and 192GB of memory. The guidance in Exchange 2013 Sizing and Configuration Recommendations https://technet.microsoft.com/en-us/library/dn879075(v=exchg.150).aspx needs to be followed. Exchange 2013 was designed with O365 in mind. A large number of commodity servers in a data warehouse versus a few monster servers. If you ignore this guidance, you may have problems. The more cores above 24 you go, the more potential for trouble you will have.

Revised May 30, 2018

Comments

  • Anonymous
    January 01, 2003
    Thanks to putting all the informations together! Very useful!

  • Anonymous
    January 01, 2003
    I would definitely not recommend staying on CU4. If nothing else, there are security updates that need to be considered such as KB 3045301 SMTP is not transported over TLS 1.1 or TLS 1.2 protocol in an Exchange Server 2013 environment as well as DST updates. in CU8. Also, there are MANY fixes and enhancements that are not listed in those CU update KB's. Only the "big" ones get listed.

  • Anonymous
    January 01, 2003
    Great post, very useful information.

  • Anonymous
    January 01, 2003
    "1. Install at least N-1 Cumulative Update."

    If I don't have problems which is listed in the "Issues that the cumulative update resolves" sections - can I stay on SP1(CU4)?

  • Anonymous
    January 01, 2003
    Hi Selahattin - The guidance from the product group did not distinguish between regular disks and SSD's.

  • Anonymous
    March 30, 2015
    Great post
    Start implementation on a very large customer.
    Thanks to putting all together.

  • Anonymous
    April 06, 2015
    very nice job! Even if I use SSD on servers do I have to set pagefile? thanks

  • Anonymous
    June 04, 2015
    Great info put together.. Loved it. Referring to all my customers.

  • Anonymous
    June 17, 2015
    Excellant Post. Resolved same issue. Without PS Call.

  • Anonymous
    July 18, 2015
    I think this info is great. This is a very nice consolidation of information. I will be sharing this with my teams.

  • Anonymous
    October 07, 2015
    Excellent post Dave. I believe this will be a major time saver and reduce the customer and engineer stress.

  • Anonymous
    October 19, 2015
    Thank you for the info this is extremely helpful.

  • Anonymous
    November 19, 2015
    Fantastic roll up / consolidation of recommended settings! Thanks!!

  • Anonymous
    January 15, 2016
    David, I have a question on point number 2, I do not find KB2775511 to download and KB2728738 never get completes installing. I tried installing KB2728738 on couple of windows server 2008 r2 enterprise sp1 and it stays on installing page and never gets completed even after 3-4 hours hence i killed the task and rebooted servers and it took 40 minutes to come-up. Can you help here please? Also what is the correct method to update tcpip.sys ?

    • Anonymous
      April 22, 2016
      Hey Rajeev , can you provide me your server OS build ? is it a 2008 R2 ?
  • Anonymous
    March 30, 2016
    There have been reports of issues with the TCP updates described in step 2. If you are performing your normal Windows updates at a regular interval, your TCP.SYS should be on a current version and you can ignore this step. If you do install the updates, it is not uncommon for the update to take an hour to install.

  • Anonymous
    May 10, 2016
    This is the treasure trove. Really appreciate your efforts.

  • Anonymous
    September 11, 2016
    Amazing article. We have same issues in our environment and will follow this. However, one concern is related to Chimney and RSS settings. We have it Disabled for Exch 2010 on win 2008 R2 SP1. The below article says it should be disabled:https://blogs.technet.microsoft.com/samdrey/2013/12/02/exchange-2007-2010-2013-on-windows-2008-2008-r2-check-tcp-chimney-windows-settings-and-status/

  • Anonymous
    September 21, 2016
    I Suppose there is a typo: DWORD value = 900000. We need it less then 30 minutes so it should be 90000.

    • Anonymous
      October 20, 2016
      As an FYI to Dmitry Opaleychuk, the keepalivetime is in milliseconds, not seconds. So the 900,000 is only 15 minutes.You can see it documented here:https://technet.microsoft.com/en-us/library/cc957549.aspx
  • Anonymous
    May 02, 2017
    Windows Server 2016 does not have this registry setting - HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\KeepAliveTime

    • Anonymous
      May 24, 2017
      The key will need to be created. It does not exist in any supported version of Windows Server. The default by the OS is 2 hours if the key does not exist.
  • Anonymous
    September 06, 2017
    The comment has been removed

  • Anonymous
    December 06, 2017
    The sizing and configuration recommendations have changed for Exchange 2016 recently. I believe a 192GB server is supported.

    • Anonymous
      May 16, 2018
      Hi Steven, You You are correct, I have update the blog accordingly. Thanks for the heads up.
    • Anonymous
      May 29, 2018
      You are correct, I have updated the blog to reflect the new guidance. Thanks for letting me know.
  • Anonymous
    April 20, 2018
    Thanks Dave, great articleSharon

  • Anonymous
    May 22, 2018
    I just assume 120 decimal in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\KeepAliveTimeDWORD value = 1200000 (120 decimal) is a mistake?

  • Anonymous
    May 29, 2018
    (This comment has been deleted per user request)