How To Troubleshoot Microsoft Exchange Server Latency or Connection Issues
Written by Samuel Drey, Premier Field Engineer. This article is meant to be a hopefully useful guide to help Microsoft Exchange Server IT Operations teams understand, troubleshoot and remedy situations where users are experiencing issues connecting to the Exchange messaging service via Outlook or OWA. I’ve included information relating to Exchange Server 2003, 2007 and 2010. The following process helps rule out server latencies and helps determine whether a less than optimal messaging user experience comes from a client-side configuration, client-side performance issue or a server-side issue.
Step 1: Check the Application Log and System Log
The first thing to look at is the Application Log and then the System Log for possible errors. Usually, poor messaging experiences caused due to server issues are surfaced by warnings or errors regarding memory or disk issues and are obvious recurring events. For example: Error 9582 stating that “the virtual memory necessary to run your Exchange server is fragmented in such a way that performance may be affected” or Event ID 51 for the disk component stating that “an error was detected on device \Device\Harddisk3\DR3”.
Step 2: Check for Issues Using Key Performance Counters
The second less obvious thing to look at are the performance counters and checking if there are any latencies. The first counters that will indicate performance issues are the RPC latencies counters since all the actions a user does corresponds to RPC requests being sent to the Exchange server.
Here are the steps to follow:
- Check for RPC latencies
- Check for CPU performance issues
- Check for Memory load issues
- Check for Disk bound issues
- Check for Network issues
- Check for Active Directory related issues
- Check for Virus scanning issues
If an issue is not visible in the Application or System Log, then the performance logging analysis will point out the cause(s) of the issue most of the time, provided you use the correct methodology as introduced above.
Conduct Performance Analysis to Fine-Tune Exchange Components and Help Identify Issues
- If users are still able to connect to the Exchange server, but they encounter huge latencies, then performance analysis with will tell you where the issue is.
- As there are hundreds of counters on an Exchange Server it is essential to have a subset of counters to begin with the performance analysis.
Once the component causing the Exchange issue (e.g. Disk, Memory, Network, etc…) has been identified, then we can dig further in the analysis of this component by using more of the component’s counters.
- For example, with the Memory component, we must check the “Available MB” and “Pages/Sec” counters, and if one of these shows an issue, then we will add more Memory counters (total counters for the Memory component is 35). That’s why we start with only 2 counters, Available MB and Pages/Sec. The principle is the same for all other components: take 2 to 4 significant counters, then dig further.
Key Exchange Performance Counters for Monitoring and Troubleshooting
Below are two tables that I created in the past for initial versions of Microsoft’s Premier Exchange Risk Assessment Programs and that I updated since then to fit the evolution of best practices:
- Exchange 2007/2010 counters table
- Exchange 2003 counters table
We really encourage administrators to focus on these specific counters to effectively monitor their Exchange infrastructure and to proactively identify potential performance issues. Usually these are a subset of the System Center Operations Manager (SCOM) Exchange Management Pack rules, so use tables below to tune SCOM alerts to focus on the most important ones. If you are using another monitoring application, integrate the above counters into your monitoring solution.
Exchange Server 2007/2010 Key Performance Counters
Here is the selection of the key Exchange 2007/2010 counters that will help point out where the issue is (you can copy/paste the relevant counter names):
For additional information, check out Monitoring Without System Center Operations Manager. | |||
SERVER ROLE | COUNTER | Check | Expected |
Database and Database ==> Instances | |||
MAILBOX AND HUB | MSExchange Database(Information Store)\Database Page Fault Stalls/sec | Avg | <10 |
Max | <100 | ||
MAILBOX | MSExchange Database ==> Instances(*)\Log Generation Checkpoint Depth | Max | <=500 |
MAILBOX | xchange Database(Information Store)\Version buckets allocated | Max | <=12000 |
HUB | MSExchange Database ==> Instances(edgetransport/Transport Mail Database)\Log Generation Checkpoint Depth | Max | <=1000 |
HUB | MSExchange Database ==> Instances(edgetransport/Transport Mail Database)\Version buckets allocated | MAX | <=200 |
LogicalDisk (or substitute PhysicalDisk if Logical is unavailable) | |||
MAILBOX | LogicalDisk – Temp/Page File Disks | ||
LogicalDisk\Average Disk sec/Read | Avg | <10ms | |
Max | <=50ms | ||
LogicalDisk\Average Disk sec/Write | Avg | <10ms | |
Max | <=50ms | ||
HUB | LogicalDisk – SMTP | ||
LogicalDisk\Average Disk sec/Read | Avg | <20ms | |
Max | <=50ms | ||
LogicalDisk\Average Disk sec/Write | Avg | <20ms | |
Max | <=50ms | ||
MAILBOX | LogicalDisk – Databases | ||
LogicalDisk\Average Disk sec/Read | Avg | <=20ms | |
LogicalDisk\Average Disk sec/Write | Avg | <=100ms | |
LogicalDisk – Transaction Logs | |||
LogicalDisk\Average Disk sec/Read | Avg | <=20ms | |
LogicalDisk\Average Disk sec/Write | Avg | <=10ms | |
Logical Disk - All disks | |||
CAS | LogicalDisk(_Total)\Disk Reads/sec | Max | <=50 |
LogicalDisk(_Total)\Disk Writes/sec | Max | <=50 | |
Memory | |||
COMMON | Memory\Available Mbytes (MB) | Min | >=100Mb |
Memory\Pages/sec | Max | <1,000 | |
MSExchangeDSAccess | |||
COMMON | MSExchange ADAccess Domain Controllers(*)\LDAP Read Time | Avg | <=50ms |
Max | <=100ms | ||
MSExchange ADAccess Domain Controllers(*)\LDAP Search Time | Avg | <=50ms | |
Max | <=100ms | ||
MSExchange ADAccess Domain Controllers(*)\LDAP Searches timed out per minute | Max | <=10 | |
MSExchange ADAccess Domain Controllers(*)\Long running LDAP operations/Min | Max | <=50 | |
MSExchangeIS | |||
MAILBOX | MSExchangeIS Public(_Total)\Replication Receive Queue Size | Max | <=100 |
MSExchangeIS\RPC Averaged Latency | Avg | <=25ms | |
MSExchangeIS\RPC Num. of Slow Packets | Avg | <=1 | |
Max | <=3 | ||
MSExchangeIS\RPC Operations/sec | Avg | info only | |
Min | |||
Max | |||
MSExchangeIS\RPC Packets/sec | Avg | info only | |
Min | |||
Max | |||
MSExchangeIS\RPC Requests | Max | <70 | |
MSExchangeIS\Virus Scan Queue Length | Max | <=10 | |
MSExchangeIS\VM Largest Block Size | Min | info | |
MSExchangeIS\VM Total 16MB Free Blocks | Min | info | |
MSExchangeIS\VM Total Free Blocks | Min | info | |
MSExchangeIS\VM Total Large Free Block Bytes | Min | info | |
Network Interface | |||
COMMON | Network Interface\Bytes Total/sec | Max | <=7MBps or <=70MBPS |
Network Interface\Current Bandwidth | special | ||
Network Interface\Packets Outbound Errors | Max | =0 | |
Process, Processor, and System | |||
COMMON | Processor(_Total)\% Processor Time | Avg | <=75% |
Processor(_Total)\% User Time | Avg | <=75% | |
Processor(_Total)\% Privileged Time | Avg | <=75% | |
Process(*)\% Processor Time | special | ||
System\Processor Queue Length (all instances) | Avg | <=5 per proc | |
SMTP Server | |||
HUB | \MSExchangeTransport Queues(_total)\Aggregate Delivery Queue Length (All Queues) | Avg | <=3000 |
Max | <=5000 | ||
\MSExchangeTransport Queues(_total)\Active Remote Delivery Queue Length | Max | <=250 | |
\MSExchangeTransport Queues(_total)\Active Mailbox Delivery Queue Length | Max | <=250 | |
\MSExchangeTransport Queues(_total)\Submission Queue Length | Max | <=100 | |
\MSExchangeTransport Queues(_total)\Active Non-Smtp Delivery Queue Length | Max | <=250 | |
\MSExchangeTransport Queues(_total)\Retry Mailbox Delivery Queue Length | Max | <=100 | |
\MSExchangeTransport Queues(_total)\Retry Non-Smtp Delivery Queue Length | Max | <=100 | |
\MSExchangeTransport Queues(_total)\Retry Remote Delivery Queue Length | Max | <=100 | |
\MSExchangeTransport Queues(_total)\Unreachable Queue Length | Max | <=100 | |
CAS Server | |||
CAS | Outlook Web Access Counters | ||
MSExchange OWA\Average Response Time | Max | <=100ms | |
MSExchange OWA\Average Search Time | Max | <=31000ms | |
CAS to MBX connection | |||
RPC/HTTP Proxy\Number of Failed Back-End Connection attempts per Second | Max | =0 | |
Client Access Server OAB Download Counters | |||
MSExchangeFDS:OAB(*)\Download Task Queued | Max | =0 |
Exchange Server 2003 Key Counters
Here is the selection of the key Exchange 2003 counters that will help point out where the issue is (you can copy/paste the relevant counter names):
Exchange server 2003 counters | |||
COUNTER | Check | Expected | Links for more information |
Database and Database ==> Instances | |||
Database ==> Instances(*)\Log Record Stalls/sec | Avg | <10 | |
Max | <100 | ||
LogicalDisk (or substitute PhysicalDisk if Logical is unavailable) | |||
LogicalDisk – Temp/Page File Disks | |||
LogicalDisk\Average Disk sec/Read | Avg | <10ms | |
Max | <=50ms | ||
LogicalDisk\Average Disk sec/Write | Avg | <10ms | |
Max | <=50ms | ||
Paging File\% Usage | Avg | <50% | |
LogicalDisk – SMTP | |||
LogicalDisk\Average Disk sec/Read | Avg | <10ms | |
Max | <=50ms | ||
LogicalDisk\Average Disk sec/Write | Avg | <10ms | |
Max | <=50ms | ||
LogicalDisk – Database | |||
LogicalDisk\Average Disk sec/Read | Avg | <20ms | |
Max | <=50ms | ||
LogicalDisk\Average Disk sec/Write | Avg | <20ms | |
Max | <=50ms | ||
LogicalDisk – Database (additionnal disk 1) | |||
LogicalDisk\Average Disk sec/Read | Avg | <20ms | |
Max | <=50ms | ||
LogicalDisk\Average Disk sec/Write | Avg | <20ms | |
Max | <=50ms | ||
LogicalDisk – Transaction Logs | |||
LogicalDisk\Average Disk sec/Read | Avg | <5ms | |
Max | <=50ms | ||
LogicalDisk\Average Disk sec/Write | Avg | <10ms | |
Max | <=50ms | ||
LogicalDisk – Transaction Logs (additionnal disk 1) | |||
LogicalDisk\Average Disk sec/Read | Avg | <5ms | |
Max | <=50ms | ||
LogicalDisk\Average Disk sec/Write | Avg | <10ms | |
Max | <=50ms | ||
Memory | |||
Memory\Available Mbytes (MB) | Min | >=50 | |
Memory\Free System Page Table Entries | Min | >=5000 | |
Memory\Pages/sec | Max | <1,000 | |
MSExchangeDSAccess | |||
MSExchangeDSAccess Process\LDAP Read Time (for all processes) | Avg | <50ms | |
Max | <=100ms | ||
MSExchangeDSAccess Process\LDAP Search Time (for all processes) | Avg | <50ms | |
Max | <=100ms | ||
MSExchangeIS | |||
MSExchangeIS Public\Replication Receive Queue Size | Max | <=1000 | |
MSExchangeIS\RPC Averaged Latency | Max | <=50ms or 100ms | |
Avg | |||
MSExchangeIS\RPC Operations/sec | Avg | info only | |
Min | |||
Max | |||
MSExchangeIS\RPC Packets/sec | Avg | info only | |
Min | |||
Max | |||
MSExchangeIS\RPC Requests | Max | <30 | |
MSExchangeIS\Virus Scan Queue Length | Max | <=10 | |
MSExchangeIS\VM Largest Block Size | Min | >32Mb | |
MSExchangeIS\VM Total 16MB Free Blocks | Min | >=1 | |
MSExchangeIS\VM Total Free Blocks | Min | >=1 | |
MSExchangeIS\VM Total Large Free Block Bytes | Min | >50MB | |
Network Interface | |||
Network Interface\Bytes Total/sec | Max | <7MBps or <70MBps | |
Network Interface\Current Bandwidth | special | ||
Network Interface\Packets Outbound Errors | Max | =0 | |
Process, Processor, and System | |||
Processor\% Processor Time (_Total) | Avg | <80% | |
Processor\% Privileged Time (_Total) | Avg | special | |
Process(*)\% Processor Time (_Total) | special | ||
Process(*)\% Privileged Time (_Total) | special | ||
Process(*)\Virtual Bytes (store) | Max | <2.8GB | |
System\Processor Queue Length | Avg | <2 | |
SMTP Server | |||
SMTP Server\Categorizer Queue Length | Max | <10 | |
Avg | |||
SMTP Server\Local Queue Length | Max | <1000 | |
SMTP Server\Remote Queue Length | Max | <1000 | |
Avg | info |
Hope you found this helpful. At a later date, I will provide the equivalent procedure to help you troubleshoot client-side latencies.
Comments
Anonymous
January 01, 2003
Just what I needed to review baseline performance (in SCOM) for our Exchange 2010 servers. Great Article. Elsa Braun, Milwaukee, WI.Anonymous
January 01, 2003
Hi Sam
Shall we have performance counters for Exchange 2013- Anonymous
March 18, 2019
Sure, you can find these (and the original docs.microsoft.com links) on the following Technet Blog article:https://blogs.technet.microsoft.com/samdrey/2015/01/26/exchange-2013-performance-counters-and-their-thresholds/CheersSam
- Anonymous
Anonymous
January 01, 2003
Great article, can we use the same counter for 2013 or receive a set of new counter and value ?
Thanks in advance !- Anonymous
March 18, 2019
Hi Christophe,Sure, you can find these (and the original docs.microsoft.com links) on the following Technet Blog article:https://blogs.technet.microsoft.com/samdrey/2015/01/26/exchange-2013-performance-counters-and-their-thresholds/CheersSam
- Anonymous
Anonymous
July 24, 2013
Is there somewhere we can download an importable file for these data collection points?Anonymous
November 20, 2013
Great article Samuel! Hi Derek, You can get ExPerfWiz (Powershell script which will autodetect the Exchange server role and import the corresponding Exchange specific Perfmon counters) http://experfwiz.codeplex.com/ For clues, on which specific counter to look at under a microscope, use the Exchange version specific threshold file included in PAL. Generate the PAL report for an overall view, then dig deeper on finding co-relations between RPC Latency spikes and the 4 usual suspects (Disk, Memory, CPU, Network). To create your own PAL threashold file, refer to technet.microsoft.com/.../dd335215.aspx blogs.technet.com/.../how-to-create-a-threshold-file-for-the-pal-tool.aspxAnonymous
July 22, 2014
Performance Counters for exchange server 2013Anonymous
January 28, 2015
Great article, can we use the same counter for 2013 or receive a set of new counter and value ?
Thanks in advance !