Del via


Monitoring Common Counters

Microsoft Exchange Server 2007 will reach end of support on April 11, 2017. To stay supported, you will need to upgrade. For more information, see Resources to help you upgrade your Office 2007 servers and clients.

 

Applies to: Exchange Server 2007 SP1, Exchange Server 2007 SP2, Exchange Server 2007 SP3

This topic provides guidance about the most useful performance counters to monitor that are common among all Microsoft Exchange Server 2007 server roles. When monitoring Exchange 2007 servers, you should know which performance aspects are most important. The common counters and threshold values detailed in this topic can be used to proactively identify potential issues and help identify the root cause of issues when troubleshooting.

Processor and Process Counters

Generally, identifying that a server is processor-bound is straightforward. Use the counters listed in the following table to determine whether there are any contentions on the processors.

Counter Expected values

Processor(_Total)\% Processor Time

Shows the percentage of time that the processor is executing application or operating system processes. This is when the processor is not idle.

Should be less than 75% on average.

Processor(_Total)\% User Time

Shows the percentage of processor time that is spent in user mode.

User mode is a restricted processing mode designed for applications, environment subsystems, and integral subsystems.

Should remain below 75%.

Processor(_Total)\% Privileged Time

Shows the percentage of processor time that is spent in privileged mode. Privileged mode is a processing mode designed for operating system components and hardware-manipulating drivers. It allows direct access to hardware and all memory.

Should remain below 75%.

Process(*)\% Processor Time

Shows the percentage of elapsed processor time that all process threads used to execute instructions. An instruction is the basic unit of execution in a computer; a thread is the object that executes instructions; and a process is the object created when a program is run. Code executed to handle some hardware interruptions and trap conditions are included in this count.

If total processor time is high, use this counter to determine which process is causing high CPU.

Not applicable.

System\Processor Queue Length (all instances)

Indicates the number of threads each processor is servicing.

Processor Queue Length can be used to identify if processor contention or high CPU utilization is caused by the processor capacity being insufficient to handle the workloads assigned to it. Processor Queue Length shows the number of threads that are delayed in the Processor Ready Queue and are waiting to be scheduled for execution. The value listed is the last observed value at the time the measurement was taken.

On a computer with a single processor, observations where the queue length is greater than 5 are a warning that there is frequently more work available than the processor can handle readily. When this number is greater than 10, it is a strong indicator that the processor is at capacity, particularly when coupled with high CPU utilization.

On systems with multiprocessors, divide the queue length by the number of physical processors. A multiprocessor system configured using hard processor affinity (processes are assigned to specific CPU cores), which have large values for the queue length, can indicate that the configuration is unbalanced.

Although Processor Queue Length typically is not used for capacity planning, it can be used to identify if systems within the environment are capable of running the loads or if additional processors or faster processors should be purchased for future servers.

Should not be greater than 5 per processor.

Memory Counters

Exchange 2007 uses more memory than Exchange Server 2003 due to the many changes in how Exchange 2007 operates. Exchange 2003 can only address 4 gigabytes (GB) of physical memory. Also, there are other limitations in the operating system such as paged pool and nonpaged pool memory, which affects the operating system kernel in many ways. There are also limitations in the overall database cache size of the information store, which helps limit the amount of memory that the store process can consume.

With Exchange 2007 running on the 64-bit platform, most of these limitations are removed. Exchange 2007 can address up to the amount of RAM installed in any particular server. The more memory in a server, the more the information store process consumes or caches overall. The main benefit is that more operations are performed in memory, reusing data in the store cache instead of going to the disk to read or write specific data. For more information about Extensible Storage Engine (ESE) database caching and the differences between Exchange 2007 and Exchange 2003, see ESE Database Cache Size in Exchange 2007. For detailed guidance about planning memory configurations for Exchange 2007, see Planning Memory Configurations.

Use the counters listed in the following table to determine whether there are any memory-related issues.

Counter Expected values

Memory\Available Mbytes

Shows the amount of physical memory, in megabytes (MB), immediately available for allocation to a process or for system use. It is equal to the sum of memory assigned to the standby (cached), free, and zero page lists. For a full explanation of the memory manager, refer to Microsoft Developer Network (MSDN) or "System Performance and Troubleshooting Guide" in the Windows Server 2003 Resource Kit.

Should remain above 100 MB at all times.

Memory\Pool Nonpaged Bytes

Consists of system virtual addresses that are guaranteed to be resident in physical memory at all times and can thus be accessed from any address space without incurring paging input/output (I/O). Like paged pool, nonpaged pool is created during system initialization and is used by kernel-mode components to allocate system memory.

Normally not looked at, unless connection counts are very high because each TCP connection consumes nonpaged pool memory.

Not applicable.

Memory\Pool Paged Bytes

Shows the portion of shared system memory that can be paged to the disk paging file. Paged pool is created during system initialization and is used by kernel-mode components to allocate system memory.

Monitor for increases in pool paged bytes indicating a possible memory leak.

Not applicable.

Memory\Cache Bytes

Shows the current size, in bytes, of the file system cache. By default, the cache uses up to 50 percent of available physical memory. The counter value is the sum of Memory\System Cache Resident Bytes, Memory\System Driver Resident Bytes, Memory\System Code Resident Bytes, and Memory\Pool Paged Resident Bytes.

Should remain steady after applications cache their memory usage. Check for large dips in this counter, which could attribute to working set trimming and excessive paging.

Used by the content index catalog and continuous replication log copying.

Not applicable.

Memory\Committed Bytes

Shows the amount of committed virtual memory, in bytes. Committed memory is the physical memory that has space reserved on the disk paging files. There can be one or more paging files on each physical drive. This counter displays the last observed value only; it is not an average.

Determines the amount of committed bytes in use.

Not applicable.

Memory\%Committed Bytes in Use

Shows the ratio of Memory\Committed Bytes to the Memory\Commit Limit. Committed memory is the physical memory in use for which space has been reserved in the paging file should it need to be written to disk. The commit limit is determined by the size of the paging file. If the paging file is enlarged, the commit limit increases, and the ratio is reduced. This counter displays the current percentage value only; it is not an average.

If this value is very high (more than 90 percent), you may begin to see commit failures. This is a clear indication that the system is under memory pressure.

Not applicable.

Memory Paging Counters

When a server is under heavy load or has memory constraints, the operating system can start paging out memory blocks or pages to the paging file to allow the current requesting process to have enough memory to finish its requesting task. If a process requests a page in memory and cannot find this page at the requested location, a page fault occurs. This is called a hard page fault. If a page is found elsewhere in memory, this is called a soft page fault. Most processors can handle soft page faults without a performance issue. If a previously paged out memory block is then requested by an application or process, this memory needs to be transitioned back into memory cache. This process can adversely affect the overall performance of a server because more data is being read from or written to the paging file, which consumes overall processor time.

Use the counters listed in the following table to determine whether there are any paging file issues.

Counter Expected values

Memory->Transition Pages Repurposed/sec

Indicates system cache pressure.

Should be less than 100 on average.

Spikes should be less than 1,000.

Memory\Page Reads/sec

Indicates data must be read from the disk instead of memory. Indicates there is not enough memory and paging is beginning. A value of more than 30 per second means the server is no longer keeping up with the load.

Should be less than 100 on average.

Memory\Pages/Sec

Shows the rate at which pages are read from or written to disk to resolve hard page faults. This counter is a primary indicator of the kinds of faults that cause system-wide delays. It is the sum of Memory\Pages Input/sec and Memory\Pages Output/sec. It is counted in numbers of pages, so it can be compared to other counts of pages, such as Memory\Page Faults/sec, without conversion. It includes pages retrieved to satisfy faults in the file system cache (usually requested by applications) and non-cached mapped memory files.

The values that are returned by the Pages/sec counter may be more than you expect. These values may not be related to either paging file activity or cache activity. Instead, these values may be caused by an application that is sequentially reading a memory-mapped file.

Use Memory\Pages Input/sec and Memory\Pages Output/sec to determine page file I/O.

Should be below 1,000 on average.

Memory\Pages Input/sec

Shows the rate at which pages are read from disk to resolve hard page faults. Hard page faults occur when a process refers to a page in virtual memory that is not in its working set or elsewhere in physical memory, and must be retrieved from disk. When a page is faulted, the system tries to read multiple contiguous pages into memory to maximize the benefit of the read operation. Compare the value of Memory\Pages Input/sec to the value of Memory\Page Reads/sec to determine the average number of pages read into memory during each read operation.

Should be below 1,000 on average.

Memory\Pages Output/sec

Shows the rate at which pages are written to disk to free space in physical memory. Pages are written back to disk only if they are changed in physical memory, so they are likely to hold data, and not code. A high rate of pages output might indicate a memory shortage. Microsoft Windows writes more pages back to disk to free up space when physical memory is in short supply. This counter shows the number of pages, and can be compared to other counts of pages, without conversion.

Should be below 1,000 on average.

Process Memory Consumption Counters

All applications and services on a system run as processes, and it is critical to monitor these processes for unusual memory or processor consumption. Use the counters listed in the following table (and those in the following two sections) to help isolate processes which may be monopolizing system resources.

Counter Expected values

Process(*)\Private Bytes

Shows the current number of bytes this process has allocated that cannot be shared with other processes.

This counter can be used for determining any memory leaks against processes.

For the information store process, compare this counter value with database cache size to determine if there is a memory leak in the information store process. An increase in information store private bytes, together with the same increase in database cache, equals correct behavior (no memory leak).

Not applicable.

Process(*)\Virtual Bytes

Represents (in bytes) how much virtual address space the process is currently consuming.

Used to determine if processes are consuming a large amount of virtual memory.

Not applicable.

Process Working Set Counter

This counter is useful in determining if a process has an increase in working sets, which consumes additional memory and causes slower than normal performance.

Counter Expected values

Process(_Total)\Working Set

Shows the current size, in bytes, of the working set of this process. The working set is the set of memory pages touched recently by the threads in the process. If free memory in the computer is above a threshold, pages are left in the working set of a process event if they are not in use. When free memory falls below a threshold, pages are trimmed from working sets. If they are needed, they will be soft-faulted back to the working set before leaving main memory.

Large increases or decreases in working sets causes paging.

Ensure that the paging file is set to the recommended value of RAM+10. If working sets are being trimmed, add Process(*)\Working set to see what processes are affected. This counter could indicate either system-wide or process-wide issues. Cross-reference this counter with Memory\System Cache Resident Bytes to see if system-wide working set trimming is occurring

Not applicable.

Process Handle Counter

This counter is useful in determining if a process has an increase in handle counts, which consumes additional memory and causes slower than normal performance.

Counter Expected values

Process(*)\Handle Count

Shows the total number of handles currently open by this process. This number is the sum of the handles currently open by each thread in this process.

An increase in handle counts for a particular process may be the symptom of a faulty process with handle leaks, which is causing performance issues on the server. This is not necessarily a problem, but is something to monitor over time to determine if a handle leak is occurring.

Not applicable.

.NET Framework Counters

The Microsoft .NET Framework is an integral part of Exchange 2007. Most Exchange 2007 components have been completely rewritten in managed code based on this framework. Managed code offers some significant advantages over unmanaged code, such as the ability to compile applications in real time, so that you do not worry about running your applications on different architectures or platforms. Managed code also provides the ability to manage memory efficiently. Microsoft .NET connection software uses common language runtime (CLR) to allow easier coding in different languages because they share the same runtime. Any code that you develop with a language compiler that targets the runtime is called managed code, as mentioned in Common Language Runtime Overview.

Building or compiling applications in real time has advantages, such as an increase in performance while this compilation is occurring. During the Exchange installation process, Exchange components are precompiled, which significantly increases installation time. However, the result is increased performance on the server because the initial load time will now be much less. These binaries, after being compiled, are stored in the global assembly cache (GAC) on the local computer. This precompilation process is called NGEN. For more information about NGEN, see Native Image Generator (Ngen.exe).

Exchange 2007 has other dependencies such as Windows kernel and .NET CLR memory management. This is one of the more crucial aspects of Exchange that you must monitor, because if memory is not being managed correctly or is getting severely fragmented, excessive paging could occur, causing an undesired increase in processing power, which significantly impacts overall client latencies.

Monitoring the counters in the following table can help determine if managed applications are causing excessive garbage collection. Garbage collection is essentially a way within CLR to free memory for objects that are no longer being used. If you need to free large amounts of memory over long periods of time, memory constraints are likely, either because not enough memory is available on the server or an application consumes more than its share of memory because of a memory leak. For more information about automatic memory management for CLR, see Automatic Memory Management.

Use the counters listed in the following table to help identify underlying .NET Framework issues.

Counter Expected values

.NET CLR Memory(*)\% Time in GC

Shows when garbage collection has occurred. When the counter exceeds the threshold, it indicates that CPU is cleaning up and is not being used efficiently for load. Adding memory to the server would improve this situation.

If this counter increases to a high value, there might be some objects that are surviving Gen 1 garbage collections and being promoted to Gen 2. Gen 2 collections require a full global catalog for clean up. Add other .NET memory counters to determine if this is the case.

Should be below 10% on average.

.NET CLR Exceptions(*)\# of Excepts Thrown / sec

Displays the number of exceptions thrown per second. These include both .NET exceptions and unmanaged exceptions that get converted into .NET exceptions. For example, the null pointer reference exception in unmanaged code would get thrown again in managed code as a .NET System.NullReferenceException; this counter includes both handled and unhandled exceptions. Exceptions should only occur in rare situations and not in the normal control flow of the program. This counter was designed as an indicator of potential performance problems due to a large (>100 sec) rate of exceptions thrown. This counter is not an average over time; it displays the difference between the values observed in the last two samples divided by the duration of the sample interval.

Should be less than 5% of total RPS (Web Server(_Total)\Connection Attempts/sec * .05).

.NET CLR Memory(*)\# Bytes in all Heaps

Shows the sum of four other counters: Gen 0 Heap Size, Gen 1 Heap Size, Gen 2 Heap Size, and the Large Object Heap Size. This counter indicates the current memory allocated in bytes on the GC Heaps.

These regions of memory are of type MEM_COMMIT. (For details, see Platform SDK documentation for VirtualAlloc.) The value of this counter is always less than the value of Process\Private Bytes, which counts all MEM_COMMIT regions for the process. Private Bytes minus # Bytes in all Heaps is the number of bytes committed by unmanaged objects.

Used to monitor possible memory leaks or excessive memory usage of managed or unmanaged objects.

Not applicable.

Network Counters

The network, and how it is deployed, is essential to the proper performance of an Exchange server. It is uncommon for networks to be network-bound, because 100 megabits per second (Mbps) networks generally offer enough bandwidth for most organizations. However, with increasing message sizes and users per server, it is important to ensure that the network does not become a bottleneck.

Use the counters listed in the following table to determine whether there is any network performance degradation.

Counter Expected values

Network Interface(*)\Bytes Total/sec

Indicates the rate at which the network adapter is processing data bytes.

This counter includes all application and file data, in addition to protocol information such as packet headers.

For a 100-Mbps network adapter, should be below 6–7 Mbps.

For a 1000-Mbps network adapter, should be below 60–70 Mbps.

Network Interface(*)\Packets Outbound Errors

Indicates the number of outbound packets that could not be transmitted because of errors.

Should be 0 at all times.

IPV4\Datagrams/sec

IPV6\Datagrams/sec

Shows the rate, in incidents per second, at which IP datagrams were received from or sent to the interfaces, including those in error. Forwarded datagrams are not included in this rate.

Determines current user load.

Not applicable.

TCPv4\Connections Established

TCPv6\Connections Established

Shows the number of TCP connections for which the current state is either ESTABLISHED or CLOSE-WAIT.

The number of TCP connections that can be established is constrained by the size of the nonpaged pool. When the nonpaged pool is depleted, no new connections can be established.

Determines current user load.

Not applicable.

TCPv4\Segments Received/sec

TCPv6\Segments Received/sec

Shows the rate at which segments are received, including those received in error. This count includes segments received on currently established connections.

Determines current user load.

Not applicable.

TCPv4\Connection Failures

TCPv6\Connection Failures

Shows the number of times TCP connections have made a direct transition to the CLOSED state from the SYN-SENT state or the SYN-RCVD state, plus the number of times TCP connections have made a direct transition to the LISTEN state from the SYN-RCVD state.

An increasing number of failures, or a consistently increasing rate of failures, can indicate a bandwidth shortage.

TCPv4\Connections Reset

TCPv6\Connections Reset

Shows the number of times TCP connections have made a direct transition to the CLOSED state from either the ESTABLISHED state or the CLOSE-WAIT state.

Some browsers send TCP reset (RST) packets, so be cautious when using this counter to determine reset rate.

An increasing number of resets or a consistently increasing rate of resets can indicate a bandwidth shortage.

Exchange Domain Controller Connectivity Counters

Exchange 2007 depends on the performance of the global catalog domain controllers. For each of the Exchange servers in the topology, use the counters listed in the following table to determine whether there is a slowdown in communicating with global catalogs.

Use the counters listed in the following table to determine whether there is any network performance degradation.

Counter Expected values

MSExchange ADAccess Caches(*)\LDAP Searches/Sec

Shows the number of Lightweight Directory Access Protocol (LDAP) search requests issued per second.

Used to determine current LDAP search rate.

Not applicable.

MSExchange ADAccess Domain Controllers(*)\LDAP Read Time

Shows the time in milliseconds (ms) to send an LDAP read request to the specified domain controller and receive a response.

Should be below 50 ms on average.

Spikes (maximum values) should not be higher than 100 ms.

MSExchange ADAccess Domain Controllers(*)\LDAP Search Time

Shows the time (in ms) to send an LDAP search request and receive a response.

Should be below 50 ms on average.

Spikes (maximum values) should not be higher than 100 ms.

MSExchange ADAccess Processes(*)\LDAP Read Time

Shows the time (in ms) to send an LDAP read request to the specified domain controller and receive a response.

Should be below 50 ms on average.

Spikes (maximum values) should not be higher than 100 ms.

MSExchange ADAccess Processes(*)\LDAP Search Time

Shows the time (in ms) to send an LDAP search request and receive a response.

Should be below 50 ms on average.

Spikes (maximum values) should not be higher than 100 ms.

MSExchange ADAccess Domain Controllers(*)\LDAP Searches timed out per minute

Shows the number of LDAP searches that returned LDAP_Timeout during the last minute.

Should be below 10 at all times for all roles.

Higher values may indicate issues with Active Directory resources.

MSExchange ADAccess Domain Controllers(*)\Long running LDAP operations/Min

Shows the number of LDAP operations on this domain controller that took longer than the specified threshold per minute. (Default threshold is 15 seconds.)

Should be less than 50 at all times.

Higher values may indicate issues with Active Directory resources.

Active Directory Domain Controller Performance

The following sections provide information about Active Directory domain controller performance, including specific information about domain controllers running on the Windows Server 2008 or Windows Server 2003 operating systems.

Note

The following sections are not applicable to Exchange 2007 servers with the Edge Transport server role installed.

General Information

Active Directory response times can directly affect the performance of any Exchange server because all LDAP and authentication requests are handled by a domain controller or a series of domain controllers. In troubleshooting efforts, if LDAP latencies are determined to be a cause of a performance issue on an Exchange server, directing your focus to the domain controllers is the next logical step.

Windows Server 2008 Domain Controller Performance Guidance

To help troubleshoot performance-related issues on Windows Server 2008 domain controllers, data collector sets can be used to monitor Active Directory performance using Reliability and Performance Monitor.

On a Windows Server 2008 domain controller, the collector template can be found in Reliability and Performance Monitor under Reliability and Performance\Data Collector Sets\System\Active Directory Diagnostics. This tool collects data for five minutes and generates a report under Reliability and Performance\Reports\System\Active Directory Diagnostics. This report helps you to determine if there are any potential bottlenecks occurring on the server. For example, you could use Reliability and Performance Monitor to help determine if there are any long running LDAP searches that are occurring, which decrease overall performance and LDAP response times. You could also determine if a potential CPU or disk bottleneck exists.

For more information about Reliability and Performance Monitor for Windows Server 2008, see Performance and Reliability.

Windows Server 2003 Domain Controller Performance Guidance

To help aid in troubleshooting performance problems on Windows Server 2003 domain controllers, Server Performance Advisor can be used to help automate data collection.

For more information about downloading Server Performance Advisor, see Microsoft Windows Server 2003 Performance Advisor.