Health Monitoring Tools (AppFabric 1.1 Caching)
This section describes the various tools and commands available for monitoring the health of a Microsoft AppFabric 1.1 for Windows Server cache cluster. These tools include the following.
Performance Monitor
Event Tracing for Windows (ETW)
System Center Operations Manager
Windows PowerShell
Performance Monitor
The AppFabric Caching features install several Performance Monitor counters. For more information about the available counters, see Performance Coutners for AppFabric Caching. You can observe or log some counter values to determine a baseline of typical cache cluster behavior. For example, in the AppFabric Caching:Cache
category, you might observe that the Total Client Requests / sec
value stays within general ranges that varies with the time of day. You can use this baseline to identify a trend of increasing client requests to the cache cluster that might necessitate adding additional cache hosts.
For more general information about using Performance Monitor, see Using Performance Monitor.
Event Tracing for Windows (ETW)
The AppFabric Caching features use Event Tracing for Windows (ETW) to provide status and error information related to the cache cluster. You can use the Event Viewer to examine the ETW logs for AppFabric Caching features.
Open the Event Viewer on a cache host. For instructions on how to launch the Event Viewer, see Start Event Viewer.
In the left navigation pane, expand the Applications and Services Logs folder.
Then expand Microsoft, Windows, and Application Server-System Services.
Select the Admin log.
The Admin log contains informational updates, such as when the AppFabric Caching Service starts or stops. It also contains warnings and errors. Note that these logs can contain events from other AppFabric features, such as hosting and monitoring. You can choose to filter the log to just the Microsoft-Windows Server AppFabric Caching
source to focus on events related to the AppFabric Caching features.
The Application Server-System Services folder, also contains an Operational log. By default, this log is disabled. To enable it, right-click on the Operational log in the navigation pane, and then click Enable Log. The Operational log contains other events, such as low memory conditions.
When evaluating the health of the cache cluster, it is important to examine the event logs on each of the cache hosts that belong to the cluster. A problem with one cache host can have a negative effect on the entire cache cluster.
The event viewer is useful to regularly monitor the health of the cache cluster. However, when troubleshooting an error, it is possible to get an even more detailed log of the cache cluster activities. This can be done with the tracelog.exe tool. The tracelog.exe tool creates a detailed ETL trace log from the command-line. You can download the tracelog utility as a part of the Windows Software Development Kit. The following command begins logging to the cachedebugtrace.etl file:
tracelog -start debugtrace -f cachedebugtrace.etl -guid "C:\Program Files\Windows Server AppFabric\Manifests\ProviderGUID.txt" -level 5 -cir 512
The following command stops the logging:
tracelog -stop debugtrace
The following command converts the log cachedebugtrace.etl file into a text file named cachedebugtrace.csv.
tracerpt .\cachedebugtrace.etl -o cachedebugtrace.csv -of CSV
Note
Although the traceprt tool enables you to view the contents of the log file generated by tracelog, you may need to work with Microsoft support to fully interpret the information.
System Center Operations Manager
You can use System Center Operations Manager to monitor the health of the AppFabric cache cluster. For more information, see Windows Server AppFabric Management Pack for Operations Manager 2007.
Windows PowerShell
There are several Windows PowerShell commands that indicate the current status and health of a cache cluster. This section demonstrates how to use the following commands.
Get-CacheHost
Get-CacheClusterHealth
Get-CacheStatistics
Note that these commands provide dynamic information based on the current state of the cache cluster. It is often useful to also look at the configuration details with the following commands: Get-CacheConfig
, Get-CacheHostConfig
, and Export-CacheClusterConfig
. These commands are covered in the section Common Cache Cluster Management Tasks (AppFabric 1.1 Caching).
Note
For more information about how to get started with Windows PowerShell, see Common Cache Cluster Management Tasks (Windows Server AppFabric Caching). For a complete list of commands, see Using Windows PowerShell with AppFabric Caching.
Get-CacheHost
Use the Get-CacheHost
command without any parameters to quickly view the status of the cache hosts in the cache cluster. Some problems occur when one or more cache hosts in a cluster are not running. For example, consider the following output from Get-CacheHost
.
PS C:\> Get-CacheHost
HostName : CachePort Service Name Service Status Version Info
-------------------- ------------ -------------- ------------
CacheServer1:22233 AppFabricCachingService UP 1 [1,1][1,1]
CacheServer2:22233 AppFabricCachingService DOWN 1 [1,1][1,1]
CacheServer3:22233 AppFabricCachingService UP 1 [1,1][1,1]
This output shows that there are three cache hosts in the cluster: CachServer1
, CacheServer2
, and CacheServer3
. The Service Status
column indicates that the cache cluster is running, because at least one cache host has a status of UP
. However, CacheServer2
is currently stopped with a status of DOWN
. This could indicate a problem with CacheServer2
, or you might simply need to start the cache host with the Start-CacheHost
command. The Get-CacheHost
command is often the first command you should run to get a high-level overview of the state of the cache cluster.
Get-CacheClusterHealth
Use the Get-CacheClusterHealth
to get detailed information about the health of the cache hosts and the caches residing on those cache hosts. For example, consider the following sample output from the Get-CacheClusterHealth
command.
Cluster health statistics
=========================
HostName = CacheServer1
-------------------------
NamedCache = default
Healthy = 0.00
UnderReconfiguration = 0.00
NotPrimary = 0.00
NoWriteQuorum = 0.00
Throttled = 25.00
NamedCache = Cache1
Healthy = 0.00
UnderReconfiguration = 0.00
NotPrimary = 0.00
NoWriteQuorum = 0.00
Throttled = 25.00
HostName = CacheServer2
-------------------------
NamedCache = Cache1
Healthy = 25.00
UnderReconfiguration = 0.00
NotPrimary = 0.00
NoWriteQuorum = 0.00
Throttled = 0.00
NamedCache = default
Healthy = 25.00
UnderReconfiguration = 0.00
NotPrimary = 0.00
NoWriteQuorum = 0.00
Throttled = 0.00
Unallocated named cache fractions
---------------------------------
Internally, the cache cluster uses a concept of partitions to organize and manage memory. The numbers displayed in the output of the Get-CacheClusterHealth
command are the percentages of the total number of cache cluster partitions. For example, on CacheServer2 the named cache Cache1
is using 25.00
percent of the total partitions and all of those partitions are healthy. However, the specific percentages are not as important as the categories in which those percentages reside. Adding more caches or cache hosts may reduce Cache1 from 25.00
percent to 10.00
percent, but as long as that 10.00
percent is still in the Healthy
category, the cache is still healthy. In the previous example, note that CacheServer1
is showing both caches as Throttled
. This is a low-memory condition on that server. For more information about how to troubleshoot this low-memory condition, see Throttling Troubleshooting (Windows Server AppFabric Caching).
The following table describes each category in the Get-CacheClusterHealth
output.
Health Category | Description |
---|---|
|
The cache is operating normally. This is the target state for all caches. |
|
The cache is under reconfiguration. This is an internal state that may have several causes, but it should be temporary and resolve to healthy. |
|
The cache is not currently available. This can happen when secondary copies are promoted to primary. During this transition, the cache may temporarily have a state of |
|
The cache is read-only, because the cache is unable to create the required number of replicas on secondary cache hosts. This occurs when the cache has the high availability option enabled ( |
|
The cache is read-only, because the cache host is in a throttled memory state. This is a low-memory condition. |
The Unallocated named cache fractions
represents the percentage of cache partitions that have not been allocated to a specific cache host yet. This state normally appears when the cache cluster is started or when a cache host is started or stopped on the running cluster. This state should typically resolve to healthy.
Get-CacheStatistics
The Get-CacheStatistics
Windows PowerShell command provides basic information about the contents of a cache. The following example demonstrates how to display the cache statistics for a cache named Cache1
.
Get-CacheStatistics Cache1
This is sample output from the previous command.
Size : 12408186
ItemCount : 1200
RegionCount : 714
RequestCount : 1200
MissCount : 1200
The previous output shows that there are 1200 items in Cache1
for a total size of 12408186 bytes. There are 714 regions, which could be user-created or system-created. There have been 1200 requests and the same number of misses. However, it is important not to see the MissCount
as a problem indicator in isolation. When the cache cluster is restarted, applications must repopulate the cache. This involves checking to see whether the cached item exists, which increments the MissCount
. A high MissCount
could indicate that the items in the cache have been unexpectedly evicted or that the expiration time on cached items is too low, but these conditions cannot be confirmed with the cache statistics alone. For example, if you use the Put method to add an item that is not in the cache, it increments the MissCount
, but this is not an error condition.
This command can be used together with the Get-CacheConfig
command. For example, if the Get-CacheStatistics
command showed that Cache1
had an unexpectedly large size of 1 GB, you could examine the cache configuration with Get-CacheConfig
to see the eviction and expiration settings.
See Also
Concepts
Managing Cache Cluster Health (Windows Server AppFabric Caching)
2012-10-26