SharePoint 2013 Troubleshooting: DistributedCacheService.exe - The process was terminated due to an unhandled exception
Problem
This posting consolidates notes, experience, and references associated with resolving an AppFabric and Distributed Cache problem on a small legacy SharePoint 2013 development farm (1 WFE/CA, 1 SQL). The problem did not seem to significantly affect performance and operationality, but it did fill up server event and SharePoint ULS logs.
The following are some (abbreviated) example error messages raised in the server event logs:
Event ID 7034
Log Name: | System |
Source: | Service Control Manager |
Event ID: | 7034 |
Level: | Error |
Description: | |
The AppFabric Caching Service service terminated unexpectedly. It has done this 461 time(s) |
Event ID 111
Log Name: | Microsoft-Windows-Application Server-System Services/Admin |
Source: | Microsoft-Windows Server AppFabric Caching |
Event ID: | 111 |
Level: | Error |
Description: | |
AppFabric Caching service crashed with exception {Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode<UnspecifiedErrorCode>:SubStatus<ES0001>:No such host is known ---> System.Net.Sockets.SocketException: No such host is known |
Event ID 1000
Log Name: | Application |
Source: | Application Error |
Event ID: | 1000 |
Level: | Error |
Description: | |
Faulting application name: DistributedCacheService.exe, version: 1.0.4632.0... | |
Faulting application path: C:\Program Files\AppFabric 1.1 for Windows Server\DistributedCacheService.exe |
Event ID 1026
Log Name: | Application |
Source: | .NET Runtime |
Event ID: | 1026 |
Level: | Error |
Description: | |
Application: DistributedCacheService.exe Framework Version: v4.0.30319 Description: The process was terminated due to an unhandled exception. | |
Exception Info: Microsoft.ApplicationServer.Caching.DataCacheException at... |
ULS Logs
And these are some of the (abbreviated) entries seen in the SharePoint ULS logs:
Process | OWSTIMER.EXE (0x3308) |
Product | SharePoint Foundation |
Category | Health |
Level | Unexpected |
Message | |
One of the cache hosts in the cluster is down. This indicates that the SharePoint cache client is trying to connect to a wrong cache host. This will lead to unknown failures.. Automatic repair is being attempted. |
and
Process | OWSTIMER.EXE (0x3308) |
Product | SharePoint Foundation |
Category | Health |
Level | Unexpected |
Message | |
HealthRule SPDistributedCacheHostOutOfSync failed for... |
and
Process | OWSTIMER.EXE (0x3308) |
Product | SharePoint Foundation |
Category | Health |
Level | Information |
Message | |
The SharePoint Health Analyzer found and fixed the following problem: One of the cache hosts in the cluster is down.. |
Other related errors and messages were also found. Began troubleshooting.
Troubleshooting Steps
1. Check Windows Server 2012 logs
Examples are shown above
2. Check Central Administration Services on Server Distributed Cache
Started
3. Check Windows Services AppFabric Caching Service
Status: (not started)
Startup Type: Automatic
Log On As: (farm service account)
4. Check ULS logs
Examples are shown above
5. Check AppFabric Caching Server Status
Execute these commandlets:
Use-CacheCluster
Get-CacheHost
Obtained this output:
HostName : CachePort | Service Name | Service Status | Version Info |
-------------------- | ------------ | -------------- | ------------ |
SPAPP:22233 | AppFabricCachingService | UNKNOWN | 0 [0,0][0,0] |
SPCA:22233 | AppFabricCachingService | UNKNOWN | 0 [0,0][0,0] |
SPWFE:22233 | AppFabricCachingService | DOWN | 3 [3,3][1,3] |
Note: This farm previously had separate CA and APP servers, but these were removed from the farm, and the farm was consolidated to a single WFE server.
6. Check SharePoint Distributed Cache Server Status
Execute these commandlets:
Get-SPServiceInstance | ? {($_.service.tostring()) -eq "SPDistributedCacheService Name=AppFabricCachingService"} | ft Server, Status -auto
Obtained the results like:
Server | Status |
------ | ------ |
SPServer Name=SPWFE | Online |
Analysis:
these results were consistent with what is presented in Central Administration Services on Server.
7. Remove erroneous AppFabric Service instances
Executed these commandlets to remove one instance:
$instanceName = "SPDistributedCacheService Name=AppFabricCachingService" $serviceInstance = Get-SPServiceInstance | ? {( $_ . service . tostring()) -eq $instanceName -and ( $_ . server . name) -eq "SPAPP" } $serviceInstance . Unprovision() $serviceInstance . Delete()
Obtained this result:
You cannot call a method on a null-valued expression. |
At line:1 char:1 |
+ $serviceInstance.Unprovision() |
Also tried by setting $_.servername to the FQDN for SPAPP, but this also failed. Both $instanceName and $serviceInstance contained NULL.
8. Get service instance connection string:
Navigated to: C:\Program Files\AppFabric 1.1 for Windows Server\
Looked for and open this file: DistributedCacheService.exe.config.
Searched for connectionString and noted down its value.
9. Remove erroneous AppFabric Service instances II
Executed this commandlet:
Unregister-CacheHost -HostName "SPAPP" -provider "System.Data.SqlClient" -ConnectionString "Data Source=SQLALIAS;Initial Catalog=Config;Integrated Security=True;Enlist=False"
Obtained these results:
Unregister-CacheHost : ErrorCode<unspecifiederrorcode>:SubStatus<es0001>:No such host is known...</es0001></unspecifiederrorcode> |
Also tried by using FQDN for SPAPP, but also failed.
10. Remove AppFabric Service instanced III
Executed this commandlet:
Unregister-CacheHost -HostName "[FQDN]" -ProviderType SPDistributedCacheClusterProvider -ConnectionString "\\[FQDN]"
Obtained this result:
11. Remove erroneous hosts from CacheCluster Configuration file
Performed these steps:
- Exported the AppFabric CacheCluster configuration to a file using: Export-CacheClusterConfig -Path "d:\config.xml"
- Opened the file in NotePad, and in the "<hosts>" section, identified the two host entries corresponding to SPAPP and SPCA.
- Removed these entries, and then saved and closed the file.
- Imported the modified configuration file using: Import-CacheClusterConfig -Path "D:\Temp\config_REVISED.xml".
- Started the cluster using: start-cachecluster.
Obtained this result:
cache cluster started without error
12. Check AppFabric Caching Server Status
Executed these commandlets
Use-CacheCluster Get-CacheHost
obtained these results
HostName : CachePort | Service Name | Service Status | Version Info |
-------------------- | ------------ | -------------- | ------------ |
SPWFE:22233 | AppFabricCachingService | UP | 3 [3,3][1,3] |
13. Check SharePoint Distributed Cache Server Status
Executed this commandlet:
Get-SPServiceInstance | ? {($_.service.tostring()) -eq "SPDistributedCacheService Name=AppFabricCachingService"} | ft Server, Status -auto
Obtained this result:
Server | Status |
------ | ------ |
SPServer Name=SPWFE | Online |
14. Check server event logs
AppFabric and Distributed cache events no longer appear.
Solution
Given the situation, and given that the problem involved AppFabric and not Distributed Cache:
- modify the AppFabric configuration file by removing the erroneous server entries, as presented above.
References
- SharePoint 2013 - Remove faulty Cache Host which is in “Unknown” state from a cache cluster
- How to remove Cache Host from cluster when it is down
- SharePoint 2013 Distributed Cache recommendations
- Get-SPDistributedCacheClientSetting
- Manage the Distributed Cache service in SharePoint Server
- Issue trying to remove a cache host from a dead farm server
- SharePoint 2013 + Distributed Cache (AppFabric) Troubleshooting
- Appfabric caching - No such host is know
- AppFabric Event ID 1000 and Event ID 1026 with SharePoint 2013
- Troubleshooting Distributed Cache for SharePoint 2013 On Premise
- Distributed cache ConnectionString is empty DistributedCacheService.exe.config
- SharePoint AppFabric Error – Failed to connect to hosts in the cluster
- Export-cacheClusterConfig - no reference article found
- Import-cacheCluster - no reference article found
Notes
tbd