Share via


SharePoint 2013 Troubleshooting: DistributedCacheService.exe - The process was terminated due to an unhandled exception

Problem

This posting consolidates notes, experience, and references associated with resolving an AppFabric and Distributed Cache problem on a small legacy SharePoint 2013 development farm (1 WFE/CA, 1 SQL). The problem did not seem to significantly affect performance and operationality, but it did fill up server event and SharePoint ULS logs.

The following are some (abbreviated) example error messages raised in the server event logs:

Event ID 7034

Log Name: System
Source: Service Control Manager
Event ID: 7034
Level: Error
Description:
The AppFabric Caching Service service terminated unexpectedly. It has done this 461 time(s)

Event ID 111

Log Name: Microsoft-Windows-Application Server-System Services/Admin
Source: Microsoft-Windows Server AppFabric Caching
Event ID: 111
Level: Error
Description:
AppFabric Caching service crashed with exception {Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode<UnspecifiedErrorCode>:SubStatus<ES0001>:No such host is known ---> System.Net.Sockets.SocketException: No such host is known

Event ID 1000

Log Name: Application
Source: Application Error
Event ID: 1000
Level: Error
Description:
Faulting application name: DistributedCacheService.exe, version: 1.0.4632.0...
Faulting application path: C:\Program Files\AppFabric 1.1 for Windows Server\DistributedCacheService.exe

Event ID 1026

Log Name: Application
Source: .NET Runtime
Event ID: 1026
Level: Error
Description:
Application: DistributedCacheService.exe Framework Version: v4.0.30319 Description: The process was terminated due to an unhandled exception.
Exception Info: Microsoft.ApplicationServer.Caching.DataCacheException at...

ULS Logs

And these are some of the (abbreviated) entries seen in the SharePoint ULS logs:

Process OWSTIMER.EXE (0x3308)
Product SharePoint Foundation
Category Health
Level Unexpected
Message
One of the cache hosts in the cluster is down. This indicates that the SharePoint cache client is trying to connect to a wrong cache host. This will lead to unknown failures.. Automatic repair is being attempted.

and

Process OWSTIMER.EXE (0x3308)
Product SharePoint Foundation
Category Health
Level Unexpected
Message
HealthRule SPDistributedCacheHostOutOfSync failed for...

and

Process OWSTIMER.EXE (0x3308)
Product SharePoint Foundation
Category Health
Level Information
Message
The SharePoint Health Analyzer found and fixed the following problem: One of the cache hosts in the cluster is down..

Other related errors and messages were also found. Began troubleshooting.

Troubleshooting Steps

1. Check Windows Server 2012 logs

Examples are shown above

2. Check Central Administration Services on Server Distributed Cache

Started

3. Check Windows Services AppFabric Caching Service

Status: (not started)

Startup Type: Automatic

Log On As: (farm service account)

4. Check ULS logs

Examples are shown above

5. Check AppFabric Caching Server Status

Execute these commandlets:

Use-CacheCluster
Get-CacheHost

Obtained this output:

HostName : CachePort Service Name Service Status Version Info
-------------------- ------------ -------------- ------------
SPAPP:22233 AppFabricCachingService UNKNOWN 0 [0,0][0,0]
SPCA:22233 AppFabricCachingService UNKNOWN 0 [0,0][0,0]
SPWFE:22233 AppFabricCachingService DOWN 3 [3,3][1,3]

Note: This farm previously had separate CA and APP servers, but these were removed from the farm, and the farm was consolidated to a single WFE server.

6. Check SharePoint Distributed Cache Server Status

Execute these commandlets:

 Get-SPServiceInstance | ? {($_.service.tostring()) -eq "SPDistributedCacheService Name=AppFabricCachingService"} | ft Server, Status -auto 

Obtained the results like:

Server Status
------ ------
SPServer Name=SPWFE Online

Analysis:

these results were consistent with what is presented in Central Administration Services on Server.

7. Remove erroneous AppFabric Service instances

Executed these commandlets to remove one instance:

    $instanceName     =  "SPDistributedCacheService Name=AppFabricCachingService"       $serviceInstance     =     Get-SPServiceInstance     |     ?   {(  $_  .  service  .  tostring())   -eq     $instanceName     -and   (  $_  .  server  .  name)   -eq     "SPAPP"  }    $serviceInstance  .  Unprovision()     $serviceInstance  .  Delete()     

Obtained this result:

You cannot call a method on a null-valued expression.
At line:1 char:1
+ $serviceInstance.Unprovision()

Also tried by setting $_.servername to the FQDN for SPAPP, but this also failed. Both $instanceName and $serviceInstance contained NULL.

8. Get service instance connection string:

Navigated to: C:\Program Files\AppFabric 1.1 for Windows Server\

Looked for and open this file: DistributedCacheService.exe.config.

Searched for connectionString and noted down its value.

9. Remove erroneous AppFabric Service instances II

Executed this commandlet:

 Unregister-CacheHost -HostName "SPAPP" -provider "System.Data.SqlClient" -ConnectionString "Data Source=SQLALIAS;Initial Catalog=Config;Integrated Security=True;Enlist=False" 

Obtained these results:

Unregister-CacheHost : ErrorCode<unspecifiederrorcode>:SubStatus<es0001>:No such host is known...</es0001></unspecifiederrorcode>

Also tried by using FQDN for SPAPP, but also failed.

10. Remove AppFabric Service instanced III

Executed this commandlet:

Unregister-CacheHost -HostName "[FQDN]" -ProviderType SPDistributedCacheClusterProvider -ConnectionString "\\[FQDN]"

Obtained this result:

11. Remove erroneous hosts from CacheCluster Configuration file

Performed these steps:

  1. Exported the AppFabric CacheCluster configuration to a file using: Export-CacheClusterConfig -Path "d:\config.xml"
  2. Opened the file in NotePad, and in the "<hosts>" section, identified the two host entries corresponding to SPAPP and SPCA.
  3. Removed these entries, and then saved and closed the file.
  4. Imported the modified configuration file using: Import-CacheClusterConfig -Path "D:\Temp\config_REVISED.xml".
  5. Started the cluster using: start-cachecluster.

Obtained this result:

cache cluster started without error

12. Check AppFabric Caching Server Status

Executed these commandlets

  Use-CacheCluster    Get-CacheHost 

obtained these results

HostName : CachePort Service Name Service Status Version Info
-------------------- ------------ -------------- ------------
SPWFE:22233 AppFabricCachingService UP 3 [3,3][1,3]

13. Check SharePoint Distributed Cache Server Status

Executed this commandlet:

 Get-SPServiceInstance | ? {($_.service.tostring()) -eq "SPDistributedCacheService Name=AppFabricCachingService"} | ft Server, Status -auto 

Obtained this result:

Server Status
------ ------
SPServer Name=SPWFE Online

14. Check server event logs

AppFabric and Distributed cache events no longer appear.

Solution

Given the situation, and given that the problem involved AppFabric and not Distributed Cache:

  •  modify the AppFabric configuration file by removing the erroneous server entries, as presented above.

References

  1. SharePoint 2013 - Remove faulty Cache Host which is in “Unknown” state from a cache cluster
  2. How to remove Cache Host from cluster when it is down
  3. SharePoint 2013 Distributed Cache recommendations
  4. Get-SPDistributedCacheClientSetting
  5. Manage the Distributed Cache service in SharePoint Server
  6. Issue trying to remove a cache host from a dead farm server
  7. SharePoint 2013 + Distributed Cache (AppFabric) Troubleshooting
  8. Appfabric caching - No such host is know
  9. AppFabric Event ID 1000 and Event ID 1026 with SharePoint 2013
  10. Troubleshooting Distributed Cache for SharePoint 2013 On Premise
  11. Distributed cache ConnectionString is empty DistributedCacheService.exe.config
  12. SharePoint AppFabric Error – Failed to connect to hosts in the cluster
  13. Export-cacheClusterConfig - no reference article found
  14. Import-cacheCluster - no reference article found

Notes

tbd