Cache retry fails .. what next ??

When using In-Role Cache or Cache Service applications may get retry’ble error such as below

ErrorCode<ERRCA0017>:SubStatus<ES0002>:There is a temporary failure. Please retry later. (The request did not find the primary.). Additional Information : The client was trying to communicate with the server: net.tcp://<IP>:20004/. ---> Microsoft.ApplicationServer.Caching.DataCacheException:  ……………

.

.

ErrorCode<ERRCA0016>:SubStatus<ES0001>:The connection was terminated_ possibly due to server or network problems or serialized Object size is greater than MaxBufferSize on server. Result of the request is unknown.. Additional Information : The client was trying to communicate with the server: net.tcp://<IP>:20003 ………….

Reasons in general can be in case of High Availability the underlying cache service is load balancing the partitions and the secondary node is transitioning to primary and the client still is sending request to old primary node OR for some reason the cache service got moved to a different VM as part of service healing process but cache client still is having the old IP address of cache service VM.

Though its good to have a retry policy in place but in extreme cases where retry is not helping then you could use below approach in your application to mitigate the errors by refreshing the cache client wen an exception is thrown.

Note : Microsoft Azure Cache Product Group have incorporated the refresh logic in Azure SDK 2.7. Please leverage below code for the applications using Azure SDK 2.6 or below versions. Please find the modified code for Azure SDK 2.7 at the bottom of this article.

Sample code (Azure SDK 2.6 or below version)

Application Code

 try {
 
 DataCacheHelper.DataCache.Get("key");
 
 }
 
 catch (DataCacheException) {
 
 DataCacheHelper.Refresh();
 
 }
 

DataCacheHelper.cs

 using Microsoft.ApplicationServer.Caching;
 
 using System;
 
 using System.Reflection;
 
 namespace DataCacheHelpers {
 
 public static class DataCacheHelper {
 
 private static DataCacheFactory _factory;
 
 private static DataCache _cache;
 
 public static DataCacheFactory DataCacheFactory {
 
 get {
 
 if (_factory == null) {
 
 _factory = new DataCacheFactory();
 
 }
 
 return _factory;
 
 }
 
 }
 
 public static DataCache DataCache {
 
 get {
 
 if (_cache == null) {
 
 _cache = DataCacheFactory.GetDefaultCache();
 
 }
 
 return _cache;
 
 }
 
 }
 
 public static void Refresh() {
 
 var factory = _factory;
 
 if (factory != null) {
 
 factory.Dispose();
 
 _factory = null;
 
 }
 
 _cache = null;
 
 // Clear DataCacheFactory._connectionPool
 
 var coreAssembly = typeof(DataCacheItem).Assembly;
 
 var simpleSendReceiveModulePoolType = coreAssembly.
 
 GetType("Microsoft.ApplicationServer.Caching.SimpleSendReceiveModulePool", throwOnError: true);
 
 var connectionPoolField = typeof(DataCacheFactory).GetField("_connectionPool", BindingFlags.Static | BindingFlags.NonPublic);
 
 connectionPoolField.SetValue(null, Activator.CreateInstance(simpleSendReceiveModulePoolType));
 
 // Clear DistributedCacheSessionStateStoreProvider._staticInternalProvider
 
 var providerType = typeof(Microsoft.Web.DistributedCache.DistributedCacheSessionStateStoreProvider);
 
 var providerField = providerType.GetField("_staticInternalProvider", BindingFlags.Static | BindingFlags.NonPublic);
 
 providerField.SetValue(null, null);
 
 }
 
 }
 
 }
 

Sample code (Azure SDK 2.7 or above version)

Starting Azure SDK 2.7 and above, the core refresh logic remains same, only the re-initialization part changes to use new API instead of reflection. The refresh method now changes as below, rest of the code remains same.

DataCacheHelper.cs

 <snip>public static void Refresh()
 {
    var factory = _factory;     if (factory != null)
      {
         factory.Dispose();
        _factory = null;
        }
    _cache = null;    DataCacheFactory.Reinitialize();
    DistributedCacheSessionStateStoreProvider.Reinitialize();    
   }<snip>

Comments

  • Anonymous
    June 03, 2014
    I'm using the azure cache for handling session state.  Does this work for session state as well?  I'm not using a DataCacheFactory at all.  Thanks!I've been getting this errors as well and the only thing that fixes it is restarting the app.Would it be easier just to restart the role?  RoleEnvironment.RequestRecycle()
  • Anonymous
    June 25, 2014
    Tom Wilson's unanswered question is almost exactly mine.  I am also using the cache only for session state, and no factory is involved.  I wonder whether I should be using the code only after the comment "Clear DistributedCacheSessionStateStoreProvider._staticInternalProvider".  This is difficult to test though, and some further clarification would be welcome.
  • Anonymous
    June 27, 2014
    I would use a global.asax application_error handler for the session state refresh.  That is where you'll catch the datacacheexception.  My customers have been using that successfully.  I would also check if it is explicitly the errca0017 error.  This also applies to substatus 6.  Microsoft knows about this issue and there is an internal bug on this.  Hopefully, we'll see this in future NuGet packages.thanks!mike
  • Anonymous
    July 16, 2014
    This is what I put in my Application_Error in global.asax.  I couldn't catch the error when updating the session.  Does this look right?protected void Application_Error(object sender, EventArgs e)       {           Exception ex = Server.GetLastError();           // Log the exception and notify system operators           if (ex.GetType() == typeof(DataCacheException))           {                   DataCacheHelper.Refresh();           }           else           {               LoggingUtility.TraceError(ex);           }       }
  • Anonymous
    August 25, 2014
    This code assumes only a single DataCacheFactory will exist, however we have a scenario where many caches may be accessed from within a single process and therefore use multiple instances of DataCacheFactory which are created in code and have separate lifetime management.  I notice that the Refresh logic here targets a static field for the connection pool.  I presume therefore that prior to this code being used, ALL active factories should be disposed and then re-initialised following the connection pool reset?Also, I assume that the code after the session state comment can safely be ignored when not using cache for session state?