Udostępnij za pośrednictwem


Users are being re-authenticates every around 5 minutes in SP 2013 web application where OKTA as SAML provider.

Recently I have gone through an issue where the end users are being required to re-authenticates to the web application in every 5 minutes or less in their SharePoint 2013 environment where they used OKTA as SAML provider. The re-authentication doesn't force to enter the credentials again instead we see there is a call to the _trust and to OKTA server for getting a valid cookie again.  

By looking at the Fiddler I understand that the SAML token expiration value was 8:05 hrs. 

Created xmlns:wsu="https://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd">2015-04-28T20:44:00.307Z</wsu:Created>

<wsu:Expires xmlns:wsu="https://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd">2015-04-29T04:49:00.307Z</wsu:Expires>

Had a look at the STSConfiguration (Get-SPTrustedIdentityToekIssuer) and found the STS token expiration (LogonTokenCacheExpirationWindow) set to 2 minutes. This means the sliding session should be every 8:03 hrs, which means the client will try to re-authenticate with the SAML provider only every 8:03 hrs.  

Basically the value of the LogonTokenCacheExpirationWindow property is always less than the SAML token lifetime; otherwise, you'll see a loop whenever a user tries to access your SharePoint web application and keeps being redirected back to the token issuer. Here are the details on sliding session and its logic: https://msdn.microsoft.com/en-us/library/hh446526.aspx

So here the questions is why the users being authenticates very frequently? Customer already informed me that they played a lot with the values of DistributedLogonTokenCache parameters. I collected the SharePoint ULS logs in VerboseEx for an hour from their servers using Merge-SPLogFile command and found following,

- Here the request started where the user tried to access a resource in SharePoint.

w3wp.exe (ServerName:0x057C) 0x03C4 SharePoint Foundation Logging Correlation Data xmnv Medium Name=Request (GET:https://SiteURL/_api/sp.userprofiles.PeopleManager/GetMyProperties) 91d7039d-55e9-7036-63d4-13bd0943287a

- Here the exception thrown on ‘Distributed logon token cache’ complaining about the value of MaxBufferSize

w3wp.exe (ServerName:0x057C) 0x03C4 SharePoint Foundation DistributedCache agyfv Unexpected Unexpected error occurred in method 'GetObject' , usage 'Distributed Logon Token Cache' - Exception 'Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode<ERRCA0016>:SubStatus<ES0001>:The connection was terminated, possibly due to server or network problems or serialized Object size is greater than MaxBufferSize on server. Result of the request is unknown. ---> System.TimeoutException: The socket was aborted because an asynchronous receive from the socket did not complete within the allotted timeout of 00:01:00. The time allotted to this operation may have been a portion of a longer timeout. ---> System.IO.IOException: The read operation failed, see inner exception. ---> System.TimeoutException: The socket was aborted because an asynchronous receive from the socket did not complete within the allotted timeout of 00:01:00. The time allotted to this operation may have been a portion of a longer timeout. ---> System.Net.Sockets.SocketException: The I/O operation has been aborted because of either a thread exit or an application request at System.ServiceModel.Channels.SocketConnection.HandleReceiveAsyncCompleted() at System.ServiceModel.Channels.SocketConnection.OnReceiveAsync(Object sender, SocketAsyncEventArgs eventArgs) --- End of inner exception stack trace --- at System.Runtime.AsyncResult.End[TAsyncResult](IAsyncResult result) at System.ServiceModel.Channels.ConnectionStream.ReadAsyncResult.End(IAsyncResult result) at System.Net.FixedSizeReader.ReadCallback(IAsyncResult transportResult) --- End of inner exception stack trace --- at System.Net.Security.NegotiateStream.EndRead(IAsyncResult asyncResult) at System.ServiceModel.Channels.StreamConnection.EndRead() --- End of inner exception stack trace --- at System.Runtime.AsyncResult.End[TAsyncResult](IAsyncResult result) at System.ServiceModel.Channels.TransportDuplexSessionChannel.EndReceive(IAsyncResult result) at Microsoft.ApplicationServer.Caching.WcfClientChannel.CompleteProcessing(IAsyncResult result) --- End of inner exception stack trace --- at Microsoft.ApplicationServer.Caching.DataCache.ThrowException(ResponseBody respBody, RequestBody reqBody) at Microsoft.ApplicationServer.Caching.DataCache.InternalGet(String key, DataCacheItemVersion& version, String region, IMonitoringListener listener) at Microsoft.ApplicationServer.Caching.DataCache.<>c__DisplayClass49.<Get>b__48() at Microsoft.SharePoint.DistributedCaching.SPDistributedCache.GetObject(String key)'. 91d7039d-55e9-7036-63d4-13bd0943287a

 

- The client failed to get the token for user in question from Distributed cache.

w3wp.exe (ServerName:0x057C) 0x03C4 SharePoint Foundation DistributedCache air4c Medium Token Cache: Failed to get token from distributed cache for '05.t|okta|user@domain.com'.(This is expected during the process warm up or if data cache Initialization is getting done by some other thread). Exception: 'Microsoft.SharePoint.DistributedCaching.SPDistributedCacheConnectionTerminatedException: Connection to the server terminated,check if the cache host(s) is running ---> Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode<ERRCA0016>:SubStatus<ES0001>:The connection was terminated, possibly due to server or network problems or serialized Object size is greater than MaxBufferSize on server. Result of the request is unknown. ---> System.TimeoutException: The socket was aborted because an asynchronous receive from the socket did not complete within the allotted timeout of 00:01:00. The time allotted to this operation may have been a portion of a longer timeout. ---> System.IO.IOException: The read operation failed, see inner exception. ---> System.TimeoutException: The socket was aborted because an asynchronous receive from the socket did not complete within the allotted timeout of 00:01:00. The time allotted to this operation may have been a portion of a longer timeout. ---> System.Net.Sockets.SocketException: The I/O operation has been aborted because of either a thread exit or an application request at System.ServiceModel.Channels.SocketConnection.HandleReceiveAsyncCompleted() at System.ServiceModel.Channels.SocketConnection.OnReceiveAsync(Object sender, SocketAsyncEventArgs eventArgs) --- End of inner exception stack trace --- at System.Runtime.AsyncResult.End[TAsyncResult](IAsyncResult result) at System.ServiceModel.Channels.ConnectionStream.ReadAsyncResult.End(IAsyncResult result) at System.Net.FixedSizeReader.ReadCallback(IAsyncResult transportResult) --- End of inner exception stack trace --- at System.Net.Security.NegotiateStream.EndRead(IAsyncResult asyncResult) at System.ServiceModel.Channels.StreamConnection.EndRead() --- End of inner exception stack trace --- at System.Runtime.AsyncResult.End[TAsyncResult](IAsyncResult result) at System.ServiceModel.Channels.TransportDuplexSessionChannel.EndReceive(IAsyncResult result) at Microsoft.ApplicationServer.Caching.WcfClientChannel.CompleteProcessing(IAsyncResult result) --- End of inner exception stack trace --- at Microsoft.ApplicationServer.Caching.DataCache.ThrowException(ResponseBody respBody, RequestBody reqBody) at Microsoft.ApplicationServer.Caching.DataCache.InternalGet(String key, DataCacheItemVersion& version, String region, IMonitoringListener listener) at Microsoft.ApplicationServer.Caching.DataCache.<>c__DisplayClass49.<Get>b__48() at Microsoft.SharePoint.DistributedCaching.SPDistributedCache.GetObject(String key) --- End of inner exception stack trace --- at Microsoft.SharePoint.DistributedCaching.SPDistributedCache.GetObject(String key) at Microsoft.SharePoint.IdentityModel.SPDistributedSecurityTokenCache.GetObject(String key) at Microsoft.SharePoint.IdentityModel.SPTokenCache.TryGetCachedToken(String cacheKey)'. 91d7039d-55e9-7036-63d4-13bd0943287a

w3wp.exe (ServerName:0x057C) 0x03C4 SharePoint Foundation DistributedCache air4d Medium Token Cache: Reverting to local cache to get the token for '05.t|okta|user@domain.com'. 91d7039d-55e9-7036-63d4-13bd0943287a

w3wp.exe (ServerName:0x057C) 0x03C4 SharePoint Foundation Claims Authentication fr16 Verbose Token Cache: Entry missing for user '05.t|okta|user@domain.com'. 91d7039d-55e9-7036-63d4-13bd0943287a

 

- The user forcefully signed out to get a new token.

w3wp.exe (ServerName:0x057C) 0x03C4 SharePoint Foundation Claims Authentication af30i Verbose Token Cache: Failed to find token for user '05.t|okta|user@domain.com' for cookie so signing out the user. 91d7039d-55e9-7036-63d4-13bd0943287a

 

Basically in this scenario, SharePoint stores the user’s token in the user's browser session and in the DistributedCacheLogonTokenCache container. When SharePoint tried to retrieve the token from distributed cache, the connection would time out or a connection would be unavailable or the Distributed cache not able to serve the cookie detail due to some reason. Since it couldn't validate the presented token SharePoint had no choice but to log the user out and redirect them to the sign in page.

I shared one of my Blog to the customer to make the DistributedLogonTokenCache values to standardize and to increase the distributed cache timeout values to 10 seconds. 

After changing the Distributed cache value we noticed a change in authentication behavior that it re-authenticates very frequently than before like in every few minutes.

I gone through the current DistributedLogonTokenCache value sand found "MaxConnectionsToServer" was set to "100".

The number of shared connections is determined by the maxConnectionsToServer configuration setting, which defaults to the number of processor cores on the client machine. Ref: https://msdn.microsoft.com/en-us/library/azure/dn386109.aspx. I shared following script with customer which to set “MaxConnectionsToServer” back to the recommended value of “1” for all containers--note that if “MaxConnectionsToServer” is set too high, we will see high memory usage from the DistributedCache process, and possible intermittent hangs.

Add-PSSnapin Microsoft.Sharepoint.Powershell

#DistributedLogonTokenCache
$DLTC = Get-SPDistributedCacheClientSetting -ContainerType DistributedLogonTokenCache
$DLTC.MaxConnectionsToServer = 1
Set-SPDistributedCacheClientSetting -ContainerType DistributedLogonTokenCache -DistributedCacheClientSettings $DLTC

#DistributedViewStateCache
$DVSC = Get-SPDistributedCacheClientSetting -ContainerType DistributedViewStateCache
$DVSC.MaxConnectionsToServer = 1
Set-SPDistributedCacheClientSetting -ContainerType DistributedViewStateCache $DVSC

#DistributedAccessCache
$DAC = Get-SPDistributedCacheClientSetting -ContainerType DistributedAccessCache
$DAC.MaxConnectionsToServer = 1
Set-SPDistributedCacheClientSetting -ContainerType DistributedAccessCache $DAC

#DistributedAccessCache
$DAF = Get-SPDistributedCacheClientSetting -ContainerType DistributedAccessCache
$DAF.MaxConnectionsToServer = 1
Set-SPDistributedCacheClientSetting -ContainerType DistributedActivityFeedCache $DAF

#DistributedActivityFeedLMTCache
$DAFC = Get-SPDistributedCacheClientSetting -ContainerType DistributedActivityFeedLMTCache
$DAFC.MaxConnectionsToServer = 1
Set-SPDistributedCacheClientSetting -ContainerType DistributedActivityFeedLMTCache $DAFC

#DistributedBouncerCache
$DBC = Get-SPDistributedCacheClientSetting -ContainerType DistributedBouncerCache
$DBC.MaxConnectionsToServer = 1
Set-SPDistributedCacheClientSetting -ContainerType DistributedBouncerCache $DBC

#DistributedDefaultCache
$DDC = Get-SPDistributedCacheClientSetting -ContainerType DistributedDefaultCache
$DDC.MaxConnectionsToServer = 1
Set-SPDistributedCacheClientSetting -ContainerType DistributedDefaultCache $DDC

#DistributedSearchCache
$DSC = Get-SPDistributedCacheClientSetting -ContainerType DistributedSearchCache
$DSC.MaxConnectionsToServer = 1
Set-SPDistributedCacheClientSetting -ContainerType DistributedSearchCache $DSC

#DistributedSecurityTrimmingCache
$DTC = Get-SPDistributedCacheClientSetting -ContainerType DistributedSecurityTrimmingCache
$DTC.MaxConnectionsToServer = 1
Set-SPDistributedCacheClientSetting -ContainerType DistributedSecurityTrimmingCache $DTC

#DistributedServerToAppServerAccessTokenCache
$DSTAC = Get-SPDistributedCacheClientSetting -ContainerType DistributedServerToAppServerAccessTokenCache
$DSTAC.MaxConnectionsToServer = 1
Set-SPDistributedCacheClientSetting -ContainerType DistributedServerToAppServerAccessTokenCache $DSTAC

 After running the script, you’ll need to do a Restart-CacheCluster to restart the AppFabric services.

We closely monitor the environment for next few days and found the web application doesnt try to re-authenticate frequently and the issue is resolved after changing the "MaxConnectionToServer" value to "1".

Recently the following technet article updated with script to fine tune the distributed cache service,

https://technet.microsoft.com/en-us/library/jj219613.aspx

 

 Happy SharePointing :)

Comments

  • Anonymous
    July 02, 2015
    It was very usefull information and keep updating with new topics
    http://staygreenacademy.com/
  • Anonymous
    July 23, 2015
    Thanks for sharing this info. how do we know what value is good for "MaxConnectionToServer" ? at the link mentioned below author has set it to 100 . Still trying to find out how do you know what is good value.
    Also Technet says on a 4 CPU server setting the value to 4 is not good idea but does not explain criteria to find out good value for this parameter. Any thoughts ?

    https://habaneroconsulting.com/insights/sharepoint-2013-distributed-cache-bug
    https://technet.microsoft.com/en-us/library/jj219613.aspx
  • Anonymous
    July 30, 2015
    The comment has been removed
  • Anonymous
    January 17, 2016
    Hi Sajith,
    We are trying integrate OKTA with SharePoint and whenever it is authenticated from OKTA, there is a there is a call to the _trust and throws an exception System.InvalidOperationException: This method can only be called after the authentication event.
    Appreciate your response in resolving this
    Regards
    Ahmed
  • Anonymous
    January 26, 2016
    Hello Ahmed,
    This might be due to the configuration issues. You may have to engage OKTA team or engage MS support on a case. You might need to verify the URN and claims setup while integrating SharePoint with OKTA.