Dela via


Considerations for NetTcpBinding/NetNamedPipeBinding you may not be aware

 

NetTcpBinding is a strange beast and chances are you will encounter several problems in production you never experienced in development or staging phases. The information you will see here will be either fragmented or hidden in the fine print throughout  MSDN documentation.

Considerations about net.tcp binding

 

Port Sharing

Net.tcp services using shared port needs to run under an account that is either SYSTEM, NETWORK or IIS_IUSR or be part of local Administrators. If part of local administrators the application hosting the service must run in elevated privileges.

To add new accounts that may use port sharing edit <allowAccounts> from SMSvcHost.exe.config (normally at C:\Windows\Microsoft.NET\Framework\v4.0.30319 for .NET 4.0+ or C:\Windows\Microsoft.NET\Framework\v3.0\Windows Communication Foundation\). Notice that the users/groups must be in SID format. You may use PowerShell to translate a user account into SID: https://technet.microsoft.com/en-us/library/ff730940.aspx

$objUser = New-Object System.Security.Principal.NTAccount("fabrikam", "kenmyer") $strSID = $obj
User.Translate([System.Security.Principal.SecurityIdentifier]) $strSID.Value

If you perform an in-place update from Windows 2008 R2 to Windows 2012 R21, net.tcp sharing services may stop working with a error similar to this one because there is as mismatch between .NET Framework and WCF:

Log Name: System
Source: SMSvcHost 4.0.0.0
Date: 10/22/2015 11:49:42 AM
Event ID: 7
Task Category: Sharing Service
Level: Error
Keywords: Classic
User: LOCAL SERVICE
Computer: SERVER1.contoso.local
Description:
A request to start the service failed. Error Code: System.TypeLoadException: Could not load type 'System.Runtime.Diagnostics.ITraceSourceStringProvider' from assembly 'System.ServiceModel.Internals, Version=4.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35'.
at System.ServiceModel.Channels.BinaryMessageEncoderFactory..ctor(MessageVersion messageVersion, Int32 maxReadPoolSize, Int32 maxWritePoolSize, Int32 maxSessionSize, XmlDictionaryReaderQuotas readerQuotas, Int64 maxReceivedMessageSize, BinaryVersion version, CompressionFormat compressionFormat)
at System.ServiceModel.Channels.BinaryMessageEncodingBindingElement.CreateMessageEncoderFactory()
at System.ServiceModel.Channels.ConnectionOrientedTransportChannelListener..ctor(ConnectionOrientedTransportBindingElement bindingElement, BindingContext context)
at System.ServiceModel.Channels.NamedPipeChannelListener..ctor(NamedPipeTransportBindingElement bindingElement, BindingContext context)
at System.ServiceModel.Channels.NamedPipeTransportBindingElement.BuildChannelListener[TChannel](BindingContext context)
at System.ServiceModel.Channels.Binding.BuildChannelListener[TChannel](Uri listenUriBaseAddress, String listenUriRelativeAddress, ListenUriMode listenUriMode, BindingParameterCollection parameters)
at System.ServiceModel.Description.DispatcherBuilder.MaybeCreateListener(Boolean actuallyCreate, Type[] supportedChannels, Binding binding, BindingParameterCollection parameters, Uri listenUriBaseAddress, String listenUriRelativeAddress, ListenUriMode listenUriMode, ServiceThrottle throttle, IChannelListener& result, Boolean supportContextSession)
at System.ServiceModel.Description.DispatcherBuilder.BuildChannelListener(StuffPerListenUriInfo stuff, ServiceHostBase serviceHost, Uri listenUri, ListenUriMode listenUriMode, Boolean supportContextSession, IChannelListener& result)
at System.ServiceModel.Description.DispatcherBuilder.InitializeServiceHost(ServiceDescription description, ServiceHostBase serviceHost)
at System.ServiceModel.ServiceHostBase.InitializeRuntime()
at System.ServiceModel.ServiceHostBase.OnOpen(TimeSpan timeout)
at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)
at System.ServiceModel.Activation.SharingService.StartControlService()
at System.ServiceModel.Activation.SharingService.Start()
at System.ServiceModel.Activation.TcpPortSharing.OnStart(String[] args) Process Name: SMSvcHost Process ID: 2208

To resolve this issue, install at least .NET 4.5.2 to bring .NET and WCF to the same page (https://www.microsoft.com/en-us/download/details.aspx?id=42642)

Net.tcp/net.pipe channels are sessionful and may leak sessions if configured differently

This problem is also common when using NetPipeBindind (net.pipe). Net.tcp channel is sessionful (i.e. it will establish a session between client and service) and if you configure instance mode differently (e.g. PerCall) it will still create sessions. If a misbehaving client is not closing the connection properly or reliable sessions is enabled (see reliable session later), the service will hold a zombie session for the time the receive time out is set (by default 10 minutes). Clients should instantiate a service proxy (or use the channel factory to get a proxy) so the operation it should do and close or abort the inner channel after that. Reuse a proxy for the lifetime of the client application is always a bad idea. If using load balance, never hold a proxy between calls unless you are indeed using session in your service. This is the snipped of the correct way to invoke a WCF service from a client application:

  
             var client = new Service1Client();
             try
             {
  
                 var response = client.WhoAmI(); // Call the service
  
                 Console.WriteLine("Response = {0}", response);
                 client.Close(); // Close afterwards
             }
             catch (Exception ex)
             {
                 Console.WriteLine("\n\nError: {0}", ex.Message);
                 var inner = ex.InnerException;
                 while (inner != null)
                 {
                     Console.WriteLine("Inner: {0}", inner.Message);
                     inner = inner.InnerException;
                 }
                 if (client.State != CommunicationState.Closed)
                 {
                     client.Abort(); // If service is not closed at this point, abort
                 }
             }

The common exceptions when WCF sessions are leaking is occasionally receiving are timeout error and communication exceptions on the client side that will only subside when the service is restarted or the 10 minutes receive timeout is elapsed without any new connection. Exceptions in the client side looks like these ones:

System.ServiceModel.CommunicationException: The socket connection was aborted. This could be caused by an error processing your message or a receive timeout being exceeded by the remote host, or an underlying network resource issue.

System.TimeoutException: This request operation sent to net.tcp://contoso:9090/servicemodelsamples/nettcp did not receive a reply within the configured timeout

However, there will be times you cannot control all the client applications using the service, so if this is the case, make sure you set the receiveTimeout property of NetTcpBinding to a short time, as 30 seconds for instance, so the zombies sessions will be dead after 30 seconds, instead of the default 10 minutes.  Don’t worry because this setting is mostly a misname for “session time out” and it is not direct to the time a request may take to be received. This is a sample config settings to make receiveTimout 30 seconds:

 <bindings>
   <netTcpBinding>
     <binding 
              receiveTimeout="00:00:30">
       <security mode="Transport">
         <transport clientCredentialType="Windows" protectionLevel="EncryptAndSign" />
       </security>
     </binding>
   </netTcpBinding>
 </bindings>

If you have .NET 4.5+, it is easy to troubleshoot the issue using ETL traces. When you see clients stop responding, capture ETL/ETW traces on the service side (server) using a batch file as below (I like to call it get-traces.bat):

@echo off ECHO These commands will enable tracing: @echo on
logman create trace "redist_mspartners" -ow -o %temp%\redist_mspartners.etl -p "Microsoft-Windows-Application Server-Applications" 0xffffffffffffffff 0xff -nb 16 16 -bs 1024 -mode Circular -f bincirc -max 4096 -ets logman update trace "redist_mspartners" -p {7F3FE630-462B-47C5-AB07-67CA84934ABD} 0xffffffffffffffff 0xff -ets
@echo off echo ECHO Reproduce your issue and enter any key to stop tracing @echo on pause logman stop "redist_mspartners" -ets
@echo off echo Tracing has been captured and saved successfully at %temp%\redist_mspartners.etl pause

Run only when the problem is happening and for little time as this file grows fast and is circular. Open the resulting trace with Message Analyzer and look for “Concurrent sessions ratio” in the summary. In the example below there are 30/30 possible concurrent connections. While this was happening the client was throwing

image

If you have previous version of .NET you may consider using the custom service behavior I discuss here to monitor your service when necessary: https://blogs.msdn.com/b/rodneyviana/archive/2014/10/08/verifying-current-calls-and-sessions-during-runtime.aspx

Identifying session leak for Advanced Users (also valid for net.pipe)

If you are comfortable analyzing dump files in WinDBG, you may use NetExt to verify the runtime throttling counters as well as look for mismatches between the service session mode and the channel service mode. To learn how to have NetExt installed, see this: Getting started with NetExt

Open the dump file in WinDBG. Load netext and index the heap:

.load netext !windex

List all services

!wservice

 0:000>; .load netext
 netext version 2.1.0.5000 Oct  5 2015
 License and usage can be seen here: !whelp license
 Check Latest version: !wupdate
 (...)
  
 0:000>; !windex
 Starting indexing at 19:52:07 PM
 Indexing finished at 19:52:09 PM
 7,916,949 Bytes in 43,512 Objects
 Index took 00:00:01
 0:000>; !wservice
 Address             State   (...)  Calls/Max   Sessions/Max    ConfigName,.NET Type
 00000016d97fd998    Opened  (...)  0n31/0n100     0n30/0n30    "VanillaService.DataContractSample",VanillaService.DataContractSample
  
 1 ServiceHost object(s) found

 

Notice that you see a very similar number of calls and sessions and the service should increase only one of them. If there are calls and sessions instances it is indication that there is a link. Clicking on the object link, it will show the details.

 0:000>; !wservice 00000016d97fd998
  
 Service Info
 ================================
 Address            : 00000016D97FD998
 Configuration Name : VanillaService.DataContractSample
 State              : Opened
 (...)
 Calls/Max Calls    : 0n31/0n100
 Sessions/Max       : 0n30/0n30 <;-- Max session reached
 (...)
 Session Mode       : False  <;-- Service level session mode is FALSE
  
 Service Behaviors
 ================================
 Concurrency Mode   : Multiple
 Instance Mode      : PerCall <;-- Instancing is not PerSession
 Add Error in Faults: false
 (...)
  
 Service Base Addresses
 ================================
 net.tcp://localhost:9090/servicemodelsamples/
  
 Channels
 ================================
 Address            : 00000016D9889AA8
 Listener URI       : net.tcp://localhost:9090/servicemodelsamples/nettcp
 Binding Name       : https://tempuri.org/:NetTcpBinding
 Aborted            : No
 State              : Opened
 Transaction Type   : No transaction
 Listener State     : Opened
 Timeout settings   : Open [00:01:00] Close [00:01:00] Receive: [00:10:00] Send: [00:01:00]
 Server Capabilities: SupportsServerAuth [Yes] SupportsClientAuth [Yes] SupportsClientWinIdent [Yes]
 Request Prot Level : EncryptAndSign
 Response Prot Level: EncryptAndSign
 Events Raised      : No Event raised
 Handles Called     : OnOpeningHandle OnOpenedHandle 
 Session Mode       : True <;-- Tcp Channel is Session oriented by design
 (...)
  
  
 Endpoints
  
 ================================
  
 Address            : 00000016D98770F8
 URI                : net.tcp://localhost:9090/servicemodelsamples/nettcp
 Is Anonymous       : False
 Configuration Name : VanillaService.IDataContractSample
 Type Name          : VanillaService.IDataContractSample
 Listening Mode     : Explicit
 Class Definition   : 00007ffebf71c160 VanillaService.IDataContractSample
 Behaviors          : 00000016d98773c8
 Binding            : 00000016d9871b18
 (...)

The details are very important because we will compare the service session mode with the net.tcp channel session mode (which is by design PerSession). The service session mode is defined by instancing context mode. By default it is PerCall. To change it you should declare it as an attribute of the service class. See this: https://msdn.microsoft.com/en-us/library/system.servicemodel.servicebehaviorattribute.instancecontextmode(v=vs.110).aspx

Verifying further down in the net.tcp channel session mode setting, it is defined as true. This is by channel design: it is sessionful. There is no way to change this. So, since there is a mismatch between the service session mode (declared by the developer) and the channel session mode (defined by .NET WCF), there will be a leak. The client in this scenario is closing the proxy properly after making the request.

For this situation the only remedy is to decrease the receive timeout setting in the net.tcp binding configuration as mentioned previously. So, if you are using net.tcp and do not mean to leverage sessions, set receive timeout value to 15 or 30 seconds. If you don’t know if your application requires session is because it does not. If you control the WCF Service server side, use PerSession instancing mode (and still keep receive timeout low).

Bottleneck on client side

By default, a net.tcp or net.pipe client will be limited to 10 concurrent outbound connections. It is a good number if the client is a standalone application. If the WCF client is a Web Application, the concurrent outbound connections limited to 10 may become a bottleneck. You may increase that value by changing maxConnections attribute in net.tcp binding in the client side configuration. This set has no effect on the server side. See: https://msdn.microsoft.com/en-us/library/ms731343(v=vs.110).aspx

If you are an advanced user, use this commands in a dump file to see the outbound calls (requires NetExt):

!wfrom -implement System.ServiceModel.Channels.TransportOutputChannel where((!$implement("*preamble*"))&&($enumname(state)!="Closed")) $a("Address",$addr()),$a("Url",to.uri.m_String),$a("State",$enumname(state)),$a("Open",channelManager.connectionPool.openCount),$a("Max",channelManager.connectionPool.maxCount)

 0:000>; !wfrom -implement System.ServiceModel.Channels.TransportOutputChannel where((!$implement("*preamble*"))&&($enumname(state)!="Closed")) $a("Address",$addr()),$a("Url",to.uri.m_String),$a("State",$enumname(state)),$a("Open",channelManager.connectionPool.openCount),$a("Max",channelManager.connectionPool.maxCount)
 Address: 0000000002C29590
 Url: net.tcp://localhost:9090/servicemodelsamples/nettcp
 State: Opened
 Open: 0n30
 Max: 0n30
 Address: 0000000002C98238
 Url: net.tcp://localhost:9090/servicemodelsamples/nettcp
 State: Opened
 Open: 0n30
 Max: 0n30
 Address: 0000000002C9DCC8
 Url: net.tcp://localhost:9090/servicemodelsamples/nettcp
 State: Opened
 Open: 0n30
 Max: 0n30
 (...)
  
 30 Object(s) listed
 36 Object(s) skipped by filter
  

 

Reliable Session

Reliable Session is the WCF implementation of Oasis WS-RealiableMessaging (RM). Reliable session is an overkill for net.tcp binding because net.tcp already implements everything implemented by reliable sessions except for the reconnection. However, reliable session uses a different and complex code path and at a high performance cost. It really brings no added benefit but on the contrary, it opens a myriad of potential issues. I heard this from WCF product team when working a case where reliable session was being used.

The problem is having channel leak because of session mode mismatch (as explained before) with the extras hassle of reliable messaging. So, please use reconnection logic instead of reliable session if you do need reliable sessions.

Below is the advanced troubleshoot technique of searching for reliable session in a dump file and checking if the inner channel can be reopened (not aborted). This is THE NUMBER ONE source of problem with performance when using WCF with net.tcp or net.pipe. NetExt query:

!wfrom -type System.ServiceModel.Channels.ReliableChannelBinder?ChannelSynchronizer* where (($contains($typename(),"_ChannelSynchronizer<"))&&($enumname(state)!="Closed")) $a("Address",$addr()), $a("State",$enumname(state)), $a("Inner Channel State",$enumname(currentChannel.state)),$a("Fault Mode",$enumname(faultMode)), $a("Is Aborted?",currentChannel.aborted)

 0:000>; !wfrom -type System.ServiceModel.Channels.ReliableChannelBinder?ChannelSynchronizer* where (($contains($typename(),"_ChannelSynchronizer<"))&&($enumname(state)!="Closed")) $a("Address",$addr()), $a("State",$enumname(state)), $a("Inner Channel State",$enumname(currentChannel.state)),$a("Fault Mode",$enumname(faultMode)), $a("Is Aborted?",currentChannel.aborted)
 Address: 01102AF4
 State: ChannelOpening
 Inner Channel State: Closed
 Fault Mode: IfNotSecuritySession
 Is Aborted?: 1 <;--- Leak: Inner channel aborted, Reliable Session will timeout with receiveTimeout
 Address: 01123844
 State: ChannelOpening
 Inner Channel State: Closed
 Fault Mode: IfNotSecuritySession
 Is Aborted?: 1
 Address: 01138E74
 State: ChannelOpening
 Inner Channel State: Closed
 Fault Mode: IfNotSecuritySession
 Is Aborted?: 1
 Address: 01139F30
 State: ChannelOpening
 Inner Channel State: Closed
 Fault Mode: IfNotSecuritySession
 Is Aborted?: 1
 (...)
 Address: 18262A8C
 State: ChannelOpened
 Inner Channel State: Opened
 Fault Mode: IfNotSecuritySession
 Is Aborted?: 0
 Address: 182EFD90
 State: ChannelOpened
 Inner Channel State: Opened
 Fault Mode: IfNotSecuritySession
 Is Aborted?: 0
 Address: 184818C4
 State: ChannelOpened
 Inner Channel State: Opened
 Fault Mode: IfNotSecuritySession
 Is Aborted?: 0
 Address: 185EBB28
 State: ChannelOpened
 Inner Channel State: Opened
 Fault Mode: IfNotSecuritySession
 Is Aborted?: 0
 Address: 18615B84
 State: ChannelOpened
 Inner Channel State: Opened
 Fault Mode: IfNotSecuritySession
 Is Aborted?: 0
  
 100 Object(s) listed
 208 Object(s) skipped by filter

 

Load balancing net.tcp

Preventing load balance idle timeout is the only justified reason to use reliable session with net.tcp if you are really leveraging sessions. But, again, the same result can be achieved without reliable session if WCF Service receive timeout matches load balancer session idle timeout. Don’t feel encouraged to follow this path.

The use of load balancer is to provide scalability, so using sessions will require the configuration of stick sessions in the load balancer, most of the modern load balancers offers this type of sticky session that is different from HTTP’s. This sticky sessions will always match a client connection to a server. This defeats the purpose of scalability.