TCP delayed ACK combined with Nagle algorithm can badly impact communication performance

Recently I was involved in a network connectivity performance issue and wanted to share the root cause we found. You can find below an example communication of how the slow performance occurs:

10.15.28.40: A Windows application server

10.95.2.49: A UNIX application server

 

No. Time Delta Source Destination Protocol Length Info

   3562 2012-12-12 18:19:53.258784 0.000000 10.15.28.40 10.95.2.40 TCP 377 51122 > 7890 [PSH, ACK] Seq=2654579738 Ack=1586267028 Win=253 Len=323

   3563 2012-12-12 18:19:53.359881 0.101097 10.95.2.40 10.15.28.40 TCP 60 7890 > 51122 [ACK] Seq=1586267028 Ack=2654580061 Win=49640 Len=0

   3564 2012-12-12 18:19:53.359939 0.000058 10.95.2.40 10.15.28.40 TCP 60 7890 > 51122 [PSH, ACK] Seq=1586267028 Ack=2654580061 Win=49640 Len=6

   3566 2012-12-12 18:19:53.550528 0.190589 10.15.28.40 10.95.2.40 TCP 54 51122 > 7890 [ACK] Seq=2654580061 Ack=1586267034 Win=253 Len=0

   3567 2012-12-12 18:19:53.551151 0.000623 10.95.2.40 10.15.28.40 TCP 330 7890 > 51122 [PSH, ACK] Seq=1586267034 Ack=2654580061 Win=49640 Len=276

   3585 2012-12-12 18:19:53.753770 0.202619 10.15.28.40 10.95.2.40 TCP 54 51122 > 7890 [ACK] Seq=2654580061 Ack=1586267310 Win=252 Len=0

 

At frame #3562, Windows Application server sends a request to the UNIX application server

At frame #3563, UNIX server sends an ACK back to Windows server (probably the current TCP delayed ACK timer on UNIX server side is around 100 msec)

At frame #3564, shortly after the delayed ACK, UNIX applications first response packet is received (6 bytes - probably the application sends the application layer protocol header and payload seperately) that it delayed until it receives an ACK from the other party.

At frame #3566, TCP delayed ACK timer on Windows application server expires (which is around ~200 msec) and then Windows sends a delayed ACK back to UNIX server. The reason delayed ACK timer expires on Windows server is that it receives a TCP segment from UNIX server but it doesn’t receive any other segments.

At frame #3567, UNIX application server sends the rest of the application layer data after the ACK is received

 

The reason Windows waits for ~200 msec at frame #3566 is TCP delayed ACK mechanism. The reason UNIX application layer server doesn’t send the 6 bytes or 276 bytes payloads until the previous TCP segment (6 bytes) it sent was ACKed is Nagle algorithm.

 

As you can see above, TCP delayed ACK + Nagle + application behavior bring some ~100 msec + ~200 msec = ~300 msec average delay in every transaction. Considering that thousands of transactions take place for the duration of the session between the two application servers cause a considerable amount of delay.

The solution here was to turn off TCP delayed ACK on Windows server side since it was the easiest one to implement in customer’s scenario. Another option could be turning Nagle algorithm off on the UNIX side application server which serves the application running on Windows.

 

The following Wikipedia article also summarizes the Nagle algorithm:

 

This algorithm interacts badly with TCP delayed acknowledgments, a feature introduced into TCP at roughly the same time in the early 1980s, but by a different group. With both algorithms enabled, applications that do two successive writes to a TCP connection, followed by a read that will not be fulfilled until after the data from the second write has reached the destination, experience a constant delay of up to 500 milliseconds, the "ACK delay". For this reason, TCP implementations usually provide applications with an interface to disable the Nagle algorithm. This is typically called the TCP_NODELAY option.

If possible an application should avoid consecutive small writes in the first place, so that Nagle's algorithm will not be triggered. The application should keep from sending small single writes and buffer up application writes then send (or with the help of writev() call).

 

What we were observing was exactly the same mentioned above. You can also find more information on how to turn off TCP Delayed ACK on Windows servers by changing the TcpAckFrequency registry key:

https://support.microsoft.com/kb/823764 Slow performance occurs when you copy data to a TCP server by using a Windows Sockets API program

 

Hope this helps

 

Thanks,

Murat

Comments

  • Anonymous
    January 01, 2003
    TcpAckFrequency in Server 2012 R2 would me intrest, too?
  • Anonymous
    September 01, 2014
    Hi Murat,

    We are running into this same issue with one of our applications, but it's no longer possible to change TCPAckFrequency in Server 2012 R2 (was still possible in server 2012). The registry value is ignored, and Powershell says the attribute is read-only. Any help? :(
  • Anonymous
    December 10, 2014
    Me to!
  • Anonymous
    May 05, 2015
    Maybe it's time that software actually is written correctly using the guidelines listed above! Then the issue would no longer exist.
  • Anonymous
    September 11, 2015
    This is an old post, but I agree with Garry. Nagle and delayed ACKs are good things. If you want an application to behave properly with respect to TCP/IP, use send buffers and a TCP RWIN that are an even multiple of the path TCP MSS. If the total payload is an odd number of packets less than the TCP RWIN, the application layer will presumably offer a response, and delayed acknowledgements will not be required. If the total payload is larger than the TCP RWIN, it will be broken into an even number of packets, and delayed acknowledgements will not be required (every even packet is ACK'd).