The consequences of ignoring nagling and delayed acks

项目
08/06/2004

Over most of this week, I’ve discussed how ignoring the underlying network architecture can radically hurt an application. Now it’s time for a war story about how things can go awry if you don’t notice these things.

One of the basic limitations of networking is that you really shouldn’t send a server more data than it is expecting. At a minimum, it’s likely that your connection will be flow-controlled. This is especially important when you’re dealing with NetBIOS semantics. Unlike stream-based sockets (like TCP), NetBIOS requires message-based semantics. This means that if the transmitter sends a buffer that is larger than the buffer that the receiver is prepared to accept, the send will fail. As a result, the SMB protocol has the concept of a “negotiated buffer size”. Typically this buffer size is about 4K.

Lan Manager 1.0 had a bunch of really cool new enhancements to the basic SMB protocol. One of the neatest ones (which was used for its IPC mechanism) was the “transaction” SMB. The idea behind the transaction SMB was to enable application-driven large (up to 64K :) ) transaction. The protocol flow for the transaction SMB went roughly like this:

            Client: Request Transaction, sending <n> bytes, receiving <m> bytes
            Server: Ok, buffers allocated to receive <n> bytes, go ahead
            Client: Sending 4K block 1
            Client: Sending 4K block 2
            Client: Sending 4K block <n>
            <The server does it’s thing and responds>
            Server: Response 4K block 1
            Server: Response 4K block 2
            Server: Response 4K block 3
            Server: Response 4K block <n>

The idea was that the client would “shotgun” the sends asynchronously, and as quickly as possible to the server, and the server would do the same. The thinking was that if the transmitter had lots of outstanding sends, then the transport would deliver the data as quickly as possible.

It looks good at this level. But if you were following the discussion on the earlier posts, you should have some red flags raised by now. Now lets consider what happens to the code at the network layer:

            Client: Request Transaction, sending <n> bytes, receiving <m> bytes
            Server: ACK Request
            Server: Ok, buffers allocated to receive <n> bytes, go ahead
            Client: ACK Request
            Client: Sending 4K block 1, frame 1
            Client: Sending 4K block 1, frame 2
            Client: Sending 4K block 1, frame 3
            Server: ACK Request
            Client: Sending 4K block 2, frame 1
            Client: Sending 4K block 2, frame 2
            Client: Sending 4K block 2, frame 3
            Server: ACK Request
            Client: Sending 4K block <n>, frame 1
            Client: Sending 4K block <n>, frame 2
            Client: Sending 4K block <n>, frame 3
            Server: ACK Request
            <The server does it’s thing and responds>
            Server: Sending 4K block 1, frame 1
            Server: Sending 4K block 1, frame 2
            Server: Sending 4K block 1, frame 3
            Client: ACK Request
            Server: Sending 4K block 2, frame 1
            Server: Sending 4K block 2, frame 2
            Server: Sending 4K block 2, frame 3
            Client: ACK Request
            Server: Sending 4K block 3, frame 1
            Server: Sending 4K block 3, frame 2
            Server: Sending 4K block 3, frame 3
            Client: ACK Request
            Server: Sending 4K block <n>, frame 1
            Server: Sending 4K block <n>, frame 2
            Server: Sending 4K block <n>, frame 3
            Server: ACK Request

Well, that’s a lot more traffic, but nothing outrageous, on the other hand, that idea about multiple async sends being able to fill the pipeline clearly went away – the second send doesn’t start until the first is acknowledged. In addition, the sliding window never gets greater than 3K but that’s not the end of the world…

But see what happens when we add in delayed (or piggybacked) acks to the picture… Remember, CIFS uses NetBIOS semantics. That means that every byte of every send must be acknowledged before the next block can be sent.

            Client: Request Transaction, sending <n> bytes, receiving <m> bytes
            Server: ACK Request
            Server: Ok, buffers allocated to receive <n> bytes, go ahead
            Client: ACK Request
            Client: Sending 4K block 1, frame 1
            Client: Sending 4K block 1, frame 2
            Client: Sending 4K block 1, frame 3
            Server: wait 200ms for server response and ACK Request
            Client: Sending 4K block 2, frame 1
            Client: Sending 4K block 2, frame 2
            Client: Sending 4K block 2, frame 3
            Server: wait 200ms for server response and ACK Request
            Client: Sending 4K block <n>, frame 1
            Client: Sending 4K block <n>, frame 2
            Client: Sending 4K block <n>, frame 3
            Server: wait 200ms for server response and ACK Request
            <The server does it’s thing and responds>
            Server: Sending 4K block 1, frame 1
            Server: Sending 4K block 1, frame 2
            Server: Sending 4K block 1, frame 3
            Client: wait 200ms for client request and ACK Request
            Server: Sending 4K block 2, frame 1
            Server: Sending 4K block 2, frame 2
            Server: Sending 4K block 2, frame 3
            Client: wait 200ms for client request and ACK Request
            Server: Sending 4K block 3, frame 1
            Server: Sending 4K block 3, frame 2
            Server: Sending 4K block 3, frame 3
            Client: wait 200ms for client request and ACK Request
            Server: Sending 4K block <n>, frame 1
            Server: Sending 4K block <n>, frame 2
            Server: Sending 4K block <n>, frame 3
            Server: wait 200ms for client request and ACK Request

All of a sudden, an operation that looked really good at the high level protocol overview turned into an absolute nightmare on the wire. It would take over a second just to send and receive 28K of data!

This is the consequence of not understanding how the lower levels behave when you design higher level protocols. If you don’t know what’s going to happen on the wire, design decisions that look good at a high level turn out to be terrible when put into practice. In many ways, this is another example of Joel Spolsky’s Law of Leaky Abstractions – the network layer abstraction leaked all the way up to the application layer.

My solution to this problem when we first encountered it (back before NT 3.1 shipped) was to add the TDI_SEND_NO_RESPONSE_EXPECTED flag to the TdiBuildSend API that would instruct the transport that no response was expected for the request. The transport would then disable delayed acks for the request (if it was possible). Now for some transports it’s not possible to disable piggyback acks, but for those that can, this is a huge optimization.

Comments

Anonymous
August 06, 2004
Could you please rewrite the code sample to resolve the performance problems? I am still a little confuse as to the best way to resolve the performance issues. A revised sample would greatly improve my understanding. Plus, I never could understand how to properly use "overlapped" IO.
Thanks.
Anonymous
August 06, 2004
Let me see what I can come up with anon1...
Anonymous
August 09, 2004
This problem is also caused by SO_SNDBUF=0.

The real lesson from all this is that you should only set SO_SNDBUF = 0 if you really know what you are doing.

If SO_SNDBUF=0, each send() or WriteFile() must wait for an acknowledgement from the other side before returning. (That is the only way for the protocol to implement retransmissions)

If you want to send large amounts of data quickly using SO_SNDBUF=0, you need to use overlapped IO to post multiple writes. This can get quite complicated.

The only real reason for using SO_SNDBUF=0 is to avoid the extra copying of data from the user buffer to the kernel buffer.

Using SO_SNDBUF=0 because you want to "make sure date your data has reached its destination" before returning is a bad practice.

Remember that an ack only means that the data has reached the remote tcp implementation. The data might still not reach its final destination.

As a matter of fact, if you keep SO_SNDBUF at its default, and disable Nagling (TCP_NODELAY=1) you will probably not have any problems at all with delayed acks.

By the way, does TDI_SEND_NO_RESPONSE_EXPECTED really work on tcpip ? If I understand correctly, you set the flag in the sender, and the receiver should respond to it by acking the current packet without delay.

I do not see how that could be implemented on the wire for tcp/ip ?
Anonymous
August 09, 2004
StefanG: I don't know if the TDI_SEND_NO_RESPONSE_EXPECTED works on TCP, that's why I used the caveat of "transports that support it". I know it works on NetBEUI and on SPX, which were the two premier transports at the time (In 1990, TCP/IP was a second tier protocol).
Anonymous
August 29, 2004
The comment has been removed
Anonymous
June 13, 2006
Now I want to pop the stack up a bit and talk about messages.&nbsp; At their heart, connection oriented...
Anonymous
June 19, 2009
PingBack from http://mydebtconsolidator.info/story.php?id=16704

通过

The consequences of ignoring nagling and delayed acks

Comments

其他资源