Moving lots of data

Recently I've been getting lots of questions about moving large files (or lots of data) between a WCF service and client. The question comes in multiple forms, e.g.

How do I send a file that's many GB in size from the service to the client?
Should I use MTOM to send large files?
I have a huge object graph and I want to send it over the wire, how should I do it?

So let's explore the problems associated with moving lots of data and the solutions offered by WCF.
Problem: Bandwidth Utilization
Sending gigabytes of data means lots of bandwidth usage. While that may not be a problem if it happens infrequently on a 100 Mbit or 1 Gigabit LAN, it is definitely an issue when bandwidth is scarce and/or is being paid for. In interop scenarios, where messages are encoded as text XML, the encoding of binary data using Base64 exacerbates the problem because it inflates the size by 1/3. There are three solutions to this problem depending on your scenario. First, compression could really help especially if the data is text or gets encoded as text (using Base64). Compression/decompression can be implemented using a custom WCF encoder/decoder (and IIS 6 offers built-in response compression). Second, you can avoid using text encoding when interop is not required. WCF provides a binary encoding which is far more bandwidth-efficient than the text encoding especially when sending binary data. Third, if you want interop and you need to send large binary content, you can use MTOM which allows you to send the binary content outside of the SOAP envelope (as a separate part of a multi-part MIME message) without Base64 encoding it.

Problem: Memory Utilization
By default, WCF buffers messages to support protocols like WS-ReliableMessaging and WS-Security that require buffered messages. For extremely large messages this can lead to out-of-memory conditions especially on servers that try to send or receive multiple of those messages simultaneously. Fortunately, WCF supports streaming on HTTP, TCP, and Named Pipes allowing you to send infinitely large messages without hitting out-of-memory exceptions (actually, message size is constrained to Int64.MaxValue or 9,223,372,036,854,775,807 but hopefully that's infinite as far as your app is concerned).

Problem: Recovering from Failures
So what happens if half way through sending your 4GB stream the TCP connection fails? Well, your app must catch the exception and recover. If the other side has been processing the stream as it receives it (e.g. saving it to disk) you may be able to have the sending app coordinate with the receiving app to figure out what was the last received byte and restart from there.
Another alternative is to use Reliable Messaging to recover from connection failures. RM will detect the failure and automatically re-establish a connection and resend the failed message. The problem here is that when you're streaming, the entire stream is one message so resending means basically starting over. Another problem, is that RM requires buffering (so it can resend on failure) so it doesn't work with streaming! The solution is to use chunking. This is where the sending application divides up the 4GB file into say 1 million messages each 4KB in size and sends them in buffered mode. The receiving application reconsitutes the file by appending all 1 million messages to form the original 4GB file. Here, RM buffers only a few of the 4KB messages at a time. In case of a failure, RM will automatically resend the 1 or few failed messages. The chunking/dechunking functionality can be encapsulated in a general purpose chunking channel that lets the applications program against a Stream and handles the chunking under the covers. This is what my sample chunking channel from PDC does. Note that because of the chunking and the protocol overhead added by RM, the resulting throughput is expected to be lower than direct streaming. However, you are getting reliability which comes in handy when the connection fails especially when you're almost done sending the 4GB and don't want to start all over!

To summarize, my recommendations are:
1. Use binary encoding if you have WCF on both sides or MTOM if you need interop.
2. Consider compression if you are forced to use text encoding or if the data itself is highly compressible (e.g. large text data).
3. Use streaming, or if you want reliability, use chunking. One restriction to keep in mind is that to stream, there can be only one parameter (or return value, depending on which direction you're streaming) and it must be of a type that derives from System.IO.Stream.

Comments