Simplifying Overlapped I/O With PPL

Helped by the recent advances in modern languages (such as async/await in C#) and frameworks (such as Node.js), non-blocking programming has firmly entered the zeitgeist of the development community.

On Windows, non-blocking I/O programming has traditionally been done using the overlapped I/O APIs. It’s essence is letting the thread do useful work while one or several I/O requests are in flight – hence the word “overlapped”.

Unfortunately using overlapped I/O on Windows is hard. Not impossibly hard, but difficult enough that people decide not to deal with it, and instead move on solving other problems. If you’re familiar with the overlapped I/O, read right on. If not, you can read a good overview in this MSDN article.

Luckily, overlapped I/O can be made easier with modern C++ and libraries such as the PPL. In this post, I’ll show a simple class called file_reader that can be used to read files asynchronously.

Without further ado, this is how you’d use it:

 file_reader reader(L"c:\\test\\file.txt");
 reader.read_bytes().then([](std::vector<byte> v) {
     std::wcout << L"done reading " << v.size() << L" bytes from the file." << std::endl;
     std::wstring s;
     s.assign(v.begin(), v.end());
     std::wcout << s << std::endl;
 });

 

The read_bytes function returns a task<std::vector<byte>> , which completes when the entire file has been read. You can also provide a callback that will be invoked asynchronously each time the reader loads a new chunk:

 reader.on_chunk([](byte* buffer, size_t len) {
     std::wstring s;
     s.assign(buffer, buffer + len);
     std::wcout << s << std::endl;
 });

You’ll find the full implementation of file_reader in the source file that I’m attaching to this post (requires Visual Studio 2012).

Now, when would you use the asynchronous file reader? To be clear, asynchrony will not let you read the file any faster. As I mentioned above, the key is allowing the thread to do other work while the I/O operation is in progress.

When is it useful? One example would be writing server-side code, where blocked threads can wreak havoc on the server’s scalability. Asynchronous file reader avoids blocking entirely. Another example would be a GUI application, where you want to offload the I/O off the main thread to maintain the application’s responsiveness.

Finally, let’s not forget the Gustafson's law. If we cannot solve the problem faster, let’s increase the size of the problem! In the attached sample I’m using the file_reader to read from multiple files asynchronously. I also implemented a serial file reader and compared the two.

It turns out that when reading local files, asynchrony is a mixed bag – a lot depends on the file and the buffer size. However, the asynchronous reader always beats the synchronous reader when it comes to reading remote files – on my laptop, up to twice as fast when reading two files at a time.

If you want to learn more about the asynchronous APIs for modern C++, take a look at the Casablanca project, where we’re taking the asynchrony to a whole new level. You will find the APIs for authoring and consuming REST services, as well as a more comprehensive approach to file I/O.

Artur Laksberg
Visual C++ Team

reader.cpp

Comments

  • Anonymous
    July 22, 2012
    Boost.Asio?

  • Anonymous
    July 25, 2012
    Shouldn't reading get faster if you have more hardware devices? The same way parallel_for_each should saturate multiple cores for processing, non-blocking I/O should be able to read from many disks (in a san or nas or jbod). We just tend to be using the equivalent of a single core system on our local machines.

  • Anonymous
    July 25, 2013
    extreamly nice