Using SqlDataReader’s new async methods in .Net 4.5

Artigo
04/20/2012

With .Net 4.5, ADO.NET introduced two asynchronous methods to the SqlDataReader class: ReadAsync and NextResultAsync. These two methods allow you to move to the next row or result set asynchronously and represent a much more fine-grained level of asynchronous access to data compared to .Net 4.0 (which only had asynchronous command execution). However, for the .Net 4.5 we have gone a step further and introduced two new column-level asynchronous methods: IsDBNullAsync and GetFieldValueAsync<T>.

Flow of Data

To understand the need for different levels of asynchronous methods, we need to have a look at how SQL Server sends data to clients like ADO.NET.

Figure 1 - Row data sent from SQL Server via the TDS protocol

SQL Server sends data to the client using the Tabular Data Stream (TDS) protocol which, as shown in the diagram above, separates each result set with a “DONE” token, each row with a “ROW” token and each column has both a header and actual data (note that this is a simplification of TDS useful for understanding ADO.NET, if you want to learn more about TDS you can get the full specification on the MSDN). So, when you call NextResult, ADO.NET reads the rest of the current result set and the “DONE” token afterwards, and then returns true if there is more data available. Similarly, calling Read will read off the rest of the current row and the next “ROW” token and getting the value for a column will cause the data for any preceding column to also be read. For getting column values there is also multiple behaviors for what to do with the previous columns’ data: in “Default” mode (also called “non-sequential” mode) the previous columns’ data is stored and you can go back and re-read that column from SqlDataReader’s data storage; whereas in “SequentialAccess” mode the data for the previous columns is discarded and you will get an exception if you try to re-read it.

For the new asynchronous methods in .Net 4.5, their behavior is exactly the same as with the synchronous methods, except for one notable exception: ReadAsync in non-sequential mode. The difference is that ReadAsync will not just read up to the “ROW” token of the current row, but will also read and parse all of the column data for that row as well. By doing this, ADO.NET only pays the overhead to call into its parsing code once per row, instead of paying it every time a new column is read.

Asynchronous column access

An important guidance for writing asynchronous APIs is to ensure that asynchronous methods do large or “chunky” operations, which doesn’t seem to mesh well with column-level asynchronous access until you look closely at how they work. Consider a table that has a varchar column with about 2000 characters per row. This means that you can fit about four of these columns per packet (assuming that you are using the default packet size, and ignoring any headers or other overhead). If you are reading the row via the Read method, or you are in sequential mode, then getting the column synchronously will block the thread for every fifth row. For an application or web server, having every fifth row block a thread can end up becoming a serious scalability issue. The same issue can occur if you are ignoring this 2000 character column, but checking if the next column in the row is null: IsDBNull needs to read the header for the target column, which means that it will need to read past any previous columns (such as our 2000 character column). This can become even worse if you have a column that is megabytes, or even gigabytes, in size, since you will be typically using sequential mode, and you can partially read such a column (using methods like GetStream), which means that IsDBNull will need to read off the rest of the row synchronously.

When to use what

The new asynchronous methods that SqlDataReader exposes make it very easy to create a scalable application, but they only provide a benefit if their synchronous counterpart would have blocked a thread, and that only happens if ADO.NET requires more data from the network.

So to choose whether or not to call an asynchronous method, you need to consider how much data the method is likely to use and therefore if it is likely to require a new network packet. You can calculate the probability of needing a new packet by comparing the size of the data you are reading with the maximum packet size, or you can use the following guidelines:

Default (non-sequential) mode vs SequentialAccess – It is important to decide before you create a SqlDataReader whether you want to use non-sequential or sequential access mode. In most cases using the Default (non-sequential) access mode is the better choice, as it allows an easier programming model (you can access any column in any order) and you will get better performance using ReadAsync. However, since non-sequential access mode has to store the data for the entire row, it can cause issues if you are reading a large column from the server (such as varbinary(MAX), varchar(MAX), nvarchar(MAX) or XML). In this case using sequential access mode will allow you to stream these large columns rather than having to buffer the entire column into memory.

NextResult vs. NextResultAsync – Whenever possible, you should be using NextResultAsync() to move between result sets. This will allow leftover rows in the old result set to be read off asynchronously, as well as other TDS tokens (which were excluded from figure 1 for simplicity) which can sit between result sets.

Read vs ReadAsync – It is also a good idea to call ReadAsync: in non-sequential mode this will read in all column data, which can potential span multiple packets, allowing faster access to the column values. In sequential mode ADO.NET will need to finish reading the data for the current row (if it hasn’t been read entirely), and there is the potential for some TDS tokens to sit between rows which can then be read asynchronously.

IsDBNull and GetFieldValue<T> vs IsDBNullAsync and GetFieldValueAsync<T> - If you had previously called ReadAsync and are using non-sequential access, then calling the synchronous versions of these methods will provide the best performance since the column data has already been read and processed (so calling the asynchronous method just adds the overhead of wrapping the value in a Task). However, if you called Read in non-sequential access mode, or if you are using sequential access mode, then the decision is much harder as you need to consider how much data you need to read to get to your desired column and how much data that column may contain. If you’ve read the previous column, and the target column is small (like a Boolean, a DateTime or a numeric type) then you may want to consider using a synchronous method. Alternatively, if the target column is large (like a varbinary(8000)) or you need to read past large columns, then using an asynchronous method is much better. Finally, if the target column is massive (like a varbinary(MAX), varchar(MAX), nvarchar(MAX) or XML) then you should consider the new GetStream, GetTextReader or GetXmlReader methods instead.

Test early, test often

Finally, please be aware that the recommendations above are based on the workings of ADO.NET and what we have observed in our tests. You need to consider what type of data your application is using to see at what level (result set, row or column) it makes sense to call asynchronous methods. The best way to check this is to build up a suite of performance and scalability tests such that you can accurately measure and tune your application.

And the best way to do this is with the Visual Studio 2012 and .Net 4.5.

Happy Coding!

Daniel Paoliello
Software Development Engineer
ADO.NET Managed Providers and DataSet Team

Comments

Anonymous
April 20, 2012
The comment has been removed
Anonymous
April 20, 2012
Nice post. Thanks.
Anonymous
April 20, 2012
The comment has been removed
Anonymous
April 20, 2012
@sich: Indeed, the addition of these async methods to SqlClient and to the ADO.NET provider model will allow us to add support for the new async pattern to Entity Framework. The EF team is working on this at the moment.
Anonymous
April 20, 2012
Wow, I feel so ignorant right now. Great article, thanks guys. Please write another blog post with use cases and code samples. Thanks again!
Anonymous
April 22, 2012
Thank You The given information is very effective i will keep updated with the same <a href ="http://www.sweetball.in">industrial automation </a>
Anonymous
April 23, 2012
To what level will the new async methods support cancellation?
Anonymous
April 23, 2012
I hope to never have to use these directly! Seems way too granular. I'm sure you had a specific reason to add them but I can't tell you the last time I had a scenario that would benefit from reading an individual column (let alone testing for null) on its own task.
Anonymous
April 24, 2012
@Phil, judging by the documentation at least for SqlDataReader, all the Async methods support cancellation. I.e. there is an overload that accepts a cancellation token. msdn.microsoft.com/.../w9y9ttex(v=vs.110).aspx One thing I was wondering is why you would ever need to read over large fields as is mentioned above? I guess with something like an ORM where you care selecting entire entities it would be useful. Otherwise, you could just do a projection and select only the columns that you need. On a related note, it would be nice if EF had lazy loaded properties for fields that for example map to varbinary(max) columns.
Anonymous
April 24, 2012
@Jon: Thanks for the feedback on lazy loaded properties. This is something EF already has in the backlog for future releases, but I would like to encourage you to vote for it (if you haven't already) in our UserVoice site: data.uservoice.com/.../1051513-lazy-loading-at-property-column-level. In the meanwhile there is a workaround that works pretty well: you can "split" the mapping of the table into two different entities (think Customer and CustomerPhoto) with a 1:0..1 assocition among them. Then if you don't need the BLOB column, you can just query for the main entity, i.e. Customer. If you need to load the blob, you can use eager loading (i.e. Include) or regular lazy loading to load the related table (i.e. CustomerPhoto). On a side note, since we now have streams support in SqlClient, we could in theory add support for stream properties in EF for BLOBs. Not something that we are planning for the next release, but a very interesting option for the future.

Compartilhar via