Does ADO.NET update batching really do something?
I’ve gotten this same question several times in the last few weeks, so I thought it would be a good idea to put together some details on the topic. Update batching is a new ADO.NET 2.0 feature that, when enabled, introduces an optimization to the process of executing a DataAdapter.Update call (which basically takes the changes from a DataTable and pushes them into a database using the INSERT/UPDATE/DELETE commands).
Without batching, Update walks through each change in the DataTable and for each one of them it executes the corresponding command (obtained through the InsertCommand, UpdateCommand and DeleteCommand properties of the adapter). Each execution is separated from the next one, so your application keeps going back and forth over the network between the client and the server.
With batching, we can take a bunch of these operations and send them in a single round trip. This optimizes the operation in a number of ways, the most notorious one being having less network round-trips, which saves quite a bit of overhead in most (although not all) cases.
Batching implementation strategy #1
One way of implementing batching is to concatenate a bunch of SQL statements together. For example, if you have to do an insert and an update, you can simply put a batch together like this (in T-SQL):
INSERT INTO Person (id, name) VALUES (1, 'John');
UPDATE Person SET name = 'Juan' WHERE id=2
That will execute both statements in a single shot. However, the thing gets tricky if you want to do it in a generic way. For starters, we have to use parameters, because the data-adapter needs to replace the values for INSERT/UPDATE/DELETE for each particular DataRow. So this becomes:
INSERT INTO Person (id, name) VALUES (@id, @name);
UPDATE Person SET name = @name WHERE id=@id
Now this doesn’t really work, because we’re using the same parameter names, but we actually want different values (the INSERT line wants the values from one DataRow, and the UPDATE line from another DataRow). So we have to rename the parameters so they are unique. Renaming parameters means that before the adapter executes any command, it has to parse the SQL statement (at least enough parsing to isolate parameter markers) and replace the original parameter names with new ones so their names don’t clash. That’s bad, parsing SQL statements is problematic and the operation consumes CPU and causes more allocations.
There are also two more subtle issues with this approach:
1. It turns out that some databases such as SQL Server have an upper limit to the number of parameters that you can pass to a single parametrized statement invocation. In the case of SQL Server this is around 2100. That means that your batch cannot sum up more than those parameters total (not per statement). Since each (INSERT/UPDATE/DELETE) command can have a different number of parameters, it’s hard to predict how many rows you’ll be able to process.
2. The second side-effect has to do with plan caching. Database servers usually cache query plans for efficiency. Batching can cause trouble here. For example, let’s say you have the two commands we used above for INSERT and UPDATE setup in a data-adapter. In one call, you have 1 updated row and 2 inserted rows, so your batch becomes (simplified version):
UPDATE Person SET name = @name1 WHERE id=@id1;
INSERT INTO Person (id, name) VALUES (@id2, @name2);
INSERT INTO Person (id, name) VALUES (@id3, @name3);
Now, if right after that you get another request with the exact same statements, but with a table that has 1 insert followed by 1 update, the batch will be:
INSERT INTO Person (id, name) VALUES (@id1, @name1);
UPDATE Person SET name = @name2 WHERE id=@id2;
That is, for every permutation of changes in a DataTable we’ll produce a different batch. That will bloat the server’s procedure cache introducing inefficiencies. This is the approach that we used for batching in for both SqlClient and OracleClient during Whidbey beta 1. In beta 2 OracleClient still uses this technique (mapped to its PL-SQL equivalent), but SqlClient doesn’t.
Batching implementation strategy #2
The other way of doing batching is “simpler”, but happens at a lower level. I’m going to describe the SQL Server implementation here, but other database servers might have a similar feature. At the client-server protocol level (TDS), each command is executed against the server using a “request”. For parametrized and batched statements, this request is an “RPC request” (nothing to do with the DCE RPC, just same name). It turns out that in SQL Server we can put one RPC request on the network, and right before putting the “we’re done” mark on the wire, we can actually put a “here comes another RPC” mark. This way, we can keep sending RPCs over the wire all together. Once we sent all of them, we switch to listen for responses, and we get the responses for each RPC in the order we sent them. The protocol has the appropriate infrastructure that allows us to distinguish when a response for a given RPC ends and the next starts.
This means that we get rid of the following problems:
- We can send multiple request for statement execution in a single network round-trip, which was the original goal
- No need to parse statements and rename parameters, because each statement goes in a different request
- No procedure cache bloat, because regardless of the order of execution, each row has an RPC of its own
- No limit to the number of rows that can be updated with a single batch, because we don’t depend on lots of parameters for a single RPC
There are some minor specific drawbacks to RPC-level batching, such as paying a little bit of extra cost in the server per-RPC execution, but the benefits above in general out-weigh those.
Sure, but is it really batching?
With all this background in place, now I can tackle the initial question: does batching really work? How can I see it working?
If you have a SqlDataAdapter and set the UpdateBatchSize property to 0 (meaning all changes in a single round-trip) or a value greater than 1 (indicating the number of rows per batch), we’ll enable batching. The first thing you’ll want to do is use SQL Profiler to see how the statements are being batched. However, you’ll find out that you still see one RPC execution for each DataRow that needs an INSERT/UPDATE/DELETE.
For example, let’s take this trivial code snippet that uses batching:
DataTable t = new DataTable();
t.Columns.Add("a", typeof(int));
t.Rows.Add(1);
t.Rows.Add(2);
// don't hardcode connection strings in your apps, this is just a quick sample :)
// pre-create the table with "CREATE TABLE t (a INT)"
using(SqlConnection conn = new SqlConnection("server=tcp:localhost; integrated security=true; database=test")) {
conn.Open();
SqlDataAdapter da = new SqlDataAdapter(null, conn);
da.InsertCommand = new SqlCommand("INSERT INTO t VALUES (@a)", conn);
da.InsertCommand.Parameters.Add("@a", SqlDbType.Int, 0, "a");
da.InsertCommand.UpdatedRowSource = UpdateRowSource.None;
da.UpdateBatchSize = 10;
da.Update(t);
}
The filtered SQL profiler output will look like this:
What’s happening there is that you effectively have a separate RPC for each statement, so the profiler is accurate. Batching is actually happening at a lower level in SQL Server. If you really want to see batching working, you’ll have to use a lower-level tool such as “netmon” or some other network monitor. Using netmon you would be able to see that all the RPCs are effectively sent together, and then all the responses are also sent together by the server. For example, for this trivial program that does 2 inserts in a single batch, here is the netmon trace for the packet that contains the execution:
00000000 00 06 5B BD 30 E5 00 30 6E 4C 31 63 08 00 45 00 ..[.0..0nL1c..E.
00000010 01 20 B9 EC 40 00 80 32 11 13 9D 37 79 43 9D 37 <....@..2...7yC.7>
00000020 7A FA E9 BA 57 A2 00 00 01 0D 12 1F 05 99 83 57 z...W..........W
00000030 47 89 AF 31 33 AF 50 18 42 C8 36 94 00 00 03 01 G..13.P.B.6.....
00000040 00 DB 00 00 01 00 FF FF 0A 00 00 00 00 00 E7 32 ...............2
00000050 00 09 04 D0 00 34 32 00 49 00 4E 00 53 00 45 00 .....42.I.N.S.E.
00000060 52 00 54 00 20 00 49 00 4E 00 54 00 4F 00 20 00 R.T...I.N.T.O...
00000070 74 00 20 00 56 00 41 00 4C 00 55 00 45 00 53 00 t...V.A.L.U.E.S.
00000080 20 00 28 00 40 00 61 00 29 00 00 00 E7 0C 00 09 ..(.@.a.).......
00000090 04 D0 00 34 0C 00 40 00 61 00 20 00 69 00 6E 00 <...4..@.a...i.n>.
000000A0 74 00 02 40 00 61 00 00 26 04 04 01 00 00 00 80 t..@.a..&.......
000000B0 FF FF 0A 00 00 00 00 00 E7 32 00 09 04 D0 00 34 .........2.....4
000000C0 32 00 49 00 4E 00 53 00 45 00 52 00 54 00 20 00 2.I.N.S.E.R.T...
000000D0 49 00 4E 00 54 00 4F 00 20 00 74 00 20 00 56 00 I.N.T.O...t...V.
000000E0 41 00 4C 00 55 00 45 00 53 00 20 00 28 00 40 00 A.L.U.E.S...(.@.
000000F0 61 00 29 00 00 00 E7 0C 00 09 04 D0 00 34 0C 00 a.)..........4..
00000100 40 00 61 00 20 00 69 00 6E 00 74 00 02 40 00 61 @.a...i.n.t..@.a
00000110 00 00 26 04 04 02 00 00 00 01 02 03 04 05 06 07 ..&.............
00000120 07 06 90 CE 9B 4E 42 21 15 E4 56 DD 44 E4 .....NB!..V.D.
As you can see, both inserts are in the same network packet. No need to understand the whole thing – it’s enough to notice that there are two insert statements in the payload.I hope this helps clarify doubts around the topic. If you have further questions feel free to post them here and we’ll take a look.
Pablo Castro
Program Manager - ADO.NET Team
Microsoft Corp.
Disclaimer: This posting is provided "AS IS" with no warranties, and confers
Comments
Anonymous
May 19, 2005
Do you plan to implement batching for
OracleClient just like SqlClient
by release timeframe ?Anonymous
May 19, 2005
Ricky, Yes, we will support Batching in Oracle client in RTM. Our implementation will be based on strategy#1 mentioned in the blog. As pointed by Pablo in the article above " ..OracleClient still uses this technique (mapped to its PL-SQL equivalent)".
Sorry for the confusion, if any.
Thanks!
Sushil Chordia [MS]Anonymous
May 19, 2005
I like the reduced round trips but the fact that the data is sent over the wire in string form seems to be inefficient. Is there a way in ado.net 2 to send a group of inserts (1000+) over in binary form? In binary form an int stays and int (4 bytes). I know in sql server 2000 I have to use bulk insert and that too requires everything to be string.Anonymous
May 19, 2005
Nice one PabloAnonymous
May 20, 2005
Hi Paul,
Actually, only string data is sent to the server in string form. Other data is sent in their original format; that is, if you give us an int in a SqlParamter value where the parameter type is set to SqlDbType.Int, then we put an int on the wire (4 bytes). Same thing for all other types.
-pabloAnonymous
May 22, 2005
Looks good - any reason why we can't access the AddBatchCommand and ExecuteBatchRPCCommand? It would be very useful to have this level of functionality available without having to go through the DataAdapter.Anonymous
May 25, 2005
Pablo Castro, Microsoft's Program Manager of the ADO.NET Team, has a good post last week that helps clarify...Anonymous
August 14, 2005
Thanks Pablo, for the very informative post.
Yes, we've reduced network round trips. But how do I enable an all-or-nothing batch transaction across these multiple RPCs? Is it my imagined problem? All I need to do is to call BeginTransaction() on the connection? Or is there something more to it?Anonymous
September 09, 2005
Hi Pablo,
I havent seen a very dramatic improvement in performance. Here's what I observed.
BatchUpdateSize = 100
For 3000 rows
with 4 columns being updated takes 3 Mins
With out setting the batchupdatesize it takes 3:40 seconds.
Its just 40 seconds. I was expecting it to be really fast.Anonymous
September 20, 2005
Nice to see your solution but how about if I have 3 insert sql to different tables and I want to batch it in 1 round trip to the DB?
1. Insert into Table1 with 3 parameters
2. Insert into Table2 with 4 parameters
1. Insert into Table3 with 10 parameters
Is this possible with the sample you gave? Or do I still need to concat these sqls with ";"?Anonymous
September 26, 2005
Chris, We dont have batching at command level for .Net Framework version 2.0. Batching is only supported at the Adapter level. Meaning all the update, insert and delete commands on the adapter are the only ones that get batched. Hope this helps.Anonymous
September 27, 2005
That helped a lot. Thanks!Anonymous
February 16, 2006
Quick question:
Assuming the application is a web based and assuming a thin client architecture.
Are the batching RPC calls originating from the web-based client or from the application host server (the web server).
ThanksAnonymous
February 28, 2006
Thanks.
This clears up the confusion caused after following your webcast which demoed the new features of ADO.NET 2.0 - standalone DataTables, binary serialization, db agnostic access, batching etc.
In my own tests, I saw 3 updates in Sql profiler after setting UpdateBatchSize to 100 unlike your example in the webcast which showed only 1 so I thought that either batching was not working or some other parameter needed to be set to turn batching on.
This really cleared it up. Thanks again.Anonymous
March 02, 2006
Hi,
Is there any way to get the Row on which an error occurred after/during processing of a batch.
ie. I am doing a batch insert of 2000 records and record 30 has an error (eg. data is too long for field). In the RowUpdated event it tells me of the error but the DataRow is always the last one in the batch? I have also tried looking at the datatable AFTER the update has been done and ALL the rows have their 'error flag' set?
Can I find out which row/s really had the error?
thanks.Anonymous
May 05, 2006
The following links to .NET resources have been collated over time with the assistance of colleagues.&nbsp;...Anonymous
May 20, 2006
dear sir i want to change the update record by record done by the gridview control and make it bayched - i mean group of records together in one ransaction
can i di thisAnonymous
May 29, 2006
I had a quick look at the doco but couldn't find a way to accomplish this with my own custom collections. Is this possible?
eg something like the following:
SqlCommand cmd = new SqlCommand();
cmd.BeginBatch();
foreach (Item i in Coll c)
{
cmd.Command = "insert into t1 ... ";
}
cmd.ExecuteBatch();Anonymous
July 07, 2006
An excellent post by Pablo Castro of&nbsp; the ADO.NET team explaining the mechanics of how batch updates...Anonymous
September 08, 2007
The comment has been removedAnonymous
May 31, 2009
PingBack from http://outdoorceilingfansite.info/story.php?id=22796Anonymous
June 13, 2009
PingBack from http://firepitidea.info/story.php?id=969