Conor vs. Updates – Talk (for those of you in Austin)
I am giving a talk tomorrow (Tuesday) night at the Austin PASS group on how Update Queries are processed/optimized in SQL Server.
The talk will be at the Microsoft office in Austin (which recently moved).
More details can be found at the CACTUS web page. (I don’t know if they have my talk announcement up there yet, but hopefully it will be there soon)
(I believe that I will be giving a longer version of this talk at the national PASS conference in Seattle in November, for those of you not lucky enough to be in Austin :)).
I hope you see you there!
Conor
Comments
Anonymous
August 17, 2010
The comment has been removedAnonymous
August 20, 2010
"It depends". If the semantics of your application need all-or-nothing, then do txns at that scope. If the desired semantics is to get reasonable performance but allow some work to commit before the whole job is done, smaller transactions mean that you hold locks for less time and generally promote concurrency. I don't know enough about your app to know whether the application NEEDS isolation. I can tell you that most batch jobs have other notions of backup/recovery (ETL may go into a staging table and the whole thing can just be dropped at the logical level instead of worrying about txns). I would start with the business requirement and then work towards the "how does it get implemented w.r.t. txns?" question.Anonymous
August 20, 2010
While not in this particular code situation, we were running another code transformation process where we had 4 million rows of mainframe source code in a table. My stored procedures did updates to translate all occurrences of (for example) VAR-ABC to VarAbc. Based on what you were saying in the meeting, this would have generated two spoolers to force the updates to operate sequentially. In this case, since the updates were occurring to different columns of the same table, so 'sequential' would occur by default. When I tried creating Oracle stored procedures that did updates on hundreds of thousands of rows, it failed with a message to the effect that I had exceeded my rollback segment (or some term to that effect). To complete this update, I had to create a cursor, scroll through 1000 records applying my updates, then commit the 1000 update transaction. On the one hand, I was horrified at the bother, on the other hand, I could see that this probably forced better performance. On a developer machine, one doesn't worry about concurrency or deadlocks. The local instance was, in effect, the production instance. Based on the fact that each row update creates a new record, however, this means our 4 million row table was putting 40 million rows in tempdb, under the assumption that there is no periodic commit. So the question becomes: is it best to have fewer columns in the table and run separate stored procedures on separate tables to generate (for example) the SQL schema and the VB classes? Better to begintrans and commit around each group of closely related updates (i.e., updates on string fields followed by updates on numeric fields followed by updates on date fields)? Or, better to reduce rowcounts if possible? We could do that down to the IMS table level, say 250 database columns with 20 repeating subgroups. Sometimes someone can look at stuff like this and know instantly how they would do it better. On the one hand, you may not be able to remark on that in the capacity of a Microsoft employee, on the other, the approach I described may be as efficient as any alternative.Anonymous
August 20, 2010
Personally, I would probably do this in batches. It likely would improve performance and also would give me the chance to restart the operation mid-way. However, it is not a requirement in SQL Server to do so. A lof of other factors would play into my ultimate decision, including load on the source and target machines, how long the operation took on the hardware, etc. It sounds like batching would be very reasonable in this case given that the Oracle equivalent was batching.