T-SQL Update Operations: An Answer For Jared
Jared Rypka-Hauer poses a question here:
As it appears (mostly from looking at how triggers work), a T-SQL update operation reads the row to be edited, concatenates the changes, then writes the whole thing to the table... but is that really how it works?
This description is correct as far as it goes, but a couple of steps are missing.
When SQL Server makes an update, it will first lock the data with the least aggressive strategy acceptable to the optimizer (this might be a row lock, a page lock, or a table lock). It will then write the updates to the transaction log, and write the appropriate pages of the transaction log to disk (which includes both a "before" and "after" image of the updated page -- the source for the virtual inserted and deleted tables available inside triggers). Only then will it write the table pages to disk.
Jared's particular concern is an update statement in an ORM facility:
Can you help settle a debate for me? I'm arguing with someone in a forum about whether an ORM framework should allow for subsets of columns within a generated class.
My contention is that a "row" (or in the case of ORM, a "record") should handle the entire row as an atomic entity, and that, as he refers to them, "selective updates" aren't a great idea.
My contention would be, given that you've already logically factored your data into classes, that you're probably on the right track, Jared. It's important to note that in my original post which spawned your question, I was discussing how to avoid expensive post-processing on data in an UPDATE statement. In that case, we're attempting to update every column in the table, but we don't want to run the expensive function on a column if its data hasn't changed. The net effect of this could be interpretted to be a "selective update", but I don't think the classification is completely accurate.
The other issue to consider here is what constitutes "an atomic entity" in your database. This will be a function of the extent to which you've normalized your schema -- I've got a database in which "an atomic entity" spans six tables; all of those tables must be updated correctly in order to correctly apply this particular unit of work.
So, Jared, with these caveats, I'd say you're on the right track in your design (and your "debate" in the forum). Please let me know if you need any clarification.
-wp
Comments
Anonymous
February 21, 2006
Ward...
Just a note, first off... it's Rypka-Hauer, not Rypka-Howard. Common mistake, no worries. Just figured I'd point it out.
(thanks, Jared, and apologies. I've corrected the post. -wp) As far as the debate on the forum, the "other side" wants to be able to tell the ORM framework to only pass the subset of instance data that's changed since the last write (or since the last read), or, alternatively, to be able to only commit a specific set of instance fields based on an argument to the save() method.
There's only 2 ways I can see to do this. The first, which I guess I prefer if either were to be adopted, is to tell the framework to associate a specific class with a specific subset of columns on instantiation. To clarify, if you could say Factory.createRecord("Table","field1,field2,field3") and get a whole object built from some but not all of the columns in Table. That might be nice in a situation where you wanted to have a User object and an Address object build from {firstname,lastname,email,password} and {address1,address2,city,state,zip}, respectively, and yet maintain that data in one table.
The only other way I can see to make it work effectively is to build a at least a minimal state machine into each object so that there's facility for tracking changes since last read or save. That sounds like a maintenance headache, plus creates "fat" objects that have to do extra work on a low level for each higher-level operation.
For the work involved in making such a scheme come out both performant and maintainable, I just don't see the value. His perspective is that if you're working on sensitive data you need to be able to work on a minimal set of columns per table. My point is along the lines of "If it's that important it should be in a different table altogether." However, he's also trying to make the point that you should only read from and write to the columns that you actually HAVE to for that one particular situation.
Since you've clarified that, so far as SQL Server is concerned, the act of updating a table with new data in all columns or a subset are effectively the same (in that it re-writes the whole row to the table anyway) you've helped me make my closing arguments in the case. ;)
I really do appreciate the quick response.
Thanks,
Jared Rypka-HauerAnonymous
February 22, 2006
It seems to me that "the other side" is missing the fact that if we pass all of the data to the backend, then the backend can determine which data has changed and how to react. This strikes me as a cleaner approach than building multiple CREATE transactions with different scopes.
Since SQL Server is going to lock the whole record for the update, as we've noted, the backend locking profile should be the same for each approach.Anonymous
February 22, 2006
The comment has been removedAnonymous
February 22, 2006
It's a marvelously multi-faceted universe, isn't it? :)
Thanks for your participation in the blog!
-wp