CASE To The Rescue Again: Order Of Operations In SQL Server
You might remember way back when, when we discussed the fact that sometimes we inherit systems -- and the design decisions which spawn them -- which don't quite match how we'd approach things. A recent exchange on one of the Microsoft SQL Server discussion lists highlights this wisdom, and also offers up a nifty programming insight.
David wrote:
This works great:
SELECT [BinaryName]
, [SimpleVersion]
, CONVERT(float, [MetricValue]) as ChurnValue
FROM [BRATDB].[dbo].[BinaryBuildMetricView]
WHERE MetricName='ChurnDensity'
ORDER BY SimpleVersion, ChurnValue DESC
But I’m only look for ‘ChurnDensity’ records on a specific BinaryName. When I change the WHERE line to:
WHERE MetricName='ChurnDensity' AND BinaryName='wininet.dll'
I get the following error:
Msg 8114, Level 16, State 5, Line 1
Error converting data type nvarchar to float.
It seems like the query optimizer is getting all the MetricValues for a BinaryName and trying to CONVERT all of them instead of doing a query to narrow down the MetricValue column first to only contain ‘ChurnDensity’ values (which will cause the CONVERT to succeed) and then filtering that list by BinaryName.
When I asked David for more insight into the contents of the [MetricValue] column, he wrote:
If MetricName=’ChurnDensity’, its matching MetricValue will always be CONVERT-able to a float (I enforce that outside the database, plus I’ve manually verified that every ChurnDensity’s value can be converted in my current data set).
Now, if MetricName <> ‘ChurnDensity’, chances are, it’ll be some string that can’t be converted…which is why I want to avoid selecting those records before the CONVERT happens.
I know this begs the question why values of different types are being shoved into MetricValue(nvarchar)…but let’s just let sleeping dogs lie on that…
So, poor David is stuck with a design I hope no reader of this blog would countenance -- what an old boss of mine used to derisively call a "multi-use" column. This column is so multi-use that the underlying type of the column's contents is different, depending on the contents of another column.
Umachandar replied to David's query with a wonderfully complete discussion of his solution:
You cannot guarantee that the CONVERT happens last. It depends on the query plan. There are cases in SQL2000 also where a query that does these types of operations will work some time. The only way to change order in which expressions are evaluated is to use a CASE expression. So in your example, you should modify the SELECT list like:
,CASE MetricName WHEN 'ChurnDensity' THEN CONVERT(float, [MetricValue]) END as ChurnValue
This will ensure that the expression if evaluated for rows that doesn’t have MetricName = ‘ChurnDensity’ does produce run-time error.
Note that even though the ANSI SQL standard describes how various parts of the SELECT statement are evaluated, lot of it is left to implementation on how to produce the results. SQL Server 2005 is even more aggressive in this area to better match computed column expression for example. See the “Behavior Changes to Database Engine Features” topic in SQL Server 2005 BOL ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/instsql9/html/65eaafa1-9e06-4264-b547-cbee8013c995.htm.
I can't add much to this except to thank both David and Umachandar for consenting to my blogging of their exchange. Hopefully this little trick will come in handy if you find yourself inheriting a multi-use column.
You'd never design one of those yourself, would you? :-)
Comments
Anonymous
January 01, 2003
Thanks for your question, Spazecaze. I've posted an answer here. -wpAnonymous
February 21, 2006
Can you help settle a debate for me? I'm arguing with someone in a forum about whether an ORM framework should allow for subsets of columns within a generated class.
My contention is that a "row" (or in the case of ORM, a "record") should handle the entire row as an atomic entity, and that, as he refers to them, "selective updates" aren't a great idea. One of my arguments to support this is the fact that the entire row is locked, opened, altered, committed, and released... but I'm not sure if that's the case or not and I've spent about 2 hours researching online to find it.
Finally Google brought me to your blog. :)
So... the question is: As it appears (mostly from looking at how triggers work), a T-SQL update operation reads the row to be edited, concatenates the changes, then writes the whole thing to the table... but is that really how it works?
Thanks in advance,
JAnonymous
February 21, 2006
Thanks for your question, Jared. I've answered here.Anonymous
February 11, 2008
So how would you go about designing a database that allows for end user defined fields? For instance, a system is built to allow users to enter data into fields on a form. These fields are defined by a different user as being either date, int or string. So you could have x number of y type fields on a form. How would you store each data type in a different typed column and retrieve the correct value from the correct column? It is analogous to designing a database to store metadata.