Advanced Performance Tuning – 2 :: Which side of the operator
If you have read my earlier blog post on the Importance of data-types, you might be wondering, “What the heck is going on here?”
I had already written in that post the reason behind the slow performance – the EmpCountry column in the Employee table is CHAR(2) which cannot be directly compared with NCHAR(2); hence, SQL Server is forced to CONVERT the EmpCountry column values to NCHAR(2) before it can perform the comparison. Again, SQL Server does not know what the column values will look like after converting the Non-Unicode character values to Unicode. Hence, it converts each column value to Unicode and then compares the result with the value passed. Hence, the Index Scan.
What I want you to note here is the CONVERT_IMPLICIT operation that SQL Server performs; it applies this operation against each value of EmpCountry in the table. Wouldn’t it have been much easier if SQL Server was to convert the @Country parameter to CHAR(2) and then do the comparison? Well, probably… probably not. There is a huge difference between Unicode and Non-Unicode character sets; why will the designers of SQL Server take so much risk to convert the user defined value to another character set and then run into the risk of doing an inappropriate comparison?
Also, the conversion is not always so easy. Ok, lets take some examples.
Scenario 1:
Assume that the CEO of the company wants information about Employees who have joined the Organization less than an year ago. Most of the developers that I have come across, will prefer writing the query as below:
SELECT <Column List>
FROM <Table Name>
WHERE DATEDIFF(d, Join_Date, GETDATE()) < 365
Can this function be Reversed Engineered to get a Join_Date which is exactly 365 days earlier than the Current Date? Well, yes, and its very easy to do so as well.
Will SQL Server do it? No, because SQL Server has not been instructed to do so. Why will SQL Server modify the query [to: WHERE Join_Date > DATEADD(d, –365, GETDATE())]? Although it looks like an easy change considering the simplicity of the query, but in real life, queries will not be so simple. Also, such a change in the query can cause implemented Business Logic to change, and there by provide incorrect results.
The result of this query? Obviously a Table Scan will follow (considering that this is a one time query and there will be no Indexes on the Join_Date), and the query will take minutes to complete.
Scenario 2:
Assume that the Head of the Payroll Team wants information about Employees who have paid more than, say, 10,000 USD as Tax. The database has a function fn_CalculateTaxPaid that takes the Annual Salary as input and provides the amount of Tax that this accounts to. You will probably now write the query as:
SELECT <Column List>
FROM <Table Name>
WHERE fn_CalculateTaxPaid (Annual_Gross) > 10000
Think: Is there a way to Reverse Engineer the fn_CalculateTaxPaid function, and calculate the Annual_Gross that corresponds to a Tax of 10,000 USD? Probably yes, but is that easy? I do not think so.
Now, lets complicate matters a bit further. Lets say, the fn_CalculateTaxPaid function takes the EmpID as input, and calculates the Tax based on his/her Annual_Gross, Other_Income, Deduction1, Deduction2, etc. Now the Query will look like:
SELECT <Column List>
FROM <Table Name>
WHERE fn_CalculateTaxPaid (EmpID) > 10000
Now, is there a way to Reverse Engineer the function to get a list of EmpIDs that have paid a Tax more then 10,000 USD? I hope you are getting the point now.
So, now, what will SQL Server do when it encounters such a query. Obviously, a Table Scan or an Index Scan or a Clustered Index Scan will follow (depending on whether there are supporting Indexes and the type of the Indexes) and the function will be applied against each row of the table, the resulting value will be compared with the input, and the result set formed.
Conclusion:
What I am trying to explain in this post is, use of Functions, Calculations, or for that matter, any operation on the Left of the Operator will cause bad execution plans. In Scenario 1, the operator was Greater Than ( > ), while in the second, the operator was Less Than ( < ). And in both cases, we had asked SQL Server to perform an operation to the Left of the Operator, which caused the queries to choose bad execution plans. If we can avoid the use of Functions or Calculations on the Left of the Operator, we can use the existing Indexes to generate a cost effective Execution Plan, there by decreasing the overall Execution Time (which might mean shorter tea-breaks for some – like me ).
Disclaimer: All information provided here is my personal opinion and is neither verified nor approved by Microsoft before it is published. All information, and code samples, if any, is provided "AS IS" with no warranties and confers no rights.
Comments
Anonymous
August 21, 2011
What does the side of the operator have to do with it? Shouldn't the conclusion be that you should prefer to do calculations on the parameters of your query rather than on the contents of the tables you're searching?Anonymous
August 21, 2011
Correct Peter. Calculations must always be done on the parameters and not on the contents of the table. Side of the operator is just to make it simplier to understand, - as we all write queries in the form "select Col1 from Table1 where Col2 = value" and not as "select Col1 from Table1 where value = Col2". Hence avoiding calculation on the left of the operator is th same as avoiding calculations on the contents of the table.