T-SQL Misconceptions - JOIN ON vs. WHERE

Hello all,

Last week while I was onsite, I was discussing with a customer about some T-SQL querying misconceptions, one particularly caught his interest.
In the subject of JOINs, I was asked if there are there any known issues using search arguments in the ON predicate instead of a WHERE predicate?

It depends! Winking smile

If we are talking about INNER JOINs, as a rule-of-thumb I would say use only JOIN predicates in the ON clause.
Refrain from using search arguments in the ON clause, and use the WHERE clause instead.
Performance wise, there is no difference if the generated plan is the same, like in the example below. But imagine you are reviewing someone else code, wouldn’t you read it better like this?

image

On OUTER JOINs the case is quite different, meaning that it’s not just a cosmetic change, if you will. JOINS can be optimized, as we will see next. One possible optimization with OUTER JOINs is that adding references to the table in the right side of a JOIN to the WHERE clause will convert the OUTER JOIN to an INNER JOIN, just like in the example below:

image

In terms of time, there was an expected difference between both examples (124ms and 144ms respectively) with these small tables, favorable to the INNER JOIN, but that will also depend on the physical JOIN in use.
One other argument from the discussion was that the WorkOrderID columns were part of the PK on both tables, and that allowed the query optimizer to perform this JOIN transform optimization, in the sense that the search predicate was SARGable.
Well, watch as a similar query goes thru the same optimization even when the search predicate is not a SARG.

image

With a different spin to this exercise, we can also observe other optimizations. Make notice of the following example:

image

In the first query, as seen before, the LEFT JOIN was transformed into an INNER JOIN. In the second query, the LEFT JOIN was transformed into a RIGHT JOIN.
In a LEFT JOIN all data is returned from the left table. On the right table, the matching data is returned in addition to NULL values where a matching record exists in the left table, but not in the right table.
The opposite is true for a RIGHT JOIN. As such, we can change either the order of the tables in the JOIN statement, or the type of JOIN itself, and it will still yield the same results.
This optimization weighs the inputs and decides based on the estimated row count from each.
In this case we have 72591 rows from table WorkOrder and 18554 rows from table WorkOrderRouting. Therefore it is cheaper to get the rows from the larger table and match the ones from the smaller, or to put it in other words, the RIGHT JOIN is cheaper than the LEFT JOIN.

For the sake of argument, lets check how the plan would turn out if this optimization hadn’t been done, by adding a join hint that will not allow the optimizer to do its job fully. Notice below how by not optimizing and keeping the JOIN in the statement, the LEFT JOIN would be more costly, and even leverage on parallelism.

image

That was an interesting talk I though I should share.

Until next time!

Disclaimer: I hope that the information on these pages is valuable to you. Your use of the information contained in these pages, however, is at your sole risk. All information on these pages is provided "as -is", without any warranty, whether express or implied, of its accuracy, completeness, fitness for a particular purpose, title or non-infringement, and none of the third-party products or information mentioned in the work are authored, recommended, supported or guaranteed by Ezequiel. Further, Ezequiel shall not be liable for any damages you may sustain by using this information, whether direct, indirect, special, incidental or consequential, even if it has been advised of the possibility of such damages.

Comments

  • Anonymous
    February 13, 2012
    The comment has been removed

  • Anonymous
    February 18, 2012
    Thanks Eduardo. Be sure to check in from time to time :-)

  • Anonymous
    January 21, 2013
    nice food for thought

  • Anonymous
    January 21, 2015
    I work w/ tables that have millions of rows in MS SQL 2008. I have frequently observed massive performance gains when moving criteria such as value checks from the where clause to the "On" in a join. I have no idea why this is, does anyone?

  • Anonymous
    February 29, 2016
    The comment has been removed

  • Anonymous
    February 29, 2016
    I typically leave the criteria in the ON then nicely format everything so it reads well. Also If I want to change from an Inner to an Outer Join in the future, I can do that by just changing the join type, no changes to the where.