Share via


Transact-SQL: Join Order


Background

You can divide the task of "executing a query process around T-SQL" into two groups: (1) Processes that occur in the relational engine, and (2) Processes that occur in the storage engine.

The relational engine is responsible for several processes: (1) parsing the query text, (2) processing the information using the Query Optimizer, which is responsible to generate the Execution plan, and (3) sending the plan (in a binary format) to the storage engine.

The storage engine retrieves or updates the underlying data. The storage engine includes processes such as locking, index maintenance, and transactions.

This article discusses the execution plan, which is generated by the Query Optimizer, while executing queries with Join operations. We will focus on one specific issue: in what order our Join operations will be executed, and how we can control it. We will show you techniques that you can put in your toolkit!

Our Case studies

We will discuss several different specific cases, from the simple query to more complex situations where the execution plan will depend on a lot of variables, such as the table's size and number of rows, the availability of statistics, indexes, key distribution, etc. In the last paragraph, we will try to summarize the golden rules that explain most of the common issues that you'll face. 

Default behavior of simple JOIN queries

For this demonstration, we will create three tables. We will populate two tables with a very low number of rows (100 and 500) and the third table with a large number of rows (1000000).

/************************************************************/
/*********        Join  Order, By Ronen Ariely       *********/
/************************************************************/
USE master ;
GO
CREATE DATABASE  JoinOrderDB
ON
( NAME  = Sales_dat,
    FILENAME = 'G:\SQL\Databases\JoinOrderDB.mdf',
    SIZE = 2 GB,
    MAXSIZE = 3GB,
    FILEGROWTH = 250MB )
LOG ON
( NAME  = Sales_log,
    FILENAME = 'G:\SQL\Databases\JoinOrderDB.ldf',
    SIZE = 1GB,
    MAXSIZE = 2GB,
    FILEGROWTH = 100MB ) ;
GO
  
use JoinOrderDB
GO
  
/**************************************************** DDL - Create  tables */
IF OBJECT_ID('T1', 'U') IS NOT NULL  DROP TABLE  T1;
IF OBJECT_ID('T2', 'U') IS NOT NULL  DROP TABLE  T2;
IF OBJECT_ID('T3', 'U') IS NOT NULL  DROP TABLE  T3;
   
CREATE TABLE  T1 (
    ID INT  IDENTITY CONSTRAINT  PK_T1 PRIMARY  KEY,
    MyValue bigint
)
CREATE TABLE  T2 (
    ID INT  IDENTITY CONSTRAINT  PK_T2 PRIMARY  KEY,
    MyValue bigint
)
CREATE TABLE  T3 (
    ID INT  IDENTITY CONSTRAINT  PK_T3 PRIMARY  KEY,
    MyValue bigint
)
GO
   
/**************************************************** DML - Populate Tables*/
SET NOCOUNT ON;
  
insert T1 (MyValue)
select top  100 CAST(CHECKSUM(NEWID()) AS  bigint)
from dbo.Numbers
GO
  
insert T2 (MyValue)
select top  500 CAST(CHECKSUM(NEWID()) AS  bigint)
from dbo.Numbers
GO
   
insert T3 (MyValue)
select top  1000000 CAST(CHECKSUM(NEWID()) AS  bigint)
from dbo.Numbers
GO

Let's examine the influence of the Join order in the query, to the Join operation order in the execution plan. First, we will check the JOIN using two small tables.

Please execute the next queries and examine the Execution plan.

SELECT T1.ID, T2.MyValue
FROM T2
INNER  JOIN T1
ON T1.ID = T2.ID
GO
SELECT T1.ID, T2.MyValue
FROM T1
INNER  JOIN T2
ON T1.ID = T2.ID
GO

Note

>> You can repeat the execution of each query in this article with the setting "SET SHOWPLAN_ALL ON". This causes Microsoft SQL Server not to execute Transact-SQL statements. Instead, SQL Server returns detailed information about how the statements are executed and provides estimates of the resource requirements for the statements.

Notice that while joining the two small tables of the same size order, the join order in the execution plan fits to the join order in our query. Will it be the same if we join tables that have a substantial difference in their amount of rows?

SELECT T1.ID, T3.MyValue
FROM T3
INNER  JOIN T1
ON T1.ID = T3.ID
GO
SELECT T1.ID, T3.MyValue
FROM T1
INNER  JOIN T3
ON T1.ID = T3.ID
GO

It's obvious that in this case the Query Optimizer chooses to use the small table first, and in both cases we get the same Execution plan, regardless of the join order in the query. Let's now join three tables together and examine the execution plan:

select T1.MyValue, T2.MyValue, T3.MyValue
from T3
    join T2 on T2.ID = T3.ID
    join T1 on T1.ID = T2.ID
GO

Notice that the Query Optimizer has changed the join order from T3->T2->T1 into T1->T2->T3. This behavior fits our preview conclusion in Demo_01, that the execution order fits to the number of rows in our tables and our join operations are ordered from the smallest to the largest SET.

Basic Controlling of the Join order

The query optimizer should be intelligent enough to figure out the best order in which to execute the steps necessary to translate your logical instructions into a physical result. It uses the stored statistics that the database maintains on each table to determine what to do (what type of join logic to use for example), as well as what order to perform the operations in, in order to minimize Disk IOs and the processing costs.

In some cases, we will need to have the Query Optimizer keep our query order of operations.

Query Optimizer Rules and Query Hints

The query optimizer uses different rules when exploring different plans to evaluate. For example: SelectToFilter, AppIdxToApp , JoinToIndexOnTheFly, JoinCommute, and so on. We can turn rules off using the rule name and the QUERYRULEOFF hint. The rule called JoinCommute is responsible to the join order in our case, and we can actually turn it off.

Note

The execution plan doesn’t expose the ability to selectively enable or disable rules available to the optimizer.  We can, however, use a couple of undocumented DBCC commands and the undocumented dynamic management view (sys.dm_exec_query_transformation_stats) to explore the way the optimizer uses rules. (Check the "Resources and More Information" section, Inside the Optimizer: Constructing a Plan, by Paul White). The execution plan doesn’t expose the ability to selectively enable or disable rules available to the optimizer. 

Force Order Hint

It specifies that the join order indicated by the query syntax is preserved during query optimization. Using FORCE ORDER does not affect a possible role-reversal behavior of the query optimizer. We can examine the use of the "FORCE ORDER" hint on our queries.

Let's execute the same query, which uses the big table first, but in this case we will use the "FORCE ORDER" hint:

SELECT T1.ID, T3.MyValue
FROM T3
INNER  JOIN T1
ON T1.ID = T3.ID
OPTION (FORCE ORDER)
GO

Using Force Order in our three-tables join:

select T1.MyValue, T2.MyValue, T3.MyValue
from T3
    join T2 on T2.ID = T3.ID
    join T1 on T1.ID = T2.ID
OPTION (FORCE ORDER)
GO

As expected, using the Force Order hint, both queries executed in the order that the queries were written.

Complex JOIN queries & Controlling the Join order

In this section, we are going to examine the behavior of three tables in relation to one another. We will start with DDL + DML, and then we will move to discuss several JOIN cases related to those tables.

/************************************************************/
/*********        Join  Order, By Ronen Ariely       *********/
/************************************************************/
 
 
/**************************************************** DDL - Create  tables */
IF OBJECT_ID('Entities', 'U') IS NOT NULL  DROP TABLE  T1;
IF OBJECT_ID('Countries', 'U') IS NOT NULL  DROP TABLE  T2;
IF OBJECT_ID('EntityOrders', 'U') IS NOT NULL  DROP TABLE  T3;
GO
 
CREATE TABLE  Entities
     (
        EID   INT  NOT NULL IDENTITY PRIMARY  KEY,
        ENAME NVARCHAR(10) NOT NULL
     ) 
GO
CREATE TABLE  Countries
     (
        CID    INT  NOT NULL IDENTITY PRIMARY  KEY,
        CNAME  VARCHAR(50) NOT NULL
     ) 
GO
CREATE TABLE  EntityOrders(
        EOID      INT                NOT NULL IDENTITY PRIMARY  KEY,
        EntityID  INT                NOT NULL,
        CountyID  INT                NOT NULL,
        EOPrice   DECIMAL(18,3) NOT NULL
);
GO
-- Foreign Key Constraints
ALTER TABLE   EntityOrders
ADD CONSTRAINT   FK_Entity FOREIGN  KEY(EntityID) REFERENCES Entities(EID);
GO
ALTER TABLE   EntityOrders
ADD CONSTRAINT   FK_Country FOREIGN  KEY(CountyID) REFERENCES Countries(CID);
GO
 
/**************************************************** DML - Populate Tables*/
INSERT INTO  Entities (ENAME) VALUES ('E-1'),('E-2'),('E-3');
INSERT INTO   Countries (CNAME) VALUES ('IL'), ('US'), ('FR');
INSERT INTO  EntityOrders(EntityID, CountyID,EOPrice) VALUES (1, 1, 20),(2, 3, 200);      
GO

Query 1: Execute a simple Join query using the relations:

-- Here we try to use: [Entities] JOIN [EntityOrders] JOIN [Countries]
SELECT *
FROM   Entities AS a
       INNER JOIN  EntityOrders AS  b ON  a.EID       = b.EntityID
       INNER JOIN  Countries    AS  c ON  b.CountyID  = c.CID;
-- but we get: [EntityOrders] JOIN [Entities] JOIN [Countries]

Query 2: Execute the same Join query with a QUERYRULEOFF JOINCOMMUTE hint:

SELECT *
FROM   Entities AS a
       INNER JOIN  EntityOrders AS  b ON  a.EID       = b.EntityID
       INNER JOIN  Countries    AS  c ON  b.CountyID  = c.CID
OPTION ( QUERYRULEOFF JoinCommute);
-- Now we get: [Entities] JOIN [EntityOrders] JOIN [Countries]

Query 3: Execute the same Join query with a FORCE ORDER hint:

SELECT *
FROM   Entities AS a
       INNER JOIN  EntityOrders AS  b ON  a.EID       = b.EntityID
       INNER JOIN  Countries    AS  c ON  b.CountyID  = c.CID
OPTION ( FORCE ORDER);
-- Now we get: [Entities] JOIN [EntityOrders] JOIN [Countries]

* The execution plan is the same as in the preview query.

Query 4: Execute the same Join query with a JOIN type hint:

SELECT *
FROM   Entities AS a
       INNER JOIN  EntityOrders AS  b ON  a.EID       = b.EntityID
       INNER JOIN  Countries    AS  c ON  b.CountyID  = c.CID
OPTION (MERGE JOIN);
-- Now we get: [Countries] JOIN [Entities] JOIN [EntityOrders]

Query 5+6: Execute the same Join query using Brackets and Filter to control the JOIN order:

SELECT *
FROM Entities AS a
INNER JOIN (
    select * 
    from EntityOrders as b
    INNER JOIN Countries  AS  c ON  b.CountyID  = c.CID
    where c.CID < 10
    ) T on  a.EID = T.EntityID
-- Now we get: [EntityOrders] JOIN [Countries] JOIN [Entities]

* in this query we use a filter which actually do not filter any row out. It is only used for re-ordering the JOIN order.

Same solution can be done using this query:

SELECT *
FROM  
    Entities AS  a
    INNER JOIN  EntityOrders AS  b ON  a.EID       = b.EntityID
    INNER JOIN  (
        select * 
        from Countries c
        where c.CID < 10 
        ) T ON  b.CountyID  = T.CID

Query 7: Execute the same Join query using GROUP BY to control the JOIN order:

SELECT *
FROM  
    Entities AS  a
    INNER JOIN  EntityOrders AS  b ON  a.EID = b.EntityID
    INNER JOIN  (
        select MIN(c.CID) CID,CNAME
        from Countries c
        group by  c.CNAME
    ) T ON  b.CountyID  = T.CID

We can force the optimizer to process a derived table by simply placing a GROUP BY in your derived table. The optimizer is then obligated to resolve the GROUP BY aggregate before it can consider resolving the join between the two tables.

* We can find many more ways to influence the order of operations. As long as we control the understanding of order of operations on the server, then we have a way to change this order. To start out, it would be good to look up the order in which Query Optimizer basically build the execution plan:

  1. FROM clause
  2. WHERE clause
  3. GROUP BY clause
  4. HAVING clause
  5. SELECT clause
  6. ORDER BY clause

Note

* This article is based on SQL Server 2014 Enterprise edition, using a small virtual machine with 2 CPUs and 4 GB of shared memory, on Windows Server 2012 R2.

* We can't always predict the join order, and you might get different results from those shown in this article!

Conclusions and General Rule of Thumb

We saw that the execution plan depends on lot of variables, such as table size and number of rows, availability of statistics, indexes, key distribution, etc. Moreover, we have a lot of options to change order of operation or even the whole Execution plan by using hints and/or writing the same query with a slightly different format.

This list summarize several options we can put in our toolkit for future use:

  • By default, JOINs are done in left to right order.
  • Tables with less amount of rows pushed to the beginning of the process.
  • In extreme situations with complex JOIN query, The Optimizer may fail to find the best solution, and it build an Execution Plan with order of operation almost similar to the original query.

Since the Execution plan depended on huge amount of variables which part of them we can control or know but lot of them we have no way to control or even know about, this is impossible to write rules that will be applied always. Remember that same query on different database can result in different execution plan, and different order of joins. We can however write some General Rule of Thumb (Golden rules) which work in most cases:

  • We can try to change the order of JOINs in the query.
  • We can control the Join Order using hints as QUERYRULEOFF JOINCOMMUTE, FORCE ORDER, etc.
  • We can use different Join hints to control the type of join, which have side effect of changing the join operations order as well.
  • We can control order of operation by using Brackets, Filters, GROUP BY, and so on.

Resources and More information

Code

Resource

More information

See Also