Scalability LEFT JOIN vs MERGE

RJ 286 Reputation points
2025-01-28T17:04:24.43+00:00

Hi there,

I'm UPSERTING AND handling 70 + millions of rows of UPSERT.

I'm the only developer on the DB.

GP_Gen5_6

LEFT JOIN DURING UNIT TEST OF ALL RECORDS 12 mins

SCALED UP OVERNIGHT PROCESS

GP_Gen5_20

History of existing MERGE process same operation (multiple dates) - 25mins

Yesterday evening I replaced MERGE with LEFT JOIN code and thought my 12 min time should come down to may be 6 mins. However LEFT JOIN code was 18 mins. (still faster than MERGE). but i thought increased compute power will make it even faster than my unit test.

i expected option 1 JOIN/LEFT JOIN to go much faster THAN UNIT TEST at scaled up compute. Am i wrong? am i missing anything?

I noticed when I run left join few times during my unit testing, it seems to be 50% faster than overnight MERGE SCALED UP. However overnight process scales up the compute power i noticed LEFT JOIN still takes the same time as lower compute.

Option 1

Use LEFT JOIN

UPDATE T

C1 = Stg.C1

From StagingTbl Stg

JOIN TargetTbl T on T.ID = STG.ID

INSERT INTO TARGETTBL

SELECT STG.* FROM StagingTbl Stg

LEFT JOIN TargetTbl T on T.ID = STG.ID

WHERE T.ID IS NULL

Option 2

use MERGE

Azure SQL Database
Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,686 questions
{count} votes

Accepted answer
  1. Mahesh Kurva 2,345 Reputation points Microsoft Vendor
    2025-01-28T21:58:06.77+00:00

    Hi @RJ,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    It sounds like you're dealing with a complex and large-scale data operation. Let's break down your situation:

    • Compare the execution plans of the unit test and the scaled-up environment to identify any differences.
    • Use monitoring tools to check for resource contention and ensure that the increased compute power is being utilized effectively.
    • Ensure that the data is evenly distributed across the nodes to avoid bottlenecks.
    • Verify that the necessary indexes are in place to optimize the JOIN operations.
    • Consider breaking down the operation into smaller batches to reduce the load on the system.

    Hope this helps. Do let us know if you any further queries.

     If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Erland Sommarskog 116.1K Reputation points MVP
    2025-01-28T22:34:12.88+00:00

    It's impossible to give an answer specific to the question of what is going on without any knowledge about the tables, indexes and query plans.

    What I can say are general details.

    SQL is a declarative language. This means that you state what you want to achieve with your query. The optimizer attempts to figure out the most efficient way to run the query. This is by no means by a perfect model. The optimizer works from statistics about the data that have been produced by sampling the data. From this information, the optimizer makes an estimate of what is the best plan. A complex query has many possible plans, and there is no way that the optimizer can evaluate all - that would take too much time. So the optimizer has to take shortcuts, and it may also time out.

    It's important to understand that there is no such rule like "LEFT JOIN is faster than MERGE" or similar. One construct may be faster for one data set, but the reverse apply for a different data set.

    Indexing is very important, as they have help SQL Server get to the data faster.

    Generally, as Manesh says, you need to look at query plans to get an understanding of how SQL Server is running the query.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.