Adventures in TSQL: Using CTEs when performing DELETE operations
In most applications these days there will always be some form of backend database. So hopefully over the coming months I thought it would be a good idea to share some TSQL bits, starting with using Common Table Expressions (CTEs) for managing DELETE operations.
It seems in quite a few applications of late there has been the requirement to batch process, in a distinct order, a series of entries from a table; namely a processing queue. Without delving in broker services this is easily managed using standard TSQL. The requirement is to delete a predefined number of entries from a processing queue, and selecting the deleted entries so they can be used to manage downstream processing.
Table Definition
Before getting into the guts of TSQL here is the definition for the TABLE that will be used during the discussion, and a script to populate the table:
CREATE TABLE [dbo].[ActivityQueue]
(
[ActivityId] int IDENTITY(1,1) PRIMARY KEY NOT NULL,
[ActivityType] int NOT NULL,
[ActivityMessage] xml NOT NULL
);
GO
DECLARE @idx int = 0;
WHILE (@idx < 100000)
BEGIN
INSERT INTO [dbo].[ActivityQueue] ([ActivityType], [ActivityMessage])
VALUES (@idx, '<process>Just a test message</process>');
SET @idx = @idx + 1;
END
GO
Deletion without Ordering
If one was not concerned with the order by which data is pulled from a queue the OUPUT clause of a DELETE operation easily solves the requirement; as show here:
DECLARE @processSize int = 1000;
DELETE TOP(@processSize) [dbo].[ActivityQueue]
OUTPUT deleted.[ActivityId], deleted.[ActivityType], deleted.[ActivityMessage];
If one looks at the query plan for this operation it is a simple Clustered Index scan and delete operation:
This leads to the simple statistics:
- Table 'ActivityQueue'. Scan count 1, logical reads 18
However if one needs to delete from the activity table in a specified order things get a little messier.
Deletion with Ordering
The normal process when one needs to delete entries from a table in a defined order is to first select the required entities, placing them into a temporary table. The temporary table is then used to perform the DELETE operation by performing an INNER JOIN:
DECLARE @processSize int = 1000;
INSERT INTO @queue ([ActivityId], [ActivityType], [ActivityMessage])
SELECT TOP(@processSize) [ActivityId], [ActivityType], [ActivityMessage]
FROM [dbo].[ActivityQueue] WITH (UPDLOCK)
ORDER BY [ActivityId] ASC;
DELETE [dbo].[ActivityQueue]
OUTPUT deleted.[ActivityId], deleted.[ActivityType], deleted.[ActivityMessage]
FROM [dbo].[ActivityQueue] AS AQ
INNER JOIN @queue AS QU ON QU.[ActivityId] = AQ.ActivityId;
The INSERT operation first gives us the following query plan:
Followed by the DELETE and SELECT operation:
Looking at the statistics for both operations we get:
- Table '#17F790F9'. Scan count 0, logical reads 2055
- Table 'ActivityQueue'. Scan count 1, logical reads 18
- Table 'ActivityQueue'. Scan count 0, logical reads 3000
- Table '#17F790F9'. Scan count 1, logical reads 14
As you can see this is a massive jump in read operations. If we also have to select the processed records in the correct order we will end up scanning the temporary table once more:
SELECT [ActivityId], [ActivityType], [ActivityMessage]
FROM @queue
ORDER BY [ActivityId] ASC;
So how can CTEs help in this process?
Deletion using CTEs
With the addition of a CTE we have the option of combining the use of the OUPTPUT statement with the definition of which entities to delete:
DECLARE @processSize int = 1000;
WITH [QueueData] ([ActivityId], [ActivityType], [ActivityMessage])
AS (
SELECT TOP(@processSize) [ActivityId], [ActivityType], [ActivityMessage]
FROM [dbo].[ActivityQueue]
ORDER BY [ActivityId] ASC
)
DELETE [QueueData]
OUTPUT deleted.[ActivityId], deleted.[ActivityType], deleted.[ActivityMessage];
The CTE allows us to define which entities we are deleting along with the necessary ORDER BY clause. If one looks at the resulting query plan you will see we are back to the simple initial case:
As you may have already determined this also means our IO operations are also back down:
- Table 'ActivityQueue'. Scan count 1, logical reads 18
Although CTEs are commonly demonstrated to solve the expanding hierarchy problem they can also be used to solve other TSQL challenges.
Written by Carl Nolan