T-SQL: Delete All Rows From a Table Except Top(N) Rows
Introduction
How to delete all rows from a table except the Top (N) rows, for example, leaving only the top (1000) rows? This article is the right answer to such questions and others like this one. All Code samples in this article are downloadable from this link.
Problem
To explore this scenario we need a test database. The following code creates a sample table and inserts just five rows into it.
CREATE TABLE ErrorLog
(
ErrorLogId BIGINT ,
ErrorDate DATETIME
);
GO
INSERT dbo.ErrorLog
( ErrorLogId, ErrorDate )
VALUES ( 1, '2015-01-01' ),
( 2, '2015-01-02' ),
( 3, '2015-01-03' ),
( 4, '2015-01-04' ),
( 5, '2015-01-05' )
GO
The problem is to delete all rows but does not touch the top (2) rows ( based on the ErrorDate values). The script to solve this problem would be the following:
DELETE
FROM dbo.ErrorLog
WHERE
ErrorLogId NOT IN ( SELECT TOP ( 2 )
ErrorLogId
FROM dbo.ErrorLog
ORDER BY ErrorDate )
Problem solved.
But is this a good - efficient - solution?
As illustrated in the next picture, this query scans the index two times. This approach is fine for smaller tables but could be a problem with huge tables What solution could offer better performance?
Solutions
Huge Table
Deleting data from a huge table takes a long time to complete. In such a situation, we have two alternatives:
Truncate Table
In this solution, we can insert the remaining data into a temporary table, truncate the original and finally insert data back in from the temporary table. The following script illustrates the approach:
SELECT TOP 2 *
INTO #temp
FROM dbo.ErrorLog
ORDER BY ErrorDate
--fast delete
TRUNCATE TABLE dbo.ErrorLog
INSERT dbo.ErrorLog
SELECT *
FROM #temp ;
Delete in Chunks
Truncate table has some restrictions. One important ( and unfortunately quite common) limitation is that we cannot truncate a table which is referenced by a Foreign Key constraint. If we cannot use the truncate table, the delete command is the alternative to use. In order to reduce the transaction size, we can use an old technique and apply the delete in segments. The following script illustrates this approach:
SELECT TOP 2
*
INTO #temp
FROM dbo.ErrorLog
ORDER BY ErrorDate
-- delete in chunks
DECLARE @RowCount INT = 1;
WHILE @RowCount > 0
BEGIN
BEGIN TRANSACTION;
DELETE TOP ( 25000 )
dbo.ErrorLog
WHERE ErrorLogId NOT IN ( SELECT ErrorLogId
FROM #temp );
SET @RowCount = @@ROWCOUNT;
COMMIT TRANSACTION;
END
Small Table
Back to the problem section, we saw one solution to solve this issue for small tables. If a user is using SQL Server 2012 or higher version, he can use OFFSET N ROWS in Order BY clause that helps in achieving better performance in this problem. The following code show this solution:
;WITH cte AS
(
SELECT ErrorLogId
FROM dbo.ErrorLog
ORDER BY ErrorDate
OFFSET 2 ROWS
)
DELETE cte ;
As illustrated in the next picture, this query scans the index one time which leads to better query performance. Because of existing this new feature, we could get rid of NOT IN. So, we can directly delete the final rows.
Conclusion
This problem shows how OFFSET N ROWS as an extra feature introduced in SQL Server 2012 can impact our queries. Also, we saw three traditional solutions for solving this problem.