Share via


T-SQL: Remove Duplicate Rows From A Table Using Query Without Common Table Expression

Introduction

Another TechNet article by sqlsaga discusses removing duplicates from a table using Common Table Expression.

That article can be accessed here: How to Remove Duplicates from a Table in SQL Server.

This article explores our possibilities of refraining ourselves from using 'with common table expression' to achieve the same result.

This article shows how to remove duplicate rows from a table in SQL Server without a common table expression.

Prerequisite

If we take the same example of the table used in the article, How to Remove Duplicates from a Table in SQL Server as follows:

CREATE ``TABLE People ``(


 Name varchar(50) NOT NULL,
 City varchar(30) NOT NULL,
 [State] char(2) NOT NULL
);
  
INSERT INTO  People(Name, City, [State])
VALUES
  ('John', 'Dallas','TX'),
  ('Mark', 'Seattle','WA'),
  ('Nick', 'Phoenix','AZ'),
  ('Laila', 'San Jose','CA'),
  ('Samantha', 'Tulsa','OK'),
  ('Bella', 'San Antonio','TX'),
  ('John', 'Dallas','TX'),
  ('John', 'Dallas','TX'),
  ('Mark', 'Seattle','WA'),
  ('Nick', 'Tempe','FL'),
  ('John', 'Dallas','TX');
  
SELECT * FROM People;

The aforementioned script will create the table, People, with 11 records, where 5 records are duplicates (if we consider similar Name field) or 4 records are duplicates (if we consider all of Name, City, State fields). Therefore the aforementioned Select query returns the following rows:

Query To Delete Duplicates Without Common Table Expression

You can execute the following query to delete the aforementioned 5 duplicate records from the table People:

-- Delete any duplicates which have the same Name field in the table


DELETE DUP 
FROM
( 
 SELECT ROW_NUMBER() OVER (PARTITION BY Name  ORDER BY  Name ) AS Val 
 FROM People
) DUP 
WHERE DUP.Val > 1;
 
-- Or else you can also use the following query 
-- to delete duplicates having same Name, City and State fields in the People table
DELETE DUP 
FROM
( 
 SELECT ROW_NUMBER() OVER (PARTITION BY Name, City, State ORDER  BY Name, City, State ) AS  Val 
 FROM Peole
) DUP 
WHERE DUP.Val > 1;

The aforementioned query (or queries) uses an inner query to construct a view over the People table based on ROW_NUMBER within a partitioning of a result set. Now, if you run the select statement on People table once again you can find that the duplicates have been removed effectively as follows:

Dealing With Real World Tables To Remove Duplicates

MS SQL Tables in the real world will have primary key column most of the times, unlike the example that was presented above. In case your tables do not have the primary key column(s) then you should ask yourself whether the table structure is formed correctly and revisit the design of your table structures to amend it. Now, if we take the same People table into consideration with a small amendment, having an identity primary key column, we can remove duplicates in another way.

Prerequisite

Create Table With Identity Column

Let's create the People table with an identity column, which will also serve as the primary key of the table as follows:

CREATE TABLE  People(
   RowID int not null  identity(1,1) primary key,
  Name varchar(50) NOT NULL,
   City varchar(30) NOT NULL,
  [State] char(2) NOT NULL
);
  
-- Use the same code as in the previous script to populate the table
  
SELECT * FROM People

Removing Duplicates From A Table With Identity Primary Key Column

Execute the following code and all your duplicates are gone forever:

DELETE FROM  People

  WHERE  RowId NOT IN (SELECT MIN(RowId) FROM  People GROUP  BY Name, City, [State]);

And now if you use the select statement on the table, People you will find that the duplicates have been removed effectively by a one-liner query shown above.

References