Find duplicate rows in SQL

Question

Find duplicate rows in SQL

Jonathan Brotto 420

I don't mind the approach but was I was looking for a PO within my database and found duplicate rows. Like, have a way to catch these.

Yitzhak Khabinsky 26,486 Reputation points

2024-07-30T20:10:11.51+00:00

While asking a question you need to provide a minimal reproducible example:

(1) DDL and sample data population, i.e. CREATE table(s) plus INSERT, T-SQL statements.

(2) What you need to do, i.e. logic, and your attempt implementation of it in T-SQL.

(3) Desired output based on the sample data in the #1 above.

(4) Your SQL Server version (SELECT @@version;)

All within the question as text, no images.
Jonathan Brotto 420 Reputation points

2024-07-31T17:29:07.1966667+00:00

I want to be able to return row 2 and 4 in this situation
Erland Sommarskog 120.1K Reputation points MVP

2024-07-31T21:08:35.75+00:00

Do did you try the second query in my answer below? You would need to add columns from Column1 to Column7 to the PARTITION BY clause.
Pradeep M 6,655 Reputation points Microsoft External Staff

2024-08-02T05:48:02.31+00:00

Hi Jonathan Brotto,

I wanted to check if you had a chance to review the response provided by Erland Sommarskog/Viorel to your question. If you found the answer helpful, please consider clicking the "Accept answer" button and upvoting it. This will help other members in the Microsoft Q&A community find useful information. If you have any questions or need further assistance, feel free to reach out. Thank you.

Accepted answer

1 additional answer

Your answer

Yitzhak Khabinsky 26,486 Reputation points

2024-07-30T20:10:11.51+00:00

While asking a question you need to provide a minimal reproducible example:

(1) DDL and sample data population, i.e. CREATE table(s) plus INSERT, T-SQL statements.

(2) What you need to do, i.e. logic, and your attempt implementation of it in T-SQL.

(3) Desired output based on the sample data in the #1 above.

(4) Your SQL Server version (SELECT @@version;)

All within the question as text, no images.
Jonathan Brotto 420 Reputation points

2024-07-31T17:29:07.1966667+00:00

I want to be able to return row 2 and 4 in this situation
Erland Sommarskog 120.1K Reputation points MVP

2024-07-31T21:08:35.75+00:00

Do did you try the second query in my answer below? You would need to add columns from Column1 to Column7 to the PARTITION BY clause.
Pradeep M 6,655 Reputation points Microsoft External Staff

2024-08-02T05:48:02.31+00:00

Hi Jonathan Brotto,

I wanted to check if you had a chance to review the response provided by Erland Sommarskog/Viorel to your question. If you found the answer helpful, please consider clicking the "Accept answer" button and upvoting it. This will help other members in the Microsoft Q&A community find useful information. If you have any questions or need further assistance, feel free to reach out. Thank you.

Answer 1

Erland Sommarskog 120.1K MVP

What do you mean "catch"?

Anyway, to delete duplicate rows, this is the typical solution:

; WITH numbering AS (
     SELECT *, rn = row_number() OVER(PARTITION BY keycol ORDER BY something DESC) 
     FROM  tbl
)
DELETE numbering
WHERE rn > 1

In the PARTITION BY column, you list the column(s) where you don't want duplicates. In the ORDER BY clause you have the criteria for which row to keep. I added DESC, since commonly it is a date column, and you want to keep the most recent row.

If you only want to view duplicate rows, you can do:

; WITH cnts AS (
     SELECT *, cnt = COUNT(*) OVER(PARTITION BY keycol)
     FROM tbl
)
SELECT * FROM cnts WHERE cnt > 1

Jonathan Brotto 420 Reputation points

2024-07-30T20:39:17.9766667+00:00

Catch means finding duplicates within the same table. For example, row 3 and row 5 have the same records in the database. Would the approach work for that?
Erland Sommarskog 120.1K Reputation points MVP

2024-07-30T20:45:53.3366667+00:00

I have no idea, since I don't know what you mean by that. If one row is 3 and one row is 5, they have to be different, at least in the row number.

But try the queries.
Yitzhak Khabinsky 26,486 Reputation points

2024-07-30T20:51:14.39+00:00

Please provide a minimal reproducible example as in my original comments. Otherwise, it will be eternity before we will be able to help you if at all.
LiHongMSFT-4306 31,471 Reputation points

2024-07-31T02:05:58.1766667+00:00

Hi @Jonathan Brotto

Have you tried Erland's query which may solve your issue.

If still not solved, please post more details.
Jonathan Brotto 420 Reputation points

2024-08-01T14:08:24.9133333+00:00

Is there a more memory-efficient way of doing this?
(15 rows affected)

Msg 1105, Level 17, State 2, Line 9

Could not allocate space for object 'dbo.SORT temporary run storage: 143718385909760' in database 'tempdb' because the 'PRIMARY' filegroup is full. Create disk space by deleting unneeded files, dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup.

Completion time: 2024-08-01T09:30:01.9365124-04:00

Viorel 120.8K

Check if this alternative is more efficient:

select top(100) *, count(*) as number
from Table
group by Column1, Column2, Column3, Column4, Column5, Column6, Column7
having count(*) > 1

Erland Sommarskog 120.1K Reputation points MVP

2024-08-01T20:55:24.2066667+00:00

My guess is that Viorel's suggestion will meet the same fate.

Where is the database? On your laptop? On your server? The gist of the error message is that you need a bigger disk for tempdb. (Unless some smart person has set an upper limit for the space on tempdb.)

How big is the table? Post the output from "sp_spaceused yourtable".

Answer 2

Olaf Helper 46,301

And how do you define duplicates?

Can't be by the primary key, because that key is unique.

Jonathan Brotto 420 Reputation points

2024-07-31T13:37:54.6433333+00:00

I want to be able to return row 2 and 4 in this situation

Share via

Find duplicate rows in SQL

All within the question as text, no images.

1 additional answer

Your answer