Поделиться через


SQL Script: Finding duplicates on supposedly unique values

Unsupported: Use at your own risk

Sometimes you load data into metaverse that is supposed to be unique and you do not want to use FindMVEntries technique due to performance impact.

This SQL scripts allow you to find out directly from DB if you have unexpected duplicates so you can talk to data store owners to go and fix their data.

The following example looks for repeated mailNickName attribute values in metaverse. Just replace “mailNickName” with “SAMAccountName”, “uid” or whatever attribute you have that should be unique:

SELECT mailNickName, COUNT(mailNickName) AS NumOccurrences

FROM FIMSynchronizationService.dbo.mms_metaverse with (nolock)

where object_type = 'person'

GROUP BY mailNickName

HAVING ( COUNT(mailNickName) > 1 )

order by NumOccurrences desc

You could also use this technique to find the opposite, that is, records with a particular value that occur exactly once:

SELECT email

FROM FIMSynchronizationService.dbo.mms_metaverse with (nolock)

GROUP BY email

HAVING ( COUNT(email) = 1 )

However, the previous queries give just details about the number of records that you have duplicates or unique. Finding out more details about the metaverse records require more elaborated SQL query.

In the following example, we first find duplicated sAMAccountName values and store in temporary “tblDups” table, that then we use to join with full metaverse (“MV”) and show “displayName” and “object_id” attributes.

select tblDups.sAMAccountName, MV.displayName, MV.object_id

FROM

(

SELECT sAMAccountName, COUNT(sAMAccountName) AS NumOccurrences

FROM FIMSynchronizationService.dbo.mms_metaverse with (nolock)

where object_type = 'person'

GROUP BY sAMAccountName

HAVING ( COUNT(sAMAccountName) > 1 )

) as tblDups

INNER JOIN

FIMSynchronizationService.dbo.mms_metaverse AS MV WITH (nolock)

ON

tblDups.sAMAccountName = MV.sAMAccountName

 

 

You can also add WHERE sentences at the end of the query to find specific values, such as those sAMAccountNames duplicates that belong to HR department:

select tblDups.sAMAccountName, MV.displayName, MV.object_id

FROM

(

SELECT sAMAccountName, COUNT(sAMAccountName) AS NumOccurrences

FROM FIMSynchronizationService.dbo.mms_metaverse with (nolock)

where object_type = 'person'

GROUP BY sAMAccountName

HAVING ( COUNT(sAMAccountName) > 1 )

) as tblDups

 

INNER JOIN

FIMSynchronizationService.dbo.mms_metaverse AS MV WITH (nolock)

ON

tblDups.sAMAccountName = MV.sAMAccountName

WHERE MV.department = 'HR'