how to display duplicate documents in sharepoint?

Samuel Santos 250 Reputation points
2024-01-16T19:39:55.0966667+00:00
I need to identify and delete duplicate documents that are in my SharePoint documents but there is no option on the site to identify these duplicates.
SharePoint Server
SharePoint Server
A family of Microsoft on-premises document management and storage systems.
2,404 questions
SharePoint
SharePoint
A group of Microsoft Products and technologies used for sharing and managing content, knowledge, and applications.
11,044 questions
SharePoint Server Management
SharePoint Server Management
SharePoint Server: A family of Microsoft on-premises document management and storage systems.Management: The act or process of organizing, handling, directing or controlling something.
2,976 questions
{count} vote

Accepted answer
  1. Yanli Jiang - MSFT 28,196 Reputation points Microsoft Vendor
    2024-01-17T09:37:30.7066667+00:00

    Hi @Samuel Santos ,

    If you are using SharePoint online, you can try using PowerShell to get duplicate files in the SharePoint site and export a CSV file.

    #Parameters
    $SiteURL = "https://tenant.sharepoint.com/sites/Amy12345"
    $Pagesize = 2000
    $ReportOutput = "C:\Users\spadmin\Desktop\Duplicates.csv"
     
    #Connect to SharePoint Online site
    Connect-PnPOnline $SiteURL -Interactive
      
    #Array to store results
    $DataCollection = @()
     
    #Get all Document libraries
    $DocumentLibraries = Get-PnPList | Where-Object {$_.BaseType -eq "DocumentLibrary" -and $_.Hidden -eq $false -and $_.ItemCount -gt 0 -and $_.Title -Notin("Site Pages","Style Library", "Preservation Hold Library")}
     
    #Iterate through each document library
    ForEach($Library in $DocumentLibraries)
    {    
        #Get All documents from the library
        $global:counter = 0;
        $Documents = Get-PnPListItem -List $Library -PageSize $Pagesize -Fields ID, File_x0020_Type -ScriptBlock `
            { Param($items) $global:counter += $items.Count; Write-Progress -PercentComplete ($global:Counter / ($Library.ItemCount) * 100) -Activity `
                 "Getting Documents from Library '$($Library.Title)'" -Status "Getting Documents data $global:Counter of $($Library.ItemCount)";} | Where {$_.FileSystemObjectType -eq "File"}
       
        $ItemCounter = 0
        #Iterate through each document
        Foreach($Document in $Documents)
        {
            #Get the File from Item
            $File = Get-PnPProperty -ClientObject $Document -Property File
     
            #Get The File Hash
            $Bytes = $File.OpenBinaryStream()
            Invoke-PnPQuery
            $MD5 = New-Object -TypeName System.Security.Cryptography.MD5CryptoServiceProvider
            $HashCode = [System.BitConverter]::ToString($MD5.ComputeHash($Bytes.Value))
      
            #Collect data        
            $Data = New-Object PSObject 
            $Data | Add-Member -MemberType NoteProperty -name "FileName" -value $File.Name
            $Data | Add-Member -MemberType NoteProperty -Name "HashCode" -value $HashCode
            $Data | Add-Member -MemberType NoteProperty -Name "URL" -value $File.ServerRelativeUrl
            $Data | Add-Member -MemberType NoteProperty -Name "FileSize" -value $File.Length        
            $DataCollection += $Data
            $ItemCounter++
            Write-Progress -PercentComplete ($ItemCounter / ($Library.ItemCount) * 100) -Activity "Collecting data from Documents $ItemCounter of $($Library.ItemCount) from $($Library.Title)" `
                         -Status "Reading Data from Document '$($Document['FileLeafRef']) at '$($Document['FileRef'])"
        }
    }
    #Get Duplicate Files by Grouping Hash code
    $Duplicates = $DataCollection | Group-Object -Property HashCode | Where {$_.Count -gt 1}  | Select -ExpandProperty Group
    Write-host "Duplicate Files Based on File Hashcode:"
    $Duplicates | Format-table -AutoSize
    
    #Export the duplicates results to CSV
    $Duplicates | Export-Csv -Path $ReportOutput -NoTypeInformation
    

    The result in my test: User's image

    Then, choose the files to be deleted according to the needs. Hope this is help. For your reference: https://www.sharepointdiary.com/2019/04/sharepoint-online-find-duplicate-files-using-powershell.html


    If the answer is helpful, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment". Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.