Hi @44785390,
Thanks for reaching out to us. We are very pleased to assist you.
We can use PnP PowerShell to find duplicate files in SharePoint Online.
- Preparations:
- If you have not installed PnP PowerShell before, please install PnP PowerShell first. Note: Microsoft is providing this information as a convenience to you. The sites are not controlled by Microsoft. Microsoft cannot make any representations regarding the quality, safety, or suitability of any software or information found there. Please make sure that you completely understand the risk before retrieving any suggestions from the above link.
- This PowerShell script scans all files from all document libraries in a site and extracts the File Name, File Hash, and Size parameters for comparison to output a CSV report with all data. Please don't forget to replace your parameters.
#Parameters
$SiteURL = "https://contoso.sharepoint.com/sites/24Dec"
$Pagesize = 2000
$ReportOutput = "C:\Duplicates.csv"
$ClientId = "7cda65de-xxxxx-b2-a485-bf6e2a70a909"
#Connect to SharePoint Online site
Connect-PnPOnline $SiteURL -ClientId $ClientId -Interactive
#Array to store results
$DataCollection = @()
#Get all Document libraries
$DocumentLibraries = Get-PnPList | Where-Object {$_.BaseType -eq "DocumentLibrary" -and $_.Hidden -eq $false -and $_.ItemCount -gt 0 -and $_.Title -Notin("Site Pages","Style Library", "Preservation Hold Library")}
#Iterate through each document library
ForEach($Library in $DocumentLibraries)
{
#Get All documents from the library
$global:counter = 0;
$Documents = Get-PnPListItem -List $Library -PageSize $Pagesize -Fields ID, File_x0020_Type -ScriptBlock `
{ Param($items) $global:counter += $items.Count; Write-Progress -PercentComplete ($global:Counter / ($Library.ItemCount) * 100) -Activity `
"Getting Documents from Library '$($Library.Title)'" -Status "Getting Documents data $global:Counter of $($Library.ItemCount)";} | Where {$_.FileSystemObjectType -eq "File"}
$ItemCounter = 0
#Iterate through each document
Foreach($Document in $Documents)
{
#Get the File from Item
$File = Get-PnPProperty -ClientObject $Document -Property File
#Get The File Hash
$Bytes = $File.OpenBinaryStream()
Invoke-PnPQuery
$MD5 = New-Object -TypeName System.Security.Cryptography.MD5CryptoServiceProvider
$HashCode = [System.BitConverter]::ToString($MD5.ComputeHash($Bytes.Value))
#Collect data
$Data = New-Object PSObject
$Data | Add-Member -MemberType NoteProperty -name "FileName" -value $File.Name
$Data | Add-Member -MemberType NoteProperty -Name "HashCode" -value $HashCode
$Data | Add-Member -MemberType NoteProperty -Name "URL" -value $File.ServerRelativeUrl
$Data | Add-Member -MemberType NoteProperty -Name "FileSize" -value $File.Length
$DataCollection += $Data
$ItemCounter++
Write-Progress -PercentComplete ($ItemCounter / ($Library.ItemCount) * 100) -Activity "Collecting data from Documents $ItemCounter of $($Library.ItemCount) from $($Library.Title)" `
-Status "Reading Data from Document '$($Document['FileLeafRef']) at '$($Document['FileRef'])"
}
}
#Get Duplicate Files by Grouping Hash code
$Duplicates = $DataCollection | Group-Object -Property HashCode | Where {$_.Count -gt 1} | Select -ExpandProperty Group
Write-host "Duplicate Files Based on File Hashcode:"
$Duplicates | Format-table -AutoSize
#Group Based on File Name
$FileNameDuplicates = $DataCollection | Group-Object -Property FileName | Where {$_.Count -gt 1} | Select -ExpandProperty Group
Write-host "Potential Duplicate Based on File Name:"
$FileNameDuplicates| Format-table -AutoSize
#Group Based on File Size
$FileSizeDuplicates = $DataCollection | Group-Object -Property FileSize | Where {$_.Count -gt 1} | Select -ExpandProperty Group
Write-host "Potential Duplicates Based on File Size:"
$FileSizeDuplicates| Format-table -AutoSize
#Export the duplicates results to CSV
$DataCollection | Export-Csv -Path $ReportOutput -NoTypeInformation
If you have any questions, please do not hesitate to contact me.
Moreover, if the issue can be fixed successfully, please click "Accept Answer" so that we can better archive the case and the other community members who are suffering the same issue can benefit from it.
Your kind contribution is much appreciated.