Copying many files to Onedrive for Business - preventing sync errors
Over the years I have collected a large number of files that I keep hoarding for all sorts of good and not-so-good reasons: whitepapers, investigations, memo's, projects, scripts, Visual Studio solutions, and so on. In the bad old days these files were on a file server, and I would use offline files to sync them to my laptop. Then Groove (now: OneDrive for Business) came along, but when I tried using that at first I was overwhelmed with sync errors and I quickly gave up on that. Instead, I have been using Work Folders for years now. This is a solution that syncs files with a fileserver using SSL and Federation Services, while encrypting local copies with EFS. However, with the newly updated sync engine for OneDrive for Business I decided to give OneDrive for Business a second chance.
Skipping all false starts, this is what ended up working well. I started with putting up a staging area that I will use to move (not copy!) files into the OneDrive folder. This allows me to check and vet everything before committing to the sync. Robocopy is the tool to use in this case. Note the flags: copy folders, restartable, skip hidden and system files, copy permissions and timestamps on folders, and log what we are doing.
- Copy all files to a temp folder, keeping timestamps etc: robocopy "\\fileserver.contoso.com\users\foobar" "c:\temp\staging" /E /Z /XA:SH /DCOPY:DAT /LOG:onedrivecopy.log /TEE
Make sure this staging folder is on the same partition as the destination folder, otherwise you won't be able to move the files and keep permissions and timestamps.
The next problem is this: https://support.microsoft.com/en-us/kb/3125202. While the sync engine is solid, the backend is a variation of SharePoint which imposes restrictions on the filenames that can be used. With more than 10,000 files including some weirdly named stuff, I wanted to prevent any problems in this area. Naturally, I wrote a Powershell script to do the heavy lifting. I will discuss details later, but for now:
- Validate files: .\Scan-OnedriveFilesForProblems.ps1 -SourcePath 'c:\temp\staging' -DestinationPath 'c:\users\foobar\Onedrive - Microsoft'
The script scans for files that will cause problems: path too long (256 characters), illegal characters, illegal names, path elements starting with a space, and many more of those. It does not actually copy or move anything. An edited sample of the errors I got:
Path Status
---- ------
C:\temp\staging\_OLD_Work Folders\badfolder\ hasspace.txt InvalidSpace
C:\temp\staging\_OLD_Work Folders\badfolder\..hasperiods.txt PeriodProblem
C:\temp\staging\_try.4.5.1\System.IdentityModel.Tokens.Vag TooLong
C:\temp\staging\_try.4.5.1\lib\net45\System.IdentityModel. TooLong
C:\temp\staging\_Server 2008 Introduction to Public Key i TooLong
C:\temp\staging\_chronization for DFSR-replicated SYSVOL TooLong
C:\temp\staging\_ory on a computer with a different hardw TooLong
C:\temp\staging\(DFS) technologies in Windows Server 2012 Exception
[... plenty more ...]
Encountered 86 problematic files or folders
I went ahead and fixed all errors until the output showed zero problems. The final step:
- Move (not copy!) files and folders from the staging folder to the destination Onedrive for Business folder. If you are the risk-avoiding type, start off with one top-level folder and see how Onedrive likes it before moving everything. In my case, all was fine.
So let's have a look at the PowerShell script that does the work. You can find it on GitHub as Scan-OnedriveFilesForProblems.ps1. Save it locally, and make sure to unblock (IE users) as required. Roughly, this is how it works:
- Get the source path from the command line. If none is specified, try reading the OneDrive path from the registry.
- Calculate the maximum allowable virtual path length for the future destination: 256 + source.length - destination.length
- Traverse the source folder recursively while making sure to catch all exceptions.
- check the validity of each path using function IsValidOnedrivePath. Check the script on GitHub for further details.
- If there is a problem, output it as a PSCustomObject into the pipeline for further processing by the user.
- Output all exceptions that happened while traversing the source folder into the pipeline. Typically, these include folders too long for PowerShell to process.
- Show the number of problem cases, outside of the pipeline.
This is the main loop:
Get-ChildItem -Path $SourcePath -Recurse -ErrorAction SilentlyContinue -ErrorVariable +FileError | ForEach-Object {
$PathState = IsValidOnedrivePath -path $_.FullName -MaxPathLength $MaxPathLength
if ($PathState -ne "OK")
{
[PSCustomObject] @{
Path = $_.FullName
Status = $PathState
}
}
}
I'll confess that the part that took me the most time was to catch all exceptions raised by Get-ChildItem. A typical exception would be a folder path that is too long for this commandlet. What I ended up doing was to catch the exceptions raised by Get-ChildItem into a dedicated variable named FileError, while making sure that new exceptions are added instead of replacing the previous ones. This is the plus sign (+) in front FileError. This bit of magic gives me an array with all errors that I can dump after all files are processed:
foreach ($err in $FileError)
{
[PSCustomObject] @{
Path = $err.TargetObject
Status = "Exception"
}
}
That's it. Enjoy!