Inspecting pending outbound changes between two DFSR replicas

Well, as promised, this one is a technical post.

Many words have been spoken about DFSR and there is not much I can add when it comes to inner works of DFSR and ways of troubleshooting it. But let’s assume for a moment that our DFSR deployment is fine and working as expected.

The specific scenario at hand is: How do I get a list of the files that have not yet replicated out from DFSR replica A to DFSR replica B ? More specifically: I have uploaded a new set of files to a replicated folder on server DFSRSRV01 and I need to know what files have not yet replicated to DFSRSRV02 that holds a replica of the replicated folder in question.

If we were not dealing with DFSR, our options would be rather limited and we would end up either using some sort of folder comparison utility (which would be quite expensive network utilization wise, especially in low bandwidth scenario) or you could try to optimize your approach and on each side generate a listing of the folder with respective file hashes for each file and later compare those listings. Either way you go, this will be slow, might require performing actions on both servers hosting the replicated folder and will be expensive either network or CPU wise.

Now back to the DFSR world. In DFSR world, the DFSR members maintain version chain vectors which contain the information about the files and their respective versions residing on the server in question. Here is an excerpt from the DFSR spec on MSDN:

DFS-R takes a three-tiered approach to file replication:

  1. Version chain vectors are retrieved from a server to determine which file versions are known to the server but not to the client. The protocol requires that a server ensures that the global version sequence numbers (GVSNs) of all replicated files and file metadata that it maintains in persistent storage (that is, saved to disk) are eventually included in its version chain vector, such that the state of a server's knowledge can be determined by examining the version chain vectors alone.
  2. Updates, which summarize file metadata, are retrieved from a server. The client uses the version chain vector received from the server to limit the set of updates that are retrieved from the server. To retrieve all updates known to the server but not to the client, it is sufficient to request updates with a GVSN range over the version chain vector received from the server less the version chain vector maintained by the client. The updates contain file system information about the replicated files but not about the file data. The information includes the coordinates of the file in terms of a unique identifier (UID) identifying the file across different versions of the file, the GVSN (identifying a particular version of the file on a particular machine), a reference to the file's parent directory in terms of a UID for the parent resource (directories are treated as files), and a file name.
  3. File data is retrieved if a client determines that the file data corresponding to a received update must be downloaded in order for the client to synchronize with the server.

A bit more terminology before we dig in:

Global Version Sequence Numbers (GVSN): A GVSN is a pair: Machine identifier and version sequence number (VSN). Although two machines might assign the same VSN, because they have different machine identifiers, the associated GVSNs differ. A GVSN is used to identify a unique version of a unique resource. In other words, no two different resources ever get assigned the same GVSN, and no two different updates to the same resource ever get assigned the same GVSN.

Unique identifier (UID): A pair consisting of a GUID and a version sequence number to identify each resource uniquely. The UID is used to track the object for its entire lifetime through any number of times that the object is modified or renamed.

In other words: UID will uniquely identify an object (no matter on which server it resides). If the object is changed (i.e.: the file is edited), it’s UID will not change. What will happen is that the server, on which the change was performed, will update the GVSN of the object. Other DFSR servers will learn that a new version is available by the means of exchanging their version chain vectors and will ask the originating server to send the change to them.

But hey ! This is all a nice theory. But how do we talk to DFSR ? Fortunately, the DFSR service exposes its interfaces via WMI (see http://msdn.microsoft.com/en-us/library/bb540028(v=VS.85).aspx for details).

So how does all this help us in achieving our goal ? If we have two servers (DFSRSRV01 and DFSRSRV02) both participating in replication group ReplGroup and effectively hosting replicated folder ReplFolder, what we can do is this:

  1. Retrieve version chain vector on DFSRSRV02 for ReplFolder. This will tell us what versions of the files (GVSNs) are on DFSRSRV02. In order to perform this we have to query DFSRSRV02 for the following:
    1. Find the ReplicationGroupGuid by querying DfsrReplicationGroupConfig class and look for an instance that has ReplicationGroupName set to ReplGroup
    2. Query for instances of DfsrReplicatedFolderConfig class that have ReplicationGroupGuid that corresponds to the guid retrieved from previous step.
    3. Loop through the DfsrReplicatedFolderConfig instances from step 2 and select the one that has ReplicatedFolderName property set to “ReplFolder” and obtain its ReplicatedFolderGuid (now we have the GUID of the replicated folder we are interested in)
    4. Get the instance of DfsrReplicatedFolderInfo class that has ReplicatedFolderGuid property equal to the one we found in step 3
    5. Call GetVersionVector method to obtain the version chain vector of DFSRSRV02 for the replicated folder in question.
  2. Back on DFSRSRV01:
    1. Retrieve all the the DfsrIdRecordInfo instances that have ReplicatedFolderGuid set to the guid found in step 1.3
    2. Check if the record in question has GVSN that has already been replicated to DFSRSRV02 based on the information in version chain vector we obtained in step 1.5 and extract the full path of the file if the record has not been replicated yet.

 

Important side note: DfsrReplicatedFolderInfo class has 2 methods that are quite interesting: GetOutboundBacklogFileCount and GetOutboundBacklogFileIdRecords, but while GetOutboundBacklogFileCount will return you the correct number of records that are pending to be sent to DFSRSRV02, the GetOutboundBacklogFileIdRecords method is limited to 100 records. The limit is hard coded and can not be changed. There is a very good reason for this and it is performance: scanning the whole DFSR database can have a serious performance impact on servers hosting a large number of files replicated via DFSR.

And while the limitation of the GetOutboundBacklogFileIdRecords method are there to prevent you from shooting in your foot, there are circumstances, when full listing of the differences is needed and this is the reason for the way step 2 is described (it does not rely on GetOutboundBacklogFileidRecords method.

 

Now back to the drawing board: what does this version chain vector contain and how can we deal with what it returns ? If you look at GetVersionVector documentation on MSDN, you will see that it has an OUT parameter called (surprise ! surprise !) called VersionVector of type string. If you look at the actual data in it, you will see something resembling the below:

{60D3A388-C51B-469A-AE7F-E5BCE9DEF2D9} |-> (9, 8009] {A89132B1-89AE-4877-8425-957FFA968767} |-> (9, 22627] (22628, 22649] (22650, 22687] (22688, 22692]

Now back to the DFSR spec, section 2.2.1.4.1 that states the following:

FRS_VERSION_VECTOR
An entry of a version chain vector.

 typedef struct _FRS_VERSION_VECTOR {
    GUID dbGuid;
    DWORDLONG low;
    DWORDLONG high;
} FRS_VERSION_VECTOR;

dbGuid: The GUID for the database originating the versions in the interval (low, high).
low: Lower bound for VSN interval.
high: Upper bound for VSN interval. The value of this member MUST be greater than the value of the low member.
The number indicated by "low" is excluded from the version chain vector. The number indicated by "high" is included in the version chain vector. Thus, [low, high] indicates a half-open interval of unsigned integers. The GVSNs that are included in this entry are the following: { (dbGuid, low+1), …, (dbGuid, high) }.

 

Now let’s reorder the version vector a bit and use the info from the spec:

{60D3A388-C51B-469A-AE7F-E5BCE9DEF2D9} |-> (9, 8009]
{A89132B1-89AE-4877-8425-957FFA968767} |-> (9, 22627] (22628, 22649] (22650, 22687] (22688, 22692]

We have here GUIDs and for each GUID we have a set of pairs of integers. Sounds familiar ? Yep… What we actually have here is a set of FRS_VERSION_VECTOR entries:

Entry 1:
dbGUID: {60D3A388-C51B-469A-AE7F-E5BCE9DEF2D9}
low: 9
high: 8009

Entry 2:  
dbGUID: {A89132B1-89AE-4877-8425-957FFA968767}
low: 9
hight: 22627

Entry 3:
dbGUID: {A89132B1-89AE-4877-8425-957FFA968767}
low: 22628
high: 22649

Entry 4:
dbGUID: {A89132B1-89AE-4877-8425-957FFA968767}
low: 22650
high: 22687

Entry 5:
dbGUID: {A89132B1-89AE-4877-8425-957FFA968767}
low: 22688
high: 22692

So what we have here (based on version vector above) are 5 intervals that tell us that DFSRSRV02 has file records with GVSNs that those intervals describe. What we are left with is scanning the DFSRSRV01 servers DfsrIdRecordInfo records belonging to replicated folder we are interested in and checking if its GVSN falls into one of the intervals above.

With the information above I asked myself: What would be the best way to code it ? And the answer was simple: PowerShell. Think about it: messing with WMI in C# or VBScript is quite frustrating while with PowerShell it’s a breeze.

Tip you might find useful: parsing version vector can be a bit challenging. I ended up using regular expressions to parse it. The way I do it is:

    1: function BuildVVHashFromVV
    2: {param([string]$VersionVector)
    3:  
    4:     $vvParts = ($VersionVector | Select-String -Pattern "{.*?}\s.{3}(\s+\(\d*,\s\d*\])+" -AllMatches).Matches
    5:     $vvHash =  @{}
    6:  
    7:     foreach ($vvPart in $vvParts)
    8:     {
    9:         $guid    = ($vvPart.Value | Select-String -Pattern "{.*?}").Matches[0].Value
   10:         $versionsArray = ($vvPart.Value | Select-String -Pattern "(\s+\(\d*,\s\d*\])+").Matches[0].Value
   11:         $vvPairs = ($vvPart.Value | Select-String -Pattern "\(\d*,\s\d*\]" -AllMatches).Matches
   12:         $vArr = @()
   13:         $vvPairs | % {
   14:             $low     = [int]($_.value | Select-String -Pattern "\d+" -AllMatches).Matches[0].Value
   15:             $high     = [int]($_.value | Select-String -Pattern "\d+" -AllMatches).Matches[1].Value
   16:             $vArr += @($low, $high)
   17:         }
   18:         
   19:         $vvHash[$guid] = $vArr
   20:     }
   21:     return $vvHash
   22: }

 

As you can see, I build a hashtable where the dbGUID is the key and the version pairs are flattened into an array so it can be easily traversed (you can assume that the GVSN pairs are sorted in the version vector). And later this is how I scan the records I am interested in:

    1: function GetFullDFSRDiff 
    2: {param([string]$dfsrServer, [string]$replFolderGUID, [string]$versionVector)
    3:     
    4:     $vvHash = BuildVVHashFromVV $versionVector
    5:     $backLogFileCount2 = 0
    6:     try
    7:     {
    8:         $backLogFileCount = GetOutboundBacklogFileCount $dfsrServer $replFolderGUID $versionVector
    9:         if ($backLogFileCount -gt 0)
   10:         {        
   11:             $wmiQuery = "SELECT * FROM DfsrIdRecordInfo WHERE ReplicatedFolderGuid='" + $replFolderGUID + "'"
   12:             Write-Debug "Executing WMI query ""$wmiQuery"" on $dfsrServer"
   13:             $dfsrIdRecords = Get-WmiObject -Namespace "root\microsoftdfs" -Query $wmiQuery -ComputerName $dfsrServer -ErrorAction Stop
   14:             foreach ($dfsrIdRecord in $dfsrIdRecords)
   15:             {
   16:                 $guid        = ($dfsrIdRecord.GVsn | Select-String -Pattern "{.*?}").Matches[0].Value
   17:                 $version     = [int](($dfsrIdRecord.GVsn | Select-String -Pattern "-v\d+").Matches[0].Value).Replace("-v","")
   18:                 $isInSync = $false 
   19:                 
   20:                 if ($vvHash.ContainsKey($guid))
   21:                 {
   22:                     $vvArray = $vvHash[$guid]
   23:                     for ($i=0; $i -lt $vvArray.Count;$i+=2)
   24:                     {
   25:                         if (($version -gt $vvArray[$i]) -and ($version -le $vvArray[$i+1]))
   26:                         {
   27:                             $isInSync = $true
   28:                             break
   29:                         }
   30:                     }
   31:                     if (!$isInSync)
   32:                     {
   33:                         $file = FileFromIdrecord $dfsrIdRecord
   34:                         Write-Output $file
   35:                         $backLogFileCount2++
   36:                     }
   37:                 }
   38:             }
   39:             Write-Host "Total pending outbound from API: $backLogFileCount"
   40:             Write-Host "Total pending outbound parsed: $backLogFileCount2"
   41:         }
   42:         else
   43:         {
   44:             Write-Host "There are no pending outbound changes for the given replicated folder"
   45:         }
   46:     }    
   47:     catch {
   48:         Write-Host $_.Exception.GetType().Name ": " $_.Exception.Message
   49:     }
   50: }

You can also download the full script from here. The way to use it:

PS C:\Users\guyte\DFSR> .\Show-DfsrDifferences.ps1 -source ipsw2k8r2dc01 -target ipsw2k8dc01 -rg TestReplGroup -FullList | ft -Property FullPath

Note of caution:

  1. If you do decide to run the script on high volume DFSR server, do it off-peak hours and monitor the server performance while it is running.
  2. Without specifying –FullList option, the script will default to GetOutboundBacklogFileidRecords method and will not return more than 100 records.

Enjoy !

-GuyTe

Comments

  • Anonymous
    January 01, 2003
    Jordan, I have updated the script. Should work now with backslashes in the replicated group names. Either re-download or just add the following right after "#region Main Script Body" $replicationGroupName = $replicationGroupName.Replace("","") Looks like there is even a KB on this: support.microsoft.com/.../242507

  • Anonymous
    January 01, 2003
    @Morphius: Can you please share the full error ? @Jordan, I'm looking into it. Will let know when I have this fixed. -GuyTe

  • Anonymous
    October 18, 2010
    Funny that you should write this just two weeks before I needed it (and nothing similar comes up in the typical web searches).  Thanks!  

  • Anonymous
    October 25, 2010
    Unfortunately this does not work with our DFS setup, running either with or without the fulllist switch causes the following message "ManagementException : Quota Violation Is there something I can change to make this run? Thanks

  • Anonymous
    October 26, 2010
    The comment has been removed

  • Anonymous
    December 30, 2010
    I also get the Quota Violation error.  My replicated folder contains over 1.9 million files. I tested with another replicated folder that has around 250,000 files and that worked fine.

  • Anonymous
    March 10, 2011
    I also get quota violation *DFSR groep has a lot of files, over 2 million: ManagementException :  Quota violation ManagementException :  Quota violation

  • Anonymous
    January 31, 2012
    If you get Quota Violation try changing the default value for FullList to false [switch]$FullList=$false It seems it was left enabled.