USN Rollbacks and Active Directory Replication Issues

I recently worked on an interesting issue where certain Distribution Groups (DGs) in the Active Directory (AD) were not replicating properly with the Exchange 5.5 Directory Service (DS). After adding several members to a DG in the AD, the changes did not replicate to the 5.5 server. One particularly problematic DG, lets call it Execs, had 58 members in AD and only 23 in the 5.5 DS, even after several replication cycles. Previously, we had gone through the basics of Active Directory Connector (ADC) troubleshooting, some of which are listed in 253841, but the problem still persisted. After weeks of spinning our wheels we decided to examine more closely the Update Sequence Numbers (USNs) of the problem DG.

Before adding another member in AD, we checked the USNCreated and USNChanged values and found the following:

USNCreated: 345530

USNChanged: 11240563

Nothing particularly strange so far. After adding another member on the AD side the values had changed to:

USNCreated: 307801

USNChanged: 3438089

The first odd thing I noticed was that the values decreased! What was happening here? This is really weird because under normal circumstances 1) USNCreated does not change and 2) the USNChanged increments andusually only by a bit (although on busier servers it may increment by more - but you can still tell that the new number is part of the same sequence). For instance, in my test environment, with a single Exchange server and a single Global Catalog (GC), adding a member to a DG causes the USNChanged to increment by at most 2 or 3.  What was even more surprising was that after making this one change, the DG’s membership successfully replicated to the 5.5 DS. We decided this was probably just luck. We had previously added members to this same DG and it did not replicate. I think the difference this time was the choice of Domain Controller (DC) that we made changes on. Before making the DG change the msExchangeServer1HighestUSNVector looked like this (see footnote section)

"OURDCA: 5225033"

"OURDCB: 11333798"

"OURDCC: 11269307"

"OURDCD: 3039867" <<----- DC we made changes on

"OURDCE: 72170045"

"OURDCW: 11316411"

"OURDCX: 22501269"

"OURDCY: 6993025"

"OURDCZ: 21918680"

After adding a member to Execs, its USNchanged was now 3438089, higher than 3039867, and so the next time the ADC polled the AD the changes replicated to Exchange 5.5. I speculated, given the USNChanged for Execs before adding the member (11240563), that the DC responsible for replicating the change to the 5.5 directory was one of the following:

"OURDCB: 11333798"

"OURDCC: 11269307"

"OURDCW: 11316411"

because they seemed to have sequences in the same range as Execs. I also speculated that since their USNs were all higher than 11240563, Execs was not going to replicate to 5.5 until its USN exceeded the high water mark for one of these 3 DCs. Graham McIntyre, our resident ADC guru, thinks the problem was that changes to the DG were not making it to the AD bridgehead which is the endpoint for the connection agreement responsible for the DG.

But I still wondered why the USNChanged and USNCreated got reset on OURDCD and so I took a flight of stairs down to talk to our Active Directory folks (Exchange runs on top of windows, so naturally we sit a floor above the windows Active Directory team :))  They told me that one possible cause of this is a USN Rollback, an occurence that they have seen a few times recently. USN rollbacks are described in detail in article 875495 and there are 2 ways to detect them:

  1. Applying the fix described in the article and looking out for the listed 2095, 1113, 1115 and 2103 events.
  1. Running repadmin /showutdvec * dc=<domain name>,dc=<domain suffix> and then looking at the output to determine if for any given DC A, some other remote DC B has a higher watermark USN for A than A has for itself. (If the difference in USN is only slight it could be a timing issue rather than a rollback -- i.e. repadmin ran on A first, its high water mark USN incremented, the changes replicated to B and then repadmin ran on B)

Ultimately, option 1 was going to be hard to justify because our customer had a rigorous change control process they follow before applying a fix. Most enterprise customers do, even for fixes that are known to definitively solve a specific problem (and this fix was only going to detect a problem we thought the customer might have). That left us with option 2. Unfortunately, with 44 DCs, going through the repadmin command output was going to be more difficult to go through (44 x 44) than the simple example in the article:

Repadmin /showutdvec dc1 dc=contoso,dc=com

Site1\DC1 @ USN 10 @ Time 2004-08-04 15:07:15
Site2\DC2 @ USN 24805 @ Time 2004-08-04 15:06:59

Repadmin /showutdvec dc2 dc=contoso,dc=com
Site1\DC1 @ USN 50 @ Time 2004-08-04 15:07:15
Site2\DC2 @ USN 24805 @ Time 2004-08-04 15:06:59

where DC1 has clearly experienced a rollback since DC2 has a higher USN for DC1 (50) than DC1 has for itself. (10). Like a good boy scout, I wrote a Perl script, rollbackchecker.pl, to parse the output and detect possible rollbacks. Running the script showed several entries that looked like this:

---------- rollback -----------------

DC OURDCD may have experienced a USN rollback. It's Self Highest USN = 8280087 but the remote DC OURDCA has a USN = 8288138 for OURDCD that is higher than what OURDCD has for itself

---------- rollback -----------------

DC OURDCD may have experienced a USN rollback. It's Self Highest USN = 8280087 but the remote DC OURDCB has a USN = 8290610 for OURDCD that is higher than what OURDCD has for itself

We had indeed made our change to Execs on OURDCD. According to the article, the most common sources of USN rollbacks are:

  1. Virtualized Hosting Environments, including but not limited to Microsoft Virtual Server 2005 and EMC VMWARE
  2. Software that backs up and restores an Active Directory operating system installation or a hard disk volume that contains the installation (including but not limited to Norton Ghost)
  3. Advanced disk subsystems that can selectively copy a volume that contains an Active Directory operating system installation that was saved in the past

On further questioning the customer said they may have done a system state restore on OURDCD because it had experienced ‘hardware issues’. The article further states that there are only 2 ways to recover from a rollback.

  1. Use the Active Directory Installation Wizard (dcpromo.exe) to remove and then reinstall Active Directory (If you are not interested in the changes made on the problem DC)
  2. Restore the system state from a good recent backup using a supported method.

Our customer decided to dcpromo down OURDCD and then promote it back to a DC and since then they have not experienced DG replication issues.

This customer’s rollback manifested as an ADC replication issue but broken AD replication can affect Exchange in numerous ways. In fact, to say that Exchange relies on AD is to grossly understate it. The moral of the story is, you should avoid doing any of the listed things that can cause a USN rollback.

Footnote:

For those not familiar with msExchServer1HighestUSNVector, it is an attribute that the ADC uses to store the high-watermark USN for every DC with which it replicates. The ADC periodically polls the AD for objects with higher USNs than the last highest USN that successfully replicated with the 5.5 DS and replicates the new changes. There is a corresponding msExchServer2HighestUSNVector that is used to track replication changes from Exchange 5.5 to the AD. The exact mechanics of this process are described in article 253840.

Published Tuesday, November 29, 2005 7:01 PM by jasperk