Replication error 8464 after schema upgrade

Problem Environment

Contoso has a very large branch-office deployment of Active Directory. Each branch-office is an Active Directory site, and each DC is a GC because of potentially unreliable WAN links.

Any time a Scheme upgrade is performed in which the Partial Attribute Set (PAS) is updated, replication fails for numerous partitions until all hub DCs have been updated. The last time they upgraded the Schema, it was several months before all DCs received the updated PAS.

Details

Active Directory replication failed after upgrading the schema for Exchange 2007 with an event id 8464.

“Synchronization attempt failed because the destination DC is currently waiting to synchronize new partial attributes from source. This condition is normal if a recent schema change modified the partial attribute set. The destination partial attribute set is not a subset of source partial attribute set.”

They were getting this message because the DCs in their hub site had not received the updated Partial Attribute Set (PAS) for the affected partitions.

It had been several days since the Schema was upgraded, and given enough time, this problem would eventually correct itself.  Our goal was to understand why it took so long for all DCs to reach replication convergence, and to decrease the time it takes for all domain controllers to receive the updated PAS.

In order to visualize their replication environment, I had them run our AD Topology Diagrammer tool. (ADTD or sometimes referred to by its former name: AD Map) The Visio diagram revealed one hub site and over a thousand branch-office sites.  Each branch office has replication connections to the hub site only.

Picture provided courtesy of the "Windows Server 2003 Active Directory Branch Office Environment" whitepaper

Cause

Replication of the updated PAS did not occur because of the following:

Each of the bridgehead domain controllers in the hub site have over 200 replication connections for each partition. The customer has a DC in each branch office, each office is its own site, and each DC is a GC. The sheer amount of replication connections interfered with the timely update of the PAS.  This environment would be a great candidate for RODCs (since there would not be any outbound connections from the branch office sites), but now was not the time to talk topology redesign.

Solution

Our immediate goal, to get this resolved quickly, was to get the PAS updated on each DC in the Hub site for both domain partitions reported in the events.

First we had to identify which GC's have the updated PAS.  Here is the repadmin.exe command that we used:

The following command will dump the PAS from every DC in the forest for the partition specified in the DN path *.  The resulting output lets us know which GCs haven’t gotten the updates yet:

“repadmin /showattr gc: dc=corp,dc=contoso,dc=com /gc /atts:partialattributeset >partialattributeset-Corp.txt”

* Since “dc=corp,dc=contoso,dc=com” is specified in the above command, it will dump the PAS for the “Corp” domain partition.

The following command would be used to dump the PAS for the “Branches” domain partition:

“repadmin /showattr gc: dc=branches,dc=corp,dc=contoso,dc=com /gc /atts:partialattributeset >partialattributeset-Branches.txt”

The value of interest in the output is listed after the "v1.cAttrs =" text.

If you want to check a single GC, you could run:

repadmin /showattr DCName dc=corp,dc=contoso,dc=com /gc /atts:partialattributeset

During this issue we came up with several methods to get these servers updated.  I will list out each of the methods (A, B, and C) with the top choice listed first.

A. Set up a replication connection, disable inbound and outbound replication, and force replication:

1. Create a manual replication connection to a DC that already has the updated partial attribute set (unless you already have a connection to this dc).

“repadmin /add <Naming Context> <Destination DC> <Source DC> /readonly (if needed)”

2. Disable inbound and outbound replication (one very fast way to clear out the queue).

“repadmin /options dc_name +DISABLE_INBOUND_REPL”

“repadmin /options dc_name +DISABLE_OUTBOUND_REPL”

3. Force replication with the DC over the newly created connection.

“Repadmin /replicate <dc-with-low-cAttr> <DC-with-high-cAttr> <DNpath> /force [/readonly if needed]”

4. Run the repadmin command to check the inbound replication queue.

“repadmin /queue”

You should just see the one item queued, but I have seen a few more replication requests sneak in on a very busy dc.

5. You should be able to re-enable inbound and outbound replication immediately.

“repadmin /options dc_name -DISABLE_INBOUND_REPL”

“repadmin /options dc_name -DISABLE_OUTBOUND_REPL”

B. Rehost the partition:

1. “repadmin /rehost <rehosted GC hostname> <DN path of rehosted naming context > <fully qualified DNS name of good DC hosting a writable copy of the domain partition>”

If the command returns a “replication operation was preempted” error, then perform the following steps:

2. Run “repadmin /rehost specifying the name of both the target source DC (the good DC hosting a writable copy of the rehosted partition).”

“repadmin /rehost <rehosted GC hostname> < DN path of rehosted naming context > <fully qualified DNS name of good DC hosting a writable copy of the domain partition >.”

3. Press Ctrl+C to stop the rehost command before the preemption error occurs.

4. Rerun the repadmin /rehost command, which should complete normally.

“repadmin /rehost < rehosted GC hostname > < DN path of rehosted naming context > < fully qualified DNS name of good DC hosting a writable copy of the domain partition>”

5. Run repadmin /showreps /v against the destination DC and confirm replication of the read-only partition completed.

C. Move extraneous replication connections off the DC that needs to be updated:

1. This can be accomplished by moving the DC to a test site  (so that it has fewer replication connections). OR:

2. Temporarily configure this server so that it is not a preferred bridgehead server so that the KCC removes the connections

 

As always, let me know if you have any questions!

Justin

Comments

  • Anonymous
    July 28, 2011
    Hi Justin, thanks for the detailed explanation! Thumbs up! Regards, Matthias

  • Anonymous
    September 15, 2014
    Introducing the Lingering Object Liquidator Hi all, Justin Turner here ---it's been a while since