Exchange 2007 /newCMS or /recoverCMS fails when installing on Windows 2003 clusters.

When running Exchange 2007 /newCMS or /recoverCMS on a Windows 2003 clustered node, the following error may be noted in the setup command.

[ERROR] The computer account '<CMSName>' was created on the domain controller
'\\PDCEmulator.domain.com, but has not replicated to the desired domain controller (LocalDC.domain.com) after waiting approximately 60 seconds. Please wait for the account to replicate and re-run setup /newcms.

When running Exchange 2007 /newCMS or /recoverCMS, you cannot specify a domain controller for use.  This results in Exchange setup choosing a domain controller that is in the same Active Directory site where Exchange is being installed.

The Windows 2003 cluster service, by default, will create kerberos enabled machine accounts ALWAYS on the PDC emulator when the PDC emulator is available regardless of the Active Directory site membership of the nodes.

This error generally results from two situations:

  • Exchange nodes are in Active Directory Site B, the PDC Emulator is in Active Directory Site A.
    • In this case cluster created the machine account in ADSiteA.
    • Exchange setup, using a domain controller in ADSiteB, has not seen the machine account because it has not replicated to the chosen domain controller in ADSiteB.
    • Depending on AD Site Links for replication, it could be a significant time until the two AD Sites replicate and converge.
    • This error will continue until the two sites are replicated and the machine account can be found on the domain controller that Exchange is utilizing.
  • Exchange nodes are in Active Directory Site A, the PDC Emulator is in Active Directory Site A.
    • In this case the Active Directory site may be flat, and contain several domain controllers.
    • Here the error is thrown because of intra-site replication latency between domain controllers that Exchange is utilizing and the PDC emulator.

When this failure occurs, even after replicating AD sites you can expect a second failure to occur during the setup process.  If you are monitoring cluster administrator, you will see that the core Exchange resources are created without issue.  At the time we go to create and online the resource for the first database instance, the following error is thrown:

[ERROR] Setup cannot continue because the RPC server is unavailable. This could be
due to DNS information for clustered mailbox server '<CMSName>' has not finished
replicating. Run Setup again after DNS replication has completed. You can verify
that DNS replication has completed by running "nslookup <CMSName>".

At a casual glance it would appear that you are having a name resolution issue that is preventing setup from continuing and that may actually be the case.  Usually though this error, when it occurs after the previously mentioned error, is actually the result of an access denied to a local RPC call to create the first database instance.  Let’s explore why this happens and take a look at some supporting information.

When the first error regarding replication is encountered re-running the setup command will fail until replication occurs.  If the installer waits, or replication is forced, the setup command will find the machine account available on the local domain controller and will be allowed to continue.  Here are sample LDP dumps of an initial account creation on the PDC emulator and the replicated account to the local domain controller.

  • LDP dump of machine account on PDC emulator.

Expanding base 'CN=Mail,CN=Computers,DC=domain,DC=com
Result <0>: (null)
Matched DNs:
Getting 1 entries:
>> Dn: CN=Mail,CN=Computers,DC=domain,DC=com
5> objectClass: top; person; organizationalPerson; user; computer;
1> cn: Mail;
1> distinguishedName: CN=Mail,CN=Computers,DC=domain,DC=com;
1> instanceType: 0x4 = ( IT_WRITE );
1> whenCreated: 03/26/2008 15:30:41 Eastern Standard Time Eastern Daylight Time;

1> whenChanged: 03/26/2008 15:30:41 Eastern Standard Time Eastern Daylight Time;

1> uSNCreated: 20888;
1> uSNChanged: 20893;
1> name: Mail;
1> objectGUID: a5d09e3e-2d7d-4e3d-8bf9-30b423ead057;
1> userAccountControl: 0x1020 = ( UF_PASSWD_NOTREQD | UF_WORKSTATION_TRUST_ACCOUNT
);
1> badPwdCount: 0;
1> codePage: 0;
1> countryCode: 0;
1> badPasswordTime: 01/01/1601 00:00:00 UNC ;
1> lastLogoff: 01/01/1601 00:00:00 UNC ;
1> lastLogon: 01/01/1601 00:00:00 UNC ;
1> localPolicyFlags: 0;
1>
pwdLastSet: 03/26/2008 15:30:41 Eastern Standard Time Eastern Daylight Time; 1> primaryGroupID: 515;
1> objectSid: S-1-5-21-2075556647-3556310751-2339872061-1111;
1> accountExpires: 09/14/30828 02:48:05 UNC ;
1> logonCount: 0;
1> sAMAccountName: MAIL$;
1> sAMAccountType: 805306369;
1> objectCategory: CN=Computer,CN=Schema,CN=Configuration,DC=domain,DC=com;
1> isCriticalSystemObject: FALSE;
2> dSCorePropagationData: 03/26/2008 15:52:12 Eastern Standard Time Eastern
Daylight Time; 30650/29691/8424 20284:77:2544 UNC;

  • LDP dump of replicated machine account to the local domain controller.

Expanding base 'CN=Mail,CN=Computers,DC=domain,DC=com'...
Result <0>: (null)
Matched DNs:
Getting 1 entries:
>> Dn: CN=Mail,CN=Computers,DC=domain,DC=com
5> objectClass: top; person; organizationalPerson; user; computer;
1> cn: Mail;
1> distinguishedName: CN=Mail,CN=Computers,DC=domain,DC=com;
1> instanceType: 0x4 = ( IT_WRITE );
1> whenCreated: 03/26/2008 15:30:41 Eastern Standard Time Eastern Daylight Time;

1> whenChanged: 03/26/2008 15:30:50 Eastern Standard Time Eastern Daylight Time;

1> uSNCreated: 16994;
1> uSNChanged: 16994;
1> name: Mail;
1> objectGUID: a5d09e3e-2d7d-4e3d-8bf9-30b423ead057;
1> userAccountControl: 0x1020 = ( UF_PASSWD_NOTREQD | UF_WORKSTATION_TRUST_ACCOUNT
);
1> codePage: 0;
1> countryCode: 0;
1> localPolicyFlags: 0;
1> pwdLastSet: 03/26/2008 15:30:41 Eastern Standard Time Eastern Daylight Time;
1> primaryGroupID: 515;
1> objectSid: S-1-5-21-2075556647-3556310751-2339872061-1111;
1> accountExpires: 09/14/30828 02:48:05 UNC ;
1> sAMAccountName: MAIL$;
1> sAMAccountType: 805306369;
1> objectCategory: CN=Computer,CN=Schema,CN=Configuration,DC=domain,DC=com;
1> isCriticalSystemObject: FALSE;
2> dSCorePropagationData: 03/26/2008 15:52:48 Eastern Standard Time Eastern
Daylight Time; 30650/29691/8424 20876:77:2544 UNC;

If you are looking at these and saying they are almost exactly the same, you are correct.  Please note the following are the same:

  • whenCreated: 03/26/2008 15:30:41 Eastern Standard Time Eastern Daylight Time;
  • objectGUID: a5d09e3e-2d7d-4e3d-8bf9-30b423ead057
  • pwdLastSet: 03/26/2008 15:30:41 Eastern Standard Time Eastern Daylight Time

In this specific case, we’re interested in PWD last set.

After replicating the machine account to the local domain controller you can re-run the setup.com /newCMS or setup.com /recoverCMS commands.  When you do, the command will now continue and no longer throw a replication error.  If you are watching cluster administrator, you will notice that the network name and IP address created from the first attempt are deleted and a new network name and IP address created.  This is where the circumstances for the second failure are introduced. 

When the network name is deleted and recreated, cluster goes back to the PDC emulator and hi-jacks the account that was previously created there.  When it does, the Kerberos password on the account is updated.  This results in a different Kerberos password on the PDC emulator and synced into cluster then what exists on the local domain controller.  You can verify this with LDP.

  • LDP dump on PDC emulator after replicating and rerunning setup.

Expanding base 'CN=Mail,CN=Computers,DC=domain,DC=com'...
Result <0>: (null)
Matched DNs:
Getting 1 entries:
>> Dn: CN=Mail,CN=Computers,DC=domain,DC=com
5> objectClass: top; person; organizationalPerson; user; computer;
1> cn: Mail;
1> distinguishedName: CN=Mail,CN=Computers,DC=domain,DC=com;
1> instanceType: 0x4 = ( IT_WRITE );
1> whenCreated: 03/26/2008 15:30:41 Eastern Standard Time Eastern Daylight Time;
1> whenChanged: 03/30/2008 10:35:42 Eastern Standard Time Eastern Daylight Time;
1> uSNCreated: 16994;
1> uSNChanged: 24911;
1> name: Mail;
1> objectGUID: a5d09e3e-2d7d-4e3d-8bf9-30b423ead057;
1> userAccountControl: 0x1020 = ( UF_PASSWD_NOTREQD | UF_WORKSTATION_TRUST_ACCOUNT
);
1> codePage: 0;
1> countryCode: 0;
1> localPolicyFlags: 0;
1> pwdLastSet: 03/30/2008 10:34:44 Eastern Standard Time Eastern Daylight Time;
1> primaryGroupID: 515;
1> objectSid: S-1-5-21-2075556647-3556310751-2339872061-1111;
1> accountExpires: 09/14/30828 02:48:05 UNC ;
1> sAMAccountName: MAIL$;
1> sAMAccountType: 805306369;
1> dNSHostName: MAIL.domain.com;
8> servicePrincipalName: exchangeMDB/Mail.domain.com; exchangeMDB/Mail;
exchangeRFR/Mail.domain.com; exchangeRFR/Mail;
MSClusterVirtualServer/Mail.domain.com; MSClusterVirtualServer/MAIL;
HOST/Mail.domain.com; HOST/MAIL;
1> objectCategory: CN=Computer,CN=Schema,CN=Configuration,DC=domain,DC=com;
1> isCriticalSystemObject: FALSE;
2> dSCorePropagationData: 03/26/2008 15:52:48 Eastern Standard Time Eastern
Daylight Time; 30650/29691/8424 20876:77:2544 UNC;

  • LDP dump on local domain controller after replicating and rerunning setup.

Expanding base 'CN=Mail,CN=Computers,DC=domain,DC=com'...
Result <0>: (null)
Matched DNs:
Getting 1 entries:
>> Dn: CN=Mail,CN=Computers,DC=domain,DC=com
5> objectClass: top; person; organizationalPerson; user; computer;
1> cn: Mail;
1> distinguishedName: CN=Mail,CN=Computers,DC=domain,DC=com;
1> instanceType: 0x4 = ( IT_WRITE );
1> whenCreated: 03/26/2008 15:30:41 Eastern Standard Time Eastern Daylight Time;

1> whenChanged: 03/26/2008 15:30:50 Eastern Standard Time Eastern Daylight Time;

1> uSNCreated: 16994;
1> uSNChanged: 16994;
1> name: Mail;
1> objectGUID: a5d09e3e-2d7d-4e3d-8bf9-30b423ead057;
1> userAccountControl: 0x1020 = ( UF_PASSWD_NOTREQD | UF_WORKSTATION_TRUST_ACCOUNT
);
1> codePage: 0;
1> countryCode: 0;
1> localPolicyFlags: 0;
1> pwdLastSet: 03/26/2008 15:30:41 Eastern Standard Time Eastern Daylight Time;
1> primaryGroupID: 515;
1> objectSid: S-1-5-21-2075556647-3556310751-2339872061-1111;
1> accountExpires: 09/14/30828 02:48:05 UNC ;
1> sAMAccountName: MAIL$;
1> sAMAccountType: 805306369;
1> dNSHostName: MAIL.domain.com;
8> servicePrincipalName: exchangeMDB/Mail.domain.com; exchangeMDB/Mail;
exchangeRFR/Mail.domain.com; exchangeRFR/Mail;
MSClusterVirtualServer/Mail.domain.com; MSClusterVirtualServer/MAIL;
HOST/Mail.domain.com; HOST/MAIL;
1> objectCategory: CN=Computer,CN=Schema,CN=Configuration,DC=domain,DC=com;
1> isCriticalSystemObject: FALSE;
2> dSCorePropagationData: 03/26/2008 15:52:48 Eastern Standard Time Eastern
Daylight Time; 30650/29691/8424 20876:77:2544 UNC;

Taking a look at our comparison points from before:

  • whenCreated
    • SiteA DC - 03/26/2008 15:30:41 Eastern Standard Time Eastern Daylight Time
    • SiteB DC - 03/26/2008 15:30:41 Eastern Standard Time Eastern Daylight Time
  • objectGUID
    • SiteA DC - a5d09e3e-2d7d-4e3d-8bf9-30b423ead057
    • SiteB DC - a5d09e3e-2d7d-4e3d-8bf9-30b423ead057
  • pwdLastSet
    • SiteA DC - pwdLastSet: 03/30/2008 10:34:44 Eastern Standard Time Eastern Daylight Time
    • SiteB DC - pwdLastSet: 03/26/2008 15:30:41 Eastern Standard Time Eastern Daylight Time

As you can clearly see the object is the same (based on objectGUID), but the kerberos password stored on the DC in SiteA and the DC in SiteB is not the same.  The passwords on SiteA is newer then SiteB.  This makes sense, when cluster kerberos enables the new network name it does not delete and recreate the computer account, hence the whenCreated time and objectGUID did not change.  But, it does hijack the existing account, and updates the kerberos password resulting in pwdLastSet getting updated.  Therefore, when running setup a second time, we fail with RPC DNS error (which is really access denied).

If you again replicate domain controllers or allow time for replication to complete naturally, the kerberos passwords will sync between domain controllers.  This time when re-running setup we are watermarked at a different location, and the database instance is successfully created and brought online.

The base moral of this story is that when the PDC emulator is not in the same active directory site as the Exchange nodes, or you have a flat AD site with multiple domain controllers (with some intra-site replication latency), you can expect that Exchange setup will fail at least two times when creating the clustered resources.

There are three potential ways to work around this issue.

  • Use Windows 2008 instead of Windows 2003.
    • Windows 2008 will use local domain controllers to Kerberos enable objects and update existing objects.
    • Note:  If you are in a flat AD site you may still run into this issue due to intra-site AD replication latencies.
  • Shut the PDC emulator down while running setup.
    • By shutting the PDC emulator down you make it unavailable for cluster service use. 
    • When the cluster service cannot find the PDC emulator, it reverts back to using a local domain controller to Kerberos enable objects and update existing objects.
  • Virtually make the PDC emulator unavailable by putting a host file entry in place.
    • This by far is the easiest solution since it can be controlled from the node where setup is being run.
    • In the host file, place an entry for the PDC emulator using a completely unresponsive IP address.
    • Here is an example host file:

# Copyright (c) 1993-2006 Microsoft Corp.
#
# This is a sample HOSTS file used by Microsoft TCP/IP for Windows.
#
# This file contains the mappings of IP addresses to host names. Each
# entry should be kept on an individual line. The IP address should
# be placed in the first column followed by the corresponding host name.
# The IP address and the host name should be separated by at least one
# space.
#
# Additionally, comments (such as these) may be inserted on individual
# lines or following the machine name denoted by a '#' symbol.
#
# For example:
#
#      102.54.94.97     rhino.acme.com          # source server
#       38.25.63.10     x.acme.com              # x client host

168.0.0.1    pdcEmulator.company.com
168.0.0.1    pdcEmulator

    • The IP addressed used is completely unavailable / not responsive on the network.
    • Entries are made by both the FQDN and the Netbios name of the PDC Emulator.
    • When the PDC emulator is made unresponsive, cluster will revert back to using a local domain controller to Kerberos enable objects and update existing objects.
    • You may notice that the network name resource takes longer to come online, it will eventually online if there are no other circumstances preventing it from coming online
    • You need to remove the host file entry when setup has completed successfully.

Comments