Part 8: Datacenter Activation Coordination: Stop! In the Name of DAG…
Sometimes, even when following a specific process, it takes only one mistake to send the entire process off course. Recently I’ve worked with several customers on their datacenter switchover steps that have found themselves unable to complete the process. Let’s explore several examples of what happened…
====================================================================================
In the first example, we have a four member database availability group (DAG). Two members are deployed in the primary datacenter along with the witness server, and the other two members are installed in a remote datacenter with an alternate witness server. Each datacenter is an Active Directory site with a defined subnet. In this example, AD site Exchange-A is the primary datacenter and AD site Exchange-B is the remote datacenter. Here is an example network diagram:
In preparation for testing the witness server, MBX-1, MBX-2, and the router are powered down. This leaves MBX-3 and MBX-4 in a lost quorum state in the remote datacenter. The administrator starts the datacenter switchover process with Stop-DatabaseAvailabilityGroup, as shown in this example:
Stop-DatabaseAvailabilityGroup –identity DAG –ActiveDirectorySite Exchange-B –ConfigurationOnly:$TRUE –confirm:$FALSE
WARNING: Active Directory couldn't be updated in Exchange-A site(s) affected by the change to 'DAG'. It won't be completely usable until after Active Directory replication occurs.
An error caused a change in the current set of domain controllers.
+ CategoryInfo : NotSpecified: (0:Int32) [], ADServerSettingsChangedException + FullyQualifiedErrorId : 372697AD
Next, the cluster service is stopped on MBX-3 and MBX-4.
Stop-service clussvc
To complete the switchover, Restore-DatabaseAvailabilityGroup is used.
Restore-DatabaseAvailabilityGroup –identity DAG –ActiveDirectorySite:Exchange-B
WARNING: The operation wasn't successful because an error was encountered. You may find more details in log file
"C:\ExchangeSetupLogs\DagTasks\dagtask_2012-08-12_14-07-52.764_restore-databaseavailabilitygroup.log".
Unable to get the status of the cluster service on server 'MBX-2'. Error: 'Cannot open Service Control Manager on computer 'MBX-2'. This operation might require other privileges.'
+ CategoryInfo : InvalidArgument: (:) [Restore-DatabaseAvailabilityGroup], FailedToGetServiceStatusForNodeException
+ FullyQualifiedErrorId : A9B129A5,Microsoft.Exchange.Management.SystemConfigurationTasks.RestoreDatabaseAvailabilityGroup
The command returns an error indicating that it cannot contact server MBX-2 in order to determine the status of the Cluster service. Why is the task attempting to contact a server in the primary site that is down? Using Get-DatabaseAvailabilityGroup to review the properties of the DAG shows us why:
Get-DatabaseAvailabilityGroup -Identity DAG | fl name,servers,startedmailboxservers,stoppedmailboxservers
Name : DAG
Servers : {MBX-2, MBX-3, MBX-4, MBX-1}
StoppedMailboxServers : {MBX-4.exchange.msft, MBX-3.exchange.msft} StartedMailboxServers : {MBX-1.exchange.msft, MBX-2.exchange.msft}
We can examine StoppedMailboxServers and note that MBX-3 and MBX-4 are on the stopped list when they should be on the started servers list. This happened because in this instance the administrator stopped the wrong Active Directory site. When using Stop-DatabaseAvailabilityGroup, the administrator should have specified site Exchange-A but accidentally specified Exchange-B. This means the restore task is attempting to force the Cluster service on either MBX-1 or MBX-2 online and subsequently evict MBX-3 and MBX-4 from the cluster.
If this mistake is made, how do you fix it? The first step that needs to be done is to correct the stopped and started servers list. To do this, first stop the correct set of servers.
Stop-DatabaseAvailabilityGroup -Identity DAG -ActiveDirectorySite Exchange-A -ConfigurationOnly:$TRUE -Confirm:$FALSE
WARNING: Active Directory couldn't be updated in Exchange-A site(s) affected by the change to 'DAG'. It won't be completely usable until after Active Directory replication occurs.
An error caused a change in the current set of domain controllers.
+ CategoryInfo : NotSpecified: (0:Int32) [], ADServerSettingsChangedException
+ FullyQualifiedErrorId : 372697AD
Next, use Get-DatabaseAvailabiltyGroup to confirm that all four servers in the DAG now appear on the StoppedMailboxSservers list.
Get-DatabaseAvailabilityGroup -Identity DAG | fl name,servers,stoppedmailboxservers,startedmailboxservers
Name : DAG
Servers : {MBX-2, MBX-3, MBX-4, MBX-1}
StoppedMailboxServers : {MBX-2.exchange.msft, MBX-1.exchange.msft, MBX-4.exchange.msft, MBX-3.exchange.msft}
StartedMailboxServers : {}
The second step requires starting the servers in the remote datacenter. Start-DatabaseAvailabilityGroup can be used to do this, as shown in the following example:
Start-DatabaseAvailabilityGroup -Identity DAG -ActiveDirectorySite Exchange-B
WARNING: Active Directory couldn't be updated in Exchange-A site(s) affected by the change to 'DAG'. It won't be completely usable until after Active Directory replication occurs.
An Active Manager operation failed. Error An error occurred while attempting a cluster operation. Error: Cluster API 'OpenByNames('MBX-3.exchange.msft', 'MBX-4.exchange.msft') failed for each server. Specific exceptions: 'An Active Manager operation failed. Error An error occurred while attempting a cluster operation. Error: Cluster API '"OpenCluster(MBX-
3.exchange.msft) failed with 0x6d9. Error: There are no more endpoints available from the endpoint mapper"' failed..', 'An Active Manager operation failed. Error An error occurred while attempting a cluster operation. Error: Cluster API '"OpenCluster(MBX-4.exchange.msft) failed with 0x6d9. Error: There are no more endpoints available from the endpoint mapper"' failed..'.' failed.. + CategoryInfo : NotSpecified: (0:Int32) [Start-DatabaseAvailabilityGroup], AmClusterApiException
+ FullyQualifiedErrorId : BA1A902A,Microsoft.Exchange.Management.SystemConfigurationTasks.StartDatabaseAvailabilityGroup
An error caused a change in the current set of domain controllers.
+ CategoryInfo : NotSpecified: (0:Int32) [], ADServerSettingsChangedException
+ FullyQualifiedErrorId : 372697AD
The failures that are displayed are expected. The Cluster services on the nodes are not in a started state at this time. Using Get-DatabaseAvailabilityGroup we note that the servers listed are correct for both the StartedMailboxServers and StoppedMailboxServers list.
Get-DatabaseAvailabilityGroup -Identity DAG | fl name,servers,stoppedmailboxservers,startedmailboxservers
Name : DAG
Servers : {MBX-2, MBX-3, MBX-4, MBX-1}
StoppedMailboxServers : {MBX-2.exchange.msft, MBX-1.exchange.msft}
StartedMailboxServers : {MBX-3.exchange.msft, MBX-4.exchange.msft}
The third step is to ensure the Cluster service is stopped on each node, which can be accomplished by using Stop-Service.
Stop-Service ClusSvc
The last step is to use Restore-DatabaseAvailabiltyGroup. This cmdlet will complete the datacenter switchover process by forcing the Cluster service to start and by evicting the nodes on the StoppedMailboxServers list.
Restore-DatabaseAvailabilityGroup -Identity DAG -ActiveDirectorySite Exchange-B -Confirm:$FALSE
WARNING: The Exchange Trusted Subsystem is not a member of the local Administrators group on specified witness server dc-2.exchange.msft.
This completes the datacenter switchover for the database availability group. The procedure can now continue with database activation and changes required for client access.
====================================================================================
In the second example we have a four-member DAG. Two members are in the primary datacenter with the witness server, and two members are in a remote datacenter with an alternate witness server configured. Both datacenters are in the same Active Directory site. Here is an example network diagram:
In preparation for testing the witness server, MBX-1, MBX-2, and the router are powered down. This leaves MBX-3 and MBX-4 in a lost quorum state in the remote datacenter. So the administrator starts the datacenter switchover process by issuing Stop-DatabaseAvailabilityGroup, as illustrated in the following example:
Stop-DatabaseAvailabilityGroup -Identity DAG -ActiveDirectorySite Exchange -ConfigurationOnly:$TRUE -Confirm:$FALSE
WARNING: Active Directory couldn't be updated in Exchange site(s) affected by the change to 'DAG'. It won't be completely usable until after Active Directory replication occurs.
Next, the Cluster service is stopped on MBX-3 and MBX-4.
Stop-service ClusSvc
To complete the activation, Restore-DatabaseAvailabilityGroup is issued.
Restore-DatabaseAvailabilityGroup -Identity DAG -ActiveDirectorySite Exchange -Confirm:$FALSE
WARNING: The operation wasn't successful because an error was encountered. You may find more details in log file "C:\ExchangeSetupLogs\DagTasks\dagtask_2012-08-12_16-57-27.326_restore-databaseavailabilitygroup.log". Unable to form quorum for database availability group 'DAG'. Please try the operation again, or run the Restore-DatabaseAvailabilityGroup cmdlet and specify the site with servers known to be running.
+ CategoryInfo : InvalidArgument: (:) [Restore-DatabaseAvailabilityGroup], DagTaskQuorumNotAchievedException
+ FullyQualifiedErrorId : C7FE0CB9,Microsoft.Exchange.Management.SystemConfigurationTasks.RestoreDatabaseAvailabilityGroup
The command returns an error indicating that a quorum cannot be formed because no servers are known to be running. Why has this occurred? Using Get-DatabaseAvailabilityGroup we can review the properties of the DAG:
Get-DatabaseAvailabilityGroup -Identity DAG | fl name,servers,stoppedmailboxservers,startedmailboxservers
Name : DAG
Servers : {MBX-2, MBX-3, MBX-4, MBX-1}
StoppedMailboxServers : {MBX-1.exchange.msft, MBX-4.exchange.msft, MBX-3.exchange.msft, MBX-2.exchange.msft}
StartedMailboxServers : {}
Specifically we are interested in StoppedMailboxServers. In this example, all four DAG members appear in the StoppedMailboxServers list. Why is that? In our scenario, all Exchange servers are in the same Active Directory site. The administrator issued Stop-DatabaseAvailabiltyGroup command with the ActiveDirectorySite parameter when instead the MailboxServer parameter should have been used. The MailboxServer parameter was needed so that the administrator could stop individual servers instead of all of the servers in the same site.
If this mistake is made, you can recover from it fairly easily. The first step is to fix the started and stopped mailbox server lists. You can use Start-DatabaseAvailabilityGroup to correct this.
Start-DatabaseAvailabilityGroup -Identity DAG -MailboxServer MBX-3
An Active Manager operation failed. Error An error occurred while attempting a cluster operation. Error: Cluster API 'OpenByNames('MBX-3.exchange.msft') failed for each server. Specific exceptions: 'An Active Manager operation failed. Error An error occurred while attempting a cluster operation. Error: Cluster API '"OpenCluster(MBX-3.exchange.msft) failed with 0x6d9. Error: There are no more endpoints available from the endpoint mapper"' failed..'.' failed..
+ CategoryInfo : NotSpecified: (0:Int32) [Start-DatabaseAvailabilityGroup], AmClusterApiException
+ FullyQualifiedErrorId : CE668F87,Microsoft.Exchange.Management.SystemConfigurationTasks.StartDatabaseAvailabilityGroup
Start-DatabaseAvailabilityGroup -Identity DAG -MailboxServer MBX-4
An Active Manager operation failed. Error An error occurred while attempting a cluster operation. Error: Cluster API 'OpenByNames('MBX-3.exchange.msft', 'MBX-4.exchange.msft') failed for each server. Specific exceptions: 'An Active Manager operation failed. Error An error occurred while attempting a cluster operation. Error: Cluster API '"OpenCluster(MBX-
3.exchange.msft) failed with 0x6d9. Error: There are no more endpoints available from the endpoint mapper"' failed..','An Active Manager operation failed. Error An error occurred while attempting a cluster operation. Error: Cluster API '"OpenCluster(MBX-4.exchange.msft) failed with 0x6d9. Error: There are no more endpoints available from the endpoint mapper"' failed..'.' failed..
+ CategoryInfo : NotSpecified: (0:Int32) [Start-DatabaseAvailabilityGroup], AmClusterApiException
+ FullyQualifiedErrorId : BB89A63D,Microsoft.Exchange.Management.SystemConfigurationTasks.StartDatabaseAvailabilityGroup
The failures that are displayed are expected. The Cluster services on the DAG members is not started at this time. We can use Get-DatabaseAvailabliityGroup to verify that the StartedMailboxServers and StoppedMailboxServers lists are correct.
Get-DatabaseAvailabilityGroup -Identity DAG | fl name,servers,stoppedmailboxservers,startedmailboxservers
Name : DAG
Servers : {MBX-2, MBX-3, MBX-4, MBX-1}
StoppedMailboxServers : {MBX-1.exchange.msft, MBX-2.exchange.msft}
StartedMailboxServers : {MBX-4.exchange.msft, MBX-3.exchange.msft}
The second step is to ensure that the Cluster service is stopped on MBX-3 and MBX-4.
Stop-Server ClusSvc
The last step is to run Restore-DatabaseAvailabilityGroup command. This will complete the datacenter switchover process by forcing the Cluster service to start and by evicting the nodes on the stopped servers list.
Restore-DatabaseAvailabilityGroup -Identity DAG -ActiveDirectorySite Exchange -Confirm:$FALSE
WARNING: The Exchange Trusted Subsystem is not a member of the local Administrators group on specified witness server dc-2.exchange.msft.
This completes the datacenter switchover for the database availability group. The procedure can now continue with database activation and changes required for client access.
====================================================================================
In the last example we have a four-member DAG. Two members are installed in a primary datacenter with the witness server, and two members are installed in a remote datacenter with an alternate witness server configured. Both datacenters are in the same Active Directory site. The same situation described in this example can occur when multiple Active Directory sites are used, but in my experience, this problem most commonly occurs with just a single Active Directory site. Here is an example network diagram:
In preparation for testing the witness server, MBX-1, MBX-2, and the router are powered down. This leaves MBX-3 and MBX-4 in a lost quorum state in the remote datacenter. So, the administrator starts the datacenter switchover process with Stop-DatabaseAvailabilityGroup:
Stop-DatabaseAvailabilityGroup -Identity DAG -ActiveDirectorySite Exchange -Confirm:$FALSE
Next, the cluster service is stopped on MBX-3 and MBX-4.
Stop-service ClusSvc
Finally, Restore-DatabaseAvailabilityGroup is issued.
Restore-DatabaseAvailabilityGroup -Identity DAG -ActiveDirectorySite Exchange -Confirm:$FALSE
WARNING: The operation wasn't successful because an error was encountered. You may find more details in log file "C:\ExchangeSetupLogs\DagTasks\dagtask_2012-08-12_16-57-27.326_restore-databaseavailabilitygroup.log". Unable to form quorum for database availability group 'DAG'. Please try the operation again, or run the Restore-DatabaseAvailabilityGroup cmdlet and specify the site with servers known to be running.
+ CategoryInfo : InvalidArgument: (:) [Restore-DatabaseAvailabilityGroup], DagTaskQuorumNotAchievedException
+ FullyQualifiedErrorId : C7FE0CB9,Microsoft.Exchange.Management.SystemConfigurationTasks.RestoreDatabaseAvailabilityGroup
As with the previous examples, the problem because the administrator issued the Stop-DatabaseAvailabilityGroup command and all servers were added to the stopped servers list. This is verified with Get-DatabaseAvailabilityGroup.
Get-DatabaseAvailabilityGroup -Identity DAG | fl name,servers,stoppedmailboxservers,startedmailboxservers
Name : DAG
Servers : {MBX-2, MBX-3, MBX-4, MBX-1}
StoppedMailboxServers : {MBX-1.exchange.msft, MBX-4.exchange.msft, MBX-3.exchange.msft, MBX-2.exchange.msft}
StartedMailboxServers : {}
The extent of the issue is realized when we attempt to correct the started and stopped mailbox server lists and proceed with the switchover process. As with the previous examples, we use Start-DatabaseAvailabilityGroup with the MailboxServer parameter to start the individual servers in the remote datacenter.
Start-DatabaseAvailabilityGroup -Identity DAG -MailboxServer MBX-3
An Active Manager operation failed. Error An error occurred while attempting a cluster operation. Error: Cluster API 'OpenByNames('MBX-3.exchange.msft') failed for each server. Specific exceptions: 'An Active Manager operation failed. Error An error occurred while attempting a cluster operation. Error: Cluster API '"OpenCluster(MBX-3.exchange.msft) failed with 0x6d9. Error: There are no more endpoints available from the endpoint mapper"' failed..'.' failed..
+ CategoryInfo : NotSpecified: (0:Int32) [Start-DatabaseAvailabilityGroup], AmClusterApiException
+ FullyQualifiedErrorId : CE668F87,Microsoft.Exchange.Management.SystemConfigurationTasks.StartDatabaseAvailabilityGroup
Start-DatabaseAvailabilityGroup -Identity DAG -MailboxServer MBX-4
An Active Manager operation failed. Error An error occurred while attempting a cluster operation. Error: Cluster API 'OpenByNames('MBX-3.exchange.msft', 'MBX-4.exchange.msft') failed for each server. Specific exceptions: 'An Active Manager operation failed. Error An error occurred while attempting a cluster operation. Error: Cluster API '"OpenCluster(MBX-3.exchange.msft) failed with 0x6d9. Error: There are no more endpoints available from the endpoint mapper"' failed..','An Active Manager operation failed. Error An error occurred while attempting a cluster operation. Error: Cluster API '"OpenCluster(MBX-4.exchange.msft) failed with 0x6d9. Error: There are no more endpoints available from the endpoint mapper"' failed..'.' failed..
+ CategoryInfo : NotSpecified: (0:Int32) [Start-DatabaseAvailabilityGroup], AmClusterApiException
+ FullyQualifiedErrorId : BB89A63D,Microsoft.Exchange.Management.SystemConfigurationTasks.StartDatabaseAvailabilityGroup
The failures that are displayed are expected because the Cluster services on the DAG members are not in a started state. Using Get-DatabaseAvailabliityGroup, we note that the servers are correct on both the StartedMailboxServers and StoppedMailboxServers list.
Get-DatabaseAvailabilityGroup -Identity DAG | fl name,servers,stoppedmailboxservers,startedmailboxservers
Name : DAG
Servers : {MBX-2, MBX-3, MBX-4, MBX-1}
StoppedMailboxServers : {MBX-1.exchange.msft, MBX-2.exchange.msft}
StartedMailboxServers : {MBX-4.exchange.msft, MBX-3.exchange.msft}
Using Stop-Service, we stop the Cluster service on each server.
Stop-Service ClusSvc
Finally, the last step should be run Restore-DatabaseAvailabilityGroup force quorum on the remaining servers and evict the servers on the stopped servers list.
Restore-DatabaseAvailabilityGroup -Identity DAG -ActiveDirectorySite Exchange -Confirm:$FALSE
WARNING: The operation wasn't successful because an error was encountered. You may find more details in log file "C:\ExchangeSetupLogs\DagTasks\dagtask_2012-08-12_17-55-16.974_restore-databaseavailabilitygroup.log". Couldn't start the Cluster service on 'MBX-3'. Service state: Stopped. Try forcing the cluster to start without quorum by running "net start clussvc /fq" from a command prompt on that node. + CategoryInfo : InvalidArgument: (:) [Restore-DatabaseAvailabilityGroup], FailedToStartClusSvcException
+ FullyQualifiedErrorId : 6CD04940,Microsoft.Exchange.Management.SystemConfigurationTasks.RestoreDatabaseAvailabilityGroup
As shown above, Restore-DatabaseAvailabilityGroup failed because it failed to successfully start the Cluster service on MBX-3 using force quorum. The error suggests that the administrator should attempt to manually start the service with /forcequorum.
net start clussvc /fq
System error 1058 has occurred. The service cannot be started, either because it is disabled or because it has no enabled devices associated with it.
After attempting to manually start the Cluster service with /forceQuorum the above error is displayed, which indicates that the Cluster service is not installed.
When reviewing Service Control Manager, we note that the Cluster service on the remaining members is set to Disabled.
When reviewing the system event log, we see the following event at or about the time the Stop-DatabaseAvailabilityGroup was issued.
Log Name: System
Source: Service Control Manager
Date: 8/12/2012 10:30:07 AM
Event ID: 7040
Task Category: None
Level: Information
Keywords: Classic
User: SYSTEM
Computer: MBX-3.exchange.msft
Description:
The start type of the Cluster Service service was changed from auto start to disabled.
This is where the extent of the mistake is exposed. Stop-DatabaseAvailabilityGroup was not only run against servers that should not have been stopped, but it was also run without the ConfigurationOnly parameter. When the cmdlet is run without the ConfigurationOnly parameter, any servers that are being stopped that are accessible will have their Cluster service forcibly cleaned up. This in turn prevents Restore-DatabaseAvailabilityGroup from being successful.
In order to overcome this situation the administrator must re-establish the Cluster and then proceed with database activation. The first step is to ensure that the Cluster service is completely cleaned up from the DAG members in the remote datacenter.
Windows 2008:
Cluster Node /force
Attempting to clean up node '' ...
Clean up successfully completed.
Windows 2008 R2:
Import-Module FailoverClusters
Clear-ClusterNode -Force -Verbose -Confirm:$FALSE
VERBOSE: Performing operation "Clear-ClusterNode" on Target "MBX-3".
VERBOSE: Clearing cluster node MBX-3.
The second step is to use Active Directory Users and Computers to locate the DAG’s CNO. Right-click the CNO and select RESET, and then right-click the CNO and select disable. Allow sufficient time for the disabled account to replicate around Active Directory.
The third step is to manually create the cluster. There are three methods to manually create the cluster.
Windows 2008 and Windows 2008 R2 utilizing Failover Cluster Manager:
Launch Failover Cluster Manager.
In the upper right corner select “Create a cluster…”
In the “Before you begin” dialog, select Next.
On the “Selected Server” dialog enter the server names of all servers in the remote datacenter. In our example, we will add MBX-3 and MBX-4. Select the Add button after each server name. Select Next when completed.
On the “Validation Warning” select NO. Select Next when completed.
On the “Access Point for Administering the Cluster” in the “Cluster Name:” field, enter the name of the DAG. In our example we will use DAG (creative eh?). In the networks dialog enter the IP address assigned to the DAG in the remote datacenter (if you are not sure you can use Get-DatabaseAvailabilityGroup | fl name,databaseavailabilitygroupipaddresses to list the IP addresses assigned to the DAG). Select Next when complete.
On the “Confirmation” select Next.
At this time the Cluster service should be configured on both servers. On the “Summary” select Finish.
The last step is to use the Exchange Management Shell and run the following command:
Set-DatabaseAvailabilityGroup –identity DAG
By running this command and not specifying any values this will ensure that the DAG settings from Active Directory are applied to the new cluster.
Windows 2008 Command Line:
Cluster.exe DAGNAME /create /nodes:”NODE1 NODE2 NODE3” /ipaddress:”IP/Subnet”
C:\>cluster.exe DAG /create /nodes:"MBX-3 MBX-4" /ipAddress:"192.168.1.20/24"
4% Initializing Cluster DAG.
9% Validating cluster state on node MBX-3.
13% Searching the domain for computer object DAG
18% Verifying computer object DAG in the domain
22% Configuring computer object DAG as cluster name object
27% Validating installation of the Microsoft Failover Cluster Virtual Adapter on node MBX-3.
31% Validating installation of the Cluster Disk Driver on node MBX-3.
36% Configuring Cluster Service on node MBX-3.
40% Validating installation of the Microsoft Failover Cluster Virtual Adapter on node MBX-4.
45% Validating installation of the Cluster Disk Driver on node MBX-4.
50% Configuring Cluster Service on node MBX-4.
54% Starting Cluster Service on node MBX-3.
54% Starting Cluster Service on node MBX-4.
59% Forming cluster DAG.
63% Adding cluster common properties to DAG.
68% Creating resource types on cluster DAG.
72% Creating group 'Cluster Group'.
72% Creating group 'Available Storage'.
77% Creating IP Address resource 'Cluster IP Address'.
81% Creating Network Name resource 'DAG'.
86% Searching the domain for computer object DAG
90% Verifying computer object DAG in the domain
95% Configuring computer object DAG as cluster name object
100% Bringing resource group 'Cluster Group' online.
Windows 2008 R2 Powershell:
Import-Module FailoverClusters
New-Cluster –name DAGNAME –node NODE1,NODE2,NODE3 /staticIP:IPAddress /noStorage
[PS] C:\>Import-Module FailoverClusters
[PS] C:\>New-Cluster -Name DAG -Node MBX-3,MBX-4 -StaticAddress 192.168.1.20 -NoStorage
Report file location: C:\Windows\cluster\Reports\Create Cluster Wizard DAG on 2013.08.12 At 12.09.55.mht
Name
----
DAG
At this time, the started and stopped mailbox server lists are accurate, and the Cluster service for the DAG has been re-established. To ensure the configuration is correct the administrator can run Set-DatabaseAvailabilityGroup. This will ensure that the DAG configuration in Active Directory matches the cluster configuration.
This completes the datacenter switchover for the database availability group. The procedure can now continue with database activation and changes required for client access.
====================================================================================
This blog post covers three common scenarios I see where administrators make mistakes when using Stop-DatabaseAvailabilityGroup. When used incorrectly, the cmdlet can have unintended results and the steps outlined here can be used to work around them.
========================================================
Datacenter Activation Coordination Series:
Part 1: My databases do not mount automatically after I enabled Datacenter Activation Coordination (https://aka.ms/F6k65e)
Part 2: Datacenter Activation Coordination and the File Share Witness (https://aka.ms/Wsesft)
Part 3: Datacenter Activation Coordination and the Single Node Cluster (https://aka.ms/N3ktdy)
Part 4: Datacenter Activation Coordination and the Prevention of Split Brain (https://aka.ms/C13ptq)
Part 5: Datacenter Activation Coordination: How do I Force Automount Concensus? (https://aka.ms/T5sgqa)
Part 6: Datacenter Activation Coordination: Who has a say? (https://aka.ms/W51h6n)
Part 7: Datacenter Activation Coordination: When to run start-databaseavailabilitygroup to bring members back into the DAG after a datacenter switchover. (https://aka.ms/Oieqqp)
Part 8: Datacenter Activation Coordination: Stop! In the Name of DAG... (https://aka.ms/Uzogbq)
Part 9: Datacenter Activation Coordination: An error cause a change in the current set of domain controllers (https://aka.ms/Qlt035)
========================================================
Comments
Anonymous
January 01, 2003
@JK:
Yes - what you experienced is exactly what's supposed to happen. Now - get-databaseavailabilitygroup by itself should have worked just fine since it should read what is from the active directory.
If you used -status I could see how you might experience some errors.
TIMMCMICAnonymous
January 01, 2003
I have read all nine parts thus far, but I must have forgotten about that one. :/ Though I did remember how DACP worked (as you outlined in parts 1 and 4) which helped me out in the process. So I thank you for that as well, good sir! :)Anonymous
January 01, 2003
@Aleksandar... I'm glad you enjoyed it. TIMMCMICAnonymous
January 01, 2003
@Paul... Thanks! I have that scenario covered in my blog blogs.technet.com/.../part-3-datacenter-activation-coordination-how-do-i-force-automount-consensus.aspx Thanks for reading! TIMMCMICAnonymous
January 01, 2003
@Radhakrishnan....
I'm glad you found this helpful.
TIMMCMICAnonymous
January 01, 2003
Holy cow...scenario three sounds like a complete nightmare! I came across something similar to scenario one while preparing for our annual disaster recovery test (where I had to create a split-brain environment...been meaning to e-mail you about my findings on that) and discovered that if I lose the DACP-bit (and the PAM) at the recovery data center that I can simply run a Start-DAG to get things going again. I found this out by trial and error when I had to restart the Exchange services in a lab. Great blog, like usual!Anonymous
January 01, 2003
@Hemant...
The alternate file share witness would be utilized if you go over the site failover procedure.
TIMMCMICAnonymous
January 01, 2003
Great story!!!Anonymous
January 01, 2003
It works for us in our company. Thank you Tim for your grt contribution..Anonymous
June 23, 2014
Thanks Tim, great post really it helped in our last DR activity.Anonymous
July 02, 2014
I recently went through this process and the article was a big help. One note, on Windows 2012, when you recreate the cluster and add the nodes to the site, cluster service will grab local disks on the nodes and assign them as cluster resources. This will prevent Exchange from finding them. The disk resources have to be deleted from the cluster and then marked as online in Disk Manager on each node.Anonymous
July 09, 2014
The comment has been removedAnonymous
July 10, 2014
why we have to stop the databases in case of DR? is it possible to move only the db to DR server and switch the FSW on DR server?Anonymous
July 28, 2014
thanks for this insight into "anything that can go wrong will go wrong". I have one query. We ran through a two datacentre node majority (3/2) switchover recently. It all worked, but after the stop-databaseavailabilitygroup against the primary site get-databaseavailabilitygroup at the secondary always failed code 0x46 0r 0x6d9. After a bit of digging it looked like the cluster services on the recovery side had stopped themselves because the cluster had lost quorum, so get-databaseavallabilitygroup could not work. AD showed the three nodes had stopped ok and the restart-dag worked after the usual failures as the cluster service started. Should get-databaseavailabilitygroup work between the stop and the restart ?Anonymous
August 25, 2014
Hi Tim,
Thanks a lot for these Great posts. One question, we have four member DAG, hence FSW will come into picture for the additional vote as per n/2 +1 method. ( 3 nodes should be available for the cluster to work). Now if two nodes and the FSW in the primary DC went offline. The quorum lost and the cluster will be down.
Now we have two votes available MBX-3 and MBX-4 online. Can we bring the alternet FSW configured in the DR DC into picture for the additional vote which is required to bring this cluster online. This will make 3 votes online and the cluster should come online. Why to do the entire datacenter switchover process?
I know I am missing something here. Hope you will help me find out that.Anonymous
April 29, 2015
Hello Tim
Thanks for the great post.
I have similar query and wanted confirm with you.
We have 2 AD site. each site contain 3 Exchange 2010 servers and each server has mbx, cas,HT roles installed on it.
while preparing DAG Site failover document, we ran Stop-DatabaseAvailabilityGroup without -configurationonly switch
I can see cluster service on all three servers on DR site
and then ran Restore-DatabaseAvailabilityGroup -Identity DAG -ActiveDirectorySite Exchange
do we still have to create cluster again
Thanks,
SandipAnonymous
November 12, 2015
Great post from your hands again. I loved the complete article.
By the way nice writing style you have. I never felt like boring while reading this article.
I will come back & read all your posts soon. Regards, Lucy.Anonymous
December 20, 2015
Thanks Tim... You'r blogs always awsone. But i've a questions based on Second scenario
DAG001 and DAG002 (DAG001 is in 10.1.0.0/16 and DAG002 is in 10.2.0.0/16) both are in the same site Assume Site A,and Each DAG contains 4 Exchange servers. Both DAGs are in same site but they are in Isolated N/w.
DAG001 databases dsitributed between 2 subnets and the same for dag002(We already enabled DAC mode). Considering the current infrastructure if my DAG001 goes down(Entire Subnet 10.1.0.0/16, even FSW which is tie breaker also went down) if i run restore-databaseavailbilitygroup will it affect to DAG002?Anonymous
January 18, 2016
@Pabbin:
I'm not sure I understand the question.
The important thing is that if this is ONE DAG in ONE SITE that you run commands based on the mailbox server and not site.
If these are two DAGs stopping / restoring one dag does not impact the other.
TIMMCMIC