Exchange 2010: Restore-DatabaseAvailabilityGroup fails to evict nodes error 0x46.
Restore-DatabaseAvailabilityGroup is one of the cmdlets used as part of the datacenter switchover process. The purpose of Restore-DatabaseAvailabilityGroup is to read the DAG’s list of stopped servers and evict the listed servers from the DAG’s underlying cluster. The list of servers in this scenario typically includes all DAG members in the failed primary datacenter. This allows the DAG and the cluster to shrink, and because it now has fewer members, it requires fewer servers to maintain quorum and perform DAG operations.
Restore-DatabaseAvailabilityGroup:
1) Starts a surviving node in the second datacenter using /forceQuourm.
2) Forcibly evicts each server listed on the stopped servers list.
I have worked support cases where this eviction process fails with an exception. In these cases, restore-databaseAvailabilityGroup issued the eviction while the Cluster service was still initializing (even though service control manager reported the service as started). When the Cluster service is initializing it is unable to process eviction requests. As a result, the commands failed. For a few customers, the error is consistently reproducible necessitating the use of a workaround in order for restore-databaseAvailabiltyGroup to work.
Note: Customers upgrade to Exchange 2010 Service Pack 1 before following these instructions. These instructions will only work with Exchange 2010 SP1.
Prior to SP1, the Cluster service must be found in a stopped state in order to utilize restore-databaseAvailabilityGroup. After SP1, the Cluster service no longer needs to be in a stopped state in order to proceed.
The following error may be noted when running
restore-databaseAvailabilityGroup –site <DRSite>
WARNING: Server 'PrimarySiteServer' was marked as stopped in database availability
group 'DAG' but couldn't be removed from the cluster. Error: A server-side
database availability group administrative operation failed. Error: The
operation failed. CreateCluster errors may result from incorrectly configured
static addresses. Error: An error occurred while attempting a cluster
operation. Error: Cluster API
'"EvictClusterNodeEx(node.domain.com) failed with 0x46.
Error: The remote server has been paused or is in the process of being
started"' failed. [Server: DRSiteServer.domain.com]
WARNING: The operation wasn't successful because an error was encountered. You
may find more details in log file
"C:\ExchangeSetupLogs\DagTasks\dagtask_2010-09-02_14-54-39.766_restore-databaseavailabilitygroup.log".
The error 0x46 translates to
ERROR_SHARING_PAUSED winerror.h
# The remote server has been paused or is in the process of
# being started.
Upon further review, the Service Control Manager reported the Cluster service as started, and Failover Cluster Manager will connect to the cluster service. Despite the error message, the attempt to start the Cluster service by using /forceQuorum was successful.
So the solution is simply to re-run restore-databaseAvailabilityGroup and the stopped DAG members will be successfully evicted.
Comments
Anonymous
January 01, 2003
@SKING: Most likely you are hitting a pretty common timing issue. If you look at the properties of the cluster service, on the recovery actions tab, you'll see that the default action is restart. Everytime the cluster service attempts to start after a lost quorum condition it will eventually be killed (terminated). Upon every termination the restart interval increases. When the service has been terminated it visually looks like it is in a stopped state. Then, by the time you come around to running the command, service control manager has issued a restart on it...and the circle continues. The answer though is that the cluster service must be stopped on all nodes remaining prior to executing the restore-databaseavailabilitygroup. TIMMCMICAnonymous
January 01, 2003
@RLeonard...
No - a server is not considered down simply because it is not in cluster. There can be and have been scenarios where nodes in a remote location are down (but are still members)
TIMMCMICAnonymous
January 01, 2003
@Jack: Thanks. TIMMCMICAnonymous
January 01, 2003
The comment has been removedAnonymous
September 02, 2011
Great tip...I just hit this one on my SP1 RU5 DAG.....Anonymous
June 12, 2012
Has this still not been resolved in Exchange 2010 SP2? Or even Rollup 2? I can reproduce this exact problem in our production environment, albeit intermittently.Anonymous
November 14, 2012
Negative ghost rider. We are and have been experiencing the same as of last Friday 11/9/12. Re-running the command didn't do the trick. Here we are on SP2 with rollup 2, my how far we have come.Anonymous
April 22, 2013
Hi Tim, We are running Exchange 2010 SP2 rollup 5 v2 and having issues running our BCM datacentre failover testing. Can you confirm that we need to stop the cluster services in the secondary datacenter prior to running restore-databaseavailabilitygroup? We are noticing that sometimes these services automatically restart and sometimes they dont afterwards - not sure exacly if we need to run this command in our version of exchange and what should happen? Thanks,Anonymous
February 24, 2014
What if the other node is not up/available? Shouldn't it be considered in a 'stopped state'? Because my cluster log in c:ExchangeSerupLogsDagTasks is still showing that the down server is 'Up'. I'm on 2010 sp2. (this is in a SRM test environment)