Exchange 2010: Cluster core resources, the replication service, and active manager…
Every Exchange 2010 server has a process internal to the replication service known as Active Manager. The Active Manager is responsible for all database mount, dismount, and move operations that occur in Exchange 2010.
When a server is a standalone server, Active Manager is configured as a Standalone Active Manager.
When a server is a member of a Database Availability Group (DAG), Active Manager is either configured as:
- PAM – Primary Active Manager
- SAM – Secondary Active Manager
The Active Manager status in a DAG is determined by the node that owns the cluster core resources. If a node owns the cluster core resources group, this node is then known as the Primary Active Manager (PAM). All other nodes successfully participating in the cluster and not owning the cluster core resources are Secondary Active Managers.
Let’s take a look at an example database availability group.
DAGName: DAG
DagMembers: DAG-1,DAG-2,DAG-3,DAG-4
Running get-databaseavailabilitygroup –identity DAG –status | fl name,primaryActiveManager you can determine which machine currently owns the cluster core resources and is acting as the PAM.
Get-DatabaseAvailabilityGroup -Identity DAG -Status | fl name,primaryactivemanager
Name : DAG
PrimaryActiveManager : DAG-3
Using cluster.exe we can also confirm the owner of the cluster core resources group
cluster.exe DAG.domain.com group
Group Node Status
-------------------- --------------- ------
cluster group DAG-3 Online
Using the cluster command line, the cluster core resources can be moved to another DAG member and the PAM will subsequently change.
cluster.exe DAG.domain.com group "cluster group" /moveto:DAG-4
Moving resource group 'cluster group'...
Group Node Status
-------------------- --------------- ------
cluster group DAG-4 Online
Get-DatabaseAvailabilityGroup -Identity DAG -Status | fl name,primaryactivemanager
Name : DAG
PrimaryActiveManager : DAG-4
Remember that Active Manager runs inside the Microsoft Exchange Replication service which is installed on every Exchange 2010 Mailbox Role Server. This is important – if the replication service on a DAG member is not started, but that DAG member owns the cluster core resources, database mount / dismount / move functionality will not function.
Here is an example…
Currently the cluster core resources are owned on the node DAG-4 which is successfully participating in the cluster DAG. Using the services control panel the Microsoft Exchange Replication service on the server DAG-4 was stopped. We can confirm using the commands above that DAG-4 is still seen as the PAM.
Get-DatabaseAvailabilityGroup -Identity DAG -Status | fl name,primaryactivemanager
Name : DAG
PrimaryActiveManager : DAG-4
cluster dag.domain.com group
Listing status for all available resource groups:
Group Node Status
-------------------- --------------- ------
Cluster Group DAG-4 Online
Available Storage DAG-1 Offline
Using test-replicationHealth and test-serviceHealth we can see that the replication service on node DAG-4 is unavailable.
Server Check Result Error
------ ----- ------ -----
DAG-4 ClusterService Passed
DAG-4 ReplayService *FAILED* The Microsoft Exchange Replication service is not running on s...
DAG-4 DagMembersUp Passed
Role : Mailbox Server Role
RequiredServicesRunning : False
ServicesRunning : {IISAdmin, MSExchangeADTopology, MSExchangeIS, MSExchangeMailboxAssistants, MSExchangeMailSubmission, MSExchangeRPC, MSExchangeSA, MSExchangeSearch, MSExchangeServiceHost, MSExchangeThrottling, MSExchangeTransportLogSearch, W3Svc, WinRM}
ServicesNotRunning : {MSExchangeRepl}
At this time a dismount operation on a database was issuing using the dismount-database command. An error is immediately returned:
Dismount-Database DAG-DB0
Confirm
Are you sure you want to perform this action?
Dismounting database "DAG-DB0". This may result in reduced availability for mailboxes in the database.
[Y] Yes [A] Yes to All [N] No [L] No to All [?] Help (default is "Y"): y
Couldn't dismount the database that you specified. Specified database: DAG-DB0; Error code: An Active Manager operation
failed. Error: The Microsoft Exchange Replication service may not be running on server DAG-4.domain.com. Specific RPC error message: Error 0x6d9 (There are no more endpoints available from the endpoint mapper) from cli_MountDatabase.
+ CategoryInfo : InvalidOperation: (DAG-DB0:ADObjectId) [Dismount-Database], InvalidOperationException
+ FullyQualifiedErrorId : D64CA7E2,Microsoft.Exchange.Management.SystemConfigurationTasks.DismountDatabase
This error is the occurs because the server that is designated as the Primary Active Manager does not have it’s replication service running (and therefore the Active Manager is not running). Stopping the replication service does not automatically arbitrate Active Manager functions to another DAG member.
To fix this error:
- Start the replication service on the machine that is designated as the Primary Active Manager (preferred).
- Move the cluster core resources to another DAG member (promoting that server to the Primary Active Manager. (Least preferred since it does not address why the replication service is stopped on a running DAG member).
It is important that the replication service be monitored on all DAG members to ensure it remains functional.
*Updated – 5/30/2010 – Corrected the commandlet for testing services –> test-serviceHealth instead of test-serverHealth.
*Updated – 6/22/2011 – Corrected table formatting of output.
Comments
Anonymous
January 01, 2003
@Turbomcp Article updated. TIMMCMICAnonymous
January 01, 2003
@JFM When the node that owns the cluster core resources fails, the cluster service automatically arbitrates them over to another node thereby promoting the node to be the PAM. TIMMCMICAnonymous
January 01, 2003
@Justin: Apologize for the delay in responding. When it comes to the PAM we are actually talking about the group in cluster called the "Cluster Group". By default when you reboot the node that owns the cluster group cluster moves it to another node automatically. Should you want to move the group prior you can through two methods: Windows 2008: Cluster DAGNAME.fqdn group "Cluster Group" /moveto:NODENAME Windows 2008 / Windows 2008 R2: Open PowerShell Import-Module FailoverClusters Move-ClusterGroup -name "Cluster Group" -node NODENAME -cluster DAGNAME TIMMCMICAnonymous
January 01, 2003
@Greg: It sounds like you do not have automountconsensus and possibly have DAC enabled. See my blog series on DAC if that's the case and if not post back. TIMMCMICAnonymous
January 01, 2003
@Sureshbabu...
Simply put quorum is V/2+1 (where V is the number of votes in a cluster). If you do not have the correct number of votes immediately available, then you do not have quorum.
TIMMCMICAnonymous
January 01, 2003
@ Hi all.... It appears your comment did not get completely posted. Let me know how i can assist. TIMMCMICAnonymous
January 01, 2003
@LMK:
The PAM determination when DAC is enabled is a combination of two things.
1) Does the cluster have quorum?
2) Are the rules of DAC met?
In the case you describe, when the first site comes back up the cluster has quorum. Fortunately the rules of DAC were not met - which means that no PAM can be promoted.
All servers in the primary site enter an unknown state for active manager.
TIMMCMICAnonymous
January 01, 2003
@Monika: No problem. TIMMCMICAnonymous
January 01, 2003
@Pankaj... Thanks TIMMCMICAnonymous
January 01, 2003
@Mosh: No - no downtime is required when arbitrating the PAM between nodes. TIMMCMICAnonymous
January 01, 2003
@Erik Bo: Great question. So essentially active manager that runs within the replication service controls a lot of stuff whether a DAG is involved or standalone. Active manager will on a standalone server control; Database mount Databaes dismount Active manager on a DAG will control: Database mount Database dismount Database autodismount Database move Essentially when you issue a mount request the request is sent to active manager, active manager checks certain things and then issues the request to the IS - this is an example from a standalone server. Hope that helps. TIMMCMICAnonymous
January 01, 2003
I have a question, my DAG sometimes switches over the PAM to DR site! this is really strange behavior, it should move the PAM Role to any dag node in the same site, right?? then I have to move the PAM manually by the command! could you please advice?Anonymous
January 01, 2003
If you have 2 sites and one site goes dark (including the Domain Controllers and DAG servers are down) and therefore the DAG loses quorum - I assume since the DAG has to be reestablished in the remaining site, that the PAM will be reassigned as appropriate to a remain DAG member. Furthermore, on the switchback, if one of the failed DAG members had the PAM role, the code is smart enough to detect that their is a new/existing PAM.
Does this make sense?Anonymous
January 01, 2003
@JFM... There are very few reasons that are legitimate for worrying about the owner of the cluster core resources (PAM) and this is not one of them. Whenever the PAM role changes between servers the PAM reviews the mount status of each database to ensure that no move actions were in process and that all is well across the DAG. In this instance the PAM would detect that the databases were / are owned on a node that is no longer valid (since the cluster service is non-functional) and would begin the best copy / move process to another node. If it was required to worry about where the PAM was owned in this specific instance you could see how a single point of failure would be introduced - which would not be good. TIMMCMICAnonymous
May 27, 2010
Hi great article as always:) just small typo in test-serverHealth should be Test-ServiceHealth here: Using test-replicationHealth and test-serverHealth we can see that the replication service on node DAG-4 is unavailable. Thanks again for all your efforts bringing interesting stuff every week/dayAnonymous
May 27, 2010
Hi great article as always:) just small typo in test-serverHealth should be Test-ServiceHealth here: Using test-replicationHealth and test-serverHealth we can see that the replication service on node DAG-4 is unavailable. Thanks again for all your efforts bringing interesting stuff every week/dayAnonymous
September 19, 2010
Thanks for posting about Active Manager.Anonymous
December 05, 2010
Yeah, Great article - thanks! Just 1 question: What business does the Replication Service (and the Standalone Active Manager) undertake in a standalone Exchange Server configuration? Kind regardsAnonymous
July 03, 2012
Good Morning, I have one issue.Anonymous
December 07, 2012
Awesome article!! One quick question, I'm new to administering Exchange and my PAM is currently on a server that i'd like to reboot. Is it safe to move the PAM role to the other server during production and not experience any sort of outage. Thanks, JustinAnonymous
February 18, 2013
If the PAM fails, is there a way to force one of the SAM members to become the PAM? In case the PAM physically fails without any way to put it back in production fast enough. Thanks!Anonymous
February 18, 2013
@TIMMMCMIC I currently have a 2 members DAG in production with 1 mailbox database. What if the PAM is also hosting the active database? Would the cluster service be able to move PAM to the second member and then move the active database to it? Or maybe I should always make sure that the PAM is my second MBX server with the database copy. Thank you, JFMAnonymous
July 09, 2013
I had checked its really good article for us...:)Anonymous
September 17, 2013
Does downtime require while moving PAM manually from one node to another.Anonymous
September 23, 2013
I am working on a DR test. I have brought up one virtual Exchange Server and one virtual domain controller on an offline VM Host. I moved the cluster resources to the Exchange server but the PAM does not seem to move with it. When I run anything the PAM is involved in, it tries to retrieve the PAM from a different Exchange server that I don't intend on restoring.Anonymous
September 23, 2013
Let me add that when i check the "Cluster Group" resources, they show online and owned by the Exchange server I am restoring to...Anonymous
June 18, 2014
When the PAM goes offline, another server assumes the role of PAM. Now how are the active databases residing on the prior PAM failover, who takes the failover of those databases.
Thanks
HardikAnonymous
October 14, 2014
Nice article. could you please explain in detail of quorum