Exchange Server: Fail Over and Dynamic Quorum for a Two Node DAG Cluster
Background
The available info mostly based on three or more nodes didn’t really seem to apply on a two-node DAG cluster and sometimes things keep falling apart.
The Test Environment:
Two - Exchange 2013 SP1: NodeA, NodeB
One FileShare Witness: WitA
Two DCs
AD Site:- Default-First-Site
Domain\Forest: Single
OS: Windows Server 2012 R2
Test Scenarios:
Witness Stays Active |
||||||
|
|
PAM\Active DBs |
|
|
|
|
Seq. |
NodeA |
NodeB |
Witness |
Quorum |
DAG |
Recovery |
1 |
Up |
Up |
Up |
Quorum |
Online |
- |
2 |
Down |
Up |
Up |
Quorum |
Online |
- |
3 |
Down |
Down |
Up |
No Quorum |
Offline |
- |
4a |
Down |
Up |
Up |
Quorum |
Online |
Auto |
4b |
Up |
Down |
Up |
No Quorum |
Offline |
Manual |
5 |
Up |
Up |
Up |
Quorum |
Online |
Auto |
PAM Stays Active |
||||||
|
|
PAM\Active DBs |
|
|
|
|
Seq. |
NodeA |
NodeB |
Witness |
Quorum |
DAG |
Recovery |
1 |
Up |
Up |
Up |
Quorum |
Online |
- |
2 |
Down |
Up |
Up |
Quorum |
Online |
- |
3 |
Down |
Up |
Down |
No Quorum |
Offline |
- |
4a |
Down |
Up |
Up |
Quorum |
Online |
Auto |
4b |
Up |
Up |
Down |
Quorum |
Online |
Auto |
5 |
Up |
Up |
Up |
Quorum |
Online |
Auto |
PAM Takeover |
||||||
Seq. |
NodeA |
NodeB |
Witness |
Quorum |
DAG |
Recovery |
1 |
Up |
Up(PAM) |
Up |
Quorum |
Online |
- |
2 |
Up(PAM) |
Down |
Up |
Quorum |
Online |
- |
3 |
Up(PAM) |
Down |
Down |
No Quorum |
Offline |
- |
4a |
Up(PAM) |
Down |
Up |
Quorum |
Online |
Auto |
4b |
Up(PAM) |
Up |
Down |
Quorum |
Online |
Auto |
5 |
Up |
Up |
Up |
Quorum |
Online |
Auto |
PAM Takeover-Down |
||||||
|
|
|
|
|
|
|
Seq. |
NodeA |
NodeB |
Witness |
Quorum |
DAG |
Recovery |
1 |
Up |
Up(PAM) |
Up |
Quorum |
Online |
- |
2 |
Up(PAM) |
Down |
Up |
Quorum |
Online |
- |
3 |
Up(PAM) |
Down |
Down |
No Quorum |
Offline |
- |
4 |
Down(PAM) |
Down |
Down |
No Quorum |
Offline |
- |
5a |
Up(PAM) |
Down |
Up |
Quorum |
Online |
Auto |
5b |
Up(PAM) |
Up |
Down |
Quorum |
Online |
Auto |
5c |
Down(PAM) |
Up |
Up |
No Quorum |
Offline |
Manual |
6 |
Up |
Up |
Up |
Quorum |
Online |
Auto |
Test Scenarios description:
- NodeA, NodeB, Witness UP and we have quorum
- NodeA Down, All DBs mounted on NodeB, Witness UP and we have quorum
-
Witness Down, we lose quorum and NodeB loses cluster and DAG, Databases dismounts
- NodeA comes up and with NodeB forms quorum, DBs online
-
- Witness comes up and with NodeA forms quorum, DBs online
–
NodeB down, Witness alone, we lose quorum and no DB\Mailbox Server in service.
-
- NodeB comes up, with witness forms quorum and DBs online
-
- NodeA comes up, with witness up, doesn’t form quorum, DBs still offline (Manual /forcequorum required)
-
- The last server comes up, checks with other votes and adds itself to quorum.
Point to be noted is if DAC mode is enabled and this is a cross-site scenario there are additional factors to be considered for activation, such as Boot time on the PAM and the Witness Servers etc. We will cover that some other time.
NOTE: We have not covered all scenarios here, as the list can go on.
Key Takeaways\Interesting Finds:
- If the PAM is the last man standing and goes down. Even if we have the required two votes, service is not restored automatically. This is to avoid a tie and as the server coming up now can have old data. It's up to the admin to take a call and force quorum with old data or wait till the PAM server (last man standing with the latest data) comes online.
- Dynamic Quorum doesn’t have much to play in this scenario when we have a witness and last two nodes online. You can refer to my earlier question and understanding on this point here.
Basically, when we have 3voters out of which one is a witness, the quorum freezes at 2votes, irrespective of nodes\witness going down. Also, dynamic weight for all three votes is fixed at 1. Even if servers are Down.
References
How does Dynamic Quorum work for a two-node DAG: