Exchange 2013: How does Dynamic Quorum work for a two Node DAG
Background
Two node DAG with a FS witness server. One of the node is 'down' (I have kept it like that), cluster has quorum and all services are online.
Trying to understand if a node's State=Down, isn't the Dynamic Quorum Group Manager supposed to trigger and set the DynamicWeight to '0' for that server.
If it’s not doing so, is this the way it is, or something is not quite right and I need to fix it?
Here is the link to the test which triggered this question. And yes we are talking about the Windows Server 2012 R2 feature on Exchange Server 2013.
The Test Environment
- Two - Exchange 2013 SP1: NodeA, NodeB
- One FileShare Witness : WitA
- Two DCs
- AD Site:- Default-First-Site
- Domain\Forest: Single
- OS: Windows Server 2012 R2
Troubleshooting info below:
PS C:\Windows\system32> Get-ClusterNode | ft name, dynamicweight, state, nodeweight,id -AutoSize
Name DynamicWeight State NodeWeight Id
---- ------------- ----- ---------- --
exch1 1 Down 1 1
exch2 1 Up 1 2
PS C:\Windows\system32> (Get-Cluster).WitnessDynamicWeight
1
PS C:\Windows\system32> Get-ClusterResource
Name State OwnerGroup ResourceType
---- ----- ---------- ------------
Cluster IP Address Online Cluster Group IP Address
Cluster Name Online Cluster Group Network Name
File Share Witness (\fs1... Online Cluster Group File Share Witness
Validation test: Quorum Configuration
Description: Validate that the current quorum configuration is optimal for the cluster.
Validating cluster quorum settings.
Witness Type: File Share Witness
Witness Resource: \fs1.contoso.com\dag1.contoso.com
Cluster managed voting: Enabled
Voter Name |
State |
Assigned Vote |
Current Vote |
File Share Witness (\\fs1.contoso.com\dag1.contoso.com) (\\fs1.contoso.com\dag1.contoso.com) |
Online |
1 |
1 |
exch1 |
Down |
1 |
1 |
exch2 |
Up |
1 |
1 |
This quorum model will be able to sustain failures of 1 node(s) if the file share witness remains available and 0 node(s) when the file share witness goes offline or fails.
This quorum configuration can be changed using the Configure Cluster Quorum wizard. This wizard can be started from the Failover Cluster Manager console by selecting the cluster name in the left hand pane, then in the right "actions" pane selecting "More Actions..." and then selecting "Configure Cluster Quorum Settings...".
When all servers were up
node/2+1 = 2/2+1=2 required for quorum and we have 3 votes
When 1 server gone 1/2+1=1 quorum should recalculate to this. But it’s still considering 3 votes out of 1down server+1up server+1witness.
Ideally I should be able to lose the witness too after some time and still maintain quorum (unlike what the validation test is saying).
Solution
Split brain syndrome is prevented by always requiring a majority of the DAG members (and in the case of DAGs with an even number of member, the DAG witness server) to be available and interacting for the DAG to be operational.
All DAGs with an even number of members must use a witness server.
Hence a 3 node cluster behaves differently than a 2 node. Exchange 2013 DAG kind of forces you to have a witness server always.
You can specify only a name for the DAG and leave the Witness server and Witness directory fields empty. In this scenario, the task will search for a Client Access server that doesn't have the Mailbox server role installed. It will automatically create the default witness directory and share on that Client Access server and configure the DAG to use that server as its witness server.
You can 'override the quorum configuration using Windows2012 Failover Cluster Manager', however using it to modify a DAG is not recommended.
Unless you’re doing a DC switchover and/or being assisted by Microsoft Support services (premier)
Now back to the point:
When we are left with 2 nodes and 1 witness server for Exchange HA. The Dynamic Quorum functionality kind of stops dealing with it. As 2nodes/2+1=2votes this means we need to have atleast 2 votes to have quorum.
So if we assume Dynamic Quorum triggers and removes 2 votes, 1 from Witness and 1 from nodeB.
Then the new formula we have is 1node/2+1=1vote which would mean this would allow us to lose both the witness and the nodeB. And nodeA will be the last man standing as in this article.
However having this scenario in a two node cluster brings in the split-brain problem. As if there is a full disconnect of nodeA site and nodeB+Witness can talk, they form quorum, nodeB mounts the database. Which is undesirable.
Conclusion
Hence Dynamic Quorum keeps the votes to 3 in a 2nodes+1witness scenario contrary to what is expected and in turn keeps everything running fine till we have 2votes available, just like 2010,Windows2008 days.
References
- TechNet Forum Post:
- All DAGs with an even number of members must use a witness server:
- Override the quorum configuration using Windows2012 Failover Cluster Manager:
- Exchange-2013-dag-dynamic-quorum-part2:
- Microsoft Exchange Server 2010: Clustering for high availability:
- Read more on the test which triggered this question: FailOver and Dynamic Quorum for a Two Node DAG Cluster?
Credits
This was originally posted here by inital author,