Configure DAG in Exchange 2016
In Exchange 2010 and Windows 2008 R2 the quorum model when we create a DAG cluster required a detailed analysis and then it was little bit complicated.
The quorum was an unalterable value and is fixed. We have to choose either an even or an odd number which includes the node majority and and the file-share witness during the initial configuration of the DAG.
In this article we will have a look at the features in Windows server 2012 R2 and how Exchange 2016 DAG effectively works.
With Windows 2012 cluster we can create a DAG with no administrative access point.
It does not require an IP, CNO and does not create computer account in the AD. This makes easier to manage the DAG.
Dynamic Quorum:
Basically quorum is a component which cluster requires to operate.
This quorum prevents the split brain syndrome occurring on the application layer on the windows cluster and gives the cluster dependent service available all the time.
From windows server 2012 we have a new feature called dynamic quorum.
How effectively Dynamic Quorum works ?
This feature from Windows server 2012 will automatically calculate and reassign the votes based on the failure (Dynamic Weight,Node Weight) depending if there is an odd or even number of votes present for the cluster nodes and hence making the service available all the times.When a node rejoins the cluster, it regains its quorum vote automatically.
DAG requires a quorum in-order to mount the databases in case of a failure and provide the email services available all the times.
In any case Exchange DAG is not aware of these Quorum configuration . When we create a DAG in Exchange 2016 we are giving them multiple cluster properties. They pick only the properties which they require.
We can look at all the properties of cluster by running the below commands
Get-Cluster | fl *
Get-ClusterNode | fl *
Get-ClusterResource | fl *
Get-ClusterNetwork | fl *
Get-ClusterQuorum | fl*
We can look if the dynamic quorum is enabled by running the below command
Get-Cluster | fl dynamic*
https://exchangequery.files.wordpress.com/2016/05/dd1.png
The Dynamic weight and the node weight value can be seen by running the below command
Get-ClusterNode | ft name, dynamicweight, nodeweight, state -AutoSize
This will be the default value when a DAG is configured on the Windows server 2012
https://exchangequery.files.wordpress.com/2016/05/dd.png?w=680
This dynamic weight value is the value of the dynamic quorum which adjusts automatically when a sequential failure of node occurs and adjusts the quorum accordingly.
There is an option to manually edit the node weight value.If the NodeWeight value is adjusted manually by an admin then Dynamic Quorum will not reassign the vote to that particular node.
When we adjust this nodeweight value manually to zero the dynamic quorum will not assign the dynamic vote to that particular node and will keep them always in the exclusion.
There is another option called **lowestquorumproritynodeID **
The assignment of this property to a node in the DR node( if we have equal number of nodes in both the sites) in case of a failure , we can ensure that our main site stays up.
This can be used to force the start of the cluster on the site which has less number of votes than the one which is failed.
We can change the value by running the below command
(Get-Cluster).lowestquorumproritynodeID = nodename
This will help in scenarios ensuring that few set of servers continue running before the DR activates.
LastMan Standing server:
The dynamic quorum has an ability to survive and keep the service operational till the last node is available.
Below are the conditions of the lastman standing server
- This will work only if the cluster has already acheived its quorum.
- Will work only if the Cluster experience sequential shutdowns or failures
- Does not work with 2 node DAG setup and having FSW on any one of the 2 sites setup.
Cluster Network Thresholds:
There is an option to modify the cluster network thresholds if in case the network is not much reliable between the sites and its flips lightly.
We can run the below command
Get-Cluster | fl *subnet*
https://exchangequery.files.wordpress.com/2016/05/dd2.png
So we can modify the values **CrossSubnetDelay, CrossSubnetThreshold **to some nearer value. This cannot be used to mask the unhealthy network by increasing these values to a larger number which is not recommended.
Dynamic Witness:
Till Exchange 2010 we can decide to use the filesharewitness in the quorum only during the initial configuration.If the FSW is shutdown the cluster cannot bring the resource online.
From Windows server 2012 R2 this dynamic witness automatically adjusts them and assigns witness votes as and when required.
Windows Server 2012 R2 configures a witness for Exchange (file-share witness) in a way it will automatically assign the witness a vote, the witness dynamic vote, depending on if there is an odd or even number of votes present for the cluster nodes.
It can be any of the 2 possible options according to the DAG setup :
If we have even number of votes then the witness dynamic vote value is 1
For odd number of votes the witness dynamic vote value is always 0
If the server is in last made standing state the witness vote will be adjusted and removed.This will make the last server to sustain ,run and make the exchange service available.
We can check the witnessdynamicweight by running the below command
(Get-Cluster).WitnessDynamicWeight
https://exchangequery.files.wordpress.com/2016/05/cc.png
It creates the witness.log file as same. But it uses them only based on the above 2 possible options.
https://exchangequery.files.wordpress.com/2016/05/dde.png
filesharewitness settings are managed by set-databaseavailabilitygroup command
An example of how it will show for even number of nodes . We can check the quorum by running the below command
Get-ClusterQuorum | fl
https://exchangequery.files.wordpress.com/2016/05/dde1.png
DAG File Type :
Since Exchange 2016 disks are recommended to have them on REFS we need to change the filesystem type to REFS in the DAG configuration.
We can check the filesystem by running the below command
Get-DatabaseAvailabilityGroup | fl Name, Filesystem
https://exchangequery.files.wordpress.com/2016/05/c1.png
We can modify the filesystem by running the below command
Set-DatabaseAvailabilityGroup <DAG> -FileSystem ReFS
If we have mountpoints and then the disks provisioned for the databases,logs & index the mountpoints can be NTFS and the disks should be REFS.
Also we need to disable the ReFS Integrity check on all the database volumes which are having REFS file type. This is because to reduce the IOPS for intensive application like exchange which will not affect the cpu performance.
We need to check if the integrity is enabled by running the below command
Get-FileIntegrity 'J:\Databases\M1D1.edb'
https://exchangequery.files.wordpress.com/2016/05/c.png?w=680
We need to run the below command to disable the integrity
Set-FileIntegrity 'J:\Databases\M1D1.edb'
Note:
ReFS is not supported for OS volumes. So the Exchange binaries should be installed on NTFS format.
An example of Automatic failover configuration to cross site:
The below example will work as automatic failover for the database and database copies will be activated on corresponding site in case of any one of the site fails. Active/Active or Active passive will work.
https://exchangequery.files.wordpress.com/2016/05/test.png
If the configuration is Active/Passive Perform a connectivity check on both the sites from the network side all the times.Block the connections to the DR site on the firewall level all the time when the main site is active.
Schedule a script to get notified via email when the main site is down.
When the main site is down just unblock the connection to the DR site and block the connection to main site.
The DR site copies will be activated automatically without any issues since FSW is with Azure or in the 3rd site .
The new Dynamic Quorum DAG is capable of last man stand server and is much reliable.
Below example of Switch Over configuration:
If we do not have the Dynamic Witness configured on the third site or on the Azure a manual switch over process needs to be carried in-order to activate the DR site.
https://exchangequery.files.wordpress.com/2016/05/test1.png?w=680
For DR activation and if DAC mode is enabled the DR can be activated by the below commands same as Exchange 2013:
Stop-Service ClusSvc
Stop-DatabaseAvailabilityGroup
Restore-DatabaseAvailabilityGroup
To restore them back to the primary datacenter we can use the below commands:
Start-DatabaseAvailabilityGroup
Set-DatabaseAvailabilityGroup <DAGName> -WitnessServer <ServerName>
Very IMP notes:
- By default when a DAG is created in Exchange 2016 in Windows 2012 R2 it has the Dynamic Quorum and Dynamic Witness enabled. Microsoft recommends to keep this settings default as it is unless there is some requirement or tweaking required in the configuration. If we are not sure on these parameters better to consult Microsoft support and then modify these values.
- The dynamic quorum will only work if the cluster has already achieved its quorum.
- If the Node having “one” weight shuts down unexpectedly then the node having weight “zero” cannot form the cluster.
- If we set the LowerQuorumPriorityNodeID pointing to any specific node that node would always have the node weight as “Zero" and cannot form the cluster if its the only node available.
- Automatic failover to cross site will not work if you do not have the fileshare witness configured in a Azure directory or in a 3rd site. So if you need a automatic failover in case of a failure then a 3rd site or azure in place for the FSW is mandatory.
- If the File Share Witness is placed on any of the 2 sites then we need to perform a Switchover (Manual Intervention) to activate the DR site in case of a failure.
- If we have DAG configured in the environment and taking a backup the DAG supports only IPless backup. So make sure an IP less supported backup solution is chosen.
- Dynamic quorum will not work in third party replication DAG configuration.
- Don't have the Databases beyond 2TB which is a bad idea.
- No need to manage the cluster networks since its not having administrative access point .No more separate heartbeat network for the DAG cluster and all network is considered as a heartbeat.
Thanks & Regards
Sathish Veerapandian
MVP - Office Servers & Services