Hyper-V and Failover Cluster (Domain Requirements)
I`ve seen quite often questions about Hyper-V/Failover Clustering/Domain Requirements in the forums, and where to put your Domain Controller. I have originally posted this on my blog ( http://kristiannese.blogspot.com ) but would like to share it here as well.
If you plan for Failover Clustering in Windows Server 2008 R2, you also have to dive into Active Directory and install a Domain Controller. Why? And what if you plan to run your DC as a HA VM?
Why do you need a Domain?
Systems running Windows Server 2008 R2 Failover Cluster services must be members of a domain. This ensures a common authorization framework for services as they fail over from one node another. It also means that the clients accessing the services of the Failover Cluster can participate in this same authorization framework.
It is recommended that the cluster nodes be member servers and NOT domain controllers.
(The Active Directory are already ‘Highly Available’ in its design and does not need something like Failover Cluster to be HA).
When creating a cluster, the process also creates a Cluster Name Object for the cluster in Active Directory, so the account that creates the Cluster needs to be a Local Administrator on the nodes, and have permission to create objects (computer) in Active Directory.
Run your only DC as a HA VM?
No. Period.
I have to stress that if your entire cluster shuts down, you`re in serious trouble.
You might not be able to start the cluster service, VMs, and you are finished.
Since your VMs is placed on a shared storage and the access here is granted through your cluster, and your cluster won’t come online to play, you might call it a day.
But do not panic, you only need to place your VM outside your cluster. You can even run it as a VM in Hyper-V manager on one of your nodes, but do not make it HA, or place it on shared storage. Also make sure to configure the Auto-Start Action, so your DC boots up with the host.
It`s always best practice to have at least a second domain controller as well, so you are able to support the rest of your infrastructure that require Active Directory to function. It`s a good idea to place this on a dedicated machine, outside your virtual environment.
Example:
My lab:
- 2 identical nodes with the Hyper-V role enabled
- Both nodes are member servers of my domain
- Cluster configured with CSV
- ISCSI with MPIO enabled
- Quorum: File Share Witness
- One Domain Controller (as a VM on the cluster J )
I`ve simulated the following scenario:
- The entire cluster shuts down
- Both nodes comes online again
- Now what ?
(Ok, I have to admit, that I have cheated a bit so I could demonstrate the stage AFTER you are able to log on to your hosts. Since the DC was powered off, both nodes hade some troubles to login. And if you speculate what opportunities you have if you log on locally, well, here is the answer):
http://1.bp.blogspot.com/_GSTYPgZsHyQ/TOVEs9AqnMI/AAAAAAAAAC4/Exgecxb_4Hc/s320/cluster_fail.png
Anyhow, we are now logged in to both nodes, and the cluster service is in the state of ‘stopped’.
Let’s try to start it on both nodes:
http://3.bp.blogspot.com/_GSTYPgZsHyQ/TOVE8zjMv9I/AAAAAAAAAC8/kzqXxTSHZRU/s320/cluster_fail3.png
Ok ! So far, so good.
Now, let’s try to start up the Failover Cluster Manager Console:
http://2.bp.blogspot.com/_GSTYPgZsHyQ/TOVFNM4pcCI/AAAAAAAAADA/_cWH5oDc8iA/s320/cluster_fail4.png
http://1.bp.blogspot.com/_GSTYPgZsHyQ/TOVFZ_f9uFI/AAAAAAAAADE/p4IiG6SDS20/s320/cluster_fail5.png
The console shows us that it`s empty. No cluster to manage, so we have to try to add our cluster.
As the error message indicate, we have a DNS lookup problem. That makes sense, since the only DC is powered off.
If we run the cmd ‘cluster node’ on both hosts, you can see that they indicate that everything is fine as far they concern, but don’t know that the other node is ‘joining’ as well.
http://3.bp.blogspot.com/_GSTYPgZsHyQ/TOVFpfKG-2I/AAAAAAAAADI/bcrwhxo0dVQ/s320/cluster_fail2_cold.png
http://4.bp.blogspot.com/_GSTYPgZsHyQ/TOVFrZSGX3I/AAAAAAAAADM/SgsfP9zZoHo/s320/cluster_fail2_stone.png
(When you tell the Cluster Service to start in Windows 2008 R2 Failover Cluster, it just immediately starts. Then it sends out notifications to the other nods that it wants to join the Cluster. It is also calculating the number of ‘votes’ needed to achieve ‘quorum’. Since there is no DNS connection between the nodes in this example, both nodes will be in a ‘joining’ type mode. They just wait for each other. If both nodes in this example and the witness could come online, the cluster would achieve quorum and go on its way).
Ok, so we have a DNS issue.
Since I know the IP address of my cluster, nodes, and also the witness share, and know that the first thing the DNS client does, is that it checks the local hosts file (c:\windows\system32\drivers\etc\hosts), I will add the DNS names of the involved servers here.
http://2.bp.blogspot.com/_GSTYPgZsHyQ/TOVGQ3Rm7AI/AAAAAAAAADQ/7Yutrtg4NwI/s320/cluster_fail7.png
(COLD and STONE are nodes in the cluster, CLASH is the cluster name, and SCVMM has the File Share Witness)
Now, I`m able to ping the servers by name, and let`s try to run the cluster node command again:
http://1.bp.blogspot.com/_GSTYPgZsHyQ/TOVGuQLTyKI/AAAAAAAAADU/TEsWow0NMIM/s320/cluster_fail6.png
OK, looking good.
Let us try to add the cluster in Failover Cluster Manager again:
Are we saved ?
What happens if we try to bring one of our VM online ?
Nothing, you cant.
If we take a look at the event log on one of our nodes, it shows some important information right here:
http://4.bp.blogspot.com/_GSTYPgZsHyQ/TOVHmo9um5I/AAAAAAAAADc/qQhuqyQL9dI/s320/cluster_fail8.png
So, after all this struggle you are still unable to start you VMs. Moral ?
Please plan your cluster configuration, and where you want to place your Domain Controller. This would easily be solved if we had a Domain Controller outside the cluster. And since I already have that, I would like to show what happens after I boot this machine.
http://1.bp.blogspot.com/_GSTYPgZsHyQ/TOVH4CJ8-pI/AAAAAAAAADg/ni4wpCO_g1w/s320/cluster_fail9.png
This article was written by Kristian Nese ( http://kristiannese.blogspot.com )