Configuring Virtual Machines to run in separate Resource Hosting Subsystem (RHS) Processes
I was recently working with a customer who was facing deadlock issues with their Windows Server 2008 R2 Hyper-V cluster. The RHS process would be terminated and a new process would spawn which meant virtual machines would be recycled. As a result of this my customer decided to isolate each virtual machine in it's own RHS process.
My task was to work out how to do this without causing downtime to the virtual machines and hosts. They wanted a simple way to set this up and also a way to find out which virtual machines were associated to which RHS process.
In this post I will discuss the steps I took to show and confirm the isolation of virtual machines in their own RHS process and how this could be achieved without any down time.
To understand what the RHS process is and how it is used I suggest you take a look at this article. It gives a nice explanation of the RHS process and how it fits into Failover Clustering. This article shows how you can quickly understand the cause of the RHS instability.
In order to proceed I configured my test lab as a three node cluster running a few virtual machines.
Step1. – Check VM is running in default RHS process.
The first step was to determine how to find out what process the VM was running in. For this example I was looking at a single VM (W2K8R2-Node1).
Below are extracts from Task Manager screen shots from all three cluster nodes to show how may RHS processes are running.
W2K8-CN1 – Cluster Node
W2K8-CN2 – Cluster Node
W2K8-CN3 – Cluster Node
The screenshot below confirms the VM and VM configuration have the default settings for the resource monitor process. The checkbox to run the resource in a separate Resource Monitor has not been checked.
This is as expected as by default there are only two RHS processes running. One RHS process is used to run cluster resources and the other is used for the storage.
Running the following PowerShell command will give me the list of processes the VM and VM configuration are running in.
#Get a list of VMs and correlate to RHS.exe process on a per VM basis PS C:\> Get-ClusterResource |?{$_.OwnerGroup -like "W2K8R2-Node1"} | ft Cluster, OwnerNode, OwnerGroup, SeparateMonitor, ResourceType, @{Label='RHS Process ID';Expression={$_.MonitorProcessID}} -AutoSize |
This is the output of the above command. From this I can see that both the VM and VM configuration are running on the RHS process with ID 2092 on host W2K8-CN2
Note: The VM and VM configuration are both listed. This means if we want to separate the VM entirely we also want to reconfigure the VM configuration to run in it’s own RHS process.
Step 2 – Change VM and VM configuration to run in their own process.
By running the following command I was able to change the VM and VM configuration to run in their own RHS process. I chose to do this via PowerShell as this would need to be done for a lot of virtual machines and the GUI would not be practical.
# Set VM to run in its own RHS process. PS C:\> Get-ClusterResource | ?{$_.OwnerGroup -like "W2K8R2-Node1"} | %{$_.SeparateMonitor='True'} |
Checking in the GUI I can see the relevant tick box has now been set
Running the following PowerShell command will give me the list of processes the VM and VM configuration are running in
#Get a list of VMs and correlate to RHS.exe process on a per VM basis PS C:\> Get-ClusterResource |?{$_.OwnerGroup -like "W2K8R2-Node1"} | ft Cluster, OwnerNode, OwnerGroup, SeparateMonitor, ResourceType, @{Label='RHS Process ID';Expression={$_.MonitorProcessID}} -AutoSize |
This is the output of the above command. From this I can see that both the VM and VM configuration are running on the RHS process with ID 2092 on host W2K8-CN2. Therefore setting the VM and VM configuration to run their own RHS process has no impact on the VM or VM configuration
Step 3 – Live migrate VM to spawn a new RHS process
In order to start the resource in a new RHS process the resource group needs to be moved to another node. Fortunately for a VM we can perform a Live Migration therefore the VM can start in a new RHS process without downtime.
Note: Both the VM and VM configuration should start in their own new process.
Running the following PowerShell command (once the VM has been live migrated) will give me the list of processes the VM and VM configuration are running in
#Get a list of VMs and correlate to RHS.exe process on a per VM basis PS C:\> Get-ClusterResource |?{$_.OwnerGroup -like "W2K8R2-Node1"} | ft Cluster, OwnerNode, OwnerGroup, SeparateMonitor, ResourceType, @{Label='RHS Process ID';Expression={$_.MonitorProcessID}} -AutoSize |
This is the output of the above command. I can see that both the VM and VM configuration are now running in their own processes.
I can also see four RHS processes on W2K8-CN1
Step 4: What happens to the RHS process after Live Migration?
We need to determine what happens to the RHS processes once the virtual machine has been live migrated to another node in the cluster.
The VM was live migrated to another host and this is what I saw on the new host. The processes are different as to what they were on the previous node.
The below shows there are four RHS processes on W2K8-CN3 which is as you would expect as the VM had been live migrated to that node. The previous host W2K8-CN1 still has four RHS processes. This implies that the live migration spawned new processes on the destination host but nothing removed the RHS processes from the original host.
RHS processes on new server (W2K8-CN3)
RHS processes on new server (W2K8-CN1)
I needed to determine what was running in those RHS process and the following command shows me just that.
#Get a list of VMs and correlate to RHS.exe process on a per cluster basis PS C:\> Get-ClusterResource | ft Cluster, OwnerNode, OwnerGroup, SeparateMonitor, ResourceType, @{Label='RHS Process ID';Expression={$_.MonitorProcessID}} -AutoSize |
This is the output. Although the two processes for VM and VM configuration on W2K8-CN1 are still up they are not running any resources (3160 & 3424).
Step 5: Live migrate the VM back to the previous host
Live migrate the virtual machine back to W2K8-CN1 which is where the first processes were spawned to isolate the VM and VM configuration
The clippings below show that both hosts still have four RHS processes running.
RHS processes on W2K8-CN1
RHS processes on W2K8-CN3
I ran the following command to check the RHS process for the VM and VM configuration .
#Get a list of VMs and correlate to RHS.exe process on a per VM basis PS C:\> Get-ClusterResource |?{$_.OwnerGroup -like "W2K8R2-Node1"} | ft Cluster, OwnerNode, OwnerGroup, SeparateMonitor, ResourceType, @{Label='RHS Process ID';Expression={$_.MonitorProcessID}} -AutoSize |
I can see that the process which were originally used are re-used for the VM and VM configuration .
Step 6: Remove the Isolation and see what happens.
The following command should remove the isolation but it does not work
# Set VM to run in in default RHS process. i.e. remove isolation PS C:\> Get-ClusterResource | ?{$_.OwnerGroup -like "W2K8R2-Node1"} | %{$_.SeparateMonitor='False'} |
Instead I had to use the cluster.exe command line to remove the isolation checkbox. This has to be done for both the VM and VM configuration
Remove resource isolation for VM
C:\> cluster res "Virtual Machine W2K8R2-Node1" /prop SeparateMonitor=0 |
Remove resource isolation for VM configuration
C:\> cluster res "Virtual Machine Configuration W2K8R2-Node1" /prop SeparateMonitor=0 |
We can now see the check box has been removed from the settings.
Step 7: Live migrate to see if processes are removed
Live migrate to another node and notice what happens to the RHS processes on the source server.
RHS Processes on original host after VM has been live migrated still remains at four.
RHS Process on W2K8-CN1
RHS Process on W2K8-CN3 - where the VM was live migrated to has changed to two.
Step 8: Live migrate back to original node
Live migrate the VM back to the original node. This should then only be using the default RHS process and the number of RHS processes running should drop from four to two.
RHS processes on W2K8-CN1
Note: We now have only two RHS processes
RHS processes on all hosts
W2K8-CN1
W2K8-CN2
W2K8-CN3
The following command checks what is running in each RHS process
#Get a list of VMs and correlate to RHS.exe process on a per cluster basis PS C:\> Get-ClusterResource | ft Cluster, OwnerNode, OwnerGroup, SeparateMonitor, ResourceType, @{Label='RHS Process ID';Expression={$_.MonitorProcessID}} -AutoSize |
This is the output. All VMs and VM configuration are running in the default RHS processes.
Conclusion
From the above testing I determined that the virtual machines can be isolated to their own RHS processes without causing any downtime. The VM can be live migrated from node to node and it will remain isolated from the default RHS process. The VM configuration also needs to be isolated.
There is still more testing to be done in terms of the RHS processes still running but not running any VM or VM configuration. I suspect this is by design of the cluster and an RHS process will only terminate if the thorough health check (IsAlive previously) fails.
This in itself posses some questions as there is nothing running inside the process therefore the health check will not run and therefore the process will not be terminated.
Scaling this up to a hundred virtual machines means in the cluster there will be 200 RHS processes running split over the cluster nodes. There could potentially be more as once a VM has been live migrated and spawns a new RHS process this is not terminated if the VM is live migrated again.
As you can see further testing needs to be done but this at least proves there is a way to achieve isolation without downtime. If running a lot of RHS processes does not impact performance then it may be feasible. Do these additional RHS processes cause an impact on the cluster itself especially with the health checks and the logging of information?
Depending on how my customer decides to proceed I will post an update with any further testing scenarios and results.
Aeval
Premier Field Engineer - Failover Clustering & Virtualisation
Comments
- Anonymous
January 01, 2003
thanks