CROSS POST: How Shared VHDX Works on Server 2012 R2

In the not far back point in time, there was a blog done by Matthew Walker that we felt needed to also be on the AskCore site as well due to the nature and the popularity of the article.  So we are going to cross post it here.  Please keep in mind that the latest changes/updates will be in the original blog post.

CROSS POST: How Shared VHDX Works on Server 2012 R2
https://blogs.technet.com/b/askpfeplat/archive/2015/06/01/how-shared-vhdx-works-on-server-2012-r2.aspx

Hi, Matthew Walker here, I’m a Premier Field Engineer here at Microsoft specializing in Hyper-V and Failover Clustering. In this blog I wanted to address creating clusters of VMs using Microsoft Hyper-V with a focus on Shared VHDX files.

From the advent of Hyper-V we have supported creating clusters of VMs, however the means of adding in shared storage has changed. In Windows 2008/R2 we only supported using iSCSI for shared volumes, with Windows Server 2012 we added the capability to use virtual fibre channel, and SMB file shares depending on the workload, and finally in Windows Server 2012 R2 we added in shared VHDX files.

Shared Storage for Clustered VMs:

Windows Version

2008/R2

2012

2012 R2

ISCSI

Yes

Yes

Yes

Virtual Fibre Channel

No

Yes

Yes

SMB File Share

No

Yes

Yes

Shared VHDX

No

No

Yes

So this provides a great deal of flexibility when creating clusters that require shared storage with VMs. Not all clustered applications or services require shared storage so you should review the requirements of your app to see. Clusters that might require shared storage would be file server clusters, traditional clustered SQL instances, or Distributed Transaction Coordinator (MSDTC) instances. Now to decide which option to use. These solutions all work with live migration, but not with items like VM checkpoints, host based backups or VM replication, so pretty even there. If there is an existing infrastructure with iSCSI or FC SAN, then one of those two may make more sense as it works well with the existing processes for allocating storage to servers. SMB file shares work well but only for a few workloads as the application has to support data residing on a UNC path. This brings us to Shared VHDX.

Available Options:

Hyper-V Capability

Shared VHDX used

ISCSI Drives

Virtual Fibre Channel Drives

SMB Shares used in VM

Non-Shared VHD/X used

Host based backups

No

No

No

No

Yes

Snapshots/Checkpoints

No

No

No

No

Yes

VM Replication

No

No

No

No

Yes

Live Migration

Yes

Yes

Yes

Yes

Yes

Shared VHDX files are attached to the VMs via a virtual SCSI controller so show up in the OS as a shared SAS drive and can be shared with multiple VMs so you aren’t restricted to a two node cluster. There are some prerequisites to using them however.

Requirements for Shared VHDX:

2012 R2 Hyper-V hosts
Shared VHDX files must reside on Cluster Shared Volumes (CSV)
SMB 3.02

It may be possible to host a shared VHDX on a vender NAS if that appliance supports SMB 3.02 as defined in Windows Server 2012 R2, just because a NAS supports SMB 3.0 is not sufficient, check with the vendor to ensure they support the shared VHDX components and that you have the correct firmware revision to enable that capability. Information on the different versions of SMB and capabilities is documented in a blog by Jose Barreto that can be found here.

Adding Shared VHDX files to a VM is relatively easy, through the settings of the VM you simply have to select the check box under advanced features for the VHDX as below.

image

For SCVMM you have to deploy it as a service template and select to share the VHDX across the tier for that service template.

image

And of course you can use PowerShell to create and share the VHDX between VMs.

PS C:\> New-VHD -Path C:\ClusterStorage\Volume1\Shared.VHDX -Fixed -SizeBytes 30GB

PS C:\> Add-VMHardDiskDrive -VMName Node1 -Path C:\ClusterStorage\Volume1\Shared.VHDX -ShareVirtualDisk

PS C:\> Add-VMHardDiskDrive -VMName Node2 -Path C:\ClusterStorage\Volume1\Shared.VHDX -ShareVirtualDisk

Pretty easy right?

At this point you can setup the disks as normal in the VM and add them to your cluster, and install whatever application is to be clustered in your VMs and if you need to you can add additional nodes to scale out your cluster.

Now that things are all setup let’s look at the underlying architecture to see how we can get the best performance from our setup. Before we can get into the shared VHDX scenarios first we need to take a brief stint on how CSV works in general. If you want a more detailed explanation please refer to Vladimir Petter’s excellent blogs starting with this one.

image

This is a simplified diagram of the way we handle data flow for CSV, the main points here are to realize that access to the shared storage in this clustered environment is handled through the Cluster Shared Volume File System (CSVFS) filter driver and supporting components, this system handles how we access the underlying storage. Because CSV is a clustered file system we need to have this orchestration of file access. When possible I/O travels a direct path to the storage, but if that is not possible then we will redirect over the network to a coordinator node. The coordinator node shows up in the Failover Cluster manager as the owner for the CSV.

With Shared VHDX we also have to have orchestration of shared file access, to achieve this with Shared VHDX all I/O requests are centralized and funneled through the coordinator node for that CSV. This results in I/O from VMs on hosts other than the coordinator node being redirected to the coordinator. This is different from a traditional VHD or VHDX file that is not shared.

First let’s look at this from the perspective of a Hyper-V compute cluster using a Scale-Out File Server as our storage. For the following examples I have simplified things by bringing it down to two nodes and added in a nice big red line to show the data path from the VM that currently owns our clustered workload. For my example I making some assumptions, one is that the workload being clustered is configured in an Active/Passive configuration with a single shared VHDX file and we are only concerned with the data flow to that single file from one node or the other. For simplicity I have called the VMs Active and Passive just to indicate which one owns the Shared VHDX in the clustered VMs and is transferring I/O to the storage where the shared VHDX resides.

image

So we have Node 1 in our Hyper-V cluster accessing the Shared VHDX over SMB and connects to the coordinator node of the Scale-Out File Server cluster (SOFS), now let’s move the active workload.

image

So even when we move the active workload SMB and the CSVFS drivers will connect to the coordinator node in the SOFS cluster, so in this configuration our performance is going to be consistent. Ideally you should have high speed connects between your SOFS nodes and on the network connections used by the Hyper-V compute nodes to access the shares. 10 Gb NICs or even RDMA NICs. Some examples of RDMA NICs are Infiniband, iWarp and RDMA over Converged Ethernet (RoCE) NICs.

Now as we change things up a bit, we will move the compute onto the same servers that are hosting the storage

image

As you can see the access to the VHDX is sent through the CSVFS and SMB drivers to access the storage, and everything works like we expect as long as the active VM of the clustered VMs is on the same node as the coordination node of the underlying CSV, so now let’s look at how the data flows when the active VM is on a different node.

image

Here things take a different path than we might expect, since SMB and CSVFS are an integral part of ensuring proper orchestrated access to the Shared VHDX we send the data across the interconnects between the cluster nodes rather than straight down to storage, this can have a significant impact on your performance depending on how you have scaled your connections.

If the direct access to storage is a 4Gb fibre connect and the interconnect between nodes is a 1Gb connection there is going to be a serious difference in performance when the active workload is not on the same node that owns the CSV. This is exacerbated when we have 8Gb or 10Gb bandwidth to storage and the interconnects between nodes is only 1Gb. To help mitigate this behavior make sure to scale up your cluster interconnects to match using options such as 10 Gb NICs, SMB Multi-channel and/or RDMA capable devices that will improve your bandwidth between the nodes.

One final set of examples to address concerns about scenarios where you may have an application active on multiple clustered VMs that are accessing the same Shared VHDX file. First let’s go back to the separate compute and storage nodes.

image

And now to show how it goes with everything all together in the same servers.

image

So we can even implement a scale out file server or other multi-access scenarios using clustered VMs.

So the big takeaway here is more about understanding the architecture to know when you will see certain types of performance, and how to set proper expectations based on where and how we access the final storage repository for the shared VHDX. By moving some of the responsibility for handling access to the VHDX to SMB and CSVFS we get a more flexible architecture and more options, but without proper planning and an understanding of how it works there can be some significant differences in performance based on what type of separation there is between the compute side and the storage side. For the best performance ensure you have high speed and high bandwidth interconnects from the running VM all the way to the final storage by using 10 Gb or RDMA NICs, and try to take advantage of SMB Multi-Channel.

--- Matthew Walker

Comments

  • Anonymous
    September 28, 2015
    In the not far back point in time, there was a blog done by Matthew Walker that we felt needed to also