Jaa


Everything you wanted to know about SR-IOV in Hyper-V Part 6

Summarising the series so far, we’ve answered “Why SR-IOV?”, looked at hardware and firmware dependencies, and run through the user interface and PowerShell cmdlets to configure SR-IOV. Next on the agenda is how SR-IOV and Live Migration co-operate. In addition, there’s a video showing covering the configuration aspects of SR-IOV from the previous part which also shows Live Migration in action.

A goal very early on in Windows “8” planning was that we should consider features which are incompatible with mobility scenarios such as Live Migration. (Actually SR-IOV has been in the pipeline for considerably longer than that as you can probably tell from the deck back at WinHEC 2008 I linked to previously, and even talked about at WinHEC 2006.)

That goal poses somewhat of a problem when hardware is assigned to a virtual machine. Let’s ignore SR-IOV for a moment and take a step back a few years into the initial development and prototyping of the feature. You may be familiar with the term “Discrete Device Assignment”. This is where a fully-fledged PCI Express device is assigned to a virtual machine. Discrete assignment, from a software engineering perspective can be viewed in some ways as a stepping stone towards SR-IOV support. However, discrete assignment is fraught with issues in several areas:

  • Security
  • Usability & Mobility
  • Scalability

From a security perspective, a virtual machine with significant uncontrolled access to hardware is too risky to entertain in a shipping product, while still providing a production support statement. It would be extremely difficult to secure the VM to the extent that it was unable to cause side effects to other partitions, thus breaking a critical security tenet.

From a scalability perspective, it is difficult to describe assigning a device taking an entire PCI Express slot into a single virtual machine as scalable. It would be very expensive to scale to tens or hundreds of VMs in this manner. If indeed, you could find a server with that many PCI Express slots.

From a usability perspective, it is difficult to describe this as a common user scenario. Granted, there are niche exceptions which do have some merit, but several things would be lost in the process. The most important of these is Live Migration support, shortly followed by state changes (apart from running or shutdown), then snapshots. Probably more as well. The reason is that is extremely difficult to save the state of a piece of arbitrary hardware inside a running VM, and subsequently restore it to a running state on a different platform. Further, even if we could temporarily halt the VM with hardware state intact (as would be necessary during Live Migration black-out), without an absolutely identical configuration on the target at all levels (not just hardware) the chances of successful restoration are non-existent.

Hence, we considered discrete assignment not very useful except to an extremely niche segment of our user base who aren’t concerned about security, scalability or mobility. Instead, we focused on something that addresses all of these concerns, namely SR-IOV. Security is built in at all levels. Only a single PCI Express slot is needed to support many virtual machines. And we support mobility (live and quick migration), state changes and snapshots.

So you may be wondering how SR-IOV overcomes the statement I made about it “being difficult to save the state of a piece of hardware inside a running VM, and subsequently restore it to a running state on a different platform”, yet still be able to achieve all these goals. After all, a VF is true hardware and it running in the VM.

As simple as it is, the answer probably isn’t immediately obvious. The answer is that we don’t save the hardware state at all, and don’t even attempt to tackle the problem. And yet we are able to migrate to a platform which could have a completely different physical NIC, the same type of NIC at a different firmware release level, or even a platform which doesn’t have SR-IOV support. And through all of these scenarios, keep networking fully functional in the VM. Confused yet?

One more minor backtrack. You may have noticed I’ve said a couple of times that a VF “backs” a software based network adapter. By this I mean that the VM always has a software based network adapter, but when a VF is available, we “failover” automatically to the hardware path for I/O. The software path is always present, only the VF is transient. So now it should be a little more obvious. Whenever we go through a state transition that would require hardware state to be saved, we remove any VFs from the VM beforehand, falling back to software based networking. (I say VFs in plural as a VM can have multiple software network adapters, up to eight, hence up to eight VFs assigned.) Once the VF is removed, we can perform any operation necessary on the VM as it is a complete software based container at that point. Once the operation has been completed, assuming hardware resources are available and other dependencies met, we will give a VF back to the VM. This completely solves the problem.

Those paying attention may have noticed I said that “we remove any VFs from the VM beforehand” and are wondering whether there are implications if the guest operating system isn’t co-operative. The short answer is no, our state model covers this scenario, although it is certainly easier when the guest operating system co-operates in VF deallocation.

Note also I used the word “failover” in quotation marks. I could have said “team”, but that implies a little more functionality than “failover”. Truthfully, we haven’t come up with a good term yet, but the point is that this “failover” is nothing to do with NIC teaming, also now native in Windows Server “8”. It simply means that we use a VF automatically if it is present, or software based networking if it is not, and during the transition either way, there will be no loss of network packets. As we will see shortly, NIC teaming and SR-IOV can co-exist as well in a virtual machine.

At this point, the adage about a picture speaking a thousand is apt. Rather than attempt a series of badly drawn block diagrams, here’s a video showing SR-IOV configuration and Live Migration. There are so many other new features in Hyper-V also shown in this video which aren’t immediately obvious, I’m struggling to contain myself, but do manage to limit myself at least to talking just about SR-IOV in the recording. Maybe you will notice the use of an SMB file share for the VM, the use of VHDX, Live Migration without a cluster, and of course PowerShell support.

 

 

Windows Server “8” Beta. Demonstration of SR-IOV in Hyper-V and Live Migration

More to follow in the next part.

Cheers,
John.

-

Comments

  • Anonymous
    October 13, 2012
    I don't see the picture, only a 'x'.

  • Anonymous
    November 20, 2013
    Is there any way to check the communication between PF and VF, also will we able to see the VF in the device manager or network properties of Guest OS. Thanks,

  • Anonymous
    November 20, 2013
    Check in what way - the (predominantly VF to PF rather than VF to PF BTW) is for control and up to the NIC vendor - they can do communications in two manners. The non-recommended is through a "doorbell" hardware mechanism. We don't recommend that as we (Hyper-V) in our logo tests cannot fuzz that interface. Fortunately it is little used. The second is through the use of VMBus using published NDIS APIs. The VMBus channel is present, there's really nothing to check. So the shorter answer is no, not really. It's vendor specific. What was the purpose of the question though? For your second question, yes, you can see the VF in device manager. It will not show up in the network control panel though, just the software network adapter will show. Examples are scattered in a few screenshots over this post series.

  • Anonymous
    November 23, 2013
    Hi John, Thanks for your response. Purpose of the question is, how we are going to ensure that communication works as expected after configuring SR-IOV.  Thing is we are bit confused on way it establish the communication, as it bypasses the virtual switch, also we are configuring it for first time. As you said I am able to see the VF in device manager, right now the issue is, we are not getting IP for the VF, when we use Get -NetAdapter command, it is listing two interfaces,

  1. Broadcom BCM57810 NetXtreme II
  2. Microsoft Hyper-V Network Adapter. If we take a look at the MAC address that is assigned to the interfaces, both are same. I am not sure how same mac address is assigned. I have cross checked, SR-IOV status says OK(SR-IOV active). drivers are installed properly. Could you help us figuring this out and how to ensure that we get IP address for the VF. Do we need to assign static IP for this? Thanks,
  • Anonymous
    November 23, 2013
    Arun - you have it working just fine. Think of it as the VF "backing" the software network adapter, specifically for the data path - that's why they correctly have the same MAC, both act as the same network end point. Just when a VF is assigned, the data path switches seamlessly to the VF. The only thing you configure in terms of IP address is the software network adapter, the only adapter you'll see in the network control panel. That IP will be used for the data path whether it's using indirect I/O or when backed by the VF.

  • Anonymous
    November 23, 2013
    John, Thanks for your help in clearing our confusion. Now able to understand the concept with much more clarity. As you already said, there is no way to trace or monitor the SR-IOV communication after configuring it. Is it possible to use any other third party tools to monitor this. The reason behind this, as we did this for first time, we need to ensure that we are in a position to explain the concept to the team with appropriate information. Please correct if my understanding is wrong, During communication between Host and Virtual Machines, It bypasses the Virtual switch. So the mode of data path used here would be based on IP address of the software network adapter. When VF is assigned, traffic flow to the VM's will be like PF to VF communication, it is similar to host to host communication based on IP.  When we are able to communicate to Internal, external network without any issues SR-IOV is configured. Also could you please confirm whether we can do this server 2008 R2 on the similar hardware, Gen 8 Hp proliant. Host and Guest - Server 2008 R2 Thanks,

  • Anonymous
    November 23, 2013
    John, Thanks for your help in clearing our confusion. Now able to understand the concept with much more clarity. As you already said, there is no way to trace or monitor the SR-IOV communication after configuring it. Is it possible to use any other third party tools to monitor this. The reason behind this, as we did this for first time, we need to ensure that we are in a position to explain the concept to the team with appropriate information. Please correct if my understand is wrong, During communication between Host and Virtual Machines, It bypasses the Virtual switch. So the mode of data path used here would be based on IP address of the software network adapter. When VF is assigned, traffic flow to the VM's will be like PF to VF communication, it is similar to host to host communication based on IP.  When we are able to communicate to Internal, external network without any issues SR-IOV is configured. Also could you please confirm whether we can do this server 2008 R2 on the similar hardware, Gen 8 Hp proliant. Host and Guest - Server 2008 R2 Thanks,

  • Anonymous
    November 24, 2013
    John, Thanks for your help in clearing our confusion. Now able to understand the concept with much more clarity. As you already said, there is no way to trace or monitor the SR-IOV communication after configuring it. Is it possible to use any other third party tools to monitor this. The reason behind this, as we did this for first time, we need to ensure that we are in a position to explain the concept to the team with appropriate information. Please correct if my understand is wrong, During communication between Host and Virtual Machines, It bypasses the Virtual switch. So the mode of data path used here would be based on IP address of the software network adapter. When VF is assigned, traffic flow to the VM's will be like PF to VF communication, it is similar to host to host communication based on IP.  When we are able to communicate to Internal, external network without any issues SR-IOV is configured. Also could you please confirm whether we can do this server 2008 R2 on the similar hardware, Gen 8 Hp proliant. Host and Guest - Server 2008 R2

  • Anonymous
    November 24, 2013
    Hi John, My comments are not getting published here, I have tried multiple times.

  • Anonymous
    November 25, 2013
    Comments are moderated - it depends when I have time to go through them. I go back to why you want to monitor it? The fact that the VF is assigned indicates that the device is present and communicating correctly. Go back to the diagram at the top of part 5 in this series which shows what flows where. As I put before and you can see in that diagram, the I/O data path is between the VF and the hardware NIC/wire, bypassing the virtual switch. The data flow is NOT PF to VF. It's wire to VF. IOV is only supported on windows 8 x64/Server 2012 and later guest operating systems. The parent must be Server 2012 or later. Most if not all G8s will support SR-IOV with up to date firmware and appropriate NICs. However, that is a question for HP.

  • Anonymous
    August 03, 2014
    Hi John,

    Thanks for the great explanation, I am just wondering how network policies can be kept consistent between VF path and software path? and also how NVGRE overlay network is still working during VF <-> software path transition?

    Thanks,
    Jun