Jaa


Understanding High-End Video Performance Issues with Hyper-V

A while ago I wrote a relatively short blog post high-lighting the fact that there are performance issues with Hyper-V when used with a high-end graphics adapter.  Since then I have been inundated with people asking questions and trying to get their heads around this issue.  Today I would like to take a chance to drill in on this:

What is the cause of the problem?

Okay – let’s grab the pertinent text from the original KB article:

This issue occurs when a device driver or other kernel mode component makes frequent memory allocations by using the PAGE_WRITECOMBINE protection flag set while the hypervisor is running. When the kernel memory manager allocates memory by using the WRITECOMBINE attribute, the kernel memory manager must flush the Translation Lookaside Buffer (TLB) and the cache for the specific page. However, when the Hyper-V role is enabled, the TLB is virtualized by the hypervisor. Therefore, every TLB flush sends an intercept into the hypervisor. This intercept instructs the hypervisor to flush the virtual TLB. This is an expensive operation that introduces a fixed overhead cost to virtualization. Usually, this is an infrequent event in supported virtualization scenarios. However, some video graphics drivers may cause this operation to occur very frequently during certain operations. This significantly magnifies the overhead in the hypervisor.

Usually when I talk to people about this – their eyes start to gloss over – so let’s dig in a little here.  With the help of Wikipedia we can get some definitions here:

  • Write-combining (https://en.wikipedia.org/wiki/Write-combining):

    Write combining (WC) is a computer bus technique for allowing data to be combined and temporarily stored in a buffer -- the write combine buffer (WCB) -- to be released together later in burst mode instead of writing (immediately) as single bits or small chunks.

    Write combining cannot be used for general memory access (data or code regions) due to the 'weak ordering'. Write-combining does not guarantee that the combination of writes and reads is done in the correct order. For example, a Write/Read/Write combination to a specific address would lead to the write combining order of Read/Write/Write which can lead to obtaining wrong values with the first read (which potentially relies on the write before).

    In order to avoid the problem of read/write order described above, the write buffer can be treated as a fully-associative cache and added into the memory hierarchy of the device in which it is implemented. Adding complexity slows down the memory hierarchy so this technique is often only used for memory which does not need 'strong ordering' (always correct) like the frame buffers of video cards.

    In summary, write-combining is a method of accessing memory that is typically only used by video cards.

  • Translation Lookaside Buffer (TLB) (https://en.wikipedia.org/wiki/Translation_Lookaside_Buffer)

    A Translation lookaside buffer (TLB) is a CPU cache that memory management hardware uses to improve virtual address translation speed. It was the first cache introduced in processors. All current desktop and server processors (such as x86) use a TLB. A TLB has a fixed number of slots that contain page table entries, which map virtual addresses to physical addresses. It is typically a content-addressable memory (CAM), in which the search key is the virtual address and the search result is a physical address. If the requested address is present in the TLB, the CAM search yields a match quickly, after which the physical address can be used to access memory. This is called a TLB hit. If the requested address is not in the TLB, the translation proceeds by looking up the page table in a process called a page walk. The page walk is a high latency process, as it involves reading the contents of multiple memory locations and using them to compute the physical address. Furthermore, the page walk takes significantly longer if the translation tables are swapped out into secondary storage, which a few systems allow. After the physical address is determined, the virtual address to physical address mapping and the protection bits are entered in the TLB.

    So the TLB is a CPU cache that helps with translation between virtual address spaces and physical address.  Note that these virtual address spaces have nothing to do with virtual machines – but are used to allow multiple applications on an operating system to be isolated from each other.

Summarizing all of this – video card drivers tend to use memory access methods that cause Hyper-V to need to clear out the CPU cache for memory page table mapping a lot.  This is an expensive thing to do in Hyper-V at the best of times.  In fact – the above TLB article on Wikipedia even has a section on the problems of virtualization and the TLB.

Now that we have the ground rules in place – let’s head on to some of the other questions.

How could you possibly ship Hyper-V with this issue? Did you not test this product?

To answer the second question first – I actually was the first person (in the world) to hit this issue.  Early on in development I tried to use Hyper-V as my desktop OS on my home system with a GeForce 8800 video card.  Everything seemed to work okay (though some things were oddly sluggish) until I tried to pay Age of Empires III.  I had never played this game before, and the first time I tried to play it was on top of Hyper-V.  In short, it sucked.  Unfortunately I spent most of the weekend trying to tweak my rig and looking for patches to Age of Empires III before I thought to try disabling Hyper-V.

As soon as I realized what was happening I filed a bug and the issue was investigated.

When the issue was determined to be a specific result of the combination of the Hyper-V hypervisor and the Nvidia driver – we decided to leave things as they were for a couple of reasons:

  • Windows Server does not include any video drivers other than the SVGA driver by default
  • Windows Server will not install a high-end video driver automatically at any stage – you need to manually install the Windows 7 drivers

Also, Hyper-V was being developed solely for server virtualization and:

  • We have always recommended that nothing be run in the management operating system, other than basic management tools
  • No server workload that we tested generated anywhere near the rate of TLB flushing that these video drivers cause

Finally, this is a really hard issue to address.  In fact, there are no hypervisor based virtualization platforms that addresses this issue today – and while there are several under development I suspect that they will either have specific hardware requirements (I will get to this later) or will have simplifications / limitations to help them mitigate this issue (like only having one virtual machine).

Why does this affect Hyper-V and not Virtual PC?

Here we are seeing the difference between a hypervisor and a host VMM type solution.  With a hypervisor base platform (like Hyper-V) everything runs on top of the hypervisor – even the management operating system.  Where as with a hosted VMM platform (like Virtual PC) the host operating system still has direct access to the hardware.  To explain this better – here is a diagram:

Drawing1

Hopefully you can see the difference here.  It should also be noted that all desktop virtualization products available today use an architecture similar to that of Virtual PC.

How do I know if this is affecting my computer?

To check if this is affecting your system – what you need to do is open Performance Monitor (you can do this by running “perfmon” from the start menu).  Select the Performance Monitor node and click on the plus symbol to add a new counter.  Then find the Hyper-V Hypervisor Root Partition entry, expand it, select Virtual TLB Flush Entries/sec and add the Root counter.  This will allow you to keep an eye on the rate of TLB flushing in the management operating system:

UntitledUntitled2

So what do you look for now?  On my system – the only time I see a significant rate of TLB flushing (>10) is when I start a virtual machine.  A system that has this problem will either generate a continuous rate of TLB flushing above 100 or will generate spikes in the thousands.

What can I do to stop this / work around it?

There are a couple of options here:

  1. Use the default video driver (SVGA).

    Yes, I know it is not sexy or fun – but if you are planning to just use Hyper-V as a server virtualization platform this is your easiest and simplest option.  It is the way we intended Hyper-V to be used, and it will always give the best performance.

  2. Tone down the use of 3D graphics.

    Some video cards (like the Nvidia Quadro FX 1700M) seem to work fine as long as Aero is not enabled and no 3D applications are running.  If I enable Aero I start to see a fairly frequent rate of spikes in my TLB flush count (which causes annoying lurches in the window animation).  Running a 3D game (like Halo 2) is just terrible.

    This means that for those of you who do not want the high-end driver for 3D graphics, but instead need it for multi-monitor support or for the ability to connect a projector to your laptop (like me) this may work.

  3. Choose your video card carefully.

    As a general rule of thumb – the less capable the video card, the less likely this is to be an issue.  My previous laptop had an integrated graphics controller – which was terrible for gaming – but worked great for Hyper-V.  When I wanted to get my new laptop and found that there was no Intel option – I tracked down a coworker with a similar graphics card in their laptop and tried out Hyper-V on it before going ahead and buying it.

  4. Get a system with Second Level Address Translation (SLAT).

    SLAT is a technology that goes by different names depending on whether you get Intel (where it is called “Extended Page Tables” (EPT)) or AMD (where it is called “Nested Page Tables” (NPT) or “Rapid Virtualization Indexing” (RVI)).  These technologies are an extension to the traditional TLB that allow us to use the hardware to handle multiple TLBs – one for each virtual machine.  We added support for this hardware in Windows Server 2008 R2.  If you run Windows Server 2008 R2 on a system with SLAT capabilities – you will not have any problems running 3D graphics at all.

    Intel started shipping this technology in the Nehalem (or core i7) processor line.  AMD has been shipping this for a while now – ever since generation 3 of the AMD Quad-core family.  Unfortunately neither have shipped this technology in the laptop processors yet – though Intel has indicated that they are planning to soon.

Hopefully this has answered all of your questions satisfactorily.  If you have any further questions – please feel free to ask away.  I would also encourage you that if you have a video card that appears to work well with Hyper-V and 3D graphics – post the details in the comments so that others can benefit from your good fortune!

Cheers,
Ben

Comments

  • Anonymous
    November 16, 2009
    We discovered this issue on a tester's workstation. They had been given a machine kitted out with enough CPU and RAM to run a couple of VMs in Hyper-V. The management operating system wasn't going to be used for anything hard - just a web browser, mail client, etc. The whole system ran so sluggishly when Hyper-V was enabled that we tried going back to the SVGA driver. Unfortunately this meant that the tester could only use one monitor, which was ridiculous. So we had to disable Hyper-V and move his VMs to a separate server machine. It's great to hear that it's fixed on newer processors. Is this something which we need to check the CPU's specs for, or can we safely assume that all Core i* CPUs have it, for example?

  • Anonymous
    November 16, 2009
    Hi Ben, Thanks for this post. It can be astruggle to convey all of this to developers that are relying on Hyper-V for SharePoitn 2010 development (the only other x64 solution would be VmWare Workstation). Do you think we might ever see multi-monitor suport in the SVGA driver? For some reason I think that won't be possible but it would be a huge help if it could happen. Cheers, Tristan

  • Anonymous
    November 16, 2009
    The comment has been removed

  • Anonymous
    November 16, 2009
    Not seeing this issue on a Dell Latitude E6500 running 2k8r2 x64 Datacenter, hyper-v enabled.  It's got an Nvidia Quadro NVS 160M, running driver version 186.21 - TLB flushing is spikey at VM boot, but then settles down to 0 pretty quickly.  FYI...

  • Anonymous
    November 17, 2009
    Rik Hemsley - Unfortunately, Intel's documentation is rather vague here.  What I have heard is that it has to be Core i7 (not i5) but it might be worthwhile to ask on their support forums for clarification. Tristan Watkins - Good question - I will see if I can find an answer. John Sinclair - Whoops!  I meant to mention this in the post - seeing a spike of TLB flushing during virtual machine start is completely normal. Cheers, Ben

  • Anonymous
    November 17, 2009
    Is a spike durring "Ctrl+Alt+Del" normal? I have it on a Dell D630 Name with Mobile Intel(R) 965 Express Chipset Family Graphic with still 2008 R1. Also I think I noticed that until "C:WindowsSystem32net.exe start hvboot" is started it seems to spike not at all. Durring some periods I have longer periods of loads between 20 and 100+. I'm using the newest Intel Graphics driver. For some reason I cannot run Aero anymore (using Vista Basic). Also I noticed disabling Aero enhance the performance a lot. Also I use only 16 bit color depth which also I think improves performance a bit for me.

  • Anonymous
    November 18, 2009
    Given the mess that Intel has made of VT in their product matrix (and now of SLAT), looks like I'm sticking with AMD chips.

  • Anonymous
    November 18, 2009
    Hi Ben! great post! I have this issue too in my ws2008R1. Especially there is a big spike (it is constantly over 90 for seconds) after Nod32 updates it's database, and it least for about 10 seconds, while windows is very-very slow :( I don't think this NOD32 issue is connected to video card driver, it must be some other problem. But I've realized this by using the perfmon on the way you wrote down. What do you think, should I write a mail to Eset support, or is it a Windows issue? Thanks!

  • Anonymous
    November 19, 2009
    @ Tom be careful with AMD, there's a lot of virtualization issues with certain BIOSes, mostly affecting the client side, laptops with AMD CPUs are especially affected.

  • Anonymous
    December 02, 2009
    I just bought Core i7 laptop with GeForce 230M; HP pavillion DV7-3050ec. Installed Windows 2008 R2 x64 and installed Win7 drivers. All working perfectly, even when Virtualization is enabled in BIOS. Add Hyper-V role and I get a blue screen in Nvidia driver on boot. I tried versions 186.44; 186.81 and even beta 195.62 with no luck. This is not a performance issue, I only wish it was that, but I cannot even boot the system with Hyper-V. If I revert to standard VGA, it's all fine, but then I would rather sell this laptop then look at VGA screen. I am looking for anything to try, please any ideas so I dont have to go back to RDP to my server or go back to VMWare, ouch. I will post back results if anything works.   By the way I don't buy the argument this is for servers, because virtualization belongs to workstations as well. Thanks

  • Anonymous
    December 02, 2009
    Windows 7 Ultimate 8 gig of memory Nvidia GTX 285 video card. XP mode the best color I can get is medium 16 Bit color. all my graphics are washed out I need 32 Bit. Any Ideas? I have tried changing the video card in the virtual xp session and the s3 driver still comes back.

  • Anonymous
    December 10, 2009
    Biztalker, Any luck?    I have the same scenario w/ two different core i7 laptops..  one ATI the other Nvidia graphics.   Whenver hyper-v role is enabled w/ graphics driver installed; blue screen. I do have another mobile server laptop w/ W5590 Xeon and a NVidia GTX 260M and I have Hyper-V role and Aero running like a champ.  Curious why that one is working?

  • Anonymous
    December 10, 2009
    No luck or answer yet I will keep on waiting

  • Anonymous
    December 11, 2009
    @sintak/biztalk Hi, I have the same problem w/ my core i7 laptop (720QM / HD4670).   Hyper-V + GFX Driver = BSoD I don't need 3D really, just must have the full screen resolution (1920x1080). Tried ATI Catalyst 9.5 up to 9.11 (Desktop moddified) / Win7 Standard-VGA-Driver No luck. Do the Core i7 Mobile have S.L.A.T.? Can you enable/disable?

  • Anonymous
    December 14, 2009
    @sintak, biztalk, resil If you want support for HyperV/WinVPC, go to the technet forums. http://social.technet.microsoft.com/Forums/en-US/w7itprovirt/threads http://social.technet.microsoft.com/forums/en-US/winserverhyperv/threads/ WinVPC only (always) emulates the S3 Trio, you can get 32bit color by disabling IC, or 24bit in seamless mode. http://smudj.wordpress.com/2009/10/08/xpmode-and-24bit-color/

  • Anonymous
    December 14, 2009
    The comment has been removed

  • Anonymous
    December 15, 2009
    WE aren't discarding.   We are trying to fined a solution.   Problem is that VMWare workstation is a user process that only supports dual cores.   A Hyper-v service would be much more stable, responsive, and convenient to demo and server from.

  • Anonymous
    December 15, 2009
    I'm just beside my self I can't get a Core i7 laptop running hyper V on a laptop w/ the native 3D graphics.

  • Anonymous
    January 08, 2010
    The comment has been removed

  • Anonymous
    January 08, 2010
    Note that the Core i7 processors in question has SLAT, so if the crash issues are solved, there should be no performance issues.

  • Anonymous
    January 09, 2010
    The comment has been removed

  • Anonymous
    August 03, 2010
    Hi Ben, I think these problems are mostly gone for me on the Windows Server 2008 R2 SP1 Beta but there are some quirks that confuse me, for instance the CTRL+ALT+DEL redraw and full screen YouTube are still slow but Windows key + Arrow is very fast now, and generally everything seems improved. Could you shed any light on whether these changes are expected and why it seems to be better now? By the way, I have an SMP processor with an NVIDIA GeForce 8400GS graphics card. It's a Dell XPS M1330. Cheers, Tristan

  • Anonymous
    August 31, 2010
    Hi Ben.  Thanks for the article.  I know I'm coming to the party a little late but .. I have a new Dell Studio 14 with an i5-520M cpu.  The system came with Intel's HD graphics.  If i enable Hyper-V before I install the HD graphics driver than life is good.  However once I install the driver I get the dreaded stop 119 code. I could live with the SVGA support on the host machine but for the fact that I can't get my laptop to project onto an external monitor until I install the dreaded HD driver.  Sadly I can't live without that! Any suggestions? Thanks again.

  • Anonymous
    August 31, 2010
    For everyone who has been seeing problems with core i7 laptops - please try out the Windows Server 2008 R2 SP1 beta - as this should address the problem there. Cheers, Ben

  • Anonymous
    October 11, 2010
    Uninstalling and or disabling Hyper-v does not get rid of the slow RDP, I dont need Hyper-v anymore, any ideas how to get rid of this problem without svga and or reinstalling?

  • Anonymous
    October 11, 2010
    Jasper - If Hyper-V is uninstalled, then something else must be causing the problems you are seeing. Cheers, Ben

  • Anonymous
    October 28, 2010
    In my Toshiba A505-S6033 (with an Nvidia GeForce 310M and i7 processor ) and with Hyper V installed, everything seems to be working fine (I did had to install KB975530 because I had some BSOD problems at first, but after that, performance is as good as when the laptop had Windows 7) Is there an explanation for this?  Why is my laptop not affected by KB961661 ) ? Is it because of this "SLAT" thing? Oh, and BTW, I do use Aero on Windows 2008 R2 , and works it perfectly (and it feels very fast).

  • Anonymous
    October 28, 2010
    I got a Gateway DX4840-02m that came with an Intel video card, but needed more power so I bought a MSI Nvidia GT 240, that worked perfectly with Windows 7 x64, but after I installed Windows 2008 R2, windows would not start: I would get a BSOD saying STOP 0x00000116, and that the problem was at nvlddmkm.sys. Tried everything, older drivers, newest driver, full OS reinstall, nothing worked... and then I decided to try the SP1 RC... and... it worked! I would love to know which feature or hotfix in the SP1 fixed the BSOD. Maybe what fixed it is RemoteFX ?

  • Anonymous
    November 11, 2011
    I have a Dell T110 server running hyper-V that I need to set up dual monitors on.  What PCI-E video card do you recommend?

  • Anonymous
    December 25, 2011
    sir i have DELL N5010 Laptop i install win server 2008 R2 in my laptop n hyper-v to so please sir can any one how know drivers i get for win server 2008 R2 please help me frnds

  • Anonymous
    October 06, 2015
    The comment has been removed

  • Anonymous
    October 10, 2015
    Gigaplex - We considered not making this a hard block - but when we tested it we found that the performance was unacceptable for pretty much all video adapters.  So even if we had enabled this - you would have likely ended up dual booting on a system without SLAT. Cheers, Ben

    • Anonymous
      April 13, 2016
      Ben,Was this a simple conditional check, or did you actually dropped code from hyper-v? What if a hacker finds the point(s) where you are making the checks, then uninstalls the video driver and replaces it with standard VGA driver - would the client successfully run a VM?It's very very very frustrating to dual-boot to server 2012 R2 evaluation edition just to update the VHDX installation of windows 10 insider builds... and vmware does not understand VHDX files... and I need vhdX for TRIM support on ssd...
      • Anonymous
        April 13, 2016
        The comment has been removed
  • Anonymous
    May 22, 2017
    The comment has been removed