Jaa


Drilling into ‘reasons for not switching to Hyper-V’

Information week published an article last week “9 Reasons why enterprises shouldn’t switch to hyper-v”. The Author is Elias Khnaser, this is his website and this is the company he works for.  A few people have taken him to task over it, including Aidan . I’ve covered all the points he made, most of which seem to have come the VMwares bumper book of FUD, but I wanted to start with one point which I hadn’t seen before. 

Live migration. Elias talked of “an infrastructure that would cause me to spend more time in front of my management console waiting for live migration to migrate 40 VMs from one host to another, ONE AT A TIME.” and claimed it “would take an administrator double or triple the time it would an ESX admin just to move VMs from host to host”.   Posting a comment to the original piece he went off the deep end replying to Justin’s comments , saying “Live Migration you can migrate 40 VMs if nothing is happening? Listen, I really have no time to sit here trying to educate you as a reply like this on the live migration is just a mockery. Son, Hyper-v supports 1 live VM migration at a time.” . Now this does at least start with a fact : Hyper-V only allows one VM to be in flight on a given node at any moment: but you can issue one command and it moves all the hyper-v VMs between nodes. Here’s the PowerShell command that does it.
Get-ClusterNode -Name grommit-r2 | Get-ClusterGroup |
  where-object { Get-ClusterResource -input $_ | where {$_.resourcetype -like "Virtual Machine*"}} |
     Move-ClusterVirtualMachineRole -Node wallace-r2             
The video shows it in action with 2 VMs but it could just as easily be 200.  The only people who would “spend more time in front of [a] management console” are those who are not up to speed with Windows Clustering. System Center will sequence moves for you as well. But… does it matter if the VMs are migrated in series or in parallel ?  If you have a mesh of Network connections between cluster nodes you could be copying to 2 nodes of two networks with the parallel method, but if you don’t (and most clusters don’t) then n copies will go at 1/n the speed of a single copy. Surely if you have 40VMs an they take a minute to move it takes 40 minutes either way…  right ? Well no... Let’s use some rounded numbers for illustration only: say 55 seconds of the minute is doing the initial copy of memory, 4 seconds doing the second pass copy of memory pages which changed in that 55 seconds, and 1 second doing the 3rd pass copy and handshaking. Then Hyper-V moves onto the next VM and the process repeats 40 times. What happens with 40 copies in parallel ? Somewhere in 37th minute the first pass copies complete - none of the VMs have moved to their new node yet. Now: if 4 seconds worth changed in 55 seconds – that’s about 7% of all the pages - what percentage will have changed in 36 minutes ?  Some won’t change from hour to hour and others change from second to second – how many actually change in 55 seconds or  36 minutes or any other length of time depends on the work being done at that point, and the memory size and will be enormously variable. However the extreme points are clear (a) In the very best case no memory changes and the parallel copy takes as long as the sequential. In all other cases it takes longer (b) In the worst case scenario the second pass has to copy everything – when that happens the migration will never complete.  

Breadth of OS support. In Microsoft-speak “supported”  means a support incident can go to the point of issuing a hot-fix if need be. Not supported doesn’t mean non-cooperation if you need help – but the support people can’t make the same guarantee of a resolution. By that definition, we don’t “support” any other companies’ software – they provide hot-fixes, not us - but we do have arrangements with some vendors so a customer can open a support case and have it handed on to Microsoft or handed on by Microsoft as a single incident. We have those arrangements with Novell for Suse Linux and Red Hat for RHEL, and it’s reasonable to think we are negotiating arrangements for more platforms: those who know what is likely to be announced in future won’t identify which platforms to avoid prejudicing the process. In VMware-speak “supported”, has a different meaning. In their terms NT4 is “Supported”. NT4 works on HyperV but without hot-fixes for NT4 it’s not “Supported”. If NT4 is supported on VMware and not on Hyper-V exactly how is a customer better off ? Comparisons using different definitions of “support” are meaningless.   “Such and Such an OS works on ESX / Vsphere but fails on Hyper-V” or “Vendor X works with VMware but not with Microsoft” allows the customer can say “so what” or “That’s a deal-breaker”.

Security.  Was it hyper-v that had the vulnerability which let VMs break out of into the host partition ? No that was VMware. Elias commented that "You had some time to patch before the exploit hit all your servers" which makes me worry about his understanding of network worms. He also brings up the discredited disk footprint argument; that is based on the fallacy that every Megabyte of  code is equally prone to vulnerabilities, Jeff sank that one months ago and pretty comprehensively – the patch record  shows a little code from VMware has more flaws than a lot of code of Microsoft’s.

Memory over-commit. Vmware's advice is don't do it. Deceiving a virtualized OS about the amount of memory at its disposal means it makes bad decisions about what to bring into memory - with the virtualization layer paging blindly - not knowing what needs to be in memory and what doesn’t. That means you must size your hardware for more disk operations, and still accept worse performance. Elias writes about using oversubscription, “to power-on VMs when a host experiences hardware failure”. In other words the VMs fail over to another host which is already at capacity and oversubscription magically makes the extra capacity you need. We’d design things with a node’s worth of unused memory (and CPU , Network, and Disk IOps ) in the other node[s] of the cluster. VMware will cite their ability to share memory pages, but this doesn’t scale well to very large memory systems (more pages to compare), and to work you must not have [1] large amounts of data in memory in the VMs (the data will be different in each), or [2]  OSes which support entry point randomization (Vista, Win7, Server 2008/2008-R2) or [3] heterogeneous operating systems. Back in March 2008 I showed how a Hyper-v solution was more cost effective if you spent some of the extra cost of buying VMware on memory – in fact I showed the maths underneath it and how under limited circumstances VMware could come out better. Advocates for VMware [Elias included] say buying VMware buys greater VM density: the same amount spent on RAM buys even-greater density. The VMware case is always based on a fixed amount of memory in the server: as I said back then, either you want to run [a number of] VMs on the box, or the budget per box is [a number] Who ever yelled "Screw the budget, Screw the workload. Keep the memory constant !" ? The flaw in that argument is more pronounced now than it was when I first pointed it out as the amount of RAM you get for the price of VMware has increased.

Hot add memory. Hyper-v only does hot-add of disk, not memory. Some guest OSes won’t support it at all. Is it an operation which justifies the extra cost of VMware ? . 

Priority restart - Elias describes a situation where all the domain controllers / DNS servers on are one host. In my days in Microsoft Consulting Services reviewing designs customers had in front of them, I would have condemned a design which did that, and asked some tough questions of whoever proposed it.  It takes scripting (or very conservative start-up timeouts) in Hyper-V to manage this. I don’t know enough of the feature in VMware to know how sequences things not based on the OS running but all the services being ready to respond

Fault tolerance. VMware can offer parallel running - with serious restrictions. Hyper-v needs 3rd party products (Marathon) to match that.  What this saves is the downtime to restart the VM after an unforeseen hardware failure. It’s no help with software failures if the app crashes, or the OS in the VM crashes, then both instances crash identically. Clustering at the application level is the only way to guarantee high levels of service: how else do you cope with patching the OS in the VM or the application itself ?      click for larger version

Maturity: If you have a new competitor show up in your market, you tell people how long you have been around. But what is the advantage in VMware’s case ? Shouldn’t age give rise to wisdom, the kind of wisdom which stops you shipping Updates which cause High Availability VMs to unexpectedly reboot, or shipping beta time-bomb code in a release product. It’s an interesting debating point whether VMware had that Wisdom and lost it – if so they have passed through maturity and reached senility.

 Third Party vendor support. Here’s a photo. At a meet-the-suppliers event one of our customers put on, they had us next to VMware. Notice we’ve got System Center Virtual Machine manager on our stand, running in VM, managing two other hyper-V hosts which happen to be clustered, but the lack of traffic at the VMware stand allows us to see they weren’t showing any software – a full demo of our latest and greatest needs 3 laptops, and theirs ? Well the choice of hardware is a bit limiting. There is a huge range of management products to augment Windows – indeed the whole reason for bring System Center in is that it manages hardware, Virtualization (including VMware) and Virtualized workloads. When Elias talks of 3rd party vendors I think he means people like him – and that would mean he’s saying you should buy VMware because that’s what he sells.

Comments

  • Anonymous
    January 01, 2003
    @Matt - it was just a proof that we could stick our solution on the hardware we had in the cupboard - nothing more. @Paul.  A parallel transfer is not automatically quicker than a serial one. Think of ordinary file copies, if you are copying 10 files in paralell you get closer to using the full network bandwidth : but you get the same effect with a multi-treaded copy moving each file in series. If the content is changing then ... see above.
    @Scott - OK another blog post coming up.

  • Anonymous
    January 01, 2003
    The comment has been removed

  • Anonymous
    January 01, 2003
    @anonymous. YES. Do you think researchers have not tried to find a guest escape vulnerability. Am I happy to take Microsoft's Patch record over the life of Hyper-V and go up against VMware. You bet I am. [Microsoft's patch record c. 2002 ? Not-so-much] @Ryan, actually  I agree with you. HyperV itself is free, you have to pay for VMware: so anyone doing due diligence should look at what VMware offers and ask if it is worth the money (or if you prefer, look at the free product and ask if it is up to the Job). I'm happy with the proportion who will say "no it isn't" but only the customer can say if they are in the group who say "Yes it is, for us" @Kent In the section "Memory performance Best practices" at the top of Page 6 it says clearly "Make sure the host has more physical memory than the total amount of memory that will be used by ESX plus the sum of the working set sizes that will be used by all the virtual machines running at any one time." I don't think we ever said Quick Migration was as good as Live Migration. If you can plan the move you can schedule a couple of minutes of downtime - and when we lacked live migration we had to keep pointing that out. Customers told us they saw the logic of that but they still felt a lot happier knowing they could live migrate.

  • Anonymous
    January 01, 2003
    The comment has been removed

  • Anonymous
    December 21, 2009
    Nice follow up James!  And thanks for the link :)

  • Anonymous
    December 21, 2009
    So are you making a challenge to all of the researchers out there to see if they can find a guest escape vulnerability in Hyper-V?  Really?  Do you want to go there?   Help me out with the patch record thing too.  Do you (MS) really want to go to the patch record argument?  Seriously?  

  • Anonymous
    December 21, 2009
    The comment has been removed

  • Anonymous
    December 21, 2009
    Is it just my command of the English language, but "Memory over-commit. Vmware's advice is don't do it." I don't read that at all in the document, and you stating it simply looks like jading the truth and works against credibility.  Be honest with the reader.  Otherwise, it looks just like another Quick Migration is as good as Live Migration argument.

  • Anonymous
    December 22, 2009
    James, there is no product called 'VMware', as you well know -- would you say 'Microsoft' when referring to a product? VMware, the company, published a suite of products, of which some, including VMware Server but most importantly ESXi, are in fact free [zero-cost]. vSphere is decidedly not free, but just like Windows 2008 R2, comes in a variety of levels and price points which offer various feature sets. No reasonable person would ever conclude that Hyper-V's OS support comes even remotely close to that of vSPhere/ESXi. For instance, our organization has numerous production VMs running FreeBSD and Solaris/OpenSolaris, with VMware Tools running happily in those VMs. We couldn't do the same with Hyper-V. Memory overcommit is a real, useful feature -- in a limited set of circumstances (for us, mostly desktop virtualization). There are certainly cons to using it, but it's nice to have the option. Although Live Migration has mostly caught up with vMotion (and doesn't come with the big price tag that vMotion does), Hyper-V is again playing catch-up because there's no equivalent feature to Storage vMotion. We use that regularly to move VMs among different back-end datastores for reasons of policy, performance tiering, maintenance and the like. Hyper-V's storage options are also more limited. While vSphere/ESXi can use just about any NFSv3 filer, and also just about any iSCSI or FiberChannel SAN, Hyper-V doesn't support any NAS storage (even CIFS/SMB2) as a clustered back end, and requires an iSCSI implementation that supports persistent reservations. This can affect costs for clustering, since very free free/low-cost iSCSI implementations support the required features. OpenSolaris does, one of FreeBSD's iSCSI targets does, and a few others, but OpenFiler and the like do not [yet]. Likewise, customers may already have a fast, robust NAS infrastructure in place, but they can't use it for Hyper-V. Contrary to FUD I've seen elsewhere, NFS is indistinguishable in terms of speed from iSCSI in almost all cases (and faster in some); our organization is about to switch to 10GbE NFS for even faster storage. None of this implies that Hyper-V isn't a decent product with some compelling features, and the initial price tag is almost always (but not universally) less than vSphere, depending on size and shape of the deployment and features. But it's very silly, for proponents of either  technology, to ignore the real pros and cons on each. Actually, it's even worse to actually be a proponent of either; it's like advocating a green hammer vs. a blue mallet. Both offer roughly analogous features, but they are not the same, and arguments devolve to discussing the color anyway ;)

  • Anonymous
    December 23, 2009
    The comment has been removed

  • Anonymous
    December 23, 2009
    James, I work for VMware and am one of the people responsible for our performance white papers. You incorrectly state that VMware recommends against memory over-commit.  It is foolish for you to make this statement, supported by unknown text in our performance white paper, when so much of our literature demonstrates the phenomenal value of this wonderful feature. If you think any of the language in this document supports your position, please quote the specific text.  I urge you to put that comment in a blog entry of its own.  I am sure that your interpretation of the text will receive comments from many of your customers. I will say again: we absolutely do not recommend against memory over-commit.  We love it, our customers love it, and you guys will, too, once you have provided similar functionality in Hyper-V. Thank you, Scott

  • Anonymous
    December 23, 2009
    The comment has been removed

  • Anonymous
    December 27, 2009
    The comment has been removed