Server Virtualisation - Live Migration vs. Quick Migration.
I'd really like to get to the bottom of this.
In the event of a hardware failure, both VMware and Microsoft's Server Virtualisation solutions are equal - the guest machines will be restarted on another physical node.
In the event of some planned downtime, we both have the ability to move the guest machines to another physical node - but we do it at different speeds (less than a second vs. a few seconds).
Here's my question (and I do agree that it's nice to have less than one second - that's why we will have that feature at some time in the future):
If you're planning on moving a running server (guest), do you do it now? or do you wait until the workload is small (the users have all gone home)?
i.e. Do you NEED less than a second to failover (or will a couple of seconds do)?
Please post your answer (and be honest - the question is "do you NEED less than a second") - I'll count up the replies...
Thanks,
Dave
Comments
Anonymous
January 01, 2003
Hyper-V in Windows Server 2008 Enterprise and Datacenter Edition offers the ability to make virtual machinesAnonymous
January 01, 2003
I am going to build a Hyper-V cluster in my Hyper-V session at the IMTC on the 4th April. I'm not going to be covering System Center though.. p.s. The agenda I'd seen for the IMTC was roughly a 50/50 split between Developer & IT Pro content.. DaveAnonymous
January 01, 2003
The comment has been removedAnonymous
January 01, 2003
Looks like Microsoft corp have decided to run with the answer to this: http://blogs.technet.com/virtualization/archive/2008/04/09/hyper-v-quick-migration-vmware-live-migration-part-1.aspx Even I'm looking forward to Part 2 of their blog. Dave.Anonymous
January 01, 2003
Guys, keep the comments comming - this is a great discussion. I'm on holidays now for the next week and a bit. Back again on the 1st April. Dave.Anonymous
January 01, 2003
The comment has been removedAnonymous
January 01, 2003
Yep - I've seen that. A Hyper-V console session would be lost (you're connected to the cluster node - not the VM). A terminal Services session to the VM would not be lost. Not quite sure how they got the file copy to fail - a file copy would continue if any old file server failed over on a cluster. Not quite sure how they got the database client to fail either - any old database client would continue if the database failed over on a cluster. Come on guys - we've been using Failover Clustering for all our mission critical stuff for over ten years (since NT4 and Wolfpack). Now we saying that it doesn't work? That for ten years we've been restarting file copies and database and email clients? Sorry, but I still think I'm right - you don't NEED less than a second to fail over. You haven't NEEDEED it for the last ten years & you don't NEED it now! Granted - it's nice to have (and we'll have it one day)... Dave.Anonymous
March 03, 2008
It's more than one second. And as a migration process it should be planned, so the downtime usually will be after office hours. If it's a 24/7 workload, it's good to have less than one second but definitely few seconds already considered.Anonymous
March 03, 2008
No, but most of my customers don't know that it is capable of clustering or soft failover. The biggest blocker for this exisitng feature is customers having the budget large enough... you need 2 Windows Enterprise servers plus a SAN/iSCSI storage that supports it. However with Windows 2003 R2 enterprise...you do get 4 free Windows 2003 Server OS as guests to help offset the costs.Anonymous
March 03, 2008
Well, I don't know if it's only VMware marketing stuff, but the point here is not "less than a second vs. a few seconds" but it's "All DB Current Sessions will remain connected vs. all DB Current Sessions will be lost", at least for our company. Thanks Dave for your great work.Anonymous
March 03, 2008
The comment has been removedAnonymous
March 12, 2008
so Dave what do MS have in relation to combining Vmotion with DRS so that Vms move between cluster nodes depending on utilisation requirements ?Anonymous
March 19, 2008
Dave, I quite liked your Demo for Clustering, would you consider doing a Virtualisation one? Showing how you, would set up and failover one or more VMs using System Center and/or any other HyperV features you quite like?! Or will all this be covered in the IMTC 08 Conferance and if so will you be posting it in Technet after? The conference itself seems to be very geared towards Developers, which is fine, but for the Non Dev types, its only really Day 3, Track 4 that has much appeal, well to me anyway. :)Anonymous
March 21, 2008
there is a demo of Quick migration on youtube, available here: http://vmblog.com/archive/2008/03/20/video-hyper-v-quick-migration-breaks-network-connections.aspx Dave perhaps you would like to comment ?Anonymous
March 21, 2008
Found this video on youtube: http://youtube.com/watch?v=nmZ-6rB2l9s Why does the same scenario work fine with vmware's vmotion but break in microsoft's? Just curious... NateLeeAnonymous
March 24, 2008
A readers 2 cents... I am actively managing an ESX cluster in a lab with 7 hosts made up of a mix of 2 socket Dual and Quad Core CPU 128 GB of RAM, 135 vms. DRS is key technology for maintaining environment efficiency and improving resource utilization. When we turn it to manual things can get ugly quick. Microsoft has a comparable feature? I have watched a VM move 3 + times with in an hour period under certain workloads. I can imagine the dropped session and database connections would be unacceptable even in our lab.Anonymous
March 25, 2008
No less then one second is not needed. This rapid need to move seems to defy proper IT planning and management. In the banking industry we can't afford to make implementation mistakes that lead to the need to rapidly move in one second.Anonymous
March 26, 2008
a few seconds to move a VM using Quick Migration - ha I'd like to see that! As soon as you get some RAM in those VMs you'll soon see it slow to a few minutes! Also, the question of breaking client connections for copies and DB connections is all based on the type of connection, the timeout values on the MSCS cluster and application, and the time taken to resume the VM on the other node. If its going to take more than 30 seconds, explorer times out. I'm not sure of the timeout for a SQL query, but I'm sure once you get passed 60 seconds you'll have issues. As far as capacity planning/proper IT planning and management, ITIL provides for pre-approved changes. Business wants dynamic IT environments that meet the needs of the business within its timeframes, and at a reasonable cost. If the process of planning to the nth degree costs more than the business benefit for doing it in the first place, then whats the point? Virtualisation is a technology that will allow us to decouple the hardware from the OS, providing the ability to be flexible, agile, provide higher availability and negate downtime (planned or urgent), I'm sure the banking industry needs that too!Anonymous
March 27, 2008
Tim with all due respect your kinda missing the point with your statement on proper planning mitigating the need for live migration. The vision of virtualisation is to create a grid or cluster of x86 standard servers which can be utilised as onelarge system where capacity can be added on demand and work loads balanced depending on requirements. Think about how application requirements can change over a 24 hr period in a typical datacentre and how dynamic movement of workloads can improve effeciency and you will start to understand why the rules of "proper planning" have just changed. VMware IMHO are the current leaders in production capable dynamic infrastructure with technologies like DRSVmotion, however I dont doubt MS abilities to catch up although the certainly have their work cut out.Anonymous
March 31, 2008
Actually i stand by my statement. In the banking industry we contantly model and use stress and other application load testing techniques, to stay within the "proper planning" mentality. We don't want to be caught conducting "dynamic movements" and affect customer experiences and accesses. Your methods may work for some industries, but not ours. And our success in uptime and other methods, which has allowed us to become the largest credit card processing in the world, with over 750 million active accounts on file, ensures that proper planning, and not dynamic load management is the standard, and guess what i am not a mainframe guy! lol Cheers.Anonymous
April 04, 2008
Interesting comments all around. I just did two posts on my blog. The first is why client connections break with Quick Migration and not with VMotion or live migration. The second is on the costs associated with each. Part I: http://mikedatl.typepad.com/mikedvirtualization/2008/04/part-i-quick-mi.html Part II: http://mikedatl.typepad.com/mikedvirtualization/2008/04/part-ii-quick-m.html Tim Kelly, you may want to read the second one since it has a customer in your same industry and explains why they move stuff around and why they use virtualization. The top 18 financial institutions in the world run production, customer facing apps on VMware today. With VMotion and DRS you get a more flexible environment - a sort of dynamic datacenter. Proper planning is fine but what happens when you unexpectedly add a few thousand users one day? I know where you're coming from though with proper planning since I used to work in the financial industry myself.Anonymous
April 15, 2008
The comment has been removed