Stretched SharePoint Farms vs. Disaster Recovery SharePoint Farms
Something I want to clarify is the difference between these types of “high-availability” SharePoint setups. The difference is fairly simple actually but people often get the two confused so I thought I’d write something quick about the subject, because the two designs are not nearly the same, very deliberately.
In short, stretched farms increase availability for endpoint & network failure for a single SharePoint farm. Disaster Recovery (DR) farms on the other hand are an independent copy of the primary farm to take over in-case something bad happens on the primary site that impacts users at all. The fact the DR site isn’t the same farm as the primary means any breaks in the primary site shouldn’t impact the DR site by design, all being well.
Stretched-farms are for high-availability but disaster-recovery farms are, well, about disaster-recovery which isn’t exactly the same thing even if the desired result is the same – to keep SharePoint users happily using SharePoint even with fatal failures in the SharePoint farm.
Stretched SharePoint Farms
A stretched farm is basically a single farm that exists in two separate locations, pretty much exactly like a multi-subnet farm but presumably with service-application redundancy built in on both subnets so any single subnet can function on its own. This implies a multi-subnet SQL cluster of some kind too (or SQL mirroring at least) otherwise there wouldn’t be much point in having your SharePoint farm stretched across X subnets. This diagram shows why:
Taking out a subnet won’t take down the farm if you’ve spread out enough redundancy, in short. But it is one logical farm though with just the single configuration database so all the servers in both sites make up the single SPFarm.
Now that said, because we’re talking about just having one farm if a service-application dies we’re basically out of luck on any single farm, stretched or otherwise:
Stretched farms are still just a single farm, which is why although it’s great to have stretched-farms when possible, in my opinion the real investment value comes from having an entirely separate farm to failover to, running alongside your primary.
Disaster Recovery SharePoint Farms
DR farms on the other hand are different in design and purpose. First of all they’re logical copies of another primary farm (but not literal copies or backups of) that have their own logical copies of the same service-apps as the primary site running, but running backups of content-databases that are regularly shipped from the primary site. That means if a bad upgrade kills the search application on the primary site, we can just failover to the DR site because the DR site has its own search application with completely separate and independent databases etc, and this is a key reason you’d have a DR site. Having a stretched farm wouldn’t help you at all here.
Here’s how a DR + primary farm looks:
Here we see everything working as intended with content-updates arriving at the DR site, but everything else being a separate instance. For how to set this up, see this blog-post.
Any failure on the primary site can be mitigated by failing over to the secondary site:
This gives a lot more breathing room to figure out what’s going on with the primary site should there be any issues. This only works because we’re not mirroring the data on both sides though; each site keeps its own farm plus content copy only. Sure, if a huge problem with a content database happens we’re out of luck the same but we’ve at least hugely reduced the failure points for everything else. Setting up a SharePoint DR farm I covered @ https://blogs.msdn.com/b/sambetts/archive/2013/10/11/hot-standby-disaster-recovery-sharepoint-farms-basic-setup-amp-failover-high-availability-sharepoint.aspx if you're interested in how to do it.
And that’s it! I hope that’s helped someone at least clarify the ups & downs of each strategy.
Cheers,
// Sam Betts
Comments
Anonymous
April 17, 2014
Hi Sam, Great post, I am glad to see a post breaking it down. Support on stretched farms is an issue as most companies do not meet the SQL latency requirement of 1ms one way or 2ms bi-directional. Consider the fact that SharePoint will determine which server handles the workload internally means potentially a varied response to the end user. Basically if a workload is pasted off to a server in the stretched location the user experience will be impacted. I completed some internal tests which showed quite a remarkable impact on SharePoint when latency numbers were not adhered to and as such stretched farms should be very carefully considered. The "rules" for support on stretched farms must be reviewed and factored into the design requirements. technet.microsoft.com/.../cc748824(v=office.15).aspxAnonymous
October 20, 2014
Thanks for the comments! Yep that's important info although if I'm not mistaken this requirement has been lowered slightly since. The point is if there were performance problems overall and the network wasn't up-to-speed we'd not guarantee it's fixable.Anonymous
November 08, 2015
Thanks Samuel. Nice Article.