Archiving Questions: Do Tiered Storage and Stubbing make sense?

This blog business has been turning out to be more fun than I expected - maybe I should be nicer and less sarcastic when the marketing team suggests these ideas. 

This time, answering the questions from my archiving blog and video have led into some areas which are a bit more controversial….. there are two key issues that we talk about in my latest video:

1)  Tiered storage: is it a great way of reducing costs?
2)  Stubbing approaches to archives:  the world’s most elegant architecture or what?

Tiered storage is a conversation I have been having in a variety of different contexts with numerous storage experts for the past decade.  In short, the idea that you can lower your costs by separating out your data into different tiers and putting that data on different types of hardware is only true under very restricted conditions.  Given the mass storage hardware available for the past decade however, those conditions simply do not occur for content that needs to be, in any meaningful way, 'online'.  So, if you are not thinking that one of your tiers of content is a bunch of data stuffed onto tapes that are squirreled away in a vault offsite and you expect to access only a tiny fraction of that content, then it is pretty clear that the lowest cost solution is to put all of your data, hot and cold together, onto the biggest cheapest drives you can find.  End of story.  I have been nibbling away at this theme a bit in previous posts but this is an attempt to make it as black and white as possible:  tiered solutions for online data will increase your costs.  It will make you less green, it will cost you more $$, it will make you less efficient and it will reduce the productivity of your users. 

The other issue which was raised in response to the archiving video was related to stubbing – moving the bulk of the message out of Exchange and leaving only a link or ‘stub’ behind. I consider the stubbing approach it be one of those kluges that occur in the software industry regularly that are done out of necessity.  In Exchange 2003 and earlier, to avoid the tyranny of tape back-up solutions for an online Exchange store, they were about the only thing many customers could do.  Even so, a tiny fraction of customers ever deployed solutions based on this approach.  Those that did were not very happy but realistically didn't have any alternatives.  Now that there are real alternatives, the complexity, fragility, incompleteness and expense of stub solutions should make anyone thinking about deploying them pause and think for a very long time.

Perry

Comments

  • Anonymous
    January 01, 2003
    @koolhand  The key point to recognize in replacing a backup strategy with a replication strategy is that recovery time stops being proportional to size of storage and becomes constant.  It takes the same 30 seconds or so to activate your replica whether there is 2GB of data or 2TB of data in the database.  Partitioning your data simply does not recover your data faster.  Check out my blog post on Exchange Data Protection - the video covers this point specifically starting at 2:50. Perry

  • Anonymous
    January 01, 2003
    The comment has been removed

  • Anonymous
    January 01, 2003
    @Paul  Yes backup strategies are expensive.  Don't do it.  That is my whole point.  Microsoft has not run backups on any of our internal email for years now because we have bet on our replication strategy and found that is provides more protection for a lot less money.  The overall implementation covers the cases of:  data protection in the face of multiple server and infrastructure failures; of users and administrators accidentally deleting data; of records retention for legal requirements (I doubt it comes as much of a surprise that our legal department has its fair share of discovery and legal hold activities) and of physical and logical corruption.  I think my previous blog posts covers most of those cases in more detail “Exchange Data Protection” and “More Exchange Data Protection – Beyond Replication” . Perry

  • Anonymous
    August 17, 2010
    Interesting points, but doesn't address backup, which represents the majority of the lifetime cost associated with storage.  If I can eliminate 90% of the data from my backups, then that represents a real cost savings in media and backup window.  It also drastically improves achievable restore SLAs.

  • Anonymous
    August 18, 2010
    The comment has been removed

  • Anonymous
    August 21, 2010
    The other thing missed by MS in this case is that I may want more copies (greater redundancy/HA) of my primary storage and less for my archive. Their cost saving argument assumes I have the same number of copies of my archive databases/mailboxes.

  • Anonymous
    August 23, 2010
    I totally agree with Perry's comments about stubbing..  We started going down that route with a 3rd party product and quickly realized what a mess it creates so we are going to deploy Exchange 2010 and use tiered archive storage once SP1 comes out. I do disagree with his premise that the expensive tier 1 storage would we wasted.  Although in larger environments SAN based raid groups might be dedicated to particular applications we often have different apps sharing raid groups as long as it does not create a performance issue.  We have very little waste across our SAN. I also agree with others comments about the SLAs for recovery.  Although we will use DAGs groups with one passive copy recovery time is still a big issue not only for disaster recovery where we want the Tier 1 data back fast but also for things like restores to RSGs.  If we have a true disaster which takes out our one and only data center when we will be recovering only the T1 data first and then going after the archive stores once all other critical systems are restored.

  • Anonymous
    August 24, 2010
    If stubbing were part of Exchange's built-in archiving, then client add-ons would be uncessary as either the CAS or the Mailbox roles could still present the entire message to the client; it'd be transparent to the user. The index and the "data" are already separate... only in on monolithic container (the jet database).  Since most of the IO is due to the indexes and headers, if you were able to put them on separate storage, since they’re so small, you could put them on SSD (500GB of SSD in a SAN isn’t that expensive) and put the message bodies on FATA/SATA and you’d have great performance and really cheap storage. Sure other vendors have implemented stubbing poorl (cough cough Symantec cough), but that hardly means that it cannot be done right (especially if it can be integrated into Exchange).

  • Anonymous
    August 25, 2010
    If you don't have access to another site or data centre to replicate too, then you are still stuck with shipping tapes offsite periodically.  I think it's those folks who need to worry about recovery time in the event of a disaster.  (Of course they are probably also the ones who could be considering moving it all to the cloud...)   They'd want to be able to restore tier 1 data before the old archive stuff.  But that doesn't mean that the data needs to be on different types of disks.  The other thing to remember is that Exchange 2010 doesn't need very fast disks to begin with... slap some big RAM in your servers and chances are that the disks you were considering for your "tier 2 cheapo" storage is probably good enough for the whole thing.  You can play with the storage calculator to see different combos.

  • Anonymous
    August 31, 2010
    What I do miss is Tiering beyond the current common SAN and DAS technology, no mention of SSD(see comment Jason). Plus getting really small mailboxes for your live data makes it possible to have less powerful mailbox servers. What I miss is any numbers on how the usage pattern of a dedicated set of archive servers and dedicated live mailbox servers can effect the numbers of mailboxes (for live data as they get smaller) or archives (as the performance hit is lower) per database. Then tiering does make sense, because you might be able to sell different RTO's (Koolhand comment) and buy hardware more specific to the requirements. And there's nothing wrong with aggressive mandatory archiving rules, if it beats having costly quota's and .pst sprawl on fileservers (and backup) or data-loss.

  • Anonymous
    October 05, 2010
    Not all 3rd party Archiving solutions require software to be installed for the stubbing functionality to work.  Also, most 3rd party Archiving solutions offer stubbing as an option, it is not mandatory.

  • Anonymous
    February 23, 2011
    For some of us, cost is not the issue.  Our end users expectations are such that archiving wityout stubbing is a non-started, particularly when indexing has not occurred.  MS appears to be trying to eat their gold partners by introducing Archiving 1.0 in exchange 2010, but those "in the know" realize that true to MS form, it won't be ready for production until Service Pack 3. That said, for a green field implementation, it is probably OK to start here.  For those that have stubbing solutions already deployed, it is a non-issue.