Windows Server 2012 – About Clustered Storage Spaces Issue
In this post I’ll examine a specific issue in Storage Spaces while using with Enclosure awareness.
Storage Space Setup with Enclosure awareness:
Note: There are only 2 enclosures for the disks.
The Scenario for the Issue:
1. Create Storage Pool in a cluster.
2. Use IsEnclosureAware flag to enable the Storage Space enclosure to be more resilient.
3. Connection to Storage Pool fails when either one of the Enclosures (e.g., Enclosure 2 here) is switched off.
The Cause for the Issue:
With Windows Server 2012, a majority of drives need to be available for the pool to be online. In a case where you have two enclosures and an equal amount of drives in each, a majority of drives will not be online if one enclosure is going offline.
ASolution for the Issue:
Using three enclosures will address this since a majority of drives will be online even in the event an enclosure goes offline.
References:
· Enclosure Model Used: https://www.aicipc.com/ProductDetail.aspx?ref=XJ3000-2242S
· SAS Controller Model: https://www.lsi.com/products/storagecomponents/Pages/LSISAS3801E.aspx
Comments
Anonymous
June 05, 2013
I would have thought that since half of the mirror stays up that Storage Spaces would be smart enough to keep the pool up as all the data is still online and available...Anonymous
September 19, 2013
This explanation is not very clear at all. Why is a majority of drives necessary for a two-way mirror when it's a 2:1 ratio? If that's the case, this suggests that if you simply put another some random extra disk in the second controller, it would stay online. If that's true, the requirement seems totally arbitrary. Could you please explain why this is a requirement? It would help people understand the technology better. It's so promising, but because we're left on our own as to the actual hardware implementation, one typically encounters these types of situations by accident (this is not in all the documentation). At that point, the "inexpensive commodity hardware" suddenly starts becoming a lot more expensive. It's often not possible to budget for those types of mistakes.Anonymous
May 28, 2014
Explanation is incorrect, at least based on what I read from official Microsoft documentation. The necessity for the tertiary enclosure is to host the quorum disk of the Scale-Out File Server Cluster. The reasoning is that, in certain situations (very rare), split-brain syndrome could result otherwise. For instance, assume the following:
a) Two node SOFS cluster. Both nodes attached to two (2) separate JBOD enclosures
b) Quorum disk is stored on a 2-way mirrored space using enclosure awareness (half the mirror on one enclosure, and have on the other for fail-over purposes)
c) There is a communication failure between cluster hosts (e.g. I stupidly simultaneously unplug SAS cables from SOFS1 to 2nd enclosure, and unplug SAS cables from SOFS2 to 1st enclosure
d) Both servers would have quorum... ...because have the mirror resides on each enclosure, resulting in the quorum disk being available on both SOFS servers… …making both active (each thinking they control the JBOD storage space CSV and quorum disk)… …resulting in potential corruption / storage failure.
Now, the example above is not "likely" to happen, but this is why it is unsupported and will not work. A third enclosure ensures that the quorum vote is correct. In the example I highlighted above, assuming 3rd-enclosure availability, one one node would control / own quorum disk, and only one node would be permitted to "own" the CSV1 on the attached enclosure. In node will then use "redirection" mode temporarily to write to the owner-controlled CSV, until all connectivity has been restored.