Unexpected Double Network Traffic on Writes in a 2-Node S2D Cluster with Nested Mirror-Accelerated Parity

Alex Bykovskyi 2,166 Reputation points
2024-11-14T11:02:46.8233333+00:00

Hi all,

 

I work at StarWind, and I'm currently exploring the I/O data path in Storage Spaces Direct for my blog posts.

I’ve encountered an odd behavior with doubled network traffic on write operations in a 2-node S2D cluster configured with Nested Mirror-Accelerated Parity.

 

During write tests, something unexpected happened: while writing at 1 GiB/s, network traffic to the partner node was constantly at 2 GiB/s instead of the expected 1 GiB/s.

 

Could this be due to S2D configuring the mirror storage tier with four data copies (NumberOfDataCopies = 4), where S2D writes two data copies on the local node and another two on the partner node?

 

Setup details:

The environment is a 2-node S2D cluster running Windows Server 2022 Datacenter 21H2 (OS build 20348.2527). I followed Microsoft’s resiliency options for nested configurations as outlined here: https://learn.microsoft.com/en-us/azure-stack/hci/concepts/nested-resiliency#resiliency-options
and created a nested mirror-accelerated parity volume with the following commands:

New-StorageTier -StoragePoolFriendlyName s2d-pool -FriendlyName NestedPerformance -ResiliencySettingName Mirror -MediaType SSD -NumberOfDataCopies 4

New-StorageTier -StoragePoolFriendlyName s2d-pool -FriendlyName NestedCapacity -ResiliencySettingName Parity -MediaType SSD -NumberOfDataCopies 2 -PhysicalDiskRedundancy 1 -NumberOfGroups 1 -FaultDomainAwareness StorageScaleUnit -ColumnIsolation PhysicalDisk -NumberOfColumns 4

New-Volume -StoragePoolFriendlyName s2d-pool -FriendlyName Volume01 -StorageTierFriendlyNames NestedPerformance, NestedCapacity -StorageTierSizes 820GB, 3276GB

A test VM was created on this volume and specifically hosted on the node that owns the volume, avoiding any I/O redirection (as ReFS volumes operate in File System Redirected Mode).

Testing approach:

Inside the VM, I ran tests with 1M read and 1M write patterns, setting up controls to cap performance at 1 GiB/s and limit network traffic to a single cluster network. The goal was to monitor network interface utilization.

During read tests, the network interfaces stayed quiet, confirming that reads were handled locally.

However, once again, during write tests, while writing at 1 GiB/s, I observed that network traffic to the partner node consistently reached 2 GiB/s instead of anticipated 1 GiB/s.

 Any ideas on why this doubled traffic is occurring on write workloads?

 Would greatly appreciate any insights!

For more background, here’s a link to my blog article with a full breakdown: https://www.starwindsoftware.com/blog/microsoft-s2d-data-locality

Windows Server Clustering
Windows Server Clustering
Windows Server: A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.Clustering: The grouping of multiple servers in a way that allows them to appear to be a single unit to client computers on a network. Clustering is a means of increasing network capacity, providing live backup in case one of the servers fails, and improving data security.
1,014 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.