Service Bus Relay Load Balancing–The Missing Feature (But Not For Much Longer!)

Load Balancing on the Service Bus Relay is by far our #1 most requested feature now that we’ve got Queues and Topics finally in production. It’s reasonable expectation for us deliver that capability in one of the next production updates and the good news is that we will. I’m not going to promise any concrete ship dates here, but it’d be sorely disappointing if that wouldn’t happen while the calendar still says 2011.

I just completed writing the functional spec for the feature and it’s worth communicating how the feature will show up, since there is a tiny chance that the behavioral change may affect implementations that rely on a particular exception to drive the strategy of how to perform failover.

The gist of the Load Balancing spec is that the required changes in your code and config to get load balancing are zero. With either the NetTcpRelayBinding or any of the HTTP bindings (WebHttpRelayBinding, etc) as well as the underlying transport binding elements, you’ll just open up a second (and third and fourth … up to 25) listener on the same name and instead of getting an AddressAlreadyInUseException as you get today, you’ll just get automatic load balancing. When a request for your endpoints shows up at Service Bus, the system will roll the dice on which of the connected listeners to route the request or connection/session to and perform the necessary handshake to make that happen.

The bottom line is that we’re effectively making the AddressAlreadyInUseException go away for the most part. It’ll still be thrown when the listener’s policy settings don’t match up, i.e. when one listener wants to have Access Control enabled and the other one doesn’t, but otherwise you’ll just won’t see it anymore.

The only way this way of just lighting up the feature may get anyone in trouble is if your application were to rely on that exception in a situation where you’ve got an active listener on Service Bus on one node and a ‘standby’ listener on another node that keeps trying to open up a listener into the same address to create a hot/warm cluster failover scheme and if the two nodes were tripping over each other if they were getting traffic concurrently. That doesn’t seem too likely. If you have questions about this drop me a line here in the comments, by email to clemensv at microsoft.com or on Twitter @clemensv.