Load Balancing at the Routing Service
This post subtitled “I hate to tell you this but….”
“Can the Routing Service do load balancing?” is one of the most
common questions I’m asked. The immediate answer I give is almost
universally disappointing: No. It’s actually hard to say that Load
Balancing (as most people mean it) should be done at the Routing
Service. But why? It seems like since I have this intermediary that
load balancing is one of the obvious applications. Here’s why it
doesn’t work as well or as easily as it should.
How would I
build a filter that allowed me to do something simple like Round Robin
load balancing? Well, it would seem that I would have a filter, one for
each destination that I wanted to send to, and as each message came
through, only one of the selected filters would respond that it
matched. So a setup like this:
RoundRobinFilter1 –>
Endpoint1
Round RobinFilter2 –> Endpoint2
RoundRobinFilter3
–> Endpoint3
Which we then fed 3 messages would result in
message 1 going to Endpoint1, message 2 to Endpoint 2, etc.
But
building this is hard :) Why? You immediately run into two issues:
- How do these MessageFilters exchange information? Basically
how does each filter know that this time it’s the one that needs to
match (and the other two know that they’re not)? - How does
this information stay around between messages? Remember that
MessageFilters are designed to be stateless, meaning that they don’t
normally have any information and would always match a given message the
same way.
We actually have a sample out that shows how
this would work, but it relies on a “creative” use of the MessageFilter
API. The hint is that MessageFilters can be a used as a factory to
create the MessageFilterTable that they belong to. Another key point is
that MessageFilterTables can contain child MessageFilterTables (usually
of a specific type). An example of this in the framework today is the
XPathMessageFilter. When you add a XPathMessageFilter to the Routing
Service’s MessageFilterTable, that’s actually not what happens.
Instead, this is the structure you end up with:
Where did this middle
XPathMessageFilterTable come from? Reflector gives us the answer.
When the XPathMessageFilter is added to the parent MessageFilterTable
(red), the parent MFT actually calls CreateFilterTable on the
XPathMessageFilter, at which point the XPathMessageFilter creates a new
XPathMessageFilterTable and returns it to the parent MessageFilterTable
to use.
When the parent
MessageFilterTable is asked by the RoutingService to return a list of
endpoints that match a particular message, the parent MessageFilterTable
in turn delegates this message to its own internal MessageFilterTables
and MessageFilters, building the full result set and returning it to the
Routing Service.
The sample RoundRobinMessageFilter inside the
Routing Service’s Advanced Filters sample (over
here) does much the same thing as the XPathMessageFilter. Now that
we have a custom MessageFilterTable in the mix, we can use that to
store state and coordinate the responses from the individual
MessageFilters that ultimately gets returned to the parent
MessageFilterTable. So can you build things this way? Absolutely. Is
it the best answer? Probably not. At the end of the day you’re still
stuffing state into the MessageFilter model, where it doesn’t belong.
Furthermore, implementing custom MessageFilterTables is a pain because
it requires you to stub in a lot of method calls that won’t ever be
used.
Another answer is a different custom MessageFilter, which
wouldn’t do RoundRobin specifically, but would send messages to a
particular endpoint 1/n of the time, on average, where n is the number
of possible destinations. This is also buildable, and could even be
stateless, but could also be tricky because you’d have to figure out how
to gaurentee that only one filter matched any given message. Just
having say four filters that all compute a rand() between 0 and 1 and
then determine if it’s in a particular range isn’t going to cut it,
since multiple filters could independently report that they matched. I
leave this one as an exercise to the reader, but one solution is to
predefine your ranges and use some piece of message data (such as say,
the time the message was sent, some random int that the client injected,
or a hash of some other piece of data) to pick which filter matches.
You could also get really creative with your use of the DynamicUpdate
functionality to periodically switch out which endpoints the Routing
Service was pointed to, but this seems a kludge.
The final
solution, and the one I’m going to recommend, is to use a different
extensibility point within the Routing Service. Note that the
MessageFilterTable that we use is a MessageFilterTable<IEnumerable<ServiceEndpoint>>
(something I alluded
to a while ago). Why not write your own custom
IEnumerable<ServiceEndpoint> and plug that into the right hand
side of the MessageFilterTable? That way your filter selection could be
simple (you could even use just a MatchAll if you didn’t need any other
semantics). Here’s what that would probably look like:
class
RoundRobinEndpointList :
IEnumerable<ServiceEndpoint>
{
ServiceEndpoint[] endpoints;
int head;
public
RoundRobinEndpointList(params ServiceEndpoint[] endpoints)
{
if (endpoints == null
|| endpoints.Length == 0)
{
throw new ArgumentException("One or more ServiceEndpoints must be
provided", "endpoints");
}
//Copy the list to ensure it doesn't change on us
(and so we can lock() on our private copy)
this.endpoints
= endpoints.ToArray();
}
public IEnumerator<ServiceEndpoint>
GetEnumerator()
{
int
currentHead;
lock (this.endpoints)
{
currentHead = this.head++;
if (this.head == this.endpoints.Length)
{
//Wrap back to the start
this.head = 0;
}
}
//If
one just wanted to return 1 endpoint as opposed to a list with
//backup endpoints this 'yield' is all that would
be needed.
//yield return
this.endpoints[currentHead];
//return
results [current] ... [last]
for
(int i = currentHead; i < this.endpoints.Length; i++)
{
yield return this.endpoints[i];
}
//return wrap-around (if any) [0] ... [current-1]
for (int i
= 0; i < currentHead; i++)
{
yield return this.endpoints[i];
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
}
Overall, this is a much more pleasing
implementation of RoundRobin, and could easily be extended to other load
balancing patterns as well. It has the distinction of being 1) quite
small, 2) easy to understand (especially compared to the other
implementations) 3) thread safe, and 4) performant (since we only need
to lock on a copy of a list instead of the actual list that other
MessageFilters might need to be using).
Comments
Anonymous
April 22, 2010
I'm using your RoundRobinEndpointList code in a prototype. I'm hosting the RoutingService in a Windows service. I have 2 WCF service endpoints (hosted in Windows services) that I'm routing requests to using the IRequestReplyRouter contract. I have a client that creates 4 threads that each send "ping" messages to the router as quickly as possible. By "ping" message I mean that the WCF service endpoints are just sending a response back immediately. Also, I'm using the NetTcpBinding throughout. When both WCF service endpoints are running, I'm seeing each request take ~6.6 ms. Not bad. If I stop one of the Windows services hosting a WCF service endpoint, performance drops to 500 ms per request. I think I understand what's going on. The RoundRobinEndpointList returns the down WCF service endpoint as the primary endpoint every other request. So the RoutingService tries to send the request to the primary endpoint first, finds that it is down and then routes it to the backup endpoint. It looks like it takes around 1 sec for the RoutingService to decide that the primary endpoint is down. Is there a way to improve performance when endpoints are unavailable? Thanks, Mark IrwinAnonymous
May 04, 2010
Hi Mark, In general, this issue is caused by the timeouts on your bindings, and isn't Routing Service specific. Try reduing the timeouts (to lower numbers so that the connections fail faster), in order to get the Routing Service to give up and move on. In genreal, you should also make sure that your inbound timeouts are ~= (send side timeout)*(number of failover endpoints). This way the client connection or request doesn't time out while the Routing Service is attempting to contact alternatives on its behalf.Anonymous
June 30, 2010
Hi I implemented round robin pattern mentioned in AdvancedFilter sample, its working fine rightnow. May this pattern cause any issue in future? Please guide me whether should I replace this round robin pattern with something else? Regards Jaimin Gandhi.Anonymous
February 01, 2011
How to use this class? I am new to using RoutingService.. I have my endpoint Urls coming from database, so i might need to add endpoints dynamically and also while geting the next from the roundrobin I might need to check how many requests being executing on the clinet to send the request to it or move next..Anonymous
February 09, 2011
Hi Matt, I want to use this custom class as its seems simpler than the MSDN sample of the same. What I did is I have list of services running on 4 different machines and all four are different in configuration thus I was thinking to use the class like create endpoint list for highest configuration machine ServiceEndpoints []machine1EPs RoundRobinEndpointsList list = new RoundRobinEndpointsList(machine1EPs); rc.FilterTable.Add(new MatchAllMessageFilter(), list,0); and do similar for other three machines with increating the priority by one thus to achieve if all the services on highest configuration machines are full (I had put a check in the class above in the enumerator that my one endpoint can not take more than 20 request else it gives me next from the list). If I implement it like above adding four entries in the FilterTable, when all services on one machines are full with priority 0, will my router move to iterate through the endpoints in the second group (priority 1) ? Also for a single machine is the code I wrote above is correct (usage of the class) ? Thanks in advance..Anonymous
February 20, 2011
Hi, Finally was able to achieve what I wanted by adding few lines of code, Thanks for the idea, it helped a lot.Anonymous
July 28, 2012
I am using WCF Router Round Rabin Load balancing method between two Servers (same service). When I distribute requests through this Router to Wcf Service taking long time mean at least 3 seconds and response is within milliseconds only. Could you please guide me about defect of this…