LitwareHR on SSDS - Part V - Searching across Containers

In SQL Server Data Services, the scope of a query is bound to a Container, but in LitwareHR we had a requirement of searching entities across multiple tenants, and because in our implementation each tenant gets its own Container, we had to create a way of performing queries (the same query to be more precise) that spanned across Containers.

Remember, LitwareHR actually "owns" all containers. LitwareHR's metadata (which is simply another Container as explained in a previous post), contains information about the tenant including a pointer to the tenant's Container.

The trivial way of implementing a cross-Container search would be to iterate on all tenant containers and query each of them. Something like this:

 foreach (Container c in TenantContainerCollection)
{
    IEnumerable<Entity> result = proxy.Query(CreateScope(c.ContainerId), query);
    CombinedResults.Merge(result);
}
 return CombinedResults;

 

Naturally, this doesn't scale very well and is not taking advantage of the fact that this is a highly parallel problem. In fact, most probably each Query sent to SSDS will end up in a different node of the SSDS fabric, minimizing chances of server side contention.

So we created a helper class: CrossTenantSearch, that essentially takes an array of TenantId's, a query statement (SLINQ) and returns the combined result sets.

Internally, it will asynchronously launch several concurrent searches, each one on a different thread of a ThreadPool and then will wait for all of them to complete, merge the partial results into a single collection and return.

The following diagram illustrates an example of running 3 concurrent queries (with different colors to identify each thread). Notice how the red thread returns even before the 3rd thread is launched, the green thread is longer (potentially returning a lot of results) and the yellow is very short.

 

image 

After all of them return, results are combined into a single result set and sent back to the requestor. With this approach, the max time for a query is approximately the same as the longest query you have (e.g. the green thread in the diagram above). Caveat: this might not be valid for very large sets of containers, that is when there's a large number of containers you are searching in, because you might get contention issues on the client side and also the merging of all the results might take a lot of time.

There is of course room for improvements: paging, returning to the client partial results and allow the client to drive further fetches, etc. None of these have been implemented in LitwareHR and are left as an exercise to the reader.

Side note: when developing this feature we were puzzled by an exception we got related to thread apartments. It turns out by default, Visual Studio test runner runs in STA. There's a somewhat obscure configuration parameter that you need to add to the testrunconfig file manually, as it is not exposed in the configuration window:

 <ExecutionThread apartmentState="MTA"/>

See here for more info.

 

Comments

  • Anonymous
    April 14, 2008
    PingBack from http://microsoftnews.askpcdoc.com/?p=2430

  • Anonymous
    April 14, 2008
    This technique is actually a bit more scalable then you might think.  You need to keep in mind that each of these queries will hit different machines thus executing in less then time then if you had executed it against a single container.   You could make an argument about the latency (and you would have a point here) but longer term I think this will get better as other offerings emerge to help deal with the latency issue.

  • Anonymous
    April 15, 2008
    This is a perfect scenario for the CCR (in Microsoft Robotics Developer Studio) in which you could use an Arbiter.MultipleItemReceive combined with an Arbiter.Interleave. Handle faults and/or timeouts easily too.

  • Arvindra
  • Anonymous
    April 16, 2008
    Eugenio Pace in his 5th article in a series on SSDS gives a pattern for cross container queries. The

  • Anonymous
    April 16, 2008
    Eugenio Pace in his 5th article in a series on SSDS gives a pattern for cross container queries. The

  • Anonymous
    May 03, 2008
    Hi Eugenio, Any chance that we could see this pattern expanded to include paging within a container as well before doing the Union , currently there are limitations on how large the results sets are from a single container (500 rows?), as far as I can see this pattern would not address that. Cheers Tim J.

  • Anonymous
    May 05, 2008
    Tim Jarvis raised a good point in my post on cross-container queries which is: how to handle paging?

  • Anonymous
    May 06, 2008
    More seriously... Eugenio just announced and released on Codeplex the latest drop of LitwareHR. Although

  • Anonymous
    May 06, 2008
    Microsoft Architecture Strategy Team has just shipped a new version of their LitwareHR Sample Application

  • Anonymous
    June 16, 2008
    As many of you know, Microsoft has been working on a new version of LitwareHR that uses SQL Server Data