I'm Putting On My Top Hat, Tying Up My White Tie, Brushing Out My Tails -- In That Order
I thought I might blog briefly on an interesting algorithm which can be implemented very elegantly in JScript, but a lot of scripters don't know about.
Most scripters know about the sort method on the Array prototype -- you give it an array and (optionally) a comparison function, and it sorts the array so that every element is less than or equal to the following element. However, as Raymond pointed out a while back, the standard sort algorithm only works if your comparison function can impose a total order. What if you only care about a partial order? That is, you know that some things must come before others, but there are some items that can sensibly come in any order? You can't use the built-in sort method for that, and Raymond never did say how to do it in his post.
Let me give you a concrete example of what I mean -- about twice a year I end up going to some event that requires me to wear my formal evening clothes. (Who do I think I'm fooling here? I live in Seattle, the least formally dressed city in America! I should say, I go to some event where I can get away with wearing formal evening wear, and I'll take every chance I can get to amortize the cost of that tailcoat!) Of course, this presents a problem -- what goes on first? You don't want to put on the vest before the shirt, but it doesn't really matter whether the top hat goes on before or after the gloves. What we need is a partial ordering.
The sort algorithm that produces partial orders is called topological sort, and it is very easy to implement in JScript. First, let's start with a dependency list that defines what comes before what:
var deps =
{
tophat : [ ],
bowtie : ["shirt"],
socks : [ ],
pocketwatch : ["vest"],
vest : ["shirt"],
shirt : [ ],
shoes : ["trousers", "socks"],
cufflinks : ["shirt"],
gloves : [ ],
tailcoat : ["vest"],
underpants : [ ],
trousers : ["underpants"]
};
As you can see, I'm using JScript's object literal and array literal syntaxes to define a data structure where each top-level member maps to a list of the members that we know must come before it. Now all we need is to figure out a partial ordering that meets every one of these conditions.
Here's what we're going to do: we're going to take every item and examine its list of “dependent children“. We'll then examine their dependencies, and so on until eventually we find one that has no dependencies. For example, suppose we pick "shoes". It depends on "trousers", which depends on "underpants", which depends on nothing. OK, great, so we've reached the bottom -- so what?
This is very helpful because clearly the ones "at the bottom" are the ones you must do first! And of all the items at the bottom, you can do them in any order. That's the crux of the algorithm: if you then remove the current bottom item from consideration and find any other bottom, and keep doing that until everything is gone, you've got yourself a correct partial ordering.
We can do this very efficiently -- the key is that if we run backwards up that descent I just described, we end up with a good partial ordering: "underpants", "trousers", "shoes". By diving down until we can go no deeper, we guarantee that we'll never list a later node before one that must come earlier. We just repeat this process recursively for every dependency of every node, and we're done.
Of course, we'll have to make sure that we don't accidentally hit the same item twice. Both “bowtie” and “vest” depend on “shirt”. Once we've hit bottom on “shirt” once we want to never do so again, so every time we've examined any item once we'll mark it as "dead".
How are we going to run back up the list? We'll just write a recursive function to do the descent; as we return up the stack we'll stick the current item in the stack frame onto our list.
It'll make sense when you look at the code, honest.
function topoSort(dependencies)
{
var dead = [];
var list = [];
Step one: everything is alive:
for (var dependency in dependencies)
dead[dependency] = false;
Step two: visit every dependency on the list -- we don't want to miss any!
for (var dependency in dependencies)
visit(dependencies, dependency, list, dead);
return list;
}
function visit(dependencies, dependency, list, dead)
{
Step three: If we're visiting some place we've already been, then we need go no further.
if (dead[dependency])
return;
Step four: We're about to explore this entire dependency, so mark it as dead right away, and then recursively explore all this dependency's children, looking for the bottom.
dead[dependency] = true;
for (var child in dependencies[dependency])
visit(dependencies, dependencies[dependency][child], list, dead);
Step five: As we return back up the stack, we know that all of this dependency's children are taken care of. Therefore, this is a “bottom“. Therefore, put it on the list.
list.push(dependency);
}
If we actually run this thing:
print(topoSort(deps));
We end up with a correct partial order:
tophat,shirt,bowtie,socks,vest,pocketwatch,underpants,trousers,shoes,cufflinks,gloves,tailcoat
Pretty neat, eh? Not, I confess, the order in which I'd normally dress, but it would work.
One down side to the algorithm as I've implemented it here is that dependency lists with cycles are not detected. Obviously there can be no partial ordering if foo comes before bar AND bar comes before foo, but this algorithm imposes one anyway rather than producing an error. That's a subject for another entry though.
Comments
Anonymous
March 16, 2004
Hi, Eric.
I understand the code you're setting up well enough, and I agree that it's a quite elegant solution.
But (and there always seems to be one) I think the example makes me think that 80% of the hard work has to be done by the programmer in creating the dependency graph. Because of the example chosen, all it looks like on first inspection is that you are merely serializing a data structure. And you know what? Maybe that's the point - but it seems to me that a trick that requires 80% effort to set up isn't a real productivity enhancer.
So while typing this, I realized that there was another example that people may relate to that may show the value of this useful function.
Consider a software application that is extensible with plug-ins. If some of the plug-ins are dependent on others, then a partial sort is required to determine the order in which they are loaded so that errors can be prevented. Further, since this application has an active plugin developer community, the list of plug-ins is always growing.
The advantages of using a partial sort algorithm as described becomes clear: the dependency graph is much easier to maintain over time than a compiled list -- especially once the number of plugins reaches a large number (50-100). Finally, if someone only wanted to install a subset of plug-ins, they could query the dependency graph with that subset to realize the ordering of their subset.
This example also highlights why you need to be careful of cycles, so I'm looking forward to seeing what you'll say about cycle detection. All I know is that it's a hard problem (if it wasn't, then Go would be a lot easier to write AI opponents for).Anonymous
March 17, 2004
Indeed, a nice sample. Although your implementation obviously works, the dead variable should be initialized to {}, not to []. This is because it's used as a set, not as an array. Also, in real life (as opposed to an online sample) I would define the visit() function inside topSort() so that it does not pollute the global scope. Done this way, you also avoid the need to pass list and dead as parameters because they are in the closure - less typing and also slightly better performance I believe.Anonymous
March 18, 2004
The comment has been removedAnonymous
March 18, 2004
http://blogs.xtras.net/mikes/PermaLink,guid,57000f55-832e-47e7-9895-2658d4e65d52.aspxAnonymous
September 06, 2007
I’ve mentioned recursive programming techniques several times over the years in this blog ( TopologicalAnonymous
October 11, 2007
A Joel On Software reader asked the other day for examples of recursive functions other than old chestnutsAnonymous
June 14, 2008
PingBack from http://mikeschinkel.com/blog/optimaldressingorachallengetoericlippert/Anonymous
November 29, 2010
Under step four, the call to visit, you have this: visit(dependencies, dependencies[dependency][child], list, dead); Should the second argument be simply child, instead of dependencies[dependency][child]? The latter doesn't make any sense to me, and the former seems to work. Matthew The former does not work; did you actually try it? It gives the output "tophat, 0, bowtie, socks, pocketwatch, vest, shirt, 1, shoes, cufflinks, gloves, tailcoat, underpants, trousers", which first, has you putting on your vest before your shirt, and second, has mysteriously gained items "0" and "1" for some strange reason. That is completely and utterly broken. What you have forgotten is that the "for-in" loop does not give you the members of a collection (as the foreach loop does in C#). It gives you the indicies of the members of the collection. "child" is an index into an array; zero or one, in this case.
- Eric