Finding matches in unsorted collections (or: how to find pairs of socks efficiently)
I was sorting through my clothes after taking them out of the drier today, and I realized that over time I've naturally come up with a strategy to optimize my work.
The problem is as follows: given an unordered collection of elements (socks), find the pairs matched by a key (sock color / model). The first phase is similar to hashing: as I take an element (sock) out of the initial collection (basket), I move it to a smaller heap (in my case: black dress socks, brown dress socks, and sports socks). Determining which heap (pile) a sock belongs to is very cheap, and breaks down a big problem into smaller problems.
This is important because, unfortunately, the next phase of the algorithm grows pretty darn fast as the size of the elements grow. Because the elements don't have an order (although they can be compared for equality), I'm forced to scan through each heap.
The algorithm is now: pick the first element (closest sock from where I am), scan through all the elements in that heap (a visual scan over the pile usually does the trick), then remove the matching elements from the heap when found. Like other 'scan unsorted stuff' algorithms, looking for an element that is not there (e.g. because the sock is tangled inside some shirt) is expensive (and annoying).
So remember - if you can't sort, try to group by part of the equality comparison key! And of course, measure, measure, measure.
Comments
- Anonymous
July 03, 2005
Ah, but does your algorithm deal properly with the inevitable odd number of socks? - Anonymous
July 03, 2005
I think you have more socks than me!
The thing I have found is that it takes more time to find socks if there are more socks of that colour (because it takes longer to compare them).
To optimise for this I start with socks of the rarer colours - reducing the size of the set before performing the more expensive comparisons.
Of course, I am aided in this by my parallel processing capability, but it might be a useful algorithm in the case where some comparisons are much more expensive than others.
Without the parallel processing capability you'd need the expensive comparisons to cost more than an entire set of cheap comparisons. - Anonymous
July 04, 2005
Well, you could look for an odd number of elements, but more commonly you will have an even number of unmatched socks (Murphy's Law).
I have to admit, mismatched socks are always a problem... I'm still fine-tuning this. One friend of mine recommends buying all your socks of a single color / model, and then everything matches - and you don't have to do anything at all, just put everything in a drawer and pick any two socks. But that's a bit extreme for me. :D
(hey, good name for game - eXtreme Sock Sorting!)