Efficient selection and partial sorting based on quicksort
Most people who have studied algorithms remember quicksort, the ubiquitous sorting algorithm available in the standard library of nearly every programming language implementation in existence, including C, C++, Java, and the .NET Framework. Quicksort is a compelling example of how algorithms with poor worst-case behavior but good average-case or expected behavior can be highly practical. Much less well-known, however, are the simple variations on quicksort that can improve stack space usage and solve other important problems efficiently.
Let's briefly review quicksort. Quicksort is fundamentally based on the efficient partition operation, a linear-time operation which divides a list into two parts: the elements less than a value, and the elements greater than or equal to a value. The value is called the pivot value, and is usually chosen to be one of the values in the list. In efficient implementations of quicksort, partitioning is usually implemented as an in-place operation that just shuffles elements of the array around as it scans over it. Quicksort operates by simply choosing a pivot value, running partition on the array, and then recursively running quicksort on the two partitions:
// Sort the elements of a between indexes left and right inclusive
void quicksort(a, left, right) {
if (right > left) {
Choose a pivot element a[pivotIdx] with left <= pivotIdx <= right
// Here, partition() returns the new index of the pivot element
newPivotIdx = partition(a, left, right, pivotIdx);
quicksort(a, left, newPivotIdx-1);
quicksort(a, newPivotIdx+1, right);
}
}
Much research has gone into the problem of choosing a good pivot element, but simply using a typical pseudorandom number generator to select an array index is usually sufficient to get efficient performance on average. A common and effective optimization is to switch to insertion sort for small subarrays, for which insertion sort is considerably faster than quicksort (and pretty much every sorting algorithm). A useful variation on quicksort is the partially tail-recursive variant, where we only make a recursive call for the smaller of the two sublists, and use a loop to deal with the other one:
// Sort the elements of a between indexes left and right inclusive
void quicksort(a, left, right) {
while (right > left) {
Choose a pivot element a[pivotIdx] with left <= pivotIdx <= right
newPivotIdx = partition(a, left, right, pivotIdx);
if (newPivotIdx - left > right - newPivotIdx) {
quicksort(a, newPivotIdx+1, right);
right = newPivotIdx - 1;
} else {
quicksort(a, left, newPivotIdx-1);
left = newPivotIdx + 1;
}
}
}
This variant has worst-case logarithmic space usage, and in practice makes about half as many calls, making it more competitive with in-place sorting algorithms like heapsort. It's especially useful on memory-limited systems, but since stack space often comes at a premium, it's also useful on desktops.
Now let's turn to how we can solve different problems using variants on this idea. Related to sorting is the selection problem, where we are asked to find the kth smallest element in a list. A common special case of this is finding the median. Obviously, we can just sort the list and index, but this isn't terribly efficient - this is only a good idea if we're going to be selecting many different elements and the list is static. Instead, notice this important property of the partition function: the pivot element is always placed in its final sorted position, while all elements which come before it in sorted order are before it, and all those that come after it in sorted order are after it. Consequently, if we know the sorted position of the element we're looking for, we already know which partition it's in - just check k against the index of the pivot after the partition operation:
// Get the value of the kth smallest element in the list
object quickselect(a, k) {
int left=0, right = a.Length - 1;
while (right > left) {
Choose a pivot element a[pivotIdx] with left <= pivotIdx <= right
newPivotIdx = partition(a, left, right, pivotIdx);
if (k < newPivotIdx) {
right = newPivotIdx - 1;
} else {
left = newPivotIdx + 1;
}
}
return a[k];
}
This is nearly the same as the second quicksort example above, but we no longer need to make any recursive calls - this algorithm is in-place. Moreover, because the sizes of the lists we visit shrink geometrically on average, the total expected time is only O(n), or linear time. The algorithm is also highly efficient in practice, and is used in all implementations of C++'s nth_element
call. There is a variation on this scheme with worst-case linear time that I won't discuss here, because it's not as efficient in practice, but it was one of the most important early theoretical results in algorithms.
A related problem is finding the k smallest or largest values in a list, or the "top k". This is important, for example, in cases where you want to display just the top results from a search based on some numeric metric of relevancy (or if you want to start your own pop music countdown). The simplest algorithm, running the first k iterations of selection sort, requires O(kn) time, which is quite expensive. Instead, we can exploit the properties of partition to avoid many of the recursive calls made by a complete quicksort:
// Rearrange the elements so that the k smallest elements appear at the beginning
void quicksortSmallestK(a, left, right, k) {
while (right > left) {
Choose a pivot element a[pivotIdx] with left <= pivotIdx <= right
newPivotIdx = partition(a, left, right, pivotIdx);
if (k <= newPivotIdx) {
right = newPivotIdx - 1;
} else if (newPivotIdx - left > right - newPivotIdx) {
quicksort(a, newPivotIdx+1, right);
right = newPivotIdx - 1;
} else {
quicksort(a, left, newPivotIdx-1);
left = newPivotIdx + 1;
}
}
}
This algorithm uses O(n + klog k) expected time and O(log k) space, and again is quite efficient in practice. It can also be generalized to compute an arbitrary subarray of the sorted array (the jth through kth smallest values in the list in order).
When it comes to solving ordering problems, partitioning and quicksort-based algorithms are really your swiss army knife. Keep them in mind next time you encounter a problem like this.
Comments
Anonymous
January 19, 2006
Derrick
Your posts make my nose bleed. Keep it up :-)
cheers
MikeAnonymous
February 01, 2006
Good stuff - well explained.Anonymous
May 10, 2006
OMG, I didn't realize people used this in real life. I thought it was just something I had to learn in my data structures class. Cool.Anonymous
September 05, 2006
Hi Derrick. Nice posts, I registered to your feed two weeks ago and most of the things I received were highly interesting.
One thing you could add (but maybe is it too much trouble?) is external links to your sources. On the other hand, if I stop being lazy and search for them (or if anybody else reading this blog does), maybe we can post them as comments, to share?Anonymous
May 21, 2007
in worst case, if we always select the worst pivot, then (k <= newPivotIdx) becomes true n-k times, and quicksortSmallestK becomes O(n*n)... it is the very worst case.Anonymous
April 04, 2008
Thank you so much for this wonderful post! Just exactly what I need!