Udostępnij za pośrednictwem


Spot the defect: Bad comparisons, part four

One more easy one. I want to "sort" a list into a random, shuffled order. I can do that by simply randomizing whether any two elements are greater than, less than, or equal to each other:

myList.Sort((x, y) => (new Random()).Next(-1, 2));

That generates a random -1, 0 or 1 for every comparison, right? So it will sort the list into random order, right?

.

.

.

.

.

.

.

There are multiple defects here. First off, clearly this violates all our rules for comparison functions. It does not produce a total ordering, and in fact it can tell you that two particular elements are equal and then later tell you that they have become unequal. The sort algorithm is allowed to go into infinite loops or crash horribly when given such an ill-behaved comparison function. And in fact, some implementations of sorting algorithms attempt to detect this error and will throw an exception if they determine that the comparison is inconsistent over time.

Second, every time this thing is called it creates a new Random class seeded to the current time, and therefore it produces the same result over and over again; hardly random.

Shuffling is not sorting; it is the opposite of sorting, so don't use a sort algorithm to shuffle. There are lots of efficient shuffle algorithms that are easy to implement.

Comments

  • Anonymous
    January 30, 2011
    Plus, it's unlikely this would produce a uniform distribution or any other well-defined distribution (assuming sort() even completes without an exception).

  • Anonymous
    January 30, 2011
    I mean, even if you fix the random seeding defect.

  • Anonymous
    January 31, 2011
    Long ago, when I worked as a developer in the Excel group, we used Excel for everything (I assume they still do :) ). Including shuffling. By sorting. :) The technique was simple: in a worksheet, one column of the data you want to sort (this was often a list of the programmers, for morale-related lotteries :) ). In the adjacent column, a fill-enter of "=rand()". Then, copy/paste special/values to lock the random numbers in place, finish up with a data/sort… to rank the rows by the random number. You could do the same exact thing with a deck of cards. Or whatever. So, you see: shuffling IS sorting. At least, some of the time. :)

  • Anonymous
    January 31, 2011
    A classic sort-based shuffle is to assign a random number to each piece of data and then sort them on that number.  I am not a fan of that algorithm.  My favorite shuffling algorithm remains Fisher Yates shuffle (aka Knuth Shuffle): en.wikipedia.org/.../Fisher–Yates_shuffle .

  • Anonymous
    January 31, 2011
    If you really wanted to use sort for shuffling, would there be harm in producing a tuple or a key/value pair, sort those, then split them up again for results?

  • Anonymous
    January 31, 2011
    @Joe: Sure it would pe possible, but here are two problems. First, you must ensure that you don't assign the same random number twice, as your sort algorithm might be stable (i.e. leave input in the smae original order if they have the same random number). Secondly, sorting has O(n log(n)) complexity, while shuffling can be done in O(n). In other words, for sorting to work, you need to shuffle nu numbers 1 to n. But if you can shuffle those, why not shuffle what you need to shuffle in the first place? Secondly, the sorting is likely to be slower than just plain shuffling. Brian is right, have a look at Fisher-Yates shuffling.

  • Anonymous
    January 31, 2011
    Even if the randomization worked as expected, wouldn't this maintain some of the original ordering?  With this approach you effectively have 3 queues (-1, 0, 1) so the first item would almost never be the last except on the instance when it's the only item in the +1 queue.  So, on average, the first item would come from the first 3 items, the next from the second 3, etc.

  • Anonymous
    January 31, 2011
    It seems like you <em>could</em> get at least a stable result by using the following for comparison: list.Sort((x, y) => new Random(x).Next().CompareTo(new Random(y).Next())); That is, compare elements based on the first value produced by a Random object seeded using each element. This would at least give you consistency between comparisons of the same numbers. Of course, it would also create an excessive number of superfluous Random objects. I also have no idea how statistically "good" it would be (I'm guessing not very). Still, it seems like it's at least one <em>possible</em> way to "shuffle by sorting."

  • Anonymous
    January 31, 2011
    @Dan: Comparisons also have to be transitive: if a < b and b < c then a < c.

  • Anonymous
    January 31, 2011
    An interesting read on how non-uniform random distribution caused problems for the Microsoft default-browser form can be found at: www.robweir.com/.../microsoft-random-browser-ballot.html

  • Anonymous
    January 31, 2011
    @Simon Buchan: Thinking out loud, new Random(n).Next() will produce the same value each time it's called.  So really all that's happening is 1..x..y..n is getting mapped to R(1)...R(x)...R(y)...R(n) which will have a different order, but the transitive rule will still apply for this new order because of the consistent result.

  • Anonymous
    January 31, 2011
    The comment has been removed

  • Anonymous
    January 31, 2011
    "…but the transitive rule will still apply for this new order because of the consistent result." I agree, but it also means that it's not really a random ordering. You will always get the same new ordering every time you apply that algorithm. It's dependent only on the specific RNG being used. You might as well just shift all the bits some fixed amount, or multiply by some large prime, or some other kind of "mix-up" transformation and compare that result instead.

  • Anonymous
    January 31, 2011
    "The sort algorithm is allowed to go into infinite loops ...   " No, Young master. proof infinite of random very hard it is.

  • Anonymous
    February 01, 2011
    The comment has been removed

  • Anonymous
    February 02, 2011
    @Dan: Is that guaranteed to work, or might lazy evaluation bite you? Is .OrderBy guaranteed to enumerate its input only once? Since I don't know that guarantee, I'd be inclined to throw a .ToList() on the end of the first .Select(...) just to be safe.