Поделиться через


Spot the defect: Bad comparisons, part two

Suppose I want to sort a bunch of strings into order first by length, and then, once they are sorted by length, sort each group that is the same length by some other comparison. We can easily build such a device with higher-order programming:

static Comparison<string> FirstByLength(Comparison<string> thenBy)
{
return (string x, string y) =>
{
// Null strings are sorted before zero-length strings; remember, we need to provide a total ordering.
if (x == null && y == null)
return 0;
if (x == null)
return -1;
if (y == null)
return 1;
if (x.Length > y.Length)
return 1;
if (x.Length < y.Length)
return -1;
// They are the same length; sort on some other criterion.
return thenBy(x, y);
};
}

Super. This idea of composing new comparison functions out of old ones is pretty neat. We can even built a reversal device:

static Comparison<string> Reverse(Comparison<string> comparison)
{
return (string x, string y) => -comparison(x, y);
}

Something is subtly wrong in at least one of these comparison functions. Where's the defect?

.

.

.

.

.

.

.

.

Let's restate that contract again. The comparison function returns a negative integer if the first argument is smaller than the second, a positive integer if the first is greater than the second, and zero if they are equal. Any negative integer will do, and in particular, Int32.MinValue is a negative integer. Suppose we have a bizarre comparison function that returns Int32.MinValue instead of -1:

Comparison<string> bizarre = whatever;

and we compose it:

Comparison<string> reverseFirstByLength = Reverse(FirstByLength(bizarre));

Suppose two strings are equal in length and bizarre returns Int32.MinValue for those strings. Reverse should return a positive number, but -Int32.MinValue either throws an exception (in a checked context) or returns Int32.MinValue right back (in an unchecked context). Remember, there are more negative numbers that fit into an integer than positive numbers, by one.

The right implementation of Reverse is either to spell it out:

static Comparison<string> Reverse(Comparison<string> comparison)
{
return (string x, string y) =>
{
int result = comparison(x, y);
if (result > 0) return -1;
if (result < 0) return 1;
return 0;
};
}

Or, as some commenters noted, to swap left and right:

static Comparison<string> Reverse(Comparison<string> comparison)
{
return (string x, string y) => comparison(y, x);
}

Next time: One more defect to spot.

Comments

  • Anonymous
    January 23, 2011
    Couldn't you implement reverse as "return (string x, string y) => comparison(y,x);" instead?

  • Anonymous
    January 23, 2011
    Why don't you change the implementation to: return (string x, string y) => comparison(y, x); ?

  • Anonymous
    January 23, 2011
    @Brian and @Patrick: You can. It's just that that doesn't appear immediately to be any better than returning the result of negating the original call. You basically need to remember the int.MinValue problem. Once you know it, it's easy to remember it... but I suspect that relatively few people who hadn't already come across it would pick that up in code review.

  • Anonymous
    January 23, 2011
    (Oops, hadn't spotted the longer implementation at the end of the post.) I would suggest that the "comparison(y, x)" implementation works in any sane implementation of IComparer<T>. No doubt you could produce weird implementations (e.g. ones that favoured the first argument somehow) but I'd argue that at that point all bets are off anyway. I can't think of any guarantees provided by Sort in terms of which value will be used for the first parameter and which for the second.

  • Anonymous
    January 23, 2011
    If we generalized Comparison to something that took two different data types then I could see why you'd need the longer version.  E.g. something like Comparison <int,string> where you compare the value of the int to the length of the string.  But I can't think of a great example where such a comparison would be useful.

  • Anonymous
    January 24, 2011
    I wonder why you introduced FirstByLength anyway...? It seems to me that Reverse(bizarre) will have the same unwanted behaviour - or am I missing something?

  • Anonymous
    January 24, 2011
    I'm assuming the next defect to spot isn't the missing '};' in the last code sample. According to Raymond's article that Eric referenced on Thursday: (blogs.msdn.com/.../55408.aspx) "...your comparison function needs to follow these rules: ... •Anti-Symmetry: Compare(a, b) has the opposite sign of Compare(b, a), where 0 is considered to be its own opposite. " So if we can make that assumption about the comparison function, we should be able to reverse it by reversing arguments. If we can't make that assumption, I agree with Jon, "all bets are off anyway." This is of course ignoring the fact that Eric's implementation discards any information in the return value of the inner comparison other than the sign. I'll leave it to others to decide whether that's a good thing or a bad thing.

  • Anonymous
    January 24, 2011
    @Tim Goodman: A comparison on two different types is useful when you want to update one sorted list using another sorted list of updates.  You need to compare elements from the two lists to do a merge, but update objects won't necessarily be of the same type as the objects being updated.

  • Anonymous
    January 24, 2011
    "I would suggest that the "comparison(y, x)" implementation works in any sane implementation of IComparer<T>" I suspect that a not-uncommon way to implement the Comparison<T> method is, when given input that is either numerical or easily mapped to something numerical, to just return the result of subtracting the second input from the first. While I would just delegate to the numerical type's own CompareTo() method, a) that might not always be possible (for custom numerical types that are not as fully implemented as the built-in types), and b) I can't say it's technically insane to use subtraction instead (in fact, in a micro-optimization sort of way it could be more efficient in some cases, trading the math for a method call). And subtraction makes even more sense if the difference calculation somehow just falls out of the data type naturally without an actual project to a numerical data type (not that I can think of any valid examples of that at the moment…but I don't feel comfortable ruling them out at the moment :) ). At least, this is what I came up with when I starting thinking about why the Comparison<T> (and related comparison APIs) are defined this way, rather than just requiring -1, 0, and 1 as return values in the first place.

  • Anonymous
    January 24, 2011
    @pete.d, consider subtraction and these results. Comparison<int> comparer = (a, b) => a - b; int x = comparer(3, 4); int y = comparer(int.MinValue, int.MaxValue); What's x? What's y?

  • Anonymous
    January 24, 2011
    The comment has been removed

  • Anonymous
    January 24, 2011
    I think using extension methods on the Comparison<T> type would probably be a cleaner way to compose, and would more closely match the LINQ syntax people are used to: var comparison = CompareByLength.Reverse().ThenBy(CompareAlphabetic);

  • Anonymous
    January 24, 2011
    @Anthony P: "int y = comparer(int.MinValue, int.MaxValue); What's x? What's y?" You are missing the point. I'm not talking about a general purpose comparison function. I'm talking about a scenario where the inputs are known to work fine for subtraction.

  • Anonymous
    January 24, 2011
    Now, wait a sec: "once they are sorted by length (!!!), sort each group that is the same length by some other comparison," isn't that what we ar supposed to do? If so, then why are we sorting by "some other comparison" within the same function that sorts by length, which is clearly before they are sorted by length??? Maybe my math is bad; maybe the result will be the same... but my gut feeling, plus a bit of military background that says "do exactly what you are told to do", are against this approach. Sorry if it does not sound very convincing...

  • Anonymous
    January 24, 2011
    The comment has been removed

  • Anonymous
    January 25, 2011
    Just in case anyone was thinking this kind of mistake would never make its way into production code, it has. The three implementations of LINQ to Objects which I suspect are most widely used (the one in .NET, Mono's implementation, and LinqBridge) all have bugs around OrderByDescending due to this. Joy.

  • Anonymous
    January 25, 2011
    @Jon, haha, wow. Yeah, I reproduced it. For those playing at home, use the overload that accepts an IComparer<T> implementation. Have it return int.MinValue where you would normally return -1, and experience the joy.

  • Anonymous
    January 25, 2011
    Do any of the VS project templates enable checked code in the Debug build? (C# Build->Advanced->"Check for arithmetic overflow/underflow")  I feel that they should to help catch such errors.  Hmm does VB enable 'checked' by default in all builds? (Compile->Advanced->"Remove integer overflow checks") I deal with bit parsing etc a lot so have learned to add checked and unchecked blocks/operators to my code but I guess they don't occur to most people.

  • Anonymous
    January 26, 2011
    I don't know if it's relevant, but it's the only documentation I found on MSDN regarding constraints on comparison functions: """Notes to Implementers Comparing null with any reference type is allowed and does not generate an exception. A null reference is considered to be less than any reference that is not null. """ (Regarding Comparer<T>.Compare). The Reverse method doesn't preserve this property (although I can't imagine where it would matter).

  • Anonymous
    January 26, 2011
    "The Reverse method doesn't preserve this property (although I can't imagine where it would matter)." I'm not sure what you mean. It is true that after reversing a comparison that meets that criteria, the new comparison no longer does. But that seems "by design" to me. After all, if null objects remained at the beginning of the sort order after reversing, it seems to me you haven't truly reversed the original sort order completely. I can see good arguments both ways, but at the end of the day it seems to me that the most common usage for a Reverse() composition is when you literally want the entire original sort order reversed. It won't comply strictly to the interface docs, but I think that's more because of an oversight in the docs, than because Reverse() is broken.

  • Anonymous
    January 26, 2011
    You're right. The problem is probably in this criterion, since it doesn't allow you to reverse lists that contain non null references.