Unsigned considered harmful

(or is "xxx considered harmful" completely worn out as a meme?) 

 

I believe that, in general, people should avoid unsigned variables, even when dealing with quantities which should only be positive. I have three major problems with unsigned variables:

 

Subtraction doesn't always make sense

Unsigned numbers are used to model positive integers and positive integers aren't closed under subtraction.While subtracting two integers always gives another integer, subtracting two positive integers doesn't always give a positive integer. Integer subtraction is much more useful than positive integer subtraction!

 

Of course with wraparound the subtraction gives an unsigned integer, but that doesn't make arithmetic sense. Signed integers have the same problem, but only at the extremes where wraparound occurs. Even with idealized infinite-bit-length numbers subtraction of unsigned integers is problematic.

 

Lack of Error-checking on underflow

If subtraction does underflow most systems won't generate a runtime exception if they go negative (which would be useful), instead they silently wrap-around (which is not useful). In the end unsigned variables simply alias invalid negative values to positive ones.

 

Lack of Invalid Values

It is extremely useful when a type has support for a clearly invalid/unused value. This can be used both for error-checking and loop termination. Take a loop that iterates through a string backwards:

 

int IchLastCommaInString(const char * const sz) {

    int cch = strlen(sz);

    for(int i = cch-1; i >= 0; --i) {

        if(',' == sz[i]) {

            return i;

        }

    }

    return -1;

}

 

If we change the code to use an unsigned variable ("Array indexes can never be negative!") then we lose the ability to easily detect the termination condition:

 

int IchLastCommaInString(const char * const sz) {

    int cch = strlen(sz);

    for(unsigned int i = cch-1; i >= 0; --i) {  <= BAD! unsigned ints are always >= 0

        if(',' == sz[i]) {

            return i;

        }

    }

    return -1;

}

 

To use an unsigned variable we often end up on the dark and dismal path of "if(x > BIGNUM)" where BIGNUM is chosen to be a value so large that we 'know' that x is really a negative number in disguise.

 

Just use signed variables.

Comments

  • Anonymous
    March 26, 2009
    Or just lose the FUD, and perform the equally-easy termination check:    for(unsigned int i = cch; i > 0; --i) {        if(',' == sz[i-1]) {            return i;        }    }

  • Anonymous
    March 26, 2009
    The code above is another example of what I dislike about unsigned integers. In an attempt to avoid the underflow problem while still using an unsigned integer the code was made less clear and a bug was introduced (the index that is returned is off by one)! Even when the code is right, unsigned integers can create a maintenance boobytrap. It isn't clear why everything in the loop has to be offset by one so a future change might use the (natural looking) 'i >= 0' idiom, which will crash when a comma isn't present.

  • Anonymous
    March 26, 2009
    The comment has been removed

  • Anonymous
    March 27, 2009
    The comment has been removed

  • Anonymous
    March 28, 2009
    Stephen's point about underflow errors is a good one. The problem is that in C/C++ there isn't underflow checking and in C# checking is off by default (for performance reasons). Overflow is a lot harder to get to than underflow, so it worried me less. Unless the memory situation is desperate, use a 64-bit number instead of squeezing out that last bit from a 32-bit number. That makes it a lot easier to get the math right. The point about APIs is true, but using unsigned for filesizes creates a new problem -- what is the difference in size between two files? Take code that looks like this: fileInfo1.Length - fileInfo2.Length Although file sizes are always >= 0 the difference in size between two files can be negative. In a checked world this code will generate an exception and in an unchecked world (the default)this will produce a huge number. Take this C# code: uint size1 = 10; uint size2 = 20; Console.WriteLine("The difference is {0}", size1 - size2); At any rate, there are a lot of arguments for unsigned as well as against. The point about the iterators is a good one too (we should be raising our level of abstraction). It is possible for a smart, experienced programmer to get unsigned code right. In my experience though, using unsigned numbers does more harm than good and I now prefer to use the next largest signed number (e.g. int64) instead of an unsigned one.