Udostępnij za pośrednictwem


Subtle Bugs

It wouldn't surprise me for people to wonder how it’s possible for reasonably competent programmers to introduce bugs by making seemingly innocuous changes to some program’s code. The short answer is that we’re often too clever for our own good. We’ll employ little tricks that subtly change the semantics of the data that we’re manipulating. These small changes in semantics can result in weird bugs.

There’s a good example of this in the sample code that Apple distributes with the developer tools for Mac OS X. One bit of sample code implements a rudimentary framework for Carbon Human Interface objects. Among this code is the implementation of a wrapper C++ class for Carbon Events. The following is a method implemented in this class. It extracts a Boolean parameter from a Carbon Event, and returns it in a pointer to a bool.

//-----------------------------------------------------------------------------------

//     GetParameter

//-----------------------------------------------------------------------------------

//

OSStatus

TCarbonEvent::GetParameter(

    EventParamName  inName,

    bool*           outValue )

{

    return GetParameter<Boolean>( inName, sizeof( Boolean ), (Boolean*) outValue );

}

For my non-programming readers, both a Boolean and a bool represent values in Boolean algebra. Boolean algebra is the algebra of logic, and Boolean variables take on only one of two values: true or false. In computing practice, and in the C++ Language, false is equivalent to 0, and true is any non-zero value.

The problem, however, is that Boolean and bool are different language types. Boolean represents the traditional Macintosh implementation of the concept, while bool is the concept as specified in the C++ Language standard. Boolean is type-equivalent to an unsignedchar—a single-byte quantity, while the underlying implementation of bool can change from one compiler to the next. In Metrowerks, for example, bool is usually type equivalent to an unsignedint—a four-byte quantity.

And therein lies the problem. The code above blandly takes a pointer to a four-byte quantity and passes it to a piece of code that expects a pointer to a single-byte quantity. This would not be a problem on Intel processors, but the PowerPC is big endian. On a big endian system, when you take the address of a four-byte quantity the resulting pointer actually points to the highest-order byte (i.e. the “big” end of the four-byte quantity) rather than the lowest-order byte (i.e. the “little” end of the four-byte quantity).

Suppose the called code uses the number 1 as a constant value representing true. If the requested parameter of the wrapped Carbon event is true, then this implementation of GetParameter will put the number 1 in the high-order byte of outValue. Instead of getting the number 1, then, outValue actually gets the number 16777216 (or 0x1000000 hexadecimal).

Now, in terms of language semantics, there’s nothing specifically wrong with this notion. Both numbers, 1 and 16777216, being non-zero values are valid representations for true. However, most compilers favor using the number 1 to represent the actual value for the constant true, and “clever” programmers often take advantage of this fact, and will use the result of a Boolean expression in arithmetic expecting a true result to be the same as the number 1.

If some other programmer has written a piece of code that takes advantage of this tendency to use 1 to represent true, the implementation above breaks that code. This is how subtle bugs enter programs: two programmers being “clever” in mutually exclusive ways.

The immediate problem with the above code is that it doesn’t make its cleverness explicit. One way to do this is to add a CompileAssert:

#define CompileAssert(_expr) extern int _compile_assert_failed[!!(_expr)]

//-----------------------------------------------------------------------------------

//     GetParameter

//-----------------------------------------------------------------------------------

//
OSStatus

TCarbonEvent::GetParameter(

    EventParamName  inName,

    bool*           outValue )

{

    CompileAssert( sizeof(bool) >= sizeof(Boolean) );

    return GetParameter<Boolean>( inName, sizeof( Boolean ), (Boolean*) outValue );

}

Adding the CompileAssert not only makes the assumption explicit, it will generate a compiler error should we ever move this code to a system where the assumption is no longer valid. But that doesn’t really resolve the underlying problem.

Another attempt to resolve the problem might to replace the C-style cast with a C++ explicit static_cast:

//-----------------------------------------------------------------------------------

//     GetParameter

//-----------------------------------------------------------------------------------

//
OSStatus

TCarbonEvent::GetParameter(

    EventParamName  inName,

    bool*           outValue )

{

    return GetParameter<Boolean>( inName,

                                  sizeof( Boolean ),

                                  static_cast<Boolean*> (outValue) );

}

This actually works provided that our particular compiler defines bool in a manner that’s type equivalent to Boolean. It has the added benefit that it will generate a compiler error should the underlying type-equivalence ever change. Unfortunately, if we’re already working with a compiler where Boolean and bool are not type equivalent, then our code no longer compiles. Metrowerks, for example, generates a compiler error with the message, illegal explicit type cast from bool* to Boolean*.

For my money, the best solution is the pedantically correct solution:

//-----------------------------------------------------------------------------------

//     GetParameter

//-----------------------------------------------------------------------------------

//
OSStatus

TCarbonEvent::GetParameter(

    EventParamName  inName,

    bool*           outValue )

{

    OSStatus result;

    Boolean value;

    result = GetParameter<Boolean>( inName, sizeof( Boolean ), &value );

    if (noErr == result)

       *outValue = value != 0;

    return result;

}

Note that this solution doesn’t simply assign value to the location in memory pointed to by outValue. Rather it assigns the result of the Boolean expression value != 0 to value.

When you work on a large software project with a number of other programmers, being so pedantically correct as to use real Boolean expressions for Boolean results will often mean the difference between code that works and plays well with others and code that introduces subtle bugs that can take you hours to track down, or worse, months before users actually discover that the bugs exist.

 

Rick

Currently playing in iTunes: Rhyme & Reason by Dave Matthews Band

Comments

  • Anonymous
    April 10, 2005
    I've seen similar bugs twice since I started my current job. The thing that helped me find and fix it was using the Hex view in Metrowerks' debugger. It shows you exactly how big a variable is. I watched the variable mangling happen every time an informal C cast was attempted.

    This happened in our cross platform code (part of our engine that is used by both Windows and Macintosh) and was only a problem for the Macintosh side because on Windows the two types are the same size and on Macintosh they are different.
  • Anonymous
    April 10, 2005
    And, on Intel CPUs, the size difference doesn't matter, because Intel processors are little-endian. Cast a long * to a short *, and the short * still points to the low-order two bytes of the original long in memory. You got doubly-bit by informal, C-style casts.
  • Anonymous
    April 24, 2005
    The comment has been removed
  • Anonymous
    June 18, 2009
    PingBack from http://gardenstatuesgalore.info/story.php?id=1587