Condividi tramite


Working with Signed Non-Decimal and Bitwise Values [Ron Petrusha]

Recently, a number of questions have surfaced about the accuracy of the .NET Framework when working with the binary representation of numbers. (For example, see https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=295117.) The issue surfaces most clearly when we convert the hexadecimal or octal string representation of a numeric value that should be out of range of its target data type to that data type. For example, in the following code we would expect that an OverflowException would be thrown when we increment the upper range of a signed integer value by one, call the Convert.ToString method to convert this integer value to its hexadecimal string representation, and then call the Convert.ToInt32 method to convert the string back to an integer. Here is the C# code:

 const int HEXADECIMAL = 16;
// Increment a number so that it is out of range of the Integer type.
long number = (long)int.MaxValue + 1;
// Convert the number to its hexadecimal string equivalent.
string numericString = Convert.ToString(number, HEXADECIMAL);
// Convert the number back to an integer.
// We expect that this will throw an OverflowException, but it doesn't.
try {
    int targetNumber = Convert.ToInt32(numericString, HEXADECIMAL);
    Console.WriteLine("0x{0} is equivalent to {1}.",
                      numericString, targetNumber);
}
catch (OverflowException) {
    Console.WriteLine("0x{0} is out of the range of the Int32 data type.",
                      numericString);
}  

And here is the equivalent Visual Basic code:

 Const HEXADECIMAL As Integer = 16

' Increment a number so that it is out of range of the Integer type.
Dim number As Long = CLng(Integer.MaxValue) + 1
' Convert the number to its hexadecimal string equivalent.
Dim numericString As String = Convert.ToString(number, HEXADECIMAL)
' Convert the number back to an integer.
' We expect that this will throw an OverflowException, but it doesn't.
Try
    Dim targetNumber As Integer = Convert.ToInt32(numericString, HEXADECIMAL)
    Console.WriteLine("0x{0} is equivalent to {1}.", _
                      numericString, targetNumber)
Catch e As OverflowException
    Console.WriteLine("0x{0} is out of the range of the Int32 data type.", _
                      numericString)
End Try

Instead of the expected OverflowException, this code produces what is apparently an erroneous result:

 0x80000000 is equivalent to -2147483648

If we look at the binary rather than the decimal and hexadecimal representations of this numeric operation, the source of the problem becomes readily apparent. We began with Int32.MaxValue:

 Bit #:  3         2         1
       10987654321098765432109876543210

       01111111111111111111111111111111

For Int32.MaxValue, each bit except the highest order bit of the 32-bit value is set. This represents the maximum value of a signed integer because the single unset bit is the sign bit in position 31. Because this bit is unset, it indicates that the value is positive. We then increment Int32.MaxValue by 1. Note that the variable to which we assign the new value is an Int64; we cannot assign the value to an Int32 without exceeding the bounds of the Int32 data type and causing an OverflowException to be thrown. The new bit pattern of the resulting value is:

 Bit #:    6         5         4         3         2         1
       3210987654321098765432109876543210987654321098765432109876543210

       0000000000000000000000000000000010000000000000000000000000000000

So incrementing Int32.MaxValue by one sets bit 31 and clears bits 0 through 30. Bits 32 through 62 remain unset and the sign bit in position 63 is set to 0, which indicates that the resulting value is positive.

Because leading zeroes are always dropped from the non-decimal string representations of numeric values, the call to Convert.ToString(value, toBase) produces a binary string whose length is 32:

 Bit #:  3         2         1
       10987654321098765432109876543210

       10000000000000000000000000000000

This suggests that the unexpected output produced by our code is the result of two different programming errors. First, we’ve inadvertently allowed the string representation of a 64-bit signed integer value to be interpreted as the string representation of a 32-bit signed integer value.  Second, by ignoring how signed and unsigned integers are represented, we’ve allowed a positive integer to be misinterpreted as a signed negative integer. Let’s look at each of these issues in some detail.

Accidental Change of Type

Ordinarily, the C# compiler enforces type safety by prohibiting implicit narrowing conversions, and the Visual Basic compiler can be configured to prohibit implicit narrowing conversions by setting Option Strict on. This constraint means that, in order to successfully compile code that performs a narrowing conversion, the developer must explicitly use a C# casting operator or a Visual Basic conversion function. This, of course, requires that the developer be aware of the narrowing conversion. In other words, handling a narrowing conversion is the responsibility of the developer.

For example, if the previous code is rewritten so that it does not have to parse the string representation of a numeric value, we must deal with the fact that an Int64 cannot be safely converted to an Int32. The resulting C# code is:

 // Increment a number so that it is out of range of the Integer type.
long number = (long)int.MaxValue + 1;
// Convert the number back to an integer.
// This will throw an OverflowException if the code is compiled 
// with the /checked switch.
try {
    int targetNumber = (int)number;
    Console.WriteLine("Converted {0} to a 32-bit integer.", targetNumber);
}
catch (OverflowException) {
    Console.WriteLine("{0} is out of the range of the Int32 data type.",
                      number);
}  

If Option Strict is set on, the resulting Visual Basic code is:

 ' Increment a number so that it is out of range of the Integer type.
Dim number As Long = CLng(Integer.MaxValue) + 1
' Convert the number back to an integer.
' This will throw an OverflowException.
Try
    Dim targetNumber As Integer = CInt(number)
    Console.WriteLine("Converted {0} to a 32-bit integer.", targetNumber)
Catch e As OverflowException
    Console.WriteLine("{0} is out of the range of the Int32 data type.", _
                      number)
End Try

Conversions can still produce overflows at run time, but at least the compiler alerts the developer that an overflow is possible and should be handled. However, because our original example converted a numeric value to its string representation and then converted it back to a numeric value, we’ve bypassed the safeguards that the compiler implements to alert us to the possibility of data loss in a narrowing conversion. To put it another way, the developer is solely responsible for ensuring type safety and for handling conversions when converting between numbers and their string representations. Had our code enforced type safety, it would have converted the string representation of Int32.MaxValue + 1 to an Int64 value rather than an Int32 value, as the following C# code shows:

 const int HEXADECIMAL = 16;

// Increment a number so that it is out of range of the Integer type.
long number = (long)int.MaxValue + 1;
// Convert the number to its hexadecimal string equivalent.
string numericString = Convert.ToString(number, HEXADECIMAL);
// Convert the number back to a long integer.
long targetNumber = Convert.ToInt64(numericString, HEXADECIMAL);
Console.WriteLine("0x{0} is equivalent to {1}.",
                  numericString, targetNumber);

The equivalent Visual Basic code is:

 Const HEXADECIMAL As Integer = 16

' Increment a number so that it is out of range of the Integer type.
Dim number As Long = CLng(Integer.MaxValue) + 1
' Convert the number to its hexadecimal string equivalent.
Dim numericString As String = Convert.ToString(number, HEXADECIMAL)
' Convert the number back to a long integer.
Dim targetNumber As Long = Convert.ToInt64(numericString, HEXADECIMAL)
Console.WriteLine("0x{0} is equivalent to {1}.", _
                     numericString, targetNumber)

Working with Numeric Representations

A second serious source of error in our initial example is that we’ve failed to consider numeric representations and their effect on our conversion operation. This is a common source of errors in programs. However, while the compiler provides some safeguards against data loss in narrowing conversions, it provides no safeguards when the developer chooses to work with binary data directly. In these cases, ensuring that the representation of a number is appropriate for the operation being performed is always the responsibility of the developer. This is true whenever the developer works with binary (or octal or hexadecimal) data directly either as a sequence of bits (for example, when the developer performs bitwise operations on two values or as a byte array) or when the developer is working with the non-decimal string representation of a numeric value. Moreover, this is true of any platform and is not limited to Microsoft Windows or the .NET Framework. In particular:

  • When performing bitwise operations, such as a bitwise And, the developer must make sure that both operands share the same binary representation. If they do not, the result of the bitwise operation is invalid.
  • When converting the string representation of a number to its numeric equivalent, the developer must make sure that the numeric string representation is of the type expected by the conversion method or operator.

Our initial example produced unexpected results because we passed the string representation of what turned out to be an unsigned 32-bit integer to a conversion method, Convert.ToInt32(value, fromBase) , that expected the value parameter to be the string representation of a signed 32-bit integer. Note that the actual result of this conversion depends on the particular magnitude of the 32-bit unsigned integer, as the following table illustrates.

Unsigned Integer Range Result
0 - 2,147,483,647 (or Int32.MaxValue) Successful conversion (no loss of data).
2,147,483,648 - 4,294,967,295 (or UInt32.MaxValue) value misinterpreted as a negative number.

A clearer illustration of the problems that result from working with binary values that have different numeric representations arises when we perform a bitwise operation on integers with different signs. For example, the Visual Basic code

 Console.WriteLine(16 And -3)

produces a rather unexpected result of 16 when run under the common language runtime. This result reflects the fact that the runtime uses two’s complement representation for negative integers and absolute magnitude representation for positive integers. The following example illustrates why the result of this bitwise And operation is 16:

     00000000000000000000000000010000
And 11111111111111111111111111111101

    00000000000000000000000000010000

Although the .NET Framework uses two’s complement representation for signed integers, one’s complement representation is also in use on some platforms. We can determine the method of representation with the two utility functions shown in the following C# and Visual Basic code:

 // C#
public class BinaryUtil
{
   public static bool IsTwosComplement()
   {
      return Convert.ToSByte("FF", 16) == -1;
   }

   public static bool IsOnesComplement()
   {
      return Convert.ToSByte("FE", 16) == -1;
   }
}
 ' Visual Basic
Public Class BinaryUtil
    Public Shared Function IsTwosComplement() As Boolean
        Return Convert.ToSByte("FF", 16) = -1
    End Function

    Public Shared Function IsOnesComplement() As Boolean
        Return Convert.ToSByte("FE", 16) = -1
    End Function
End Class

Performing the And operation with integers that have different signs then requires that we use a common method to represent their values. The most common method is a sign and magnitude representation, which uses a variable to store a number’s absolute value and a separate Boolean variable to store its sign. Using this method of representation, we can define the And operation as follows:

 // C#
public static int PerformBitwiseAnd(int operand1, int operand2)
{
    // Set flag if a parameter is negative.
    bool sign1 = Math.Sign(operand1) == -1;
    bool sign2 = Math.Sign(operand2) == -1;

    // Convert two's complement to its absolute magnitude.
    if (sign1)
        operand1 = ~operand1 + 1;
    if (sign2)
        operand2 = ~operand2 + 1; 

    if (sign1 & sign2) 
        return -1 * (operand1 & operand2);
    else
        return operand1 & operand2;
}
 ' Visual Basic
Public Function PerformBitwiseAnd(ByVal operand1 As Integer, ByVal operand2 As Integer) As Integer
    ' Set flag if a parameter is negative.
    Dim sign1 As Boolean = (Math.Sign(operand1) = -1)
    Dim sign2 As Boolean = (Math.Sign(operand2) = -1)

    ' Convert two's complement to its absolute magnitude.
    If sign1 Then operand1 = (Not operand1) + 1
    If sign2 Then operand2 = (Not operand2) + 1

    If sign1 And sign2 Then
        Return -1 * (operand1 And operand2)
    Else
        Return operand1 And operand2
    End If
End Function

String Representations, Conversions, and Signs

While converting binary values to sign and magnitude representation solves the problem of performing binary operations on non-decimal numbers, it does not address either of the issues raised when converting the string representation of a non-decimal number to a numeric value. When performing such string-to-numeric conversions, the root of the problem lies in the fact that at the time it is created, the string representation of a number is effectively disassociated from its underlying numeric value. This can make it impossible to determine the sign of that numeric string representation when it is converted back to a number.

However, we can solve the problem of restoring a non-decimal value from its string representation by defining a structure that includes a field to indicate the sign of the decimal value. For example, the following structure includes a Boolean field, Negative, that is set to true when the numeric value from which a non-decimal string representation is derived is negative. It also includes a Value field that stores the non-decimal string representation of a number.

 // C# 
struct NumericString {
   public bool Negative;
   public string Value;
}
 ' Visual Basic
Public Structure NumericString
    Public Negative As Boolean
    Public Value As String
End Structure

Storing a sign flag together with the string representation of a non-decimal number preserves the tight coupling between the string representation of a number and its sign. This in turn allows us to examine its sign field and to make sure that the appropriate conversion or action is taken when the string is converted back to a numeric value. For example, the following code defines a static (or Shared in Visual Basic) method named ConvertToSignedInteger that takes a single parameter (an instance of the NumericString structure defined previously) and returns an integer. The method throws an OverflowException if the string’s numeric value overflows the range of the Int32 data type. It also throws an OverflowException if the NumericString.Negative field is False, indicating that the numeric value is negative, but the sign bit is set in the numeric value represented by the NumericString.Value field. This indicates that the numeric value is positive but that its value lies in the range from Int32.MaxValue + 1 to UInt32.MaxValue, which lies entirely outside the range of the Int32 data type.  

 // C#
class ConversionLibrary
{
   public static int ConvertToSignedInteger(NumericString stringValue)
   {
      // Convert the string to an Int32.
      try
      {
         int number = Convert.ToInt32(stringValue.Value, 16);
         // Throw if sign flag is positive but number is interpreted as negative.
         if ((! stringValue.Negative) && ((number & 0x80000000) == 0x80000000))
            throw new OverflowException(String.Format("0x{0} cannot be converted to an Int32.", 
                                        stringValue.Value));
         else
            return number;
      }
      // Handle legitimate overflow exceptions.
      catch (OverflowException e)
      {    
         throw new OverflowException(String.Format("0x{0} cannot be converted to an Int32.", 
                                     stringValue.Value), e);
      }
   }
}
 ' Visual Basic
Public Class ConversionLibrary
    Public Shared Function ConvertToSignedInteger(ByVal stringValue As NumericString) As Integer
        ' Convert the string to an Int32.
        Try
            Dim number As Integer = Convert.ToInt32(stringValue.Value, 16)
            ' Throw if sign flag is positive but number is interpreted as negative.
            If (Not stringValue.Negative) And ((number And &H80000000) = &H80000000) Then
                Throw New OverflowException(String.Format("0x{0} cannot be converted to an Int32.", _
                                            stringValue.Value))
            Else
                Return number
            End If
            ' Handle legitimate overflow exceptions.
        Catch e As OverflowException
            Throw New OverflowException(String.Format("0x{0} cannot be converted to an Int32.", _
                                        stringValue.Value), e)
        End Try
    End Function
End Class

Our initial code example returned an erroneous result when we incremented Int32.MaxValue by 1, converted it to a hexadecimal string, and then converted the string back to an integer value. When we perform the same basic set of operations using the NumericString structure and the ConvertToSignedInteger method, the result is an OverflowException. This is shown in the following code:

 // C#
public class Executable
{
   public static void Main()
   {
      // Define a number.
      Int64 number = (long)Int32.MaxValue + 1;
      // Define its hexadecimal string representation.
      NumericString stringValue;
      stringValue.Value = Convert.ToString(number, 16);
      stringValue.Negative = (Math.Sign(number) < 0);
      ShowConversionResult(stringValue);
      
      NumericString stringValue2;
      stringValue2.Value = Convert.ToString(Int32.MaxValue, 16);
      stringValue2.Negative = Math.Sign(Int32.MaxValue) < 0;
      ShowConversionResult(stringValue2);
      
      NumericString stringValue3; 
      stringValue3.Value = Convert.ToString(-16, 16);
      stringValue3.Negative = Math.Sign(-16) < 0;
      ShowConversionResult(stringValue3);
   }
   
   private static void ShowConversionResult(NumericString stringValue)
   {   
      try {
         Console.WriteLine(ConversionLibrary.ConvertToSignedInteger(stringValue).ToString("N0"));
      }
      catch (OverflowException e) {
         Console.WriteLine("{0}: {1}", e.GetType().Name, e.Message);
      }
   }
}
 ' Visual Basic
Module Executable
    Public Sub Main()
        ' Define a number.
        Dim number As Int64 = CLng(Int32.MaxValue) + 1
        ' Define its hexadecimal string representation.
        Dim stringValue As NumericString
        stringValue.Value = Convert.ToString(number, 16)
        stringValue.Negative = (Math.Sign(number) < 0)
        ShowConversionResult(stringValue)

        Dim stringValue2 As NumericString
        stringValue2.Value = Convert.ToString(Int32.MaxValue, 16)
        stringValue2.Negative = Math.Sign(Int32.MaxValue) < 0
        ShowConversionResult(stringValue2)

        Dim stringValue3 As NumericString
        stringValue3.Value = Convert.ToString(-16, 16)
        stringValue3.Negative = Math.Sign(-16) < 0
        ShowConversionResult(stringValue3)
    End Sub

    Private Sub ShowConversionResult(ByVal stringValue As NumericString)
        Try
            Console.WriteLine(ConversionLibrary.ConvertToSignedInteger(stringValue).ToString("N0"))
        Catch e As OverflowException
            Console.WriteLine("{0}: {1}", e.GetType().Name, e.Message)
        End Try
    End Sub
End Module

When this code is executed, it displays the following output to the console:

 OverflowException: 0x80000000 cannot be converted to an Int32.
2,147,483,647
-16

Comments

  • Anonymous
    April 09, 2008
    This is an excellent examination of the topic. However, the conclusion that the .NET Framework is handling this situation correctly is flawed, for one simple reason: "Because leading zeroes are always dropped from the non-decimal string representations of numeric values..." This is wrong!! When representing negative values using binary two's complement, leading zeros are significant, and cannot be dropped. Dropping them changes the value of the number, and therefore the behavior of the Convert.ToString method is wrong. Simply adding a comment to the documentation saying "its wrong on purpose" is not sufficient; the buggy behavior needs to be fixed.

  • Anonymous
    April 09, 2008
    I’m not sure the previous post is correct. If the leading bits are zero’s and they are dropped then when the number is converted back from binary it will just pad it back out to 32 bits with zeros. This would put bit 31 back as a zero and indicate that the number is positive. If the original value was negative, then the bit would be a 1 and would not have been trimmed. So, this is int.maxvlue 01111111111111111111111111111111 This is int.maxvalue with the leading zero missing 1111111111111111111111111111111 And then this is what would happen when it is converted by Convert.Toint32() 01111111111111111111111111111111 The zero would be “assumed” by the conversion process. Thus the sign would end up correct.

  • Anonymous
    April 09, 2008
    @Aaron, You are assuming that the string representation will always be converted back into a number with the same storage size. But this article is about what happens when the binary representation is converted into a number with a different storage size. Why make the developer jump through all these hoops to correctly convert the binary representation from one size to another, when all you have to do is treat at least one leading zero is significant? Then the binary representation can be counted on to be accurate regardless of how long it is or what the size was of the location where the value was originally stored. In the example from the article, the result of Convert.ToString(number, HEXADECIMAL) would be a string 33 digits long, which would result in an OverflowException, which is exactly what the developer wants.

  • Anonymous
    April 10, 2008
    In ConversionLibrary.ConvertToSignedInteger, wouldn't this comparison make more sense:  // Throw if sign flag is positive but number is interpreted as negative.  if ((!stringValue.Negative) && (number < 0))    throw new OverflowException(String.Format("0x{0} cannot be converted to an Int32.", stringValue.Value)); This way you're not relying on the bit pattern (which actually will cause an implicit widening anyway). @David, The only way that preserving the leading digit will help you is if you have a rule to automatically sign-extend that leading digit whenever converting away from a string.  And I guarantee you that doing that will break almost every app in the world, since strings frequently come from user input or external data sources that won't contain a leading sign digit anyway. The problem mentioned in this article only exists if you do have a mismatch in sizes between your string generator and consumer; if due care is taken to use the same types then there isn't a problem.  Or at least there wouldn't be a problem if VB joined the Real World and got itself some unsigned types.

  • Anonymous
    April 11, 2008
    @Miral, If there is no leading sign digit, then it isn't a signed number, and a conversion to a signed type is likely to fail anyway. I'm not concerned with that scenario. I am talking about precisely representing a signed number as a string, which can easily be done by treating a leading 0 as significant. If you truncate all of the leading 0s, then you have to carry the original binary length of the number around with the string representation in order to be able to convert it back to a numeric type. Why force developers to take that extra step, when you could just leave a leading 0 and trust that the string representation is always exact, no matter what length it is?

  • Anonymous
    April 11, 2008
    I don't think I understand why the Convert.ToInt32 operation is working correctly when the example bit pattern (from the paragraph before the "Accidental Change of Type" section) implies the value -0. Bit #:  3         2         1       10987654321098765432109876543210       10000000000000000000000000000000 It seems like this specific value is clearly the result of an overflow. (Is -0 legal? If so, why?) However, it would be impossible to say that (int.MaxValue + 2) is  or isn't -1 during the conversion, so maybe it makes sense that checking for the specific -0 case is a waste of time.