The case of the inconsistent right shift results…

One of our testers just filed a bug against something I’m working on.  They reported that if they compiled code which calculated: 1130149156 >> –05701653 it generated different results on 32bit and 64bit operating systems.  On 32bit machines it reported 0 but on 64bit machines, it reported 0x21a.

I realized that I could produce a simple reproduction for the scenario to dig into it a bit deeper:

 int _tmain(int argc, _TCHAR* argv[])
{
    __int64 shift = 0x435cb524;
    __int64 amount = 0x55;
    __int64 result = shift >> amount;
    std::cout << shift << " >> " << amount << " = " << result << std::endl;
    return 0;
}

That’s pretty straightforward and it *does* reproduce the behavior.  On x86 it reports 0 and on x64 it reports 0x21a.  I can understand the x86 result (you’re shifting right more than the processor size, it shifts off the end and you get 0) but not the x64. What’s going on?

Well, for starters I asked our C language folks.  I know I’m shifting by more than the processor word size (85), but the results should be the same, right?

Well no.  The immediate answer I got was:

From C++ 03, 5.8/1: The behavior is undefined if the right operand is negative, or greater than or equal to the length in bits of the promoted left operand.

Ok.  It’s undefined behavior.  But that doesn’t really explain the difference.  When in doubt, let’s go to the assembly….

 000000013F5215D3  mov         rax,qword ptr [amount]  
000000013F5215D8  movzx       ecx,al  
000000013F5215DB  mov         rax,qword ptr [shift]  
000000013F5215E0  sar         rax,cl  
000000013F5215E3  mov         qword ptr [result],rax  
000000013F5215E8  mov         rdx,qword ptr [shift] 

The relevant instruction is highlighted.  It’s doing a shift arithmetic right of “shift” by “amount”.

What about the x86 version?

 00CC14CA  mov         ecx,dword ptr [amount]  
00CC14CD  mov         eax,dword ptr [shift]  
00CC14D0  mov         edx,dword ptr [ebp-8]  
00CC14D3  call        @ILT+85(__allshr) (0CC105Ah)  
00CC14D8  mov         dword ptr [result],eax  
00CC14DB  mov         dword ptr [ebp-28h],edx  

Now that’s interesting.  The x64 version is using a processor shift function but on 32bit machines, it’s using a C runtime library function (__allshr).  And the one that’s weird is the x64 version.

While I don’t have an x64 processor manual, I *do* have a 286 processor manual from back in the day (I have all sorts of stuff in my office).  And in my 80286 manual, I found:

“If a shift count greater than 31 is attempted, only the bottom five bits of the shift count are used. (the iAPX 86 uses all eight bits of the shift count.)”

A co-worker gave me the current text:

The destination operand can be a register or a memory location. The count operand can be an immediate value or the CL register. The count is masked to 5 bits (or 6 bits if in 64-bit mode and REX.W is used). The count range is limited to 0 to 31 (or 63 if 64-bit mode and REX.W is used). A special opcode encoding is provided for a count of 1.

So the mystery is now solved.  The shift of 0x55 only considers the low 6 bits.  The low 6 bits of 0x55 is 0x15 or 21.  0x435cb524 >> 21 is 0x21a.

One could argue that this is a bug in the __allshr function on x86 but you really can’t argue with “the behavior is undefined”.  Both scenarios are doing the “right thing”.  That’s the beauty of the “behavior is undefined” wording.  The compiler would be perfectly within spec if it decided to reformat my hard drive when it encountered this (although I’m happy it doesn’t Smile).

Now our feature crew just needs to figure out how best to resolve the bug.

Comments

  • Anonymous
    February 11, 2011
    I remember that change from way back when. The 286 (actually I think it was the 186 that started it) used this as an optimization because shifts/rotates of more than 16 zeroed out/wrapped around any operand. However, I would have thought that the 386 would have made this 6 bits and the x64, 7 bits! Is this a bug in the spec, or is there some other reason for this behaviour that I've missed?

  • Anonymous
    February 11, 2011
    @ErikF: 5 bits is 31, 6 bits is 63.  Shift right 63 bits doesn't make sense on a 32 bit platform.

  • Anonymous
    February 11, 2011
    Ahhh, my brain wasn't doing math today. Never mind....

  • Anonymous
    February 11, 2011
    I'm not too familiar with x86 instructions, but I'm guessing the reason the x86 version uses a C runtime function is because x86 has no instruction for right-shifting a 64 bit integer? That the function doesn't match the behaviour of the x64 instruction is unfortunate, but as you say, it's hard to argue with undefined behaviour. :)

  • Anonymous
    February 11, 2011
    @Sven: Yup - the 32bit x86 processor doesn't have instructions to shift 64bit values (most 32bit machines have extremely limited support for 64bit instructions).

  • Anonymous
    February 11, 2011
    Presumably the spec is the way it is for performance reasons, but it's a shame. It means that a>>x and a>>(y+z) are not equivalent even when x==y+z. I am surprised the compiler doesn't emit a warning. (I checked with VS2008, in debug and release, warning level 4.) I don't expect the compiler to warn in all cases (the amount value could be the result of a complex calculation, sub-routines, user-input, etc.) but in this case the compiler can easily know the value of amount; it's set once and never changed. You do get a compiler warning if you make amount const. Before now I had assumed making a variable like that const was more to avoid programmer error (i.e. tell you if you then modify it) and that the compiler could work it out itself, but now I'll use it even more often. I guess/hope the optimizer works things like that out at a later stage. Maybe that assumption is wrong, too. I could swear there are cases where the compiler has warned me about things like this without the variable being const. Maybe I'm imagining things!

  • Anonymous
    February 11, 2011
    You can see exactly the same problem using a 32-bit program. If you compile this in release it'll tell you the answer is zero, and compiled in debug it'll tell you it's 538. This is because in release the compiler calculates the answer. int _tmain(int argc, _TCHAR* argv[]) {    int shift = 0x435cb524;    int amount = 0x55;    int result = shift >> amount;    std::cout << shift << " >> " << amount << " = " << result << std::endl;    return 0; }

  • Anonymous
    February 11, 2011
    The comment has been removed

  • Anonymous
    February 12, 2011
    @pgliznie: Actually the compiler will warn you if it can determine the result of the operation.  If I had 0x435cb524 >> 0x55, the compiler prints out a warning.  

  • Anonymous
    February 14, 2011
    I hit this on SafeInt, and it is documented on my blog somewhere. Basically, while the standard says this is undefined, what the Microsoft compilers will do is to shift by bits % (sizeof(void*)*8) If you'd been using SafeInt, it would have thrown an assert if you did this....

  • Anonymous
    February 17, 2011
    @David: I think the other comments have established that what the Microsoft compilers do is a mixture of (a) shift by the requested amount at compile time, leaving 0 (b) leaving it up to the processor (which reduces the shift amount modulo) and (c) call a library function, depending on the type of the argument and the optimization level.

  • Anonymous
    February 17, 2011
    @Larry > @Sven: Yup - the 32bit x86 processor doesn't have instructions to shift 64bit values (most 32bit machines have extremely limited support for 64bit instructions). It has SHLD and SHRD for 64-bit shifts. However, they are so slow that calling dedicated function could actually be faster.

  • Anonymous
    February 18, 2011
    The comment has been removed

  • Anonymous
    February 18, 2011
    The comment has been removed

  • Anonymous
    February 20, 2011
    Thank you Larry, very interesting!

  • Anonymous
    April 27, 2011
    [quote]They reported that if they compiled code which calculated: 1130149156 >> –05701653 it generated different results on 32bit and 64bit operating systems.[/quote] Hopefully someone also fixed the program that used this calculation. On a higher abstraction level, this computation does not make sense. Counter-examples welcome. :-)

  • Anonymous
    April 28, 2011
    Actually they didn't - it was a test program that used monte-carlo testing.  They just adopted the code which checked for results to deal with the issue.