Another of my favorite tricks - reducing the number of bits in a number by 1
One of the classic (and thus no longer asked) Microsoft interview questions is "How quickly can you count the bits in a 16 (or 32) bit integer?".
You get a varied number of responses to this one, from brute force to lookup tables.
One of my favorite tricks for this is:
x = x & (x - 1)
This reduces the number of bits in a number by one. I'm not sure exactly why it works, and it only works for 2s complement numbers, but it DOES work.
So using this trick, the bitcount question is:
{
bitcount = 0;
while (value != 0)
{
bitcount += 1;
value = value & (value - 1);
}
return bitcount;
}
It's a silly trick, but clever.
Btw, being geeks, a number of people over here came up with the fastest known version (in software). It involves adding the bits in parallel - you can count the bits in 32bit integer in a constant 34 cycles on a 386 machine.
To do that, you first split the number into odd and even bits thus creating 32 1 bit numbers.
Next, you shift the even bits right one bit and add the values together to get a collection of 16 2 bit numbers.
Now split the number into groups of 2 bits by again masking by 0b0011001100110011 and 0b1100110011001100, shift the second value right by two bits and add the values to get a collection of 8 4 bit numbers.
Now split the number into groups of 4 bits by again masking by 0b0000111100001111 and 0b1111000011110000, shift the second value right by four bits and add the values to get a collection of 4 8 bit numbers.
Now split the number into groups of 8 bits by again masking by 0b0000000011111111 and 0b1111111100000000, shift the second value right by eight bits and add the values to get a collection of 2 16 bit numbers.
Finally, split the number into groups of 16 bits by taking the low sixteen bits and adding them to the high 16 bits, you now have one 32 bit integer that contains the number of bits that were on in the original value.
Comments
Anonymous
October 13, 2005
How useful do you feel interview questions like the one mentioned above are? Why?Anonymous
October 13, 2005
Interesting observation. So I decided to think about why x & (x - 1) has one fewer bit set if x != 0. Here's why:
x != 0 so x has at least one bit set.
From right-to-left, x has some number of 0 bits at the end (0-31), a 1 bit, and the rest of the bits. Call the rest of the bits a.
So, x looks like a10...0.
x - 1 is going to flip all the 0 bits on the right until it can borrow the rightmost 1 bit.
So, x - 1 looks like a01...1.
a10...0 & a01...1 = a00...0, which is the same as x except we killed the rightmost 1 bit. Thus, x & (x - 1) has one fewer set bit than x.Anonymous
October 13, 2005
Bitwise operations are fun.
What you're doing, (x & (x - 1)), removes the lowest bit.
My favorite one is (x & -x) which isolates the bottom bit.
My most useful operation with this? ((x & -x) == x)... returns true if x is a power of 2 (or equal to 0) -- very useful for quick assertions that the world is sane.Anonymous
October 13, 2005
The comment has been removedAnonymous
October 13, 2005
Interesting post. Thanks Larry.Anonymous
October 13, 2005
A better interview question would be: Why does x = x & (x - 1) reduce the number of bits by one? Test it on yourself. :)Anonymous
October 13, 2005
Regarding the first trick:
> It's a silly trick, but clever.
Yes and yes. But if the numeric values are distributed uniformly, then each bit will be as likely to be a 1 as a 0, so that loop will still iterate 16 times on average. Of course there are reasons why the numeric values usually aren't distributed uniformly and usually less than half the bits are 1s, but your loop might still iterate more times than you were thinking. A lookup table on bytes will still beat it.
I wouldn't have thought of the trick that your colleagues came up with though. Its cleverness outranks its silliness and that looks like amazing code.Anonymous
October 13, 2005
Wouldn't this also work?
{
bitcount = 0;
while (value != 0)
{
value >>= 1;
bitcount++;
}
return bitcount;
}Anonymous
October 13, 2005
OK - I'll be the dummy :-)
Why on earth would you want to count the number of bits in a number? Is this something you would do in a real situation, or just in Microsoft interviews?Anonymous
October 14, 2005
vcv: that gets the highest bit set.Anonymous
October 14, 2005
vcv wrote "Wouldn't this also work?"
No. It returns the position of the most significant 1 bit.
The best solution to Larry's question "How quickly can you count the bits in a 16 (or 32) bit integer?" is:
int CountBits(int value)
{
return sizeof(value) * 8;
}Anonymous
October 14, 2005
Does the "fastest known version", beat all lookup-based solutions? 4GB might be excessive overhead just for bit counting, but byte-based tables/switches could be reasonable, and presumably faster (or am I wrong about that?).Anonymous
October 14, 2005
>>Wouldn't this also work?
wouldn't that return always the same value?Anonymous
October 14, 2005
To the original question, I think a sharp interview candidate could have quickly answered immediately 16 (or 32). Because you didn't ask him how to count how many bits are set in the integer. Simply count the bits. (grin).
"How quickly can you count the bits in a 16 (or 32) bit integer?"Anonymous
October 14, 2005
The comment has been removedAnonymous
October 14, 2005
Rough draft
Assumes Win32 (32-bit ints), fails on 64-bit machines
I use leading underscores to preserve indenting (sorry, but this blog software eats leading spaces)
// bits.h
int bitsinchar(unsigned char);
int bitsinint(unsigned int);
// bits.c
int bitsinint(unsigned int i)
{
// count char-by-char... assumes four-byte int
// if this is a 64-bit int then extend the pattern accordingly
return
____bitsinchar((char)(i & 0xff)) +
____bitsinchar((char)(i >> 8 & 0xff)) +
____bitsinchar((char)(i >> 16 & 0xff)) +
____bitsinchar((char)(i >> 24 & 0xff))
;
}
// c must be an eight-bit char... no wchar's here
int bitsinchar(unsigned char c)
{
__// 256-entry lookup table
select (c)
{
// eight bits
case 0xff:
__return 8;
// seven bits
case 0xfe:
case 0xfd:
// ... eight cases...
case 0x7f:
__return 7;
// six bits... (eight-choose-two-cases)
// five bits... (eight-choose-three-cases)
// four bits... (eight-choose-four-cases)
// three bits... (eight-choose-three-cases)
// two bits... (eight-choose-two-cases)
____// one bit...
____case 0x80:
case 0x40:
// more cases...
____case 0x02:
____case 0x01:
return 1;
// zero bits
____case 0: return 0;
}
}Anonymous
October 14, 2005
The comment has been removedAnonymous
October 14, 2005
typedef char nybble;
int bitsinnybble(nybble n)
{
// 8-entry lookup table
select (n)
{
____// four bits
____case 0xf:
__return 4;
// three bits
____case 0xe:
____case 0xd:
____case 0xb:
____case 0x7:
__return 3;
// two bits
____case 0xc:
____case 0xa:
____case 0x9:
____case 0x6:
____case 0x5:
____case 0x3:
__return 3;
// one bit
____case 0x8:
____case 0x4:
____case 0x2:
____case 0x1:
__return 1;
// zero bits
____case 0:
______return 0;
}
int bitsinchar(unsigned char c)
{
return
____bitsinnybble((nybble)(c & 0xf)) +
____bitsinnybble((nybble)(c >> 4 & 0xf))
;
}Anonymous
October 14, 2005
Er, I mean, a 16-entry lookup table (not 8-entry)
Also, the two-bit cases should return 2, not 3Anonymous
October 14, 2005
The comment has been removedAnonymous
October 14, 2005
I am interested to know the intent of asking questions like this in an interview situation. If the intent is to make the candidate sweat and see how she reacts that is probably fine. If the intent is to see how well they problem solve it might not work. Prior knowledge of similar problems or problem patterns might affect the performance of a candidate. I try to stay away from specific questions as a gauge for talent when I conduct interviews.
Also fastest solution does not always mean fewest lines of code. Sometimes fastest solution means tweaking the implementation to a point that it could not be well understood. Well, as an application developer I do not encourage such practices until there is a critical need for such implementation. To me maintainability and code readability are more important than raw performance numbers.
Mark.Anonymous
October 14, 2005
Didn't get a chance to test, but this should do the trick:
input/output: eax
mov ebx,55555555h
mov edx,eax
and edx,ebx
shr eax,1
and eax,ebx
add eax,edx
mov ebx,33333333h
mov edx,eax
and edx,ebx
shr eax,2
and eax,ebx
add eax,edx
mov ebx,0f0f0f0fh
mov edx,eax
and edx,ebx
shr eax,4
and eax,ebx
add eax,edx
mov ebx,00ff00ffh
mov edx,eax
and edx,ebx
shr eax,8
and eax,ebx
add eax,edx
mov ebx,0000ffffh
mov edx,eax
and edx,ebx
shr eax,16
and eax,ebx
add eax,edx
anyone got a faster implementation?Anonymous
October 14, 2005
Woohoo! Check this out:
in/out: eax
mov edx,eax
shr edx,1
and edx,55555555h
sub eax,edx
mov ebx,33333333h
mov edx,eax
shr edx,2
and edx,ebx
and eax,ebx
add eax,edx
mov edx,eax
shr edx,4
add eax,edx
and eax,0f0f0f0fh
mov ebx,01010101h
mul ebx
shr eax,24Anonymous
October 14, 2005
Somebody, Roozbeh put a version of the parallel algorithm in the comments. It's possible that a lookup algorithm would be faster, but...
When we did the numbers (about 10 years ago, on 386 hardware), the parallel algorithm blew the lookup algorithm away completely.
With more modern processors, your results might be different.Anonymous
October 14, 2005
Check the MIT HAKMEM count http://blogs.msdn.com/jeuge/archive/2005/06/08/HAKMEM_Bit_Count.aspxAnonymous
October 15, 2005
The comment has been removedAnonymous
October 16, 2005
The comment has been removedAnonymous
October 16, 2005
The comment has been removedAnonymous
October 17, 2005
I generally prefer interview questions that prove some basic understanding of math as well as sound programming skills.
My favorite today is the not so complex ...
"Write a function for calculating the prime factors of any given integer."
Not too complex, but it shows how the person thinks and approaches code very easily.
- MichaelAnonymous
October 17, 2005
That was amazing Jeu! In assembly it's even shorter than the one that uses multiplication. I'm not sure about today's machines, but in the old days I remember DIV being much slower than MUL. That might compensate for the few extra instructions, but who cares! ;)Anonymous
October 17, 2005
According to this 2002 comparison of several bitcounting algorithms...
http://www-db.stanford.edu/~manku/bitcount/bitcount.html
A 256-entry lookup table is the fastest.Anonymous
October 17, 2005
Actually the results of the comparison are that a 65Kb lookup table is the fastest.
I presume that the initialization of the lookup table is outweighed by the number of bits counted, though.Anonymous
October 28, 2005
Counting "1" bits is probably not all that common, but checking whether a number is a power of 2 does occur with some frequency (in verifying a valid "sectors per cluster" value for example). Checking whether a number is a power of 2 is the same as checking whether it has exactly 1 "1" bit.
Is there a better way to check that a number has exactly 1 "1" bit than using an approach
derived from your algorithm?Anonymous
December 16, 2005
When you say "I'm not sure exactly why it works, and it only works for 2s complement numbers, but it DOES work."
Umm, you mean it only works when negative numbers are stored as the twos complement of the same positive number? It sounds like you're talkking about a set of "2s complement numbers" that maybe not all numbers fall into.
Also, somewhere in a related note, someone said that a lookup table would work, except if you're adding the bits that are "on" in a 32-bit fullword, you'd need a huge lookup table. I would simply think of the 32-bit word as eight 4-bit chunks, and use a 16-entry lookup table for each of the 4-bit chunks. Eight lookups and eight additions, plus some looping overhead.
In fact, I solved the same problem that way in production IBM mainframe code many years ago. (This was for something that was infrequently used... In fact, it was used once per customer to count something, and then a flag was set to never call the routine again.)
David W.Anonymous
December 16, 2005
Edit -- it wasn't someone somewhere, it was you, Larry, in a comment in this blog. Duh. I thought that's where I read it, then I missed it when I looked, oh well.
Spending scads of time optimizing a function like this might not be a good use of time -- depending on the use of course. We decided that our 4-bit-at-a-time lookup function was the perfect balance between being simple to write and (importantly) to understand, and being plenty fast.Anonymous
June 02, 2009
PingBack from http://patiochairsite.info/story.php?id=29037Anonymous
June 17, 2009
PingBack from http://patiosetsite.info/story.php?id=160Anonymous
June 17, 2009
PingBack from http://patioumbrellasource.info/story.php?id=1777Anonymous
June 19, 2009
PingBack from http://debtsolutionsnow.info/story.php?id=1308