Bit Fiddling - 4
Pre-Computing
Pre Computing is a technique normally used to speed up algorithms and by actually computing results of a smaller problem and using interpolation or other techniques to solve a larger problem. You trade-off memory/accuracy for speed here, as you will need to load the pre-computed solutions into memory and/or use interpolation to find solutions to a desired level of accuracy.
A similar approach can be used to count number of set bits in an integer. Basically, you have a pre-computed array of the number of bits in 4,8 or 16 bit size numbers and use this to find the number of bits in an integer. You could also pre-compute for 32 bit sizes but you will need 234 bytes to hold this data J
static int PreCompute8 [256] =
{
0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8
};
int BitCountPreCompute (unsigned int u)
{
// Valid for 32 bit ints only
return PreCompute8[u&0xffu] + PreCompute8[(u>>8)&0xffu] + PreCompute8[(u>>16)&0xffu] + PreCompute8[(u>>24)&0xffu] ;
}
PreCompute8 holds the number of set bits in for all 8 bit numbers. Using this result, mask out sets of 8 bits in the given integer and index into the PreCompute8 array and compute the count.
Comments
- Anonymous
August 01, 2005
Jeu, I think your algorithm is inefecient for moder processors.
Today more computations within small memory footprint results in better performance. It should be measured for clear talking, but I believe simple cycle of shifts will be faster than precomputing. - Anonymous
August 01, 2005
It seems I should place more arguments. The problem is processor cache. Main memory is slow and processor have a cache to mitigate that slowness. When you hit small area of memory, processor cache prediction speed up all your memory access. If you hit large random-like memory structures, you looze the speed.
Even worse: cache over-using disturb any other code. Your lookup table pushes other important data out from cache. - Anonymous
August 01, 2005
mihailik, along with modern processors, the processor caches, and system memory have been growing in sizes, so i dont think that using memory footprints as small as this will really create efficiency problems. I will profile and time different ways that I have mentioned about bit counting in my next block. My guess is that pre-computing should be the fastest, followed by the Hakmem parrallel counting, following by regular iteration methods.
Look forward for my next blog on this. - Anonymous
August 01, 2005
The comment has been removed