Compartilhar via


Bit Fiddling - 4

Pre-Computing

 

Pre Computing is a technique normally used to speed up algorithms and by actually computing results of a smaller problem and using interpolation or other techniques to solve a larger problem. You trade-off memory/accuracy for speed here, as you will need to load the pre-computed solutions into memory and/or use interpolation to find solutions to a desired level of accuracy.

 

A similar approach can be used to count number of set bits in an integer. Basically, you have a pre-computed array of the number of bits in 4,8 or 16 bit size numbers and use this to find the number of bits in an integer. You could also pre-compute for 32 bit sizes but you will need 234 bytes to hold this data J

 

 

static int PreCompute8 [256] =

 {

    0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4,

      1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,

      1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,

      2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,

      1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,

      2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,

      2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,

      3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,

      1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,

      2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,

      2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,

      3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,

      2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,

      3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,

      3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,

      4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8

 };

int BitCountPreCompute (unsigned int u)

{

    // Valid for 32 bit ints only

    return PreCompute8[u&0xffu] + PreCompute8[(u>>8)&0xffu] + PreCompute8[(u>>16)&0xffu] + PreCompute8[(u>>24)&0xffu] ;

}

 

 

PreCompute8 holds the number of set bits in for all 8 bit numbers. Using this result, mask out sets of 8 bits in the given integer and index into the PreCompute8 array and compute the count.

 

Technorati Profile

Comments

  • Anonymous
    August 01, 2005
    Jeu, I think your algorithm is inefecient for moder processors.

    Today more computations within small memory footprint results in better performance. It should be measured for clear talking, but I believe simple cycle of shifts will be faster than precomputing.
  • Anonymous
    August 01, 2005
    It seems I should place more arguments. The problem is processor cache. Main memory is slow and processor have a cache to mitigate that slowness. When you hit small area of memory, processor cache prediction speed up all your memory access. If you hit large random-like memory structures, you looze the speed.

    Even worse: cache over-using disturb any other code. Your lookup table pushes other important data out from cache.
  • Anonymous
    August 01, 2005
    mihailik, along with modern processors, the processor caches, and system memory have been growing in sizes, so i dont think that using memory footprints as small as this will really create efficiency problems. I will profile and time different ways that I have mentioned about bit counting in my next block. My guess is that pre-computing should be the fastest, followed by the Hakmem parrallel counting, following by regular iteration methods.

    Look forward for my next blog on this.
  • Anonymous
    August 01, 2005
    The comment has been removed