SYSK 168: Why Developers Should Know About Memory Bandwidth?
Similarly to a network bandwith, memory bandwidth is the amount of data per second that can be read from or stored into a semiconductor memory by a processor. Memory bandwidth is usually expressed in a multiple of bytes/second. There is an excellent explanation on what memory bandwidth and latency are at http://www.devhardware.com/c/a/Memory/Memory-Bandwidth-And-Timings/.
Performance is limited by the weakest link in the chain: a bottleneck. Buying faster CPUs will not bring a performance gain if the limiting factor is memory bandwidth. So, if you’re writing a multi-threaded application that is memory I/O intensive and it has to run well on an SMP (shared-memory multiprocessor) system, make sure to allocate extra time for application profiling and optimizations. It’s even possible that as you add more processessors, your application that utilizes a high degree of parrallelism will run slower.
If you find that your application is memory badwidth bound, here are some things (read: just few ideas… not intended to be a complete list) you could do to reduce the memory bandwidth requirements and, possibly, improve performance:
- Consider using on-the-fly data compression
- Use zone rendering in graphics intensive applications
- Use cache blocking in applications with large arrays of data processed in nested loops (keep blocks of data in the cache for as long as possible to maximize cache reuse and minimize memory traffic). To compute the optimum block size, the following formula is used: BLOCKSIZE = (CACHESIZE / (ARRAYCOUNT * DATASIZE))^(1/DIMENSIONS). Check out http://www.intel.com/cd/ids/developer/asmo-na/eng/dc/threading/implementation/170317.htm?page=9 for a C# example.
Loop unrolling (repeat the statements inside the loop and adjust the counter correspondingly). Note: if you use –o option during compilation, the compiler will try to do some optimizations, including inlining, loop unroling, etc.