Lock-free programming: used now in games!

Lock-free programming is a performant multi-threaded synchronization technique that allows you to safely manage in-memory data in a controlled manner. In its essence, the idea is to write your multi-threaded programs without using locks. Why is this better? For one thing, using OS constructs like mutexes will sometime induce an additional performance penalty.

For example, the occasional user-mode/kernel-mode transition for acquiring or releasing a lock induces an additional, unpredictable delay in the course of things. This additional delay causes other threads (waiting for the resource) to be delayed as well, causing a cascade effect which in the end slows down the whole program. This special situation is called lock convoy

The name lock convoy comes from the bottleneck that appears on a highway when a car performs a full stop on a lane. All the cars in the back will stop as well, creating a larger and larger bottleneck. The interesting thing is that the bottleneck continues to stay there for hours even if the original car is long gone.

Lock convoys can be tricky to detect and eliminate in performant code, but lock-free programming is one of the techniques that can be used to eliminate these unpredictable delays. For lock-freep programming, InterlockedCompareExchange( ) is your friend :-)

Are these lock-free techniques used in real life? Yes - mostly in the high-performant, CPU-intensive OS-level code, but in applications as well.

Valve plans to incorporate lock-free programming techniques in the latest Steam engine. These techniques allows you to maximize the CPU usage in multi-core configurations, by avoiding

The approach that Valve finally chose was a combination of the coarse and fine-grained, with some extra enhancements thrown in. Some systems were split on multiple cores using coarse threading. Other tasks, such as VVIS (the calculations of what objects are visible to the player from their point of view) were split up using fine-grained threading. Lastly, whenever part of a core is idle, work that can be precalculated without lagging or adversely affecting the game experience (such as AI calculations or pathfinding) was queued up to be delivered to the game engine later.

Valve's approach was the most difficult of all possible methods for utilizing multiple cores, but if they could pull it off, it would deliver the maximum possible benefits on systems like Intel's new quad-core Kentsfield chips.

[...]

The end results of Valve's efforts were even better than they had initially hoped. Not only was the speedup on the four-core Kentsfield chips nearly linear in most cases (an average of 3.2 times over a single core) but having four CPUs instead of two made it possible to do things that simply couldn't be done with fewer cores.

These new opportunities include greatly increased particle effects. Instead of smoke and explosions being static overlays on top of the game action, programmers can now develop interactive particles, such as smoke that the player can push with his hands, or burning embers that can singe the player if he gets too close. In addition, extra cores could be used for dramatically better AI, or to apply artificial intelligence techniques (like pathfinding) to huge numbers of enemies. Imagine, for example, a Lord of the Rings-type battle, with all the enemies acting independently and intelligently. With enough cores, this sort of thing could be put into a game and take place in real-time.

Comments

  • Anonymous
    November 08, 2006
    Is InterlockedCompareExchange really a big improvement over taking a leaf-level lock (lock over a small region which doesn't block on anything else), which may be built on operations similar to InterlockedCompareExchange?   Switching to lockless data structures seems like it may be a premature optimization.

  • Anonymous
    November 09, 2006
    I agree that in many cases you don't need lock-free code. Lock-free programming techniques are very advanced techniques, not for the average programmer, and are notoriously hard to debug...