Partilhar via


Randomness bugs

Some of the hardest bugs to discover are those involved in features that contain intentionally random behavior.

Sudoku was originally written using System.Random as its source of randomness. As is described in the article about its development, this randomness is used during any brute-force steps taken by the puzzle solver, and as the solver is at the heart of the puzzle generator, randomness is a key player in puzzle generation.

The solver uses randomness when it can no longer make moves through logical deductions based on the supplied elimination techniques. At that point, it looks at all of the cells in the grid to determine which of those cells have the fewest number of values as viable possibilities. If more than one cell is found to contain the same minimal number of possible values, one of them is chosen at random to set (backtracking is used if it turns out that the cell was set incorrectly). The first step in my puzzle generation algorithm is solving an empty grid. With an empty 9x9 grid, all cells have 9 possible values, and the solver will thus choose at random any of the 81 cells to fill. As soon as a number is filled into a cell, most of the cells still have 9 possible values, but some of them, due to eliminating the set number, now have 8. So the solver will choose to fill in one that has only 8, again choosing from those at random. And so on.

I finished implementing Sudoku and was pleased with the variety of puzzles it was producing. However, as I mentioned, it can be difficult to detect bugs based on systems that are supposed to be random; the best way I’ve found is to write tools that look to see if the results aren’t random enough.

In the case of Sudoku, I ended up writing a short tool that consumed the Sudoku.exe application as a library and generated hundreds of thousands of puzzles. A string representation of each of these puzzles was stored into a hashtable, allowing me to easily discover any duplicate puzzles generated. After all, as shown at https://www.afjarvis.staff.shef.ac.uk/sudoku/, there are 6,670,903,752,021,072,936,960 valid 9x9 Sudoku grids, so the chances of randomly generating the same puzzle multiple times should be extremely close to 0.

I was able to successfully run my test with hundreds of thousands of puzzles. But that was just the first step. I was testing the starting puzzles, the puzzles that were actually presented to the user. What about the final solutions? The puzzle generator first solves an empty board, resulting in a final solution with all 81 cells filled. It then proceeds to remove cells, and the board with a bunch of cells removed is what’s eventually shown to the user as the starting state. I also needed to make sure that those final solutions were all unique.

So I modified my test to generate hundreds of thousands of these solved boards rather than of the starting boards, and I reran it. After approximately 40 boards, a duplicate was found. And then another. And then another. After a few hundred boards, almost every board generated was a duplicate.

I remember sighing audibly when I saw the output, as I immediately knew what was the bug.

Every time I needed a random number, I was instantiating a new System.Random instance. System.Random’s parameterless constructor initializes the instance with the output of Environment.TickCount. TickCount returns the number of milliseconds that have passed since the computer was last booted, however as described in the documentation, the resolution of the TickCount property is limited by the resolution of the system timer. This means that if you instantiate multiple System.Random instances within the same system time quantum, they’ll all be initialized with the same tick count, and thus they’ll all produce the same sequence of random numbers. Since I’m constructing a new Random each time I need a random number, this means all of my “random” numbers constructed in the same system time quantum will be the *same* “random” number.

As an experiment, try running the following code:

     public static void Main()
    {
        int last = 0, count = 0;
        while(true)
        {
            int next = new Random().Next();
            if (next == last)
            {
                ++count;
            }
            else
            {
                Console.WriteLine(count);
                count = 0;
                last = next;
            }
        }
    }

In an infinite loop, it creates random numbers using a new instance of Random each time, and it counts how many of the same number it created in a row. On my laptop, I consistently get numbers in the thousands!

What sort of effect did this have on my Sudoku puzzle generation? Effectively, this meant that every random number retrieved during the solving of the empty board during a single puzzle’s generation was the same. Since I was working with these numbers modulo 81, in the end there were only 81 different boards being generated. Every once in a while a board’s generation would span two different TickCount values, and I would end up with a different board, but it’s occurrence was relatively rare.

Kind of a cool bug, reducing the number of puzzles that could be generated by some serious orders of magnitude. To fix it, I simply had to change my random number generator, relying on the cryptographically strong System.Security.Cryptography.RNGCryptoServiceProvider rather than on System.Random.

With the changes in place, I started my test running again and went to lunch. Upon return, I found that it had generated a million puzzles with nary a duplicate, and it was still going.

Comments

  • Anonymous
    April 07, 2006
    Why couldn't you just re-use the same Random instance?
  • Anonymous
    April 07, 2006
    With a different design, I could have.  But in the implementation I arrived at, the code to do the random number generation was buried levels deep in a static method, and without some serious refactoring, it made the most sense to either create a new instance each time, or create one static instance and lock access to it (it's possible multiple puzzles would be generated concurrently, and thus access to the non-thread-safe random number generator would need to be serialized).  With Random, I chose the former.  With RNGCryptoServiceProvider, I chose the latter.
  • Anonymous
    April 07, 2006
    To extend Steve.

    Why not use a static Random field somewhere?
  • Anonymous
    April 08, 2006
    That's exactly what I do (take a look at the code and you'll see), except with a RNGCryptoServiceProvider instead of a Random (the former provides much better pseudo-random numbers than the latter, so it was better for my purposes anyway).  As I mentioned in my previous reply, I changed from creating a new Random instance each time to creating one RNGCryptoServiceProvider, storing it in a static, and using that one instance.  The issue then is that with concurrent puzzle generation, multiple threads could be accessing that one instance at the same time, and RNGCryptoServiceProvider is not thread-safe (nor is Random).  There are then several easy solutions.  One is to synchronize access to the instance.  Another is to mark the static member as ThreadStatic, so that each thread gets its own instance.  I chose the former.