Selecting Random Seeds
A few months ago I ran across some test code which was using GetTickCount() + timeGetTime() as the seed input to a random number generator. Unfortunately this code was bad for two reasons. First, the systems that the test code was running on would be frequently rebooted, so GetTickCount() would be returning similar results across a large set of test runs. Secondly, timeGetTime() is based on the exact same counter that GetTickCount() is, so adding the two together really doesn't buy you very much.
This of course raises the question of "how do you pick a good seed?" And if we take a step back, it is also good to ask "how random/secure/performant do you need your data?" For the sake of this post, we will assume that rand() is good enough, but picking a truly random seed is desired. I should also make the distinction that this is for test code, not code which needs to be cryptographically secure, so while we want to reduce collisions, it isn't catastrophic if they do happen.
The goal of the original code was to inject a certain amount of randomness into the testing such that over several months of automated testing on daily builds, there would be an increased level of coverage (think fuzz testing). Unfortunately, there just wasn't enough variance built into the seed, which will result in the tests reusing the same values (over an extended period of time).
So, in answer to the first question, it is a fairly common practice to take a handful of diverse data sources and munge them together using a hashing function (MD5, SHA-1, etc). This hash value then becomes the seed. The following list is provided to give you some examples as to possible inputs to the hash. It is obviously not required or practical to use all of them, so pick the ones that make the most sense for your application.
If you have access to this API, then it is a great way to get some random data on Windows and CE systems, and may be all you need for the seed (without having to bother with hashing). |
|
XNetRandom() |
This is the Xbox's version of CryptGenRandom() |
Registry: HKLM \ SOFTWARE \ Microsoft \ Cryptography \ RNG |
The "seed" value in this registry node will periodically be updated with a new value. |
GetTickCount() or QueryPerformanceCounter() |
Throwing in the current CPU counter can be a good way to perturb values across a single test run if you don’t have access to something like CryptGenRandom(). |
Date |
Assuming the current date/time is "real" (i.e. not being reset), then this will work to change things on a daily basis. |
MAC Address |
Providing a unique value about the computer/system you are running on will help prevent box A from running the same cases as box B. |
Build Number |
The build number will help prevent today's tests from being the same as tomorrow's. |
Provides a small chance to get different number. |
|
Network Latency |
If you have a network based application to begin with, then it should be fairly simple to capture some transaction timings. |
Now, if you want to get really fancy, it wouldn't be all that hard to setup a web service which just dishes out hashes or values from CryptGenRandom (or XNetRandom if you happen to have a spare Xbox devkit lying around). Your tests could just grab a new seed value each time the test is run, ensuring a good starting point from which to crank through those test cases.
Comments
Anonymous
March 04, 2008
I don't understand why you would ever want truly random data injected into a test. What if some aspect of the test fails - how will you ever reproduce it? Isn't it better to always start with a known seed so that the test can be precisely reproduced later if necessary? I do use a lot of random number generators in my test code, but I always start with a known, fixed seed value.Anonymous
March 04, 2008
You absolutely want to log out the seed value, and make sure your tests can use a specified seed value, this way you can reproduce an observed failure. But using the same seed value every time isn't necessarily the best use of your test's time. If you have automated tests that run every day, you might not want to run the same exact test over and over again (especially when the underlying system hasn't changed). But if you have time slotted to run, you might as well be running your random tests with different inputs each time. In the case of Fuzz testing, this is absolutely necessary. To help ensure the best security coverage, you want to run hundreds of thousands of iterations. More than likely you can't do this all at once, and thus it must be done over a period of time, but to do so, you need to have a good random data.