Jaa


Automation Foibles Unveiled: Saving random data

Now, many of you probably know that I am a big fan of computer generated random test data that is a represents a reasonable sample data set from the total population of possible test data. (I refer to this a probabilistic stochastic test data.) So, why would I argue against preserving randomly generated test data?

I just returned from STAREast, where for the second time in a month I heard someone suggest storing randomly generated test data in a file. Many people will site the inability to recreate random test data as a drawback to using randomly generated test data in a test. So, the reason these people suggested storing the random data in a file is so they can easily repeat a test with the same data should some randomly generated test data expose an anomaly. I absolutely concur that if we generate random test data, and that test data exposes a problem we need a way to recreate the data. But, isn't there a better way than to save random test data in a file?

Saving randomly generated test data to a file creates a test artifact. Depending on how much randomly generated data is generated, this file could become quite large. Also, saving data to a file impacts the performance of an automated test and certainly slows down manual execution of tests. Then consider the number of tests that generate random test data are executed numerous times throughout the lifecycle, and it doesn't take long until we have countless test artifacts simply storing more static test data that quickly loses its value (especially if no problems were detected). Of course, we can easily delete the files after the test if no anomaly was detected, but I suspect that most testers will delete those files upon the completion of the test if no problems were detected.

So, the question is how can we reproduce computer generated probabilistic stochastic test data if we don't save that randomly generated data to a file?

Planting Seeds

In computing, a seed is simply an integer value that is used by a random generator as the starting value. If we pass a seed value as an argument to a given random generator then we will consistently get the same random value each and every time. Essentially, a seed allows us to replicate computer generated probabilistic stochastic test data anytime as long as we use the same seed and the same random generator algorithm. So, instead of saving each and every piece of randomly generated test data used in any given test, we can simply log the seed value used by that test in the test results log file.

But, if we use the same seed all the time, then we are simply generating the same data over and over again. And, manually inputting a seed for each test that generates probabilistic stochastic test data is not an ideal situation, especially for automated tests. So, to solve that problem we can randomly generate a seed value that is then passed to the random generator algorithm!  Again, logging the randomly generated seed allows us to accurately reproduce the probabilistic stochastic test data at any later time.

The example below illustrates a simple method in C# that will either generate a random seed or return a user specified seed value.

         public static int GetSeedValue(string seedValue)
        {
            // check if user specified seed value is passed as an arguement to 
            // the seedValue parameter
            if (seedValue == string.Empty)
            {
                // Create a new random object
                Random randomObject = new Random();
                // Generate a random integer value between 0 and 2,147,483,647
                return randomObject.Next();
            }
            else
            {
                // convert the seedValue to an integer value
                // NOTE: This example method does not include exception handling
                return int.Parse(seedValue);
            }
        }

The following example illustrates how to use this method to get a random seed value to generate random strings and numbers that increase the breadth of test data coverage in each subsequent iteration of a test.

         static void Main(string[] args)
        {
            // These variables declare the range of characters used for the
            // string test data. In this case the strings are composed of upper
            // case ASCII characters 'A' through 'Z'
            char minChar = '\u0041';
            char maxChar = '\u005A';
            
            // This reads the user specified seed value from the console window
            // If no seed value is specified an empty string is passed to the 
            // GetRandomSeed method which will cause it to generate a random 
            // seed value.
            string mySeed = Console.ReadLine();
            
            // Declare a seed variable and initialize it to either the user
            // specified seed or to a computer generated random seed value
            int seed = GetSeedValue(mySeed);

            // The seed value should be permenently recorded in the logged
            // results for this test
            Console.WriteLine("The seed value for this test is {0}\n", seed);

            // Create a new random object based on the seed
            Random randomGeneratorObject = new Random(seed);

            // Generate 10 random strings
            for (int count = 0; count < 10; count++)
            {
                // Declare and initialize a string variable for our test data
                string testString = string.Empty;
                // Generate random length strings between 1 and 10 characters
                for (int length = 0; length < randomGeneratorObject.Next(1, 11); length++)
                {
                    // Generate a random character within the defined range and
                    // concatenate it to the testString variable until the 
                    // random string length has been reached
                    testString += Convert.ToChar(randomGeneratorObject.Next(
                        minChar, maxChar + 1)).ToString();
                }
                // Write the test string to the console window
                Console.WriteLine("Test String {0}: {1}", count + 1, testString);
            }

            Console.WriteLine("\nRandom numbers");
            // Generate 5 random numbers
            for (int numberCount = 0; numberCount < 5; numberCount++)
            {    
                Console.WriteLine("{0} ", randomGeneratorObject.Next());
            }
        }

Calling the Main method and passing an integer value between 0 and 2,147,483,647 will generate 10 random length strings composed of random upper case characters between 'A' and 'Z' and 5 random numbers. If no user specified seed is passed to the Main method then the code will call the GetGenerateSeed method and generate a random seed value for use in the test. Of course, passing the same integer value will produce the same strings and numbers each and every time.

Using probabilistic stochastic test data is valuable because it efficiently increases the breadth of data coverage, and significantly augments 'typical' static test data, user-generated test data, or static test data derived from historical failure indicators. But, instead of storing randomly generated test data in a file, it is a best practice to simply record the seed value of each test. With a seed value we can easily recreate the computer generated random test data should any of the random data used in a test exposes an anomaly.

Comments

  • Anonymous
    May 13, 2008
    Super true. I have a "strong" similar experience that is compelling enough to tell out. We run an Online Tool platform for Software Testers named TestersDesk.com which is in BETA stage now and growing. We have intentionally designed our features in such a way that any Random results are random each time. We have a Random Combinations generator that user used to test their Web-service. It seems one test failed from the combinations we gave but they did not store the results they obtained from our platform. Here is the catch- we have a feature called "Repeat History" that promises to do an auto form-submit from the user's history. They convinced themselves that they can re-obtain the data using it, but by design even a history re-submit will return new Random results (as we save only the request parameters and not the results themselves). The result - they wrote an email asking if we have any logs or something where they can get the same data etc. To avoid this "design" becoming a "trouble" we decided to output and store the "seeds" used during user requests so that should a user want the exact same Random data again, he can select  from the previously used seeds. This is going to be a new feature in our release this month end. I don't know if I was clear in expression but reading the above post appeared a direct match to what we are currently doing on 'Random' part of our features in TestersDesk.com.

  • Anonymous
    May 14, 2008
    The comment has been removed

  • Anonymous
    May 15, 2008
    The comment has been removed

  • Anonymous
    May 20, 2008
    Just checked your tool Babel - makes perfect sense to me. Our seed functionality is exactly similar with just an addition that we remeber the user history as well so that the user can do automated form submits. What I liked the most in your tool is support to unicode, while we are using UTF-8 in major parts of the new build that will be released this month end, a few features still use the old style (basic-latin,ASCII).