What do you want from your test automation?

Part I - What's in a pass rate?

The Team Foundation Server test team is busy these days preparing our test automation for handoff to our servicing team.  This means, among other things, making darn sure our tests always run correctly and provide expected results.

"Expected results" might mean different things to different people, depending on your philosophy of test automation, your infrastructure, or the state of your product when you shipped it.

In our case, we want our tests to pass at 100% when they're run in regression test runs in servicing.  This is important because we want to make sure any hotfixes we take don't introduce any regressions from the behavior of the product when it was shipped.

Note - this is very different from what we wanted during the product development cycle.  As a test manager, I want to know the quality of my product as frequently as possible and with as much confidence as possible (watch for more on my thoughts on the confidence aspect soon).  So I want my automation runs to fail when it finds product bugs.  If my product quality is only 85% of what I would consider perfect, then that's useful information to pass along to my product development team.  Passing at 100% in this phase would tell me nothing.

Of course, there are always a few random false negatives (bogus test failures) in every test run.  You try to keep these to a minimum throughout your product cycle, but when you're automating a moving target, they're bound to happen from time to time.  You just learn to live with it and accept the day-to-day recurring cost of failure analysis and test automation maintenance.

So preparing for servicing handoff involves both fixing the rest of the random failures (because woohoo - the product is stable! ) and updating tests to pass even when they hit known bugs we couldn't fix in the v1.0 product cycle.  But what happens when we want to run these tests against v.next?  We'll want them to measure day-to-day product quality again.  We'll be fixing a bunch of those postponed bugs, so the tests will need to switch back to expecting the right behavior, not the "we couldn't fix it so we shipped it broken in v1.0" behavior.

To solve this, we're including a behavior switch that will let us run the tests in servicing more or development mode.  In servicing mode, the tests will pass if they hit a postponed bug.  In development mode, they'll fail if they don't match the expected behavior specified in the product design requirements.

A final thought on this topic:  Be wary of 100% passing in day-to-day test runs.  Jonathan Kohl points out an example of where a development team coded their unit tests to pass 100% regardless of the actual behavior exhibited by the product!  You can probably easily imagine a process where developers aren't allowed to check in code until all their unit tests are passing.  Well problem solved - just make 'em all pass anyway!  (yeesh)

 

Part II - Code Coverage

If you think "snake oil" when you hear the term "code coverage", you're not alone.  Coverage is one of the most misused and misunderstood quality metrics in use.  But despite the stigma one may associate with it, coverage data can still be quite valuable in the testing process.

As testers, it's our responsibility to make sure we're getting the right code coverage in our testing.  This doesn't necessarily mean 100%.  In fact, I'd be extremely wary of very high code coverage percentages like this - it often means shortcuts were taken in testing or prioritizations were misguided (i.e. if you had time to analyze and cover every last line of code, why didn't you do something more productive?).

Most often, when someone says "code coverage", they're probably actually talking about "block coverage".  This is the measure of the number of individual blocks of code hit during testing.  Block coverage data alone tell you what code was hit, but not what values were passed through it.

Another interesting metric is arc coverage.   There are actually a few variations on the exact definition of arc coverage, but basically it's the % of possible transitions into and out of a particular block that get covered.  This gives you a little more data on how much of the possible behavior combinations of your code were hit, but it's still not a silver bullet.

In fact, nothing is a silver bullet.  If your development team is pressuring you to give goals around coverage, make sure they understand the true meaning and value of the data.  In my opinion, code coverage data is most valuable in telling you what your tests don't cover - not what it does cover.  In terms of confidence, blocks with zero coverage give you high confidence that you have a test hole, but blocks with coverage just means something was tested.  Attaining any amount of coverage doesn't give you conclusive evidence that the testing was correct or that behavior was actually validated, so it just gives you a little confidence in your overall picture of quality.

That's all for now.  I'll discuss more about deriving confidence in quality assessment in an upcoming post.  Enjoy your weekend!

Comments