Code Coverage, It’s Exciting!

發行項
01/05/2010

In the recent Star Trek movie, following an on bridge brawl between Kirk and Spock, new arrival Scotty announces “I like this ship, it’s exciting!“. Now replace “this ship” with “debates about code coverage” and you’ll understand the tone of this blog post. As someone new to blogging I feel I can learn a lot from the experts who are already prolifically posting about the creation of quality software. Code Coverage in itself is an interesting topic, but then I came across the series of articles I will present to you and I saw how some of the more well-known bloggers not only post but comment and make their own follow up posts. I thought it might be worth sharing that. Although it starts our a bit contentious, I am most impressed at how the debate among these experienced testers results in a consensus that we all can benefit from.

Herein I will play the role of humble reporter. I will add some of my own perspective to the discussion, but my goal is to present these pieces to you so you can read them, without recreating all their work and debate. I endeavor to give full credit where it is due, I am merely the presenter here…let me know if I forget a critical reference.

For those pressed for time or of a minimalist bent, simply read the three links in boxes below in order.

Good Intentions

Out story starts with a brief piece in a recent-ish copy of Software Test & Performance Magazine by Chris McMahon and Matt Heusser:

Considering Code Coverage

Go ahead and read it (links in boxes are pretty much must-reads for this blog post to make sense). My take on what the authors are saying here is:

Code Coverage is not an end-all or panacea.
Even with high Code Coverage metrics you still may be missing many vital test cases and scenarios.
The authors make the distinction between
- “as programmed”, which means ensuring the code did as they asked it to do – a developer-centric view
- “fit for use”, which means is the software fit for use by a customer? – a tester-centric view
…and state that high code coverage metrics can tell you about the former, but not about the latter. For that you need “testers to step in and do their thing.”

So far, so good. Interesting stuff and probably good advice, but maybe not the excitement I promised. Well….

Don’t Make Me Angry

How about an Angry Test Architect from Microsoft? Bj Rollison responded to McMahon and Heusser’s piece in his own blog where he says:

“I read a lot of articles, white papers, and books. I like most of what I read, even if I disagree with some of the points being made. I can’t remember ever reading an article on software testing that ever made me angry. I was not angry because of the message of the article. In fact, I think the point the authors are trying to make is valid and I agree with them on their fundamental point. Unfortunately, the article is filled with technical inaccuracies the end message was almost lost.”

Read Bj’s critique here:

Reconsidering Code Coverage

Indeed, true to his word, Rollison seems to agree with the original article saying, “there is no correlation between code coverage and quality, and code coverage measures don’t tell us “how well” the code was tested”. He then goes on to re-state the enumeration of the different types of code coverage that the original piece did, but each with his own take on the definition. I am not sure if he is taking issue with the original author’s definitions, or is simply clarifying these for his own purposes. My primary take-aways from Rollison’s piece are:

The original piece gave an example of Path coverage, that actually illustrated Decision coverage. More specifically Path coverage should treat a compound predicate such as “number(sid) <= 1000000 or number(sid) > 600000” as two paths, not one.
The original article is trying to drive the conclusion that “structural testing misses other problems”, however in Rollison’s estimation the authors provide very poor examples of this. For example the original piece gives the example of re-sizing the window as an area left untested even with 100% code coverage, but Rollison states that this has nothing to do with the structural control flow and is therefore irrelevant.
- Seth’s comment: I liked the examples the original author’s gave as examples of where code coverage fails to tell the whole story. Perhaps they could have worded their framing a bit differently to avoid this issue.
Finally Rollison disputes the author’s conclusion that code coverage can tell you about “how well the developers have tested their code”, instead saying that it tells us what code remains untested and therefore where we may need to focus our investigation for additional testing.
- Seth’s Comment: The actual quote from the original piece was “how well the developers have tested their code, to make sure it’s possible the code can work under certain conditions,” which I think adds a different meaning in context than just the partial quote. In the end I think McMahon and Heusser (based on their article) would agree on the part about additional testing focus as they say, “it’s time for testers to step in and do their thing.”

note: There is no intent here to equate Bj with the Hulk. The one time I met him (Bj, not the Hulk) he was quite pleasant.

Cage Match!

Continuing critique of McMahon and Heusser’s original article, Alan Page took his argument to Twitter where “Alan and Matt [Heusser] were debating about the mileage testers can get out of coverage metrics for testing purposes” [ref]. One interesting thing to note is that Page and Rollison both work for the Test Excellence group at Microsoft, and along with Ken Johnston co-authored the book How We Test Software at Microsoft.

I could not find the original Twitter exchange, but both Page and Heusser agreed (thanks to Marlena Compton) to a public debate in the following article:

Heusser v. Page: Code Coverage Cage Match!

Page states his concerns as two-fold:

“The first was minor to me (but less minor to Bj), in that the overview of coverage types seemed a bit confusing”
- Seth’s comment: Ah…so Bj, was taking them to task over their enumeration and definitions of the different types of code coverage.
“…what I’d like to continue to discuss is the conclusion of the article where I felt you sort of took a right turn and said that coverage is mostly for developers, but it doesn’t say anything about quality.…I think there’s a wealth of information for testers in looking at coverage data − not in increasing the number, but in understanding more about what code is covered and uncovered.”

Heusser replies by addressing Rollison’s point #2 above regarding the examples he provided, pointing out it was exactly his intention to “point out all the kinds of defects that code coverage can miss”. He attributes the whole misunderstanding to what he defines as Symbol Failure, which he explains as follows:

“(The classic example of symbol Failure is “Andy eats shoots and leaves” − is Andy is Cowboy or a Panda Bear?) I think the risks of symbol failure increase as the background of the audience and author get more diverse”

Ultimately both Page and Heusser converge on agreement that while the original article tried to make the point that the code coverage numbers do not mean a lot to Testers and Quality, looking at what is and is not covered in the actual code can provide immense benefit to the software quality professional. Yes this cage match to the death ended with general amicable consensus among all (even as I read Bj’s comments to the Cage Match article, it seems he and the authors can agree on more than they disagree). This is a good thing because even though it’s exciting to see the push and pull of an academic argument, if the point is to remove misunderstanding and to find some agreement on the best courses of action, then I think this final article does a good job of that.

note: There is no intent to equate Alan or Matt with pugilists. I’ve met Alan on several occasions and he has never as much as feinted a punch towards me.

Other Resources

The code coverage metric is inversely proportional to the critical information it provides. Bj Rollison’s August 2007 blog post on code coverage
Basic Blocks Aren’t So Basic. Bj Rollison’s March 2009 blog post where he says, “Only a fool would use code coverage metrics to derive some measure of quality, or suggest the implication that high coverage measures equal greater quality.”
Play with fire, but don't get burned and a follow-up. Blog posts by Anita George of Microsoft, recently recommended by Alan Page on Twitter

Technorati Tags: code coverage,software testing,blogging

Comments

Anonymous
January 06, 2010
Seth, quite the interesting and daring post to pull all the threads of these other bloggers together. It is interesting to observe that within our little community of QA bloggers there are, shall we say, some interesting tangos? Anyhow, I know you and I have talked code coverage, exprimentation, exposure control... and other topics before, but if you hadn't shown me all these blogs I wouldn't have seen the interesting tapestry. BTW, I'm looking forward to your talk at the Better Software conference this June (http://www.sqe.com/BetterSoftwareConf/). Keep up the blogging, and I'll keep reading. KJ
Anonymous
January 11, 2010
Seth, I remembered one of my favorite white papers on code coverage and the correlation to defect density. I wanted to post the link for others. The authors of the paper analyze two significant software projects and draw several observations. Like most of the blogs you site, they don’t claim that code coverage by itself is a predictor of better quality. There are two points I wanted to pull out from the papers summary but mostly If you are reading this blog, you should read the full 11 page white paper. The first point is around effectiveness, “Despite dramatic differences between the two industrial projects under study we found that code coverage was associated with fewer field failures and a lower probability of field defects when adjusted for the number of pre-release changes. This strongly suggests that code coverage is a sensible and practical measure of test effectiveness.” Another conclusion from the paper is, “What appears to be even more disappointing, is the finding that additional increases in coverage come with exponentially increasing effort. Therefore, for many projects it may be impractical to achieve complete coverage.” It is a great paper to refer to for empirical analysis of the role and use of code coverage. It’s also a good place for the CC slugfest crowd to use and reference when next the sparing begins. http://mockus.us/papers/coverage.pdf
Anonymous
January 11, 2010
Seth, I followed Alan Page's tweet to your blog and I'm glad I did. A few quick comments:

This is a great article.
Our interests are very similar. SaaS? Check. Efficient and effective methods of test case design? Check? Understanding what people are saying and understanding where agreement is / is not? Check. Design of Experiments? Huge check. "I'll be watching you." :)
You might find the following post of mine interesting; Kohavi's superb presentation on EXP (and the advantages of getting objective data about what users are doing vs. acting on the hunches of the highest paid person in the room may feel about what users might do), is what finally got me off my duff to start writing blog posts: http://hexawise.wordpress.com/2009/08/18/learning-using-controlled-experiments-for-software-solutions/ Justin Hunter (founder of Hexawise)

Anonymous
January 11, 2010
Justin, that's interesting that you bring up EXP. Seth works for Ron. The team site for MS exp team is http://exp-platform.com/default.aspx. EXP is a great approach to testing. Thanks for the link to Ron's webinar on the topic. I knew he'd done one but hadn't seen a link.
Anonymous
January 11, 2010
Thanks Justin. You are right, I was very interested in your referenced blog post. If you want much of the same content as the Webinar, but in other forms, check out the paper and PPT here: http://exp-platform.com/expMicrosoft.aspx

共用方式為