Partilhar via


the Zune issue

As you can imagine there is a pretty lively debate going on over the Zune date math issue here in the hallways and on our internal mailing lists. There are plenty of places one can find analyses of the bug itself, like here, but I am more interested in the testing implications.

One take: this is a small bug, a simple comparator that was ‘greater than’ but should have been ‘greater than or equal to.’ It is a classic off-by-one bug, easily found by code review and easily fixed then forgotten. Moreover, it wasn’t a very important bug because its lifespan was only one day every leap year and it only affected the oldest of our product line. In fact, it wasn’t even our bug; it was in reused code. Testing for such proverbial needles is an endless proposition, blame it on the devs and ask them not to do it again. (Don’t get your knickers in a twist, surely you can detect the sarcasm.)

Another take: this is a big bug, in the startup script for the device and thereby affected every user. Moreover, its effect is nothing short of bricking the device, even if only for a day (as it turns out, music is actually a big deal on that specific day). This is a pri-1, sev-1, run-down-the-halls-screaming-about-it kind of bug.

As a tester can I take any view but the latter? But the bug happened. Now we need to ask what can we learn from this bug?

Clearly, the code review that occurred on this particular snippet is suspect. Every code review I have ever been part of, a check on every single loop termination condition is a top priority, particularly on code that runs at startup. This is important because loop termination bugs are not easily found in testing. They require a “coming together” of inputs, state and environment conditions that are not likely to be pulled out of a hat by a tester or cobbled together using unthinking automation.

This brings me to my first point. We testers don’t do a good job of checking on the quality of code reviews and unit testing where this bug could have been more easily found. If I was still a professor I would give someone a PhD for figuring out how to normalize code review results, unit test cases and system test cases (manual and automated). If we could aggregate these results we could actually focus system testing away from the parts of the system already covered by upstream ‘testing.’ Testers would, for once, be taking credit for work done by devs, as long as we can trust it.

The reason that system testing has so much trouble dealing with this bug is that the tester would have to recognize that the clock was an input (seems obvious to many, but I don’t think it is a given), devise a way to modify the clock (manually or as part of their automation) and then create the conditions of the last day of a year that contained 366 days. I don’t think that’s a natural scenario to gravitate toward even if you are specifically testing date math. I can imagine a tester thinking about February 29, March 1 and the old and new daylight savings days in both Fall and Spring. But what would make you think to distinguish Dec 31, 2008 as any different from Dec 31, 2007? Y2K seems an obvious year to choose and so would 2017, 2035, 2999 and a bunch of others, but 2008?

This brings me to my second point. During the discussions about this bug on various internal forums no less than a dozen people had ideas about testing for date related problems that no one else involved in the discussions had thought of. I was struck by a hallway debate between two colleagues who were discussing how they would have found the bug and what other test cases needed to be run for date math issues. Two wicked smart testers that clearly understood the problem date math posed but had almost orthogonal approaches to testing it!

The problem with arcane testing knowledge (security, y2k, localization all come to mind) is that we share our knowledge by discussing it and explaining to a tester how to do something. “You need to test leap year boundaries” is not an ineffective way of communicating. But it is exactly how we are communicating. What we should be doing is sharing our knowledge by passing test libraries back and forth. I wish the conversation had been: “you need to test leap year boundaries and here’s my library of test cases that do it.” Or, “counting days is a dangerous way to implement date math, when you find your devs using that technique, run these specific test cases to ensure they did it right.”

The testing knowledge it took to completely cover the domain of this specific date math issue was larger than the set of folks discussing it. The discussion, while educational and stimulating, isn’t particularly transportable to the test lab. Test cases (or models/abstractions thereof) are transportable and they are a better way to encapsulate testing knowledge. If we communicated in terms of test cases, we could actually accumulate knowledge and spread it to all corners of the company (we have a lot of apps and devices that do date math) much faster than sitting around explaining the vagaries of counting time. Someone who doesn’t understand the algorithms to count time could still test those algortihms using the test assets of someone else who did understand it.

Test cases, reusable and reloadable, are the basis for accumulated knowledge in software testing. Testing knowledge is simply far too distributed across various experts’ heads for any other sharing mechanism to work.

Comments

  • Anonymous
    January 06, 2009
    PingBack from http://blog.a-foton.ru/index.php/2009/01/07/the-zune-issue/

  • Anonymous
    January 06, 2009
    I was hoping you would post your opinion of the Zune defect.   I agree with you.  It's always much easier to get meaning from code like that in a test library than from a subjective discussion.  I plan to circulate your post in my own group at work which is oh-so-painfully inching towards automated unit test/system test.  Should make for a great discussion.  Thanks!

  • Anonymous
    January 07, 2009
    "If we communicated in terms of test cases, we could actually accumulate knowledge and spread it to all corners of the company (we have a lot of apps and devices that do date math) much faster than sitting around explaining the vagaries of counting time." Wow!  I guess I couldn't disagree more.   Testers using libraries of Test Cases created by unknown others for unknown contexts doesn't seem like a very good solution to me. Test Cases aren't "knowledge" any more than Times Tables are mathematics! Without understanding the vagaries of counting time, the professional developer or tester simply cannot be effective.

  • Anonymous
    January 07, 2009
    "Test Cases aren't "knowledge" any more than Times Tables are mathematics!" Hmm, I have to agree with Mr. JW here.  Times tables, once agreed to be correct, can be used by someone who doesn't know (or in some cases doesn't need to know) multiplication to verify that the output of a multiplication function is correct without a complete understanding of the mechanics. As a junior tester, I simply don't understand every aspect of the software I'm testing to the degree that a senior tester might.  If there are well thought out test cases - agreed to be correct - that pertain to my code, It would be foolish for me to think that I understand the problem better than someone who's seen this type of problem before. That's not to say that taking a blind approach is correct either.  Any tester worth their salary had better make every effort to understand the problems and the patterns that those cases are trying to catch and know enough about their code to identify discrepancies, special cases and gaps in coverage. My 2 cents

  • Anonymous
    January 07, 2009
    The comment has been removed

  • Anonymous
    January 08, 2009
    @kylereed, Multiplication Tables are great - unless you need to perform subtraction, or if you need to multiply 6-digit numbers. But if you knew and understood mathematics (or even calculators), you would almost certainly never use such Tables again, right? "Well thought out test cases - agreed to be correct" are interesting, if they are designed for your particular code.  If not, then all the agreement of correctness in the world may have very little value.

  • Anonymous
    January 08, 2009
    I think we can agree that the math analogy falls a little short here. Let's say that a tester is working on a caching product with a real timeline so they don't have unlimited time to spend learning about the fringe problems (like the Zune leap year problem) associated with caching.  In this case, wouldn't it be better to have test cases associated with the caching pattern that the tester can review?  Sure you can argue that it would be better to teach them about the in's and out's of caching and I would agree, but that would take longer and assumes that the tester could still come up with the same caliber of problems that an expert in caching could. I doubt that JW was implying that you could give someone a magic set of test cases that would work for any piece of code.  I also assume that he was talking about more abstract test cases that apply to a pattern rather than about concrete cases that have been designed to hit every code branch.  With that, I feel there is a lot of benefit that can be had from architecting test cases that others - less familiar with the problem - can use as a guide to help ensure completeness in their testing.

  • Anonymous
    January 09, 2009
    The comment has been removed

  • Anonymous
    January 09, 2009
    YAGNI jest podejściem mocno minimalistycznym. Jak ma się to do bezpieczeństwa? Czy (pośrednio) moje wymagania nie są czymś zbędnym? Czymś, co dodaje nie potrzebną złożoność do problemu? Odpowiedź krótka: NIE.A teraz wersja nieco dłuższa.

  • Anonymous
    January 09, 2009
    "I am curious to know why in a mp3 device (Zune) there shud be a "blocking" code at startup with respect to date & time?" I'll give you a 3 letter hint as to why (I assume) the zune calculates the date on startup. DRM

  • Anonymous
    January 12, 2009
    Brian Harry on TFS Installation troubleshooting guide James Whittaker on the Zune issue Jim Lamb on TFS

  • Anonymous
    January 12, 2009
    @Team System News: "http://teamsystemrocks.com/blogs/team_system_news/archive/2009/01/12/vsts-links-11-12-2009.aspx" VSTS Links - 11/12/2009 ??? Is this another one of those messages from the future? If so, who wins the Super Bowl this year?

  • Anonymous
    February 03, 2009
    [Nacsa Sándor, 2009. január 13. – február 3.]  A minőségbiztosítás kérdésköre szinte alig ismert

  • Anonymous
    February 05, 2009
    [ Nacsa Sándor , 2009. február 6.] Ez a Team System változat a webalkalmazások és –szolgáltatások teszteléséhez

  • Anonymous
    February 27, 2009
    Interesting discussion of why product testing might not be able to find the bug.  I did not realize this code only ran during startup.  Like you suggested, unit testing is the place to find this bug.  I used test driven bug fixing to isolate the bug.  You, and I and most people that look at the publicized snippet of code got the bug wrong.  Here is the real story demonstrated with automated unit tests. http://www.renaissancesoftware.net/blog/archives/38