Watson

People who don’t build software for a living have a quite understandable attitude that you should write your program, fix all the bugs, then ship it. Why would you ever ship a product that has bugs in it? Surely that is a sign that you don’t care about quality, right?

Those of us who work on non-trivial software projects naturally see this a little differently. There are many factors that determine quality of software, and some of them are counter-intuitive. For example, the first thing to realize is that you are not trying to get rid of every bug. What you are trying to do is maximize the known quality of the product at the time you ship it. If the last bug you have in the product is that a toolbar button sometimes doesn’t display correctly, and you fix it, then you ship the product, how embarrassed will you be a little later when you find out that the code fix you took to get the toolbar to display causes your application to crash when used on machines with certain video cards? Was that the right decision? Fix every bug?

The discipline of software testing is one that is rarely taught in university, and when it is taught, it is taught in a highly theoretical way, not in the way that it needs to be done to successfully produce high quality products on a schedule demanded by commercial software.

I often interview university students applying for jobs at Microsoft. They're usually excellent people and very talented, although we have to turn away most of them. One thing that always intrigues me is when I am interviewing people who want to be a developer or tester. If I ask them to write a little code, most of them can. Then I ask them to prove to me that their code works. The responses I get to this vary so widely it is remarkable. Usually people run the example values I gave them through their algorithm and if it comes out correctly, they say it works. Then I tell them, "OK, let's say I told you this code is for use in a firewall product that is going to be shipped to millions of customers, and if any hackers get through, you get fired". Then they start to realize that they need to do a little more. So they try a few more numbers and some even try some special numbers called "boundary conditions" - the numbers on either side of a limit. They try a few more ideas. At some point they either say they're now sure, or they get stuck. So I then ask them if they think it is perfect now. Well, how do you know?

This is a central problem in large software projects. Once you get beyond about 100 lines of code, it becomes essentially impossible to prove that code is absolutely correct.

Another interesting detail here is that people constantly confuse the fact that they can’t find a problem with there being no more problems. Let's say you have a test team of one person, and a development team of 10 (at Microsoft, we have a 1:1 ratio of testers to devs, so we'd have 10 testers on that team). But what if our poor lone hypothetical tester goes on vacation for 2 weeks, and while he is away the developers fix the few bugs he has found. Is the program now bug free, just because there are no known bugs in it? It is "bug free" after all, by the definition many people would use. In fact a good friend of mine who runs his own one-man software business told me once his software is in fact bug free, since whenever any customer tells him about a bug, he fixes it. I suppose his reality is subjective. But you see the point - whether you know about the bugs or not, they're in there.

So how do you deal with them? Of course, you test test test. You use automated tests, unit tests that test code modules independently of others, integration tests to verify code works when modules are put together, genetic test algorithms that try to evolve tests that cause problems, code reviews, automated code checkers that look for known bad coding practices, etc. But all you can really say is that after awhile you are finding it hard to find bugs (or at least serious ones), which must mean you're getting it close to being acceptable to ship. Right?

One of my favorite developments in product quality that is sweeping Microsoft and now other companies as well is an effort started by a couple of colleagues of mine in Office, called simply "Watson". The idea is simple. A few hundred (or even thousands) of professional testers at Microsoft, even aided by their expert knowledge, automated tools and crack development and program management teams cannot hope to cover the stunningly diverse set of environments and activities that our actual customers have. And we hear a lot about people who have this or that weird behavior or crash on their machine that "happens all the time". Believe me, we don’t see this on our own machines. Or rather, we do at first, then we fix everything we see. In fact, the product always seems disturbingly rock solid when we ship - otherwise we wouldn’t ship it. But would every one of the 400 million Office users out there agree? Everybody has an anecdote about problems, but what are anecdotes worth? What is the true scale of the problem? Is everything random, or are there real problems shared by many people? Watson to the rescue.

As an aside, I suspect that some people will read this and say that open source magic solves this, since as the liturgy goes so many eyes are looking at the code that all the bugs are found and fixed. What you need to realize is that there are very few lawyers who can fix code bugs. And hardly any artists. Not so many high school teachers, and maybe just a handful of administrative assistants. Cathedral, bazaar, whatever. The fact is that the real user base is not part of the development loop in any significant way. With 10^8 diverse users, a base of people reporting bugs on the order of 10^4 or even 10^5, especially with a developer user profile cannot come close to discovering the issues the set of non-computer people have.

The Watson approach was simply to say: let's measure it. We'll match that 10^8 number 1 for 1. In fact, we'll go beyond measuring it, we'll categorize every crash our users have, and with their permission, collect details of the crash environment and upload those to our servers. You've probably seen that dialog that comes up asking you to report the details of your crash to Microsoft. When you report the crash, if that is a crash that someone else has already had, we increment the count on that "bucket". After a while, we'll start to get a "crash curve" histogram. On the left will be the bucket with the most "hits". On the far right will be a long list of "buckets" so rare that only one person in all those millions had that particular crash and cared to report it. This curve will then give you a "top N" for crashes. You can literally count what percentage of people would be happier if we fixed just the top 10 crashes.

When we started doing Watson, it was near the end of OfficeXP. We collected data internally for a few months, and started collecting externally for the last beta. We quickly learned that the crash curve was remarkably skewed. We saw a single bug that accounted for 27% of the crashes in one application. We also saw that many people install extra software on their machine that interferes with Office or other applications, and causes them to crash. We also saw that things we thought were fairly innocuous such as grammar checkers we licensed from vendors for some European languages were causing an unbelievable number of crashes in markets like Italy or the Netherlands. So we fixed what we could, and as a result OfficeXp was pretty stable at launch. For the first time, that was not a "gut feel" - we could measure the actual stability in the wild. For follow-on service packs for Office XP, we just attacked that crash curve and removed all the common or even mildly common crashes.

For Office2003 (and OneNote 2003), the Watson team kicked into high gear with more sophisticated debugging tools so we could more often turn those crash reports into fixes. They also started collecting more types of data (such as "hangs", when an application either goes into an infinite loop, or is taking so long to do something that it might as well be in one). We fixed so many crashing bugs and hangs that real people reported during the betas that we were way down "in the weeds", in the areas where crashes were reported only a handful of times - hard to say if that is real or just a tester trying to repro a bug they found. So again we know how stable the 2003 generation of Office applications is - we can measure it directly. What a difference from the guesswork that used to go on. In fact we know for a fact that Office 2003 at launch is already more stable than any previous release of Office ever, even after all service packs are applied, since Windows now collects Watson data for all applications.

Microsoft offers this data freely for anyone writing applications on Windows. You can find out if your shareware application is crashing or its process is being killed by your users often - sure signs of bugs you didn’t know you had. The Windows team will also give you crash dumps you can load into your debugger to see exactly what the call stack was that took you down, so you can easily fix your bugs.

This is an example of what BillG calls "Trustworthy Computing". You may think that's a bunch of hoopla, but in many ways it is quite real - things like Watson are used to directly improve the quality of our products.

Comments

  • Anonymous
    February 03, 2004
    Is there a webservice for it?

  • Anonymous
    February 03, 2004
    Watson is great for saving time. The problem is, how does a developer get access to those buckets? Its not exactly open to developers is it.

    It would also be nice to have the RAID database update the occurance count from watson hits. This would help prioritize issues.

    Watson is limited to internal and limited external, say Joe bloggs wants to write an application he cannot get access to the Watson buckets so in effect the technology is pointless outside Microsoft's use.

  • Anonymous
    February 03, 2004
    It would be fantastic to have an application block which offered this functionality - imagine that every bug in every piece of software you write is instantly known and reported back. So, how about it, a .NET application block which uses a web service to report crashes in .NET applications...

  • Anonymous
    February 03, 2004
    I just tried to add to the blogg comments using Post reply in RSS Bandit , doesn't seem to work.

    Anyway, I was saying that my favorite automated test is to generate the data from a template with a bit of randomness in that template (I use an XML template with placeholders for literal values, generaeted data and random data). Its very effective in some problem domains as I have demonstrated to some people. I think I successfully polverised a product and have proved that EVERY similar product is vunerable to the same technique.

  • Anonymous
    February 03, 2004
    The comment has been removed

  • Anonymous
    February 03, 2004
    The comment has been removed

  • Anonymous
    February 03, 2004
    Give all your testers at least 2 way machines for concurrency or at least give them access to such configurations.

    What strikes me odd is you develope on Dell machines yet sell Compaq configurations all because of contract obligations. I guess thats management for you, cranial rectosis syndrome.

  • Anonymous
    February 03, 2004
    You spout on about "Trustworthy computing" yet your employees cannot even be trusted (usually contractors).

    Take for example...

    http://www.internalmemos.com/memos/memodetails.php?memo_id=1664

    http://www.internalmemos.com/memos/memodetails.php?memo_id=52

    http://www.internalmemos.com/memos/memodetails.php?memo_id=65

    http://www.internalmemos.com/memos/memodetails.php?memo_id=92

    http://www.internalmemos.com/memos/index.php?search=Microsoft&x=0&y=0

    Thats just a few.

    Not including the easter eggs, back doors left in by developers etc. Ive seen this for myself so I don't buy your FUD.

  • Anonymous
    February 03, 2004
    Scott, I'm doing something like that. Pretty simple, it's just a webservice that's called anytime an exception is thrown, passing up the message and call stack. Just be sure to put a try block in that exception handler...

  • Anonymous
    February 03, 2004
    The comment has been removed

  • Anonymous
    February 04, 2004
    The comment has been removed

  • Anonymous
    February 04, 2004
    You wrote "Microsoft offers this data freely for anyone writing applications on Windows. You can find out if your shareware application is crashing or its process is being killed by your users often." Could you please post something on how external dev's get access to that data? I'm not finding the right google keywords to pull it up.

    thanks in advance,
    -Don

  • Anonymous
    February 04, 2004
    Would be nice if we can configure watson to send our application dumps to a server of OUR CHOSING. Maybe thats why there are other solutions out there that do allow this.

  • Anonymous
    February 04, 2004
    In case anyone else is interested in signing up for Watson reporting info, go to https://winqual.microsoft.com/info/default.aspx and click on the "Windows Error Reporting" links.

    -Don

  • Anonymous
    February 04, 2004
    Good but why can I not configure watson to send my application to a server of my chosing?

  • Anonymous
    February 04, 2004
    The last bug isn't fixed until the last user is dead.

  • Anonymous
    February 04, 2004
    The comment has been removed

  • Anonymous
    February 04, 2004
    Chris Pratley has a gem of a post today discussing how Microsoft is dealing with bugs. His post is an excellent foray into why bugs are so hard to track down and the steps they are taking to resolve them....

  • Anonymous
    February 04, 2004
    moo,

    You can configure Watson to report where you want to via Corporate Error Reporting: http://www.microsoft.com/resources/satech/cer/

    also, just FYI, writing easter eggs and/or backdoors are well known by all Microsoft developers to be "termination offenses". If you find an easter egg in any product released in the last two years and report it to Microsoft, I am positive the code will be removed (if at all possible) and the developer(s) that checked the code in will have a very stern talking to (which may include being shown to the door).

  • Anonymous
    February 04, 2004
    The comment has been removed

  • Anonymous
    February 04, 2004
    The comment has been removed

  • Anonymous
    February 04, 2004
    Im not denying the usefullness of Watson, I have used this alot and know the huge gains of doing so, just I was unsure if this could be used by myself for small projects free?

  • Anonymous
    February 04, 2004
    The reference to "Trustworthy computing" was because its a case of "Do as we say, not as we do". Regularly we have leaks of internal memos and not just from contractors but also apparently to FTEs. Its a question of trust and yes there is a better control on the content of the products code and less backdoors and easter eggs but even recently there have been cases (ICSA being the most recent I can recall - and thats the ones we know about).

    Then having seen alot of code, the quality leaves much to be desired, lack of comments, deleberate obscufaction to hinder managability and readability by some developers (Ive seen this and they admitted doing it intentionally).

    Which part of this should we trust?

    Actions speak louder than words.

  • Anonymous
    February 04, 2004
    What also about the word thesaurs with racist word suggestions and the icons with other symbols, did the developer who added those have prior permission to do so?

    Trust is something that is earned, so far its not a good track record. Its not about the bugs and exploits thats a fact of life in software, its about intent.

  • Anonymous
    February 05, 2004
    Hey Moo,

    Why don't you keep the conversation to the topic at hand? Chris is writing a helpful article and you are off in the weeds criticizing something 100% unrelated. I am pretty sure Chris isn't in charge of all employees at Microsoft. He isn't in a position to code review every product. He can't control rogue employees/contractors who get some perverse sense of pride in forwarding confidential information. I realize you must like hearing the sound of your own voice, but please at least warble on topic.

  • Anonymous
    February 05, 2004
    Goodness, I wasn't at all aware of Watson/WER. This is a fine service to offer! Now, is there something similar for managed applications? I see that ReportFault() takes LPEXCEPTION_POINTERS, which isn't generally the kind of thing I have access to in managed apps....

    Nice writing, I hope you keep it up.

    Curt

  • Anonymous
    February 05, 2004
    Chris Pratley writes about bug fixing within Microsoft and how Watson, a software reporting tool, has changed their debugging process. He follows up with a second post about attention to detail and how hardware obscurities sometimes lead to buggy software....

  • Anonymous
    February 05, 2004
    Actually it was on topic, he started to talk about "Trustworthy Computing" if you read. And I quote.

    "This is an example of what BillG calls "Trustworthy Computing". You may think that's a bunch of hoopla, but in many ways it is quite real - things like Watson are used to directly improve the quality of our products."

    Which part of this is NOT on topic?

  • Anonymous
    February 06, 2004
    The comment has been removed

  • Anonymous
    February 06, 2004
    Nice stuff :-)
    Not really sure what random points Moo is making though. From what I understand any errors in dictionaries, easter eggs, etc. are ALL fixed once they come to light - that seems pretty responsible to me.

  • Anonymous
    February 06, 2004
    The issue is "trust". What happened to the check-in process? Arn't they approved? What is the team lead doing. Is the developer not having the code read over by somebody on his team (another dev, tester or team lead) before checkin?

    Its about trust and accountability thats all part and parcel of "Trustworthy computing".

    It seems there is a breakdown of that process. I know I sure do a scan for these kind of things.

  • Anonymous
    February 06, 2004
    Every issue in RAID has to be assigned from a tester to a team lead who then approves and assignes it to the developer.

    Where in that raid work item, issue etc does it say "Include back door or easter egg".

    I would certinally highlight these issues at a warteam if ever found and to make sure everybody knows who was responsible (or rather irresponsible for this basically what I would regard as sabotage code).

  • Anonymous
    February 07, 2004
    Moo, our development teams do all the things you describe to try to reduce errors. However, absolute perfection is hard to achieve. With thesauri and spellers, those tools usually come from other companies who often produce a print version of a thesaurus or dictionary, and have decided to make an electronic version and license it to us. They are contractually obligated to verify that the tools do not have any issues. Of course Microsoft employees who are lexicographers also check the content, but there will be an occasionaly error. As SpiderJerusalem noted, we fix these aggressively when the errors are noticed. BTW, many of these "errors" exist in the print versions of the same thesauri and distionaries but for some reason they are only noticed when it becomes part of a Microsoft tool. This is probably because they get used more, Microsoft tools are considered by many to be the "standard", and it is entertaining to hassle us - you would agree with that part, right? :-)

  • Anonymous
    February 07, 2004
    The comment has been removed

  • Anonymous
    February 07, 2004
    Even more bugs, if i DOUBLE RIGHT CLICK on an IE windows control on ANY application it again opens IE in my face.

    What genius dreamt up this?

  • Anonymous
    February 07, 2004
    The comment has been removed

  • Anonymous
    February 07, 2004
    User feedback from watson (non UE dumps).

    Is it possible for a user to just launch Watson and just click "submit" information and it will dump the application and they can add a comment to the effect of an annoyance or bug or maybe a suggestion? I can do this with the Mozilla Feedback bug trapper thats similar to watson.

  • Anonymous
    February 08, 2004
    The comment has been removed

  • Anonymous
    February 08, 2004
    Well, usually the people that can find the option for such features are capabable of understanding the area, thats the way it is for FullCircle's Talkback reporter that ships with Mozilla browser.

    Its not very visible on the browser, actually its not, its a seperate application "Talkback.exe" so first of all to find it requires a knowledge that Joe Blogs wont have.

  • Anonymous
    February 08, 2004
    If it is very hard to find, then you are only getting feedback from expert or technical users - not from the people who really need help with the software. Managing the bias introduced by the way you collect the data is another aspect of the problem.

  • Anonymous
    February 08, 2004
    Thats why we have "Remote Assistance" on XP and longhorn, right? They can go via theyre IT department. As far as the non computer junky feedback, thats why you have usability labs, right? But seeing stuff in XP, I seriously doubt theyre usefullness of late.

  • Anonymous
    February 08, 2004
    Crash reporting is an old idea and something akin to watson has existed for years on other platforms. The Gnome desktop for Linux and Unix has had this tool for at least the last five years. Every application crash can report back related details plus user comments.

    You're almost claiming in your post that Microsoft invented the idea. Hardly. I, and everyone on my development team at a major financial firm, have been putting bug tracking features directly into our applications for at least 6 years. The simplest example is every time an error is raised the user can press one button to automatically e-mail the team all related info.

    Just because Microsoft is finally starting to do something sensible to raise quality doesn't mean it should be hailed as a brilliant step forward. You're just barely starting to catch up with the rest of the world. If you ever finally surpass everyone in quality assurance methods I'll be the first to congratulate you.

  • Anonymous
    February 09, 2004
    Matthew, thank you for putting the "almost" in your second paragraph. I didn't mean to imply that no one has had similar ideas. Nothing is ever truly orginal anyway if you look hard enough - any idea came from another idea, or was "independently invented" in an environment that was conducive to it being conceived. I simply wanted to write about Watson and how it improves our software - and that we provide it as a service for others too. Depending on how widely your app is used, managing the reports that come back requires some significant backend system after all. One thing that makes Watson a little different from other systems I have heard about (and I have not especially researched this area), is that it collects all feedback as long as a user lets us. Other systems tend to require that you proactively report a problem, so that you can miss segments of users who are not aware how or are comfortable with taking that step.
    BTW, have you done a rigorous comparison of QA methodology in Microsoft and the "rest of the world" that backs up your assertion that Microsoft is behind everyone else in quality? We have volumes of data that indicate that our software preforms very well compared to the majority of sotware out there (from looking at the Watson data in aggregate), as well as customer satisfaction surveys etc. What similar statisticially valid data are you using? I am not being flip about this - I care deeply about software quality, but it is quite easy to toss off remarks not backed by data, so I would love to hear any data if you have it.
    Most of the time comments about poor quality in Microsoft products measure the subjective frequency of comments from others around them, and do not take into account the rate of usage of our products. If 100 people use a Microsoft product and 10 are unhappy, that sounds worse than 1 person complaining about a product used by 5 people (10:1 ratio of complaint frequency), yet the complaint ratio for that product (1 in 5 or 20%) is twice the Microsoft ratio (10 in 100 or 10%).

  • Anonymous
    February 09, 2004
    Chris,
    I'd love to sign up for WER, but the $400 price tag (for a Verisign ID) is a bit steep for tools that sell under $20 (or free!). While I understand why you would want to restrict this information to the author(s) of the application, I'm not sure why a Verisign ID is required -- you're trying to establish that the publisher of an app holds a secret that the requestor of WER info holds... not who in particular it is.

    OneNote is impressively stable. I'd love to make all software (even and especially the $10 variety) this solid.

  • Anonymous
    February 09, 2004
    Philip,
    Naturally, I am not asking you for $400. Sorry, I have no idea why the WER team requires this. (I am not Microsoft, as you know :-))

  • Anonymous
    February 09, 2004
    The comment has been removed

  • Anonymous
    February 09, 2004
    The comment has been removed

  • Anonymous
    February 11, 2004
    moo, I don't understand you. A Verisign ID costs MONEY, you are paying for that ID. Verisign seems to be kind of an internet standard for validity of a site or service or even a Wireless LAN, so now you are bashing Microsoft for USING a standard to properly verify things? I know people like you, and you will find fault with anything Microsoft does, regardless of whether what they are doing is a good thing or not.

  • Anonymous
    February 12, 2004
    I'm pretty sure that legally, Microsoft will only be able to show you Watson data that relates to your companies apps. Versign provides a univerally acceptable and managable way of validating your credentials and making sure that you only see the data you are entitled to.

  • Anonymous
    February 17, 2004
    A simple login ID and password would surfice.

  • Anonymous
    February 23, 2004
    Not really, a login ID and password are not nearly enough.
    I develop secure (online) systems for (payment) transactions and I find myself endlessly perfecting my user-verification method, since I always find a way to get around it.

    Ofcourse, this also means I get better at hacking, but that's the whole idea of me working with security, finding holes and patch them.

    VeriSign seems a good system, although for my applications it's not the right one.
    Too bad, because for once I would like to put more effort in the rest of the application instead of the security issue (not that the rest suffers from my focus on security though).

    In short: simple login ID and password wouldn't suffice.

  • Anonymous
    February 26, 2004
    Let me see a show of hands. Who here would love to work with moo? You wont see me raising my hand anytime soon. He talks about the arrogant coder, but should probably be looking in the mirror. I bet everyone has to rub Preparation H on their palm just to shake his hand.

  • Anonymous
    February 26, 2004
    The comment has been removed

  • Anonymous
    February 27, 2004
    While this is a noble idea, there are some very serious privacy issues with it.

    I've seen the dialog, I've looked at the data and there IS personal information in there.



    You explicitly state that watson information is freely available, is this just aggregate data or is access to dumps provided? If these dumps are persisted and available, can you explain to me why I would ever do anything except click "Don't Send"???

    As a developer I'm accutely aware of how available information is and I work very hard to protect mine. It's a constant battle and unfortunately one we are all destined to loose, I fear I'm putting off the inevitable...

    -JBC

  • Anonymous
    February 27, 2004
    The comment has been removed

  • Anonymous
    April 27, 2004
    To moo: It's not like Watson is the only choice for crash data collection on the Windows platform. You can trap all exceptions with SetUnhandledExceptionFilter and record whatever data you want & send it to a server of your choosing (using tcp/ip yourself). Of course, you'll have to do a little more yourself this way than when using Watson, but I suppose the ease of use is what you're paying for.

  • Anonymous
    May 12, 2006
    Hi everyone! I think your site is very interesting and useful. I always bookmarked it.

  • Anonymous
    May 12, 2006
    Hi everyone! I think your site is very interesting and useful. I always bookmarked it.

  • Anonymous
    May 24, 2006
    Have you ever seen a crash dialog while using a beta product (and maybe even some released products)? ...

  • Anonymous
    June 10, 2006
    Thanks!!! http://www.ringtones-dir.com/get/">http://www.ringtones-dir.com/get/ ringtones site. [URL=http://www.ringtones-dir.com]ringtones download[/URL]: ringtones site, Free nokia ringtones here, Download ringtones FREE. Also [url=http://www.ringtones-dir.com]samsung ringtones[/url] From website .

  • Anonymous
    June 10, 2006
    Hi! http://www.ringtones-dir.com/get/ ringtones site. ringtones site, Free nokia ringtones here, Download ringtones FREE. From website .

  • Anonymous
    January 29, 2008
    In klingelt�ne kostenlos card credit download free generator

  • Anonymous
    January 29, 2008
    Nur apply for student credit card casino sans t�l�chargement

  • Anonymous
    February 01, 2008
    Before advance fax loan no payday advance cash loan overnight

  • Anonymous
    February 01, 2008
    Her cash loan payday till advance cash loan? payday ?

  • Anonymous
    February 02, 2008
    Con probabilit� roulette cash advance loan oregon