It's on the whiteboard
Way back when, when we were first shipping NT 3.1, checking files into the source tree was pretty easy. You made your changes and checked them in. Not a big deal, since there were only 20 or so people working on the code base - the chances of collision were relatively small, and the codebase was pretty managible. There was a small team of people who had the job of doing nightly builds, it was their responsibility to ensure that a build was done every day, and that the build worked and passed BVTs (the team was something like 5 people, if I recall).
At some point, a number of groups joined the core NT team, and the NT team grew to a couple of hundred developers. Not surprisingly, the system that had worked for the 20 or so people didn't scale to the hundreds of people who were now using the system. It got so bad that we often went for days at a time without being able to have a good build.
We tried community shame (if I had a scanner here at work, I'd scan in the picture of me from back in those days wearing goat horns), it didn't work (I have no shame). We tried staged checkins (each team gets a dedicated hour to check in). We tried darned near everything, but the problem was that our old system simply didn't scale to the size of the group.
Eventually things got so bad that Dave Cutler ended up moving into the NT build lab to directly supervise the builds. It was a varient of the "community shame" solution, but instead of being forced to wear a silly costume, you had to explain why you screwed up to Dave directly, and it was FAR more effective (there's nothing like being grilled by Dave Cutler to instill fear into a developer).
In order to manage the volume of changes, Dave instituted the "Whiteboard". Basically he got Microsoft to buy a 4 foot by 12 foot whiteboard, and had them mount it vertically in the build lab across from where he sat. When you had a change ready to check in, you went to the whiteboard and wrote the your name, the bug #, the module being changed, and a contact number. Dave would then periodically run down the board and call individuals to get them to check in their changes. The cool thing about this mechanism was that Dave could control the build process - he could do sanity builds after individuals (like me) who had a propensity of breaking the build, he could batch changes from the same group together, etc.
It also provided a dramatic visual representation of the state of NT - when the whiteboard was full, the product had lots of bugs, when it was clear, we were close to being done. And when it was empty, we had shipped the product.
Of course, the whiteboard didn't really scale, even to a project the size of NT 3.1. And today, Vista is vastly more complicated - there are several thousand developers contributing code into a a bazillion binaries composed of a gajillion source files (I don't know how many, but there's a lot of them). There's no way that the whiteboard could concievably scale today. Instead, we have a main build lab (which produces the final bits of the product) and a series of "virtual build labs", each of which is responsible for aggregating changes from a set of Windows developers. Its far more scalable than the old system, and significantly more flexible (at a minimum, it doesn't require that a Senior Distinguished Engineer spend all his time making sure that the build completes successfully).
Comments
- Anonymous
October 12, 2005
I met Dave Cutler at the NT announcement event in San Francisco in the Summer of 1992. What stood out for me based on lobby conversations and chatting was how he showed a very distinctive trait of system architects. He was very clear on the invariants that he would hold onto no matter what, confident that anything else would be fixable.
It is no surprise as development processed moved to shipping that Dave would be involved in the operational end, finding the key thing to keep tied down.
That's a great story. Thanks. - Anonymous
October 12, 2005
> when the whiteboard was full, the product
> had lots of bugs, when it was clear, we were
> close to being done. And when it was empty,
> we had shipped the product.
When the whiteboard was full, you were far from shipping. When it was clear, you were close to shipping. And when it was empty, you had shipped the product. The number of bugs fluctuates independently of that status. - Anonymous
October 12, 2005
Norman, every software product ever shipped has shipped with known bugs.
Every single one of them. The only question is if the bugs are sufficiently bad to justify holding up the shipment.
So as long as there are bugs bad enough to justify holding the product up, there will be fixes for those bugs (as we find the bugs, we fix them). - Anonymous
October 12, 2005
Nice posting, very informative. Please blog more of these stories - Anonymous
October 12, 2005
hmm, well I guess I should be paying attention in /Software Engineering/ then... However much I happen to hate it. - Anonymous
October 13, 2005
The comment has been removed - Anonymous
October 13, 2005
The comment has been removed - Anonymous
October 13, 2005
I have to sit there and laugh, actually it is amazing you could do that with 20 people. Let alone thousands. We still have problems sometimes with a 5 man team tripping over each other. Also as far as shipping software with bugs. Yep I have had to do the same myself. Sometimes it has to happen - Anonymous
October 13, 2005
I'd love to hear how you got past the whiteboard and scaled up to hundreds of developers. In other words, it sounds like at one time it was very dependent on the Dave Cutler personality, so who was instrumental in moving beyond that to a more scalable build arrangement? Great story, thanks. - Anonymous
October 13, 2005
The vbl system isn't all skittles and beer... The multiple layers of branch heirarchy and then buerocracy involved in getting fixes integrated between them means it can literally take months for a fix to propagate through the system. Even so it's better than what we had before where you counted yourself lucky to get a set of source that didn't have a build break. - Anonymous
October 13, 2005
Thursday, October 13, 2005 1:30 AM by LarryOsterman
> Norman, every software product ever shipped
> has shipped with known bugs.
There are a few that ship only with unknown bugs. I think we can agree by deleting one word from your sentence: every software product ever shipped has shipped with bugs.
> The only question is if the bugs are
> sufficiently bad to justify holding up the
> shipment.
Sure, but we have pretty big differences of opinion over what kind of bug is sufficiently bad. I think that a bug which destroys all files in a disk partition is an example of sufficiently bad.
But get this: even when Windows 95 did that to me, even after I got it tracked down, I felt somewhat understanding because I know that every product ships with bugs. What made the difference was your company's reneging on warranties, refusal to allow bug reports to be submitted without the victim paying a fee, and then when your company accidentally allowed a contact which led to discussion of this bug, your company told lies denying it, and then switched to lies saying that it wasn't serious because the number of victims was low and the product was old. (The number of victims was not low, only the number of victims who understood it was low. And Windows 95 was still being shipped to corporate customers.)
So we agree on the fact of bugs, but we have widely differing opinions on what kind of bug is serious and whether fixes should be delivered to customers. - Anonymous
October 13, 2005
Perhaps the whiteboard could be scaled using a "draft" repository and a "real" one. Checking a change into the former queues up a whiteboard-style entry on an integration list; the integration team pulls in the changes at their discretion, as they did with the whiteboard, and anything that breaks the build gets dumped back in the lap of the engineer who did the check-in, at which point he has to provide a replacement or followup.
I've never worked with that large an organization, so this is purely speculation. - Anonymous
October 13, 2005
The comment has been removed - Anonymous
October 13, 2005
The comment has been removed