Venting steam

项目
06/10/2005

Ok, today I'm going to vent a bit...

This has been an extraordinarily frustrating week (that's a large part of why I've had virtually no technical content this week). Think of this one as a peek behind the curtain into a bit of what happens behind the scenes here.

The week started off great, on Tuesday, we had a meeting that finally put the final pieces together on a month-long multi-group design effort that I've been driving (over the course of the month, the effort's wandered through the core windows team, the security team, the terminal services team, the multimedia team, and I don't know how many other teams). For me, it's been a truly challenging development effort and I was really happy to see it finally come to a conclusion. I've been working on developing the non controversial work, and that stuff has been going pretty well.

On Wednesday, I started trying to test the next set of changes I've made.

I dropped a version of win32k.sys that I'd built (since my feature involves some minor changes to win32k.sys) onto my test machine and rebooted. Kaboom. The system failed to boot. It turns out that you can't drop a checked version of the win32k.sys onto a retail build (yeah, I test on a retail OS). This isn't totally surprising, if I'd thought about it I'd have realized that it wouldn't work.

But it's not the end of the world, I rebooted my test machine back to the safe build - you always have to have a safe build if you're doing OS development, otherwise if the test OS crashes irretrievably (and that does happen on test OSs), you need to be able to recover your system.

Unfortunately, one of the security changes in Longhorn meant that I was unable to put the working version of win32k.sys back on my machine when running my safe build. Not a huge deal, and if I'd been thinking about it I could have probably tried the recovery console to repair the system.

Instead, I decided to try to install the checked build on my test machine (that way I'd be able to just copy my checked binary over)

One of the tools we have internally is used to automate the installation of a new OS. Since we do this regularly, it's an invaluable tool. Essentially, after installing it on our test machine, we can click a couple of buttons and have the latest build installed cleanly on our test machines (or we can click a different set of buttons and have a built upgraded, etc). It's extraordinarily useful because it pretty much guarantees that we don't have to waste time chasing down a debugger and installing it, enabling the kernel debugger, etc. It's a highly specialized tool, and is totally unsuitable for general distribution, but boy is it useful if you're installing a new build once a week or so.

I installed the checked build, and my test machine went to work copying the binaries and running setup. A while later, it had rebooted.

It turns out that the driver for the network card in my test machine isn't in the current Longhorn build - this is temporary, but... No big deal, I have a copy of the driver for the network card saved on the test machine's hard disk.

The thing is, sometimes (as often happens) the auto-install tool is temperamental. It can be extremely sensitive to failure scenarios (if one of the domain controllers is unavailable, bad sectors on the disk, etc). And this week the tool was particularly temperamental. And it turns out that not having a network card is one of the situations that makes the tool temperamental. If you don't get things just right, the script can get "stuck" (that's the problem with an automated solution - it's automated, and if something goes wrong, it gets upset).

And that's what happened. My test machine got stuck somewhere in the middle of running the scripts. I'm not even sure where in the scripts it got stuck, since the tool doesn't report progress (it's intended for unattended use, so that normally isn't necessary).

Sigh. Well, it's time to reinstall. And reinstall. And reinstall. The stupid tool got stuck three different times. All at the same place. It's quite frustrating. I'm skipping a bunch of stuff that went on here as I tried to make progress, but you get the picture. I think I did this about 4 times yesterday alone.

And of course the team expert for this tool is on vacation, so...

This morning, I'm trying one more time.

** Flashes to an image of someone banging their head against the wall exclaiming that they're hoping it will stop hurting soon **

I just want to get to testing my code - I've got a bunch of work to do on this silly feature and the stupid tool is getting in my way. Aargh.

Oh, and one of the program managers on the team that's asking for my new feature just added a new requirement to the feature. That's going to involve even more cross-group discussions and coordination of work items.

Oh well. on the other hand, I've made some decent progress documenting the new feature in it's feature spec, and I've been to some really quite interesting meetings about the process for our annual review cycle (which runs through this month).

Edit: One of the testers in my group came by and helped me get the machine unstuck. Yay.

Comments

Anonymous
June 10, 2005
Boy, do I know that feeling...when it almost seems that something is determined not to let you even test your program, it's infuriating!
I'm not sure I feel comforted by that you have the same problem at MSFT sometimes...
Anonymous
June 10, 2005
Why don't you have a dedicated Virtual PC machine? That way you can trash your machine, delete the drive image and just copy / paste back over the top from a backup.
Anonymous
June 10, 2005
Manip,
Because a dedicated VirtualPC machine is great if I'm not updating the OS.

But I'm putting a brand new OS.

The tool does everything you're describing for me, but it runs on the machine (which means I don't get the performance hit of VirtualPC)
Anonymous
June 10, 2005
I was thinking the same thing..use virtual pc, but when doing os development, it's better to rely on physical devices and not emulated stuff. What if there's a bug in virtual pc...?
Anonymous
June 10, 2005
Even automated tools should produce output (to be redirected to a log file or whatever) so one can see what went wrong when something does. Escpecially tools that are used internally and do not undergo the testing released product goes through. And yes, I've learned the hard way :)
Anonymous
June 10, 2005
The comment has been removed
Anonymous
June 10, 2005
The comment has been removed
Anonymous
June 10, 2005
"which means I don't get the performance hit of VirtualPC"

And that is why you need the quad XEON machine with 4GB of RAM ;-)
Anonymous
June 10, 2005
"I was thinking the same thing..use virtual pc, but when doing os development, it's better to rely on physical devices and not emulated stuff. What if there's a bug in virtual pc...? "
The opposite may also be true: your hardware contains a bug which the virtualization environment does not contain.

Think about developing an OS for an embedded system, where you're developing the hardware at the same time as the software... and you may also not have the luxury of having any real hardware finished for you to test on. Emulation comes to rescue!

Virtualization is very good when you can use it. Would be nice if MS Virtual Server supported custom virtual hardware. Like a "Virtual Server DDK" :)

Btw. USB support is nr. 3 at the "Most wanted features" list at www.virtualization.info.
Anonymous
June 10, 2005
"And it turns out that not having a network card is one of the situations that makes the tool temperamental. If you don't get things just right, the script can get "stuck" (that's the problem with an automated solution - it's automated, and if something goes wrong, it gets upset)."

Glad I'm not the only person who has tripped across that problem. I worked as an SOE Build guy for a large company. Our NT4 scripted build would fail when it couldn't a network card (go figure!). The solution? Detect if a card was missing, and then install the loopback adapter.
Anonymous
June 10, 2005
Andreas Haeber wrote:
"Virtualization is very good when you can use it."

And it is ideal for Application Packaging. You save a bunch of the time not having to re-imaging a physical PC.
Anonymous
June 11, 2005
D. Absolutely. VirtualPC is a developers dream for a certain class of developers.

Unfortunately, I'm not a member of that class in my current job. In previous jobs it would have been quite nice, but...
Anonymous
June 11, 2005
The comment has been removed
Anonymous
June 12, 2005
Mike,
Remote audio works just fine today in Windows XP. I'm not 100% about dsound but I believe it works too.

I don't know what "activex audio" is. There are two ways of playing audio on Windows - the MME APIs (PlaySound, waveOutXxx) and DSound (DShow uses DSound). I don't know what activex audio is.

I can't speak about D3D remoting, I'm not on the remote team.

And venting to Bill wouldn't help. This was just a stupid tool issue. And I've complained to the right people.

My customers won't see this, ever.
Anonymous
June 12, 2005
> Unfortunately, one of the security changes
> in Longhorn meant that I was unable to put
> the working version of win32k.sys back on my
> machine when running my safe build.

I don't know enough about the security changes between current systems and Longhorn, but with current systems this seems pretty trivial. The OS that you're debugging is installed in some partition, say E. The safe OS that you use for recovery is in some other partition, say D. You boot the one in D, look at E:WindowsSystem32, and assign that folder's ownership to the Administrators group so you can copy your safe win32.sys back to that directory. Will the Longhorn on partition E refuse to boot itself when it detects that its System32 directory has been modified that way?

Regarding parallel installations for this kind of recovery, some Knowledge Base articles even used to recommend it back in the days of NT4, but now Microsoft says anyone doing this has to pay for multiple licences for their one machine. Don't tell anyone, but before I noticed that about licences, on one machine I activated Windows XP installations on both partitions D and F. I've only needed to boot that F version around 5 times though.

Hmm wait a minute, on one friend's machine where I couldn't log in through the recovery console, I put a parallel installation on partition E even though his real one was on D. After that I could repair his D, so he didn't lose any data. I don't remember if I activated the one on his E. (Actually I had told him to put all his data files on E so that if his installation on D dies then we can wipe D and reinstall, but he didn't understand and he still had a bunch of stuff in "My Documents".)
Anonymous
June 12, 2005
The comment has been removed
Anonymous
June 12, 2005
The comment has been removed
Anonymous
June 13, 2005
Joku: You can do the same on Virtual PC/Server too. Also you can directly use a harddisk, except the hdd the host os runs on AFAIK. But there doesn't seem to be any interface available to add more hardware types, at least not from the SDK.
Anonymous
June 14, 2005
The comment has been removed
Anonymous
May 30, 2009
PingBack from http://outdoorceilingfansite.info/story.php?id=5661
Anonymous
June 09, 2009
PingBack from http://greenteafatburner.info/story.php?id=1291

通过

Venting steam

Comments

其他资源