Freigeben über


Critical Driver or Cargo Cult Programming?

I've been self hosting Vista on my laptop since sometime in January.  Every Monday morning, without fail, I installed the latest build available from the "main" windows branch, and tried it.

There have been good builds and bad builds - the first few were pretty painful, everything since sometime in March has been wonderfully smooth.

But sometime late in May, things changed for the worse.  Weekly builds installed just fine on my main development machine, but my laptop would get about 3/4ths of the way through the install and stop after a reboot complaining about a problem with the critical system driver <driver>.sys.

Of course, I filed a bug on the problem and moved on - every week I'd update my laptop and it'd fail.  While I was away on vacation, the guys looking into the bug finally figured out what was happening. 

The first part of the problem was easy - something was causing <driver>.sys to fail to load (we don't know what).  But that didn't explain  the unbootable system.

Well, the <driver>.sys driver is the modem driver for my laptop.  Eventually one of the setup devs figured the root cause.  For some totally unknown reason, their inf has the following lines:

[DDInstall.Services]
AddService=<driver>_Service_Inst

[<driver>_Service_Inst]
StartType=0

If you go to msdn and look up DDInstall.Services, you get this page.

If you follow the documentation a bit you find the documentation for the service install section which describes the StartType key - it's the same as the start type for Windows services.

In particular, you find:

StartType= start-code
Specifies when to start the driver as one of the following numerical values, expressed either in decimal or, as shown here, in hexadecimal notation.
0x0 (SERVICE_BOOT_START)
Indicates a driver started by the operating system loader. This value must be used for drivers of devices required for loading the operating system.
0x1 (SERVICE_SYSTEM_START)

Indicates a driver started during operating system initialization.

This value should be used by PnP drivers that do device detection during initialization but are not required to load the system.

For example, a PnP driver that also can detect a legacy device should specify this value in its INF so that its DriverEntry routine will be called to find the legacy device, even if that device cannot be enumerated by the PnP manager.

0x2 (SERVICE_AUTO_START)
Indicates a driver started by the service control manager during system startup. This value should never be used in the INF files for WDM or PnP device drivers.
0x3 (SERVICE_DEMAND_START)

Indicates a driver started on demand, either by the PnP manager when the corresponding device is enumerated or possibly by the service control manager in response to an explicit user demand for a non-PnP device.

This value should be used in the INF files for all WDM drivers of devices that are not required to load the system and for all PnP device drivers that are neither required to load the system nor engaged in device detection.

0x4 (SERVICE_DISABLED)

Indicates a driver that cannot be started.

This value can be used to temporarily disable the driver services for a device, but a device/driver cannot be installed if this value is specified in the service-install section of its INF file.

So in this case, the authors of the modem driver decided that their driver was a boot time critical driver - which, as the documentation clearly states is only intended for drivers required to load the operating system.

So I'll leave it up to you to decide - is this an example of cargo cult programming, or did the authors of this modem driver REALLY think that the driver is a critical system driver?

What makes things worse is that this is a 3rd party driver - we believe that their INF is in error, but we can't fix it because it's owned by the 3rd party.  Our only choice is to baddriver it and prevent Vista from loading that particular driver.  The modem chip in question hasn't been made for many, many years, the vendor for that chip has absolutely no interest in supporting it on Vista, so we can't get it fixed (the laptop is old enough that it's out of OEM support, so there's no joy from that corner either - nobody wants to support this hardware anymore).

Please note: This is NOT an invitation for a "If only the drivers were open source, then you could just fix it" discussion in the comments thread.  The vendor for the modem driver owns the rights to their driver, they get to choose whether or not they want to support it, not Microsoft.

Comments

  • Anonymous
    June 14, 2006
    The comment has been removed
  • Anonymous
    June 14, 2006
    The comment has been removed
  • Anonymous
    June 14, 2006
    Why not support driver compatability workarounds like you do for applications?  Just have a compatability setting that essentially states "for driver X, override setting Y in the INF."

    So, in the case of this modem driver, disallow the use of "StartType=0".

    I don't think you will violate any copyright laws by chaning the way you interpret a data file. Otherwise every new version of a compiler could technically be illegal.

    Oh, an please tell me this sort of thing would not happen today, and would have been detected by the WHQL process.  I'd like to think that certification does serve a purpose.
  • Anonymous
    June 14, 2006
    Unfortunately this is a 32bit platform and the problem isn't that the driver isn't signed.  I wish it was, that would make it "easy".

    We don't know what's wrong with the driver, and making it a critical boot driver makes it essentially undebuggable (the kernel debugger doesn't work on critical boot driver errors).
  • Anonymous
    June 14, 2006
    The comment has been removed
  • Anonymous
    June 14, 2006
    > If you go to msdn and look up DDInstall.Services, you get
    > this page.

    You linked to
    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/devinst_r/hh/DevInst_r/inf-format_d402e9dc-1a6f-423c-b80e-43dd5779b4cc.xml.asp

    You need to link to
    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/DevInst_r/hh/DevInst_r/inf-format_10bcb43e-0799-4dff-981f-2d8c4bf8f835.xml.asp

    And this was a lucky day because "sync toc" worked.

    If you agree with customers that MSDN's table of contents could be improved, perhaps you could suggest that to the maker?

    Meanwhile I'll bet you could fix the inf file yourself.  If the driver works when loading on demand, then you'll still be able to use it.  Even cargo cult inf file editing persons have been known to accomplish such feats.
  • Anonymous
    June 14, 2006
    Running on a chk build may give you clues about why the driver isn't loading.

    I have no idea why it would be boot start.

    I thought all boot start drivers needed signatures regardless of architecture. Maybe the design has changed (again), though.

    And I realized that I was spreading a piece of bad info in my last comment. I forgot that CI doesn't actually use the cert stores. And they may be planning to remove CI's approving of test-signed code by RTM. I have no idea where the final decision on that fell.
  • Anonymous
    June 14, 2006
    The comment has been removed
  • Anonymous
    June 14, 2006
    Why can't the kernel debugger work?  I haven't debugged critical boot time drivers, as I work on drivers that load a lot later.  Of course... you could have always changed the .inf to have the driver start later, rebuilt the install image, and debugged the driver that way.  Note, I'm not saying to release it that way.  Of course, that debugging would just be for your information, but it might be interesting and point to a bug in windows.
  • Anonymous
    June 14, 2006
    Um, not to ask the obvious question, but why don't you change the start type?
  • Anonymous
    June 14, 2006
    Doesn't Windows allow you to patch stuff on load? Thus, couldn't you patch the INF file as it is read without modifying the original file?

    [Note: I don't actually know anything about how Windows Patches files for compatibility]
  • Anonymous
    June 14, 2006
    He could change it for his single install, but that's not the point.

    The point is that the OEM provided INF sets the start type incorrectly, and Microsoft cannot change it, because they don't own the driver.  The problem comes in including this driver with Vista, as it will cause this same problem if you happen to own that particular device.
  • Anonymous
    June 14, 2006
    Baljemmet, something changed, we don't know what.  The starttypes haven't changed since long before NT 3.1 shipped.

    Jeff: Why can't it be debugged? Because the kernel debugger is loaded after the critical drivers are loaded, it can't be used to debug them (at least I can't, others might be able to) :(

    Dispensa, we can't change the start type because it's not our driver.  Maybe it really DOES need to be a critical driver for some reason we're not aware of.  And I can't change the start type after the fact because I can't boot the OS to change the start type because a critical driver isn't loading (Catch 22).

    And Manip, we don't do that appcompat stuff for drivers (as far as I know, I'm not a driver guru).  And the IHV owns their INF file, we don't.  What happens if we decide to unilaterally change their INF file and we break stuff by doing it?  The IHV has told us that the're not supporting this device any more, and the OEM has explictly removed my laptop from their list of supported machines (for ANY operating system), so there's not much we can do about it at this point.
  • Anonymous
    June 14, 2006
    The solution for the catch-22 is, of course, boot another OS and change the seting from outside. Having just installed Vista, I noticed you can boot from the CD, choose recovery something and get a command prompt. From there, you could mount HKLM of the borked OS, and changre the start type in the registry.

    Not a long-term solution though..
  • Anonymous
    June 14, 2006
    Just wondering - how old is the laptop? And could you at please tell us who the vendor is so the rest of us can try to use an alternate vendor if we are planning on buying kit that we intend to use for more than X years? (Where X is the age of your laptop)

    It's not libel if it's true.
  • Anonymous
    June 15, 2006
    The comment has been removed
  • Anonymous
    June 15, 2006
    The comment has been removed
  • Anonymous
    June 15, 2006
    Jeff, the OS doesn't get that far in booting - this failure occurs BEFORE ntoskrnl.exe is loaded.

    Jonathan, if I had access to a Vista DVD, I think I could fix it with emergency repair mode, but unfortunately, I don't.

    Adam, I can't tell you which laptop it is, because I don't know if the place where I got the information from is company confidential (that's why I obscured all the info in the post).

    What I will say is that it's a 3 year old laptop, it's long out of warranty, and it technically doesn't even come close to meeting Vista's hardware requirements (although it runs Vista just fine for me).

  • Anonymous
    June 15, 2006
    My guess is cargo cult programming. A system which required the modem drier before the kernel loaded would be a pretty messed up design, IMO.
  • Anonymous
    June 15, 2006
    Then Vista's "Requirements" really are requirements are they?  Maybe the requirements need to be made more realistic.
  • Anonymous
    June 15, 2006
    The comment has been removed
  • Anonymous
    June 15, 2006
    Perhep talking with the driver owner and ask them if they wish to include the driver in the default driver set of Vista? So people don't need to install the driver from disk, and the fixed default version will "just work"?

    Honestly, if I own a piece of old hardware on the system I'm installing Vista, and found Vista automatically recognize it, I wouldn't bother to reinstall the driver from the old driver disk, especially when the old driver is not "made for Vista".
  • Anonymous
    June 15, 2006
    From casual reading of the NTDEV mailing list at osronline (where many driver-writers seem to go), it seems like the whole field of creating/editing INF files is just cargo-cult scripting.  There was even talk of replacing INF with some XML-based configuration system, but driver-writers have been tweaking these files for so long that they just don't want to throw them away.

    Based on that list, it seems like a lot of kernel-mode driver programming involves taking a sample and just modding it to work with one's device.  The world will be a better place when everyone's writing user-mode drivers and MSFT is the only one mucking about in the kernel.
  • Anonymous
    June 15, 2006
    cough Dell Lattitude C610 cough ;-)
  • Anonymous
    June 15, 2006
    Heh - this is, of course, part of the reason sensible Linux users avoid closed-source drivers (the other reasons being an unwillingess  to maintain a stable ABI and a semi-religious devotion to OSS). Unfortunately, were Linux to catch on in any big way, you'd probably end up with users installing badly-written binary-only drivers left, right and center, just like they do with unsigned Windows drivers today, and with exactly the same results.

    To be honest, I don't think there's any real solution to the problem of bad drivers. It's slightly odd, though, that Windows driver development would suffer so much from cargo cult development and Linux driver development doesn't seem to that often (even though the documentation is often sparse and driver writers have full access to the source of the kernel and lots of other drivers). Perhaps it's just less noticable because they use more suitable drivers as a base...
  • Anonymous
    June 16, 2006
    Your operating system is only as strong as its weakest link.  In this case it's an outside vendor.  Having your OS dependent on outside forces, especially ones that are probably going for quick-and-easy rather than correct can always cause problems, no matter what the development model is.

    So in this case having the source code could have helped.  It wouldnt have necessarily had to have been open source though.. MS could have required the sourc be kept in escrow for just such an eventuality.  Or they could have a team that would review INF files before certifying a driver.  There are many ways this could have been avoided, but as you said it's pretty much too late to do anything about it now.
  • Anonymous
    June 16, 2006
    Manip, not that one :)  It turns out that there are at least two other laptop models from different OEMs with the same problem (another reason I didn't mention it).

    Cheong, I believe that we asked them and they weren't interested in supporting that chipset on Vista (reading between the lines in some emails).
  • Anonymous
    June 17, 2006
    > the OS doesn't get that far in booting -
    > this failure occurs BEFORE ntoskrnl.exe is loaded.

    I doubt it.

    ntldr loads ntoskrnl, HAL, and all the boot start drivers into memory, and then transfers control to ntoskrnl.  ntoskrnl is responsible for initializing the drivers (e.g. calling DriverEntry, doing the PNP dance, etc.).  It is impossible for a driver to fail before ntoskrnl is loaded.

    Also...  ntoskrnl initializes the kernel debugger VERY early in the boot processs.  You absolutely can debug boot start drivers -- I've done it many times.
  • Anonymous
    June 17, 2006
    "So in this case, the authors of the modem driver decided that their driver was a boot time critical driver - which, as the documentation clearly states is only intended for drivers required to load the operating system."

    That's not accurate.

    "0x0 (SERVICE_BOOT_START)
    Indicates a driver started by the operating system loader.
    This value must be used for drivers of devices required for loading the operating system. "

    The documentation clearly states that drivers that are required for loading the operating system must use this startype.  It does NOT at ALL, let alone clearly or ambiguously, say that it is ONLY for drivers that are required for the loading of the operating system.

  • Anonymous
    June 17, 2006
    Why wouldn't the company want to make such a minor change to support that hardware? Is it because they're excruciatingly lazy, or because they think that if they make it work, they're going to get support calls in the future from people trying to find out what's up with potential bugs in the driver?

    I just hate to see old (But, evidently, perfectly servicable) hardware go to waste because hardware vendors decide not to update their drivers anymore...
  • Anonymous
    June 19, 2006
    Ugh...not sure why anyone would want to run Vista on a Latitude C-series...it's best to leave them chugging away on 2K/XP, which they do quite happily with RAM upgrades.
  • Anonymous
    June 23, 2006
    I know it's not really the point, but wouldn't disabling the modem through BIOS prevent the driver from wanting to load in the first place?

    (Yeah, I'm kind of assuming you don't need it..)
  • Anonymous
    June 23, 2006
    >Please note: This is NOT an invitation for a "If only the drivers were open source,
    >then you could just fix it" discussion in the comments thread.  The vendor
    >for the modem driver owns the rights to their driver, they get to choose
    >whether or not they want to support it, not Microsoft.

    Thinking forward, I'd hope that MS would have just a little bit more control over the code that they ship with Vista.

    If you don't have the source, you can't do a thorough code review and make any worthwhile claims about stability or security. At least some user-space drivers limit their capability for damage to their domains, which is bad enough.

    If you don't have permission to maintain the drivers, then you've given IHVs control over Vista's market. One person's aging laptop is another computer that won't be getting Vista, whether MS wanted to sell another copy or not.
  • Anonymous
    June 30, 2006
    The comment has been removed
  • Anonymous
    July 04, 2006
    The comment has been removed
  • Anonymous
    July 05, 2006
    I used to have a machine where NTBOOTDD.SYS did indeed have to be loaded before the HAL and NTOSKRNL.EXE, because the HAL and NTOSKRNL.EXE were in a partition that the BIOS's INT13 wouldn't reach.

    I never figured out how to rename both a suitable version of ATAPI.SYS and a SCSI driver to both be C:NTBOOTDD.SYS in order to use the boot loader's menu sensibly.  Eventually I got an even larger internal hard drive and no longer had to put an alternate installation on an external drive.

    Meanwhile I still think Microsoft has the source code of a modem's INF file and can change that one from boot start to demand start.
  • Anonymous
    June 12, 2009
    PingBack from http://cellulitecreamsite.info/story.php?id=2136