Freigeben über


This one's for you John. The core OS team didn't forget you

Way back when, back in the very early days of this blog (actually it was the 3rd post to my blog), I wrote a story about John Vert complaining about CTRL-C not working on network commands.

Well, yesterday I got a piece of email from one of the developers in COSD.  I've sanitized it a bit, but here's the important part:

Microsoft Windows [Version 6.0.<build>](C) Copyright 1985-2005 Microsoft Corp. d:\>dir \\<server>\dfgThe I/O operation has been aborted because of either a thread exit or an application request.

d:\>dir \\<server>\dfgThe I/O operation has been aborted because of either a thread exit or an application request. 

So John,  this one's for you, even though it's been 13 years since I worked on that code, your complaint wasn't ignored, and it's finally been fixed.

I have no idea what build will contain the fix, or even if the fix will make the final product, but it's getting there.

As I type this, I can just imagine the /. headline: "Microsoft takes 13 years to fix a bug".  The reality is WAY more complicated than that.  To actually make this fix work required a significant amount of change to the I/O subsystem and a number of changes to the way that I/O cancellation works. The biggest piece of the picture is the new CancelSynchronousIo API that was added to Vista to handle just this situation, without that support (as mentioned in my the original article), it wouldn't have been possible to fix the problem.

Comments

  • Anonymous
    April 19, 2006
    The comment has been removed

  • Anonymous
    April 19, 2006
    Not to create the time machine to go back 13 years with a list of bugs.

    Mike

  • Anonymous
    April 19, 2006
    Mike, I've never seen the redirector take more than a couple of seconds to stop (I actually do that every few days, go figure), and I've never seen it CRASH.

    If you've reported the crashes to MS, the redirector team should have your crash data and can figure out what went wrong.

    And the quality of 3rd party drivers is a significant issue.

  • Anonymous
    April 19, 2006
    > If you've reported the crashes to MS

    I've had around a dozen kernel crashes (BSODs) where Windows didn't offer to report the crashes to Microsoft because whatever bug caused the network connection to not work also prevented reports of its own crash.

    In user mode I've had a few hundred process crashes where dumpprep.exe and another Dr. Watson process were executing and nearly hanging the CPU but they never offered to send crash reports because whatever bug caused the network connection to not work also prevented reports of its own crash.  These didn't cause BSODs but still the only way out was to reboot.

  • Anonymous
    April 19, 2006
    The behavior that annoys me the most about this is when I try to use tab-completion:

    > dir \misspelled-servershare<hits tab> <curses for 30 seconds>

    Will I be able to stop that?

  • Anonymous
    April 19, 2006
    Does anybody know how long it took other OSes to implement the ability to cancel synchronous IOs? Maybe I just don't know what to look for, but I couldn't find any other OS that implements it. All I can find are calls to cancel async IO (Solaris, Linux, VMS).

    It really is a shame that other systems don't implement synch IO cancellation, because it's really annoying when your whole group of Unix systems goes down due to a single NFS hard mount failure.

  • Anonymous
    April 20, 2006
    > The reality is WAY more complicated than that.

    Not really.  The richest software company in the world, with the best engineers money can buy, takes 13 years to make control-C work.  I think that's pretty simple actually.

    You can change the statement to say that your engineers were so incompetent they designed things so poorly that it took 13 years of valliant redesign and effort to fix the bug, but I'd argue that's even worse.

  • Anonymous
    April 20, 2006
    Vince,

    Have you ever had a class or read an in-depth book on operating systems engineering?

    Didn't think so.

    James

  • Anonymous
    April 20, 2006
    The comment has been removed

  • Anonymous
    April 20, 2006
    Why couldn't the problem have been solved earlier by calling TerminateProcess in the CTRL-C handler?

  • Anonymous
    April 20, 2006
    Why do people assume you have to hang out on Slashdot to be anti-MS?  I've been anti-MS since before slashdot was a glimmer in Rob Malda's eye.

    Taking 13 years to fix a bug like this is inexcusable.  If a company is going to hide its code and development processes and not have an open bug tracking system, then it will be judged by what info is realeased.  

    How am I supposed to believe all of those smarmy "you can do anything with our software" MS ads if all I wanted to do was get control-C to cancel some IO?  In any case the 13 year bug will likely be a 15 or more one, because I am sure it's not going to be fixed in any sort of release any time soon.

  • Anonymous
    April 20, 2006
    The comment has been removed

  • Anonymous
    April 21, 2006
    The comment has been removed

  • Anonymous
    April 24, 2006
    > Vince, I don't know you or your experience, but it's
    > clear to me from your comments that you've never
    > ever written software for platform with widespread
    > use.

    I like how people can somehow analyze my software experience from a few posts I make on a blog.

    If by "widespread use" you mean code that is in Windows, well of course not.

    If you mean "is currently running on millions of computers", then yes.  Code of mine is included in the Linux kernel.  You're free to download the Linux source and view it, critique it all you want.  

    I'll notice I can't view any kernel code that you've written, or for that matter any of the kernel code your company produces.  So I'm the one whose at a disadvantage when considering your programming skills.

    >  As Ryan mentions above, this stuff is HARD,
    > especially if you want to get it right.

    Well of course, if you want to be whiny about it.  Honestly, all programming is hard.  That's no excuse.

  • Anonymous
    April 25, 2006
    What vince[sic] is missing here is that we're talking about behavior on builtin commands to the shell.  Maybe we can debate whether "dir" should be builtin or not but it is and as such, this isn't just a "simple" decision to terminate a process.

    As Gabe points out above, (just about) nobody else has support for cancelling in-flight synchronous I/O.  (VMS had it indirectly since all sync I/O was actually async I/O followed by a EF wait but I'm not sure that sys$waitef actually was interruptable...)

    Oh, wait, that's right.  Don't feed the troll.  Someday I'll learn.

  • Anonymous
    April 25, 2006
    Vince, you are very annoying. You are saying no more than: "MS, you are the bad boys, because you are not open souce". So, please stop bothering us with your useless posts.

  • Anonymous
    April 26, 2006
    The comment has been removed

  • Anonymous
    April 26, 2006
    Rune,
     That's exactly the crux of the problem.  The 30 second timeout comes from TCP/IP, not from the user.  The timeouts that are appropriate for a networking stack aren't appropriate for interacting with a user.

  • Anonymous
    April 26, 2006
    OK, so how do I change the timeouts used by the TCP/IP stack? ;)

    Our own applications tend to use their own timeout mechanisms in order to detect a broken connection faster. When dealing with realtime stock information, you do not want to wait 30 seconds before being hooked up to an alternative server... (no, we do not utilise satellite connections to our end users)

    I read a whitepaper once on TCP/IP and all related registry settings, but I do not remember seeing anything about timeouts. Could we have some additional tweakings in Vista Server please? (while we're at it: where are the Vista Server betas? ;-) )

  • Anonymous
    April 26, 2006
    Rune,
     You don't.  You get a choice.  

    You can accept the timeouts in the TCP/IP stack (which are the exact same timeouts in every single TCP/IP stack in the world) and interoperate with all the other TCP/IP implementations.

    Or, you can mess with the timeouts and interoperate with somewhere around none of the TCP/IP stacks in the world.

    Microsoft originally shipped a TCP/IP stack with timeouts that were reasonable for a LAN (and thus more palatible to human beigns), and got slammed hideously when it hit the internet because the MS TCP stack timed out too early for real-world situations.

    For the first service pack of NT 3.1, the timeouts were reset to match the values that everyone else expected.

  • Anonymous
    April 26, 2006
    The comment has been removed

  • Anonymous
    April 26, 2006
    The comment has been removed

  • Anonymous
    May 07, 2006
    Which is better:  Microsoft takes 13 years to fix a bug?  or Microsoft resolves to refuse to fix a bug, which Microsoft has acknowledged to be a bug, because Microsoft has repetitively shipped the same bug for maybe longer than 13 years?

    http://lab.msdn.microsoft.com/ProductFeedback/viewfeedback.aspx?feedbackid=797fe333-fcf1-478b-b1eb-829602fe7d15&lc=1033

    Don't worry about whether anything should be done about it.  (That's only if you had any inclinations of wondering whether to worry about it.  You didn't need to.  Microsoft's decision is quite clear enough.  But if the occasion ever arises for me to submit another bug report in this here public bug database, don't be surprised when the level of cynicism will be even higher.)

  • Anonymous
    May 19, 2006
    Wow, Norman, you seem pretty upset because one that DlllMain with the generic type HANDLE instead of HINSTANCE?  

    That doesn't cause any problem, it's only theorical, and I didn't really get from your posts on usenet if it caused a problem when using STRICT.  The two types are exactly equal in RAM, void pointers.

    They could have easily put an intern to fix change the wizard, but I can imagine from my company's experience the bug request getting lost by being badly logged.  

    It says "Visual C++ generates bugs in DLLs".  That's false,  that's like saying the house is on fire.  It's merely a semantic problem in template of the wizard.  Even less important if as you say the doc used to be that way as well.

    When bugs are logged more specifically they have a better chance.  There is no anonymous company entity, somewhere there is a human that's reading and triaging what gets done over what.

    I don't work for Microsoft, but I can easily imagine someone looking at this bug toward the end of a cycle, thinking by the title it's a serious compiler bug, then seeing it's not, then thinking hey, it's been that way for years and no one ever complained.  Resolved : "Won't fix."  We do the same.

    And why did no one else complain.  Easy!  No one cares! Defining your DllMAin with HANDLE or HINSTANCE results in the same compiled code, It's just syntax suggar.

    This type of bug would be or will be logged by multiple users if it was really important, so it's no big deal if one of it gets resolved as 'won't fix'.  It's the critical compiler bugs, crashes, leaks, etc, you need to worry about.

  • Anonymous
    May 31, 2009
    PingBack from http://woodtvstand.info/story.php?id=9189

  • Anonymous
    June 02, 2009
    PingBack from http://woodtvstand.info/story.php?id=50377

  • Anonymous
    June 19, 2009
    PingBack from http://debtsolutionsnow.info/story.php?id=11396