共用方式為


Gosh, programming is hard!

I've been programming since I was a wee tike of 9.  Now, 28 years later, I'm amazed because while I have grown proficient in a lot of the skills of software design and engineering, it's nonetheless been proven too many times to count that it's unbelievably hard to write correct programs.

This is near and dear to my heart since I work on Windows.  I know the callous characterizations of Windows and Microsoft in general as “good enough” and being profit rather than engineering excellence but I like to believe myself to be part of the engineering excellence cadre.  The scale and importance of Windows quality really drives this home in terms of costs, perception and more recently security ramifications.  (Note that engineering is all about economics, not just theoretical modelling.  Having the $$$ to continue to pay for your ongoing engineering costs is a factor that you may ignore at your own peril.  I will not comment at all on the obvious ancilliary topic of liability for hopefully obvious reasons.)

My point I wanted to make tonight is that it's really absolutely mind-numbingly hard to get everything right.  It makes me question if we're just doing something fundamentally wrong in how we build automata like our programs today.  (I'm a mathematician by training, not a computer scientist so while I can talk a good game about complexity theory and compare and contrast imperative vs. functional languages, I can't mentally live in the everythings-a-bignum world that people who really entrench themselves in this part of programming theory/philosophy live in.)

Let's just take a look at some bugs that are all over everyone's code base.  Here's a great example:

i = i + 1;

wow, that was rocket science!  But wait, it has a bug!  (Language lawyers who want to quote how most language designs leave the details of overflow situations to the implementation need not apply.  They'll want to say something like “there is no bug there”.  auto-bignum support also need not apply because I don't work on the kinds of components and applications which can afford to make statements like “oh, we'll just do a heap allocation for any number over 255”.  Languages which turn on checked arithmetic get one silver star here but (a) the side effects of the checking probably introduce worse effects into the overall system correctness than the overflow and (b) even in C# you have to ask for this to be turned on.)

Don't dismiss this example until you consider that such an overflow can easily lead to invalid global invariants.  It's easy to pick on buffer overflows here since they get the attention in the press, but at some point global invariant failure will lead to a large number of interesting exploits.  (For example, imagine that the programmer made such a horrible error as using a cardinal type whose precision is less than the number of object instances that can be simultaneously constructed.  Even just getting people to switch to size_t/SIZE_T isn't trivial and it just takes one.)

The most distressing thing here is that it did take 20+ years for the fact that this is a serious coding bug that can lead to viri and worms propagating around the world.  Maybe I'm just dumber than your average person but I worry that everyone thinks that programming is getting easier with all the whiz-bang tools and techniques coming together.  From my perspective I'm just finally now understanding how truly and deeply hard it is.  If I could send a message to myself 5 years ago with what I've learned from working on Windows, I would have thought that the message was obviously a fake because of how absurd the issues sound on the face of it.

I don't know what I'm really going to do with this blog; I hope that somehow I can get it as useful as Raymond Chen's blog but he's Useful; I'm just Relatively Effective.  :-)  I hope this is a good start; I've been looking for a forum for “programming deconstructionism“ as I call it.  Maybe we'll all learn something; I know I still learn a lot that truly expands my mental models every day.  Once enough deconstruction has occurred; I'll enter into the re-constructionism period and try to navigate some safe passage between Scylla and Charybdis.

Oh, and I work on Fusion, the component / application composition and versioning model.  A lot of the reason that “dll hell“ got to be the problem that it is is because implementation errors forced so much backwards compatibility and forking of source code.  Our quality bar is extremely high because even now as we evolve from XP to CLR 1.0/1.1 to Everett to Whidbey to Longhorn, every bug in our base implementations causes us untold grief.  This isn't anything unusual but when the infrastructure has a break, we end up with terrible decisions for customers like people having to write programs that work on one platform but possibly not all of them.

Intellectual exercise for people who know windows - if we know that oleaut32 is the code that registers type libraries, what are we supposed to do when oleaut32 versions?  One answer is nothing - anyone could have written those registry keys, oleaut32 was just helping the caller out and it's their responsibility to make sure they're right.  One answer that is theoretically satisfying is to unregister everything with the old oleaut32 and then reregister it with the new one.  But that's also not necessarily good - maybe the bug in the old oleaut32 was that it didn't unregister correctly; maybe you need to new oleaut32 to correctly unregister.  And if the fix is just a missing status code check in some unrelated code, why did we take the time to cycle the type library registrations just to apply a trivial security fix?

Next topic: exceptions - safe to throw as long as nobody ever catches them.

Comments

  • Anonymous
    February 09, 2004
    You only make it hard for yourself by choosing the wrong tools and overcomplicating it.

    Thats the problem with software today.

    Mr.Fancy Pants developer wannabie syndrome.
  • Anonymous
    February 09, 2004
    No tools can make up for the inherent complexity in programming. There are basically only three reasonable answers to what i=i+1 can do in overflow. One is to make the behavior implementation defined. Second is to dynamically use a bignum package. Third is to fail presumably by raising an exception.

    The first makes this a bug. The second removes predictability of performance and introduces a failure mode (heap allocation failure) where the contract seems to be that a failure is not possible. The final raises an exception in which case what's the actual likelihood that someone wrote:

    m_j++; try { m_i++; } catch { m_j--; throw; }

    (I added the m_ here to highlight the fact that these aren't local variables. Given that all languages hide the scope resolution as a programming aid, the converse isn't true:

    j++; i++;

    isn't obvious wrong to either the author or the reader of this code.

  • Anonymous
    February 09, 2004
    The comment has been removed
  • Anonymous
    February 09, 2004
    The comment has been removed
  • Anonymous
    February 09, 2004
    The comment has been removed
  • Anonymous
    February 09, 2004
    As a non-developer, I thoroughly enjoyed your post! Please keep up the great work. People like myself (IT strategist) would love to know more about the challenges faced by developers (especially in the Windows team) in continuing to produce history-making software.
  • Anonymous
    February 09, 2004
    The comment has been removed
  • Anonymous
    February 10, 2004
    The comment has been removed
  • Anonymous
    February 11, 2004
    Its not a technical problem, its a social problem.
  • Anonymous
    August 06, 2004
    Hi Mike,

    don't listen to moo. he doesn't really know what he is talking about. I think that what you said is true and programming indeed is hard. The reason it is hard is because it requires from the person to perform with perfection and it also requires a great deal attention to detail (except for moo:)) and all these are rarely human traits. Frederick P. Brooks, Jr (http://www.cs.unc.edu/~brooks) has said some ot these basic truths in his classic book "the mythical man-month" more than 25 years ago. this book about software engineering is absolute classic and even though it was written about 25 years ago many of the truths which Mr. Brooks said there are absolutely valid today.
  • Anonymous
    August 06, 2004
    btw, here is what I would do to solve this particular issue which you pointed out:

    -:)

    ///////////////////////////////////////////////////////////////
    // Safe math ...
    //

    template
    <
    typename T
    >
    class CNumberTraits
    {
    public:
    enum
    {
    isUnsigned = false
    };
    };

    #define DECLARE_UNSIGNED(type)
    template
    <
    >
    class CNumberTraits<type>
    {
    public:
    enum
    {
    isUnsigned = true
    };
    };

    DECLARE_UNSIGNED(unsigned char);
    DECLARE_UNSIGNED(unsigned short);
    DECLARE_UNSIGNED(unsigned int);
    DECLARE_UNSIGNED(unsigned long);
    DECLARE_UNSIGNED(unsigned __int64);

    template
    <
    typename T
    >
    COREDEFS_INLINE
    HRESULT
    SafeAdd(
    IN T a,
    IN T b,
    OUT T *pr
    ) throw()
    {
    C_ASSERT(CNumberTraits<T>::isUnsigned);

    ASSERT(pr);

    T t = a + b;

    if (t >= a)
    {
    *pr = t;
    return S_OK;
    }

    return HRESULT_FROM_NT(STATUS_INTEGER_OVERFLOW);
    }

    template
    <
    typename T
    >
    COREDEFS_INLINE
    HRESULT
    SafeSub(
    IN T a,
    IN T b,
    OUT T *pr
    ) throw()
    {
    C_ASSERT(CNumberTraits<T>::isUnsigned);

    ASSERT(pr);

    T t = a - b;

    if (t <= a)
    {
    *pr = t;
    return S_OK;
    }

    return HRESULT_FROM_NT(STATUS_INTEGER_OVERFLOW);
    }

    template
    <
    typename T
    >
    COREDEFS_INLINE
    HRESULT
    SafeMul(
    IN T a,
    IN T b,
    OUT T *pr
    ) throw()
    {
    C_ASSERT(CNumberTraits<T>::isUnsigned);

    ASSERT(pr);

    if (0 == a || 0 == b)
    {
    *pr = 0;
    return S_OK;
    }

    T t = a * b;

    if ((t / b) != a)
    {
    return HRESULT_FROM_NT(STATUS_INTEGER_OVERFLOW);
    }

    *pr = t;

    return S_OK;
    }

    template
    <
    typename T
    >
    COREDEFS_INLINE
    HRESULT
    SafeDiv(
    IN T a,
    IN T b,
    OUT T *pr
    ) throw()
    {
    C_ASSERT(CNumberTraits<T>::isUnsigned);

    ASSERT(pr);

    if (0 == b)
    {
    return HRESULT_FROM_NT(STATUS_INTEGER_DIVIDE_BY_ZERO);
    }

    *pr = a / b;

    return S_OK;
    }
  • Anonymous
    June 17, 2009
    PingBack from http://patioumbrellasource.info/story.php?id=1383