次の方法で共有


What's Up With Hungarian Notation?

I mentioned Hungarian Notation in my last post -- a topic of ongoing religious controversy amongst COM developers. Some people swear by it, some people swear about it.

The anti-Hungarian argument usually goes something like this:

"What is the point of having these ugly, hard-to-read prefixes in my code which tell me the type? I already know the type because of the declaration! If I need to change the type from, say, unsigned to signed integer, I need to go and change every place I use the variable in my code. The benefit of being able to glance at the name and know the declaring type is not worth the maintenance headache."

For a long time I was mystified by this argument, because that's not how I use Hungarian at all. Eventually I discovered that there are two completely contradictory philosophical approaches to Hungarian Notation. Unfortunately, each can be considered "definitive", and the bad one is in widespread use.

The one I'll call "the sensible philosophy" is the one actually espoused by Charles Simonyi in his original article. Here's a quote from Simonyi's paper:

The basic idea is to name all quantities by their types. [...] the concept of "type" in this context is determined by the set of operations that can be applied to a quantity. The test for type equivalence is simple: could the same set of operations be meaningfully applied to the quantities in questions? If so, the types are thought to be the same. If there are operations that apply to a quantity in exclusion of others, the type of the quantity is different. [...] Note that the above definition of type [...] is a superset of the more common definition, which takes only the quantity's representation into account. Naturally, if the representations of x and y are different, there will exist some operations that could be applied to x but not y, or the reverse.

(Emphasis added.)

What Simonyi is saying here is that the point of Hungarian Notation is to extend the concept of "type" to encompass semantic information in addition to storage representation information.

There is another philosophy which I call "the pointless philosophy". That's the one espoused by Charles Petzold in "Programming Windows". On page 51 of the fifth edition he says

Very simply, the variable name begins with a lowercase letter or letters that denote the data type of the variable. For example [...] the i prefix in iCmdShow stands for "integer".

And that's all! According to Petzold, Hungarian is for connoting the storage type of the variable.

All of the arguments raised by the anti-Hungarians (with the exception of "its ugly") are arguments against the pointless philosophy! And I agree with them: that is in fact a pointless interpretation of Hungarian notation which is more trouble than it is worth.

But Simonyi's original insight is extremely powerful! When I see a piece of code that says

iFoo = iBar + iBlah;

I know that there are a bunch of integers involved, but I don't know the semantics of any of these. But if I see

cbFoo = cchBar + cbBlah;

then I know that there is a serious bug here! Someone is adding a count of bytes to a count of characters, which will break on any Unicode or DBCS platform. Hungarian is a concise notation for semantics like "count", "index", "upper bound", and other common programming concepts.

In fact, back in 1996 I changed every variable name in the VBScript string library to have its proper Hungarian prefix. I found a considerable number of DBCS and Unicode bugs just by doing that, bugs which would have taken our testers weeks to find by trial and error.

By using the semantic approach rather than the storage approach we eliminate the anti-Hungarian arguments:

I already know the type because of the declaration!

No, the Hungarian prefix tells you the semantic usage, not the storage type. A cBar is a count of Bars whether the storage is a ushort or a long.

If I need to change the type from, say, unsigned to signed integer, I need to go and change every place I use the variable in my code.

Annotate the semantics, not the storage. If you change the semantics of a variable then you need to also change every place it is used!

The benefit of being able to glance at the name and know the declaring type is not worth the maintenance headache.

But the benefit of knowing that you will never accidentally assign indexes to counts, or add apples to oranges, is worth it in many situations.

UPDATE: Joel Spolsky has written a similar article: Making Wrong Code Look Wrong. Check it out!

Comments

  • Anonymous
    September 15, 2003
    The comment has been removed

  • Anonymous
    September 15, 2003
    Though you make a good point, I'd counter that by saying that (a) many developers now use some pretty sophisticated editors that can find the declaration very quickly, and (b) I try to keep my routines under three screens long. If its a local variable, the type is nearby. If it's a member variable or a global variable then sure, sometimes it is a pain to find the storage type, but really, how often do you care whether that counter is a UINT or a DWORD?

  • Anonymous
    September 15, 2003
    I've always thought that it would be handy to make that sort of thing a compile-time error. Describe the set of prefixes you plan to use and how they can be associated with each other, then make the compiler scream at you if you do something Wrong.

  • Anonymous
    September 16, 2003
    There's always the Ada way of doing things: creating new integer types, and new subtypes. However, I suspect that most of us don't have the patience for that. As far as I'm aware, there aren't any mainstream languages apart from Ada that allow us to declare the exact valid range of an integer variable, or differentiate apples from oranges within the type system (i.e. with support from the compiler).There's less need for Hungarian prefixes in a strong type system like Ada's, or the user-defined type systems of C++ - because the compiler can tell you you're making mistakes.However, if you have conversion operators and alternate constructors in C++, there's a chance of introducing type errors.

  • Anonymous
    September 16, 2003
    The comment has been removed

  • Anonymous
    September 21, 2003
    As no-one has commented on this yet, I will note that it there is no argument against using it in a language like JScript where there are variants involved. It makes the code 10 times easier to read and look pretty cool to boot :)

  • Anonymous
    June 17, 2004
    << ...there is no argument against using it in a language like JScript where there are variants involved. It makes the code 10 times easier to read and look pretty cool to boot... >>

    Ugh. JScript is the WORST kind of language in which to use Hungarian notation, as there is no way to enforce typing.

    And I disagree with claims that Hungarian Notation makes code easier to read.

  • Anonymous
    June 20, 2004
    I lovr hungarian for all the right reasons. I agree wiht the blooger's comments, but you've got to start somewhere...

    http://CodeInsight.com/Docs/Programming Standards.doc

  • Anonymous
    June 22, 2004
    If you do use hungarian it also seems like you need to setup some standards so everyone uses the SAME prefixes (however you plan on using then). If you have multiple people using different prefixes for the same thing then you still end up having to go look at the definition because you don't know what stupid thing is. Also we'd always run into problems with people manufacturing prefixes for classes they create so you end up with wpbmObject or something indecipherable like that.

    I used to use it but we hit so many problems due to people making up their own rules that we don't use it anymore. At first I was pretty skeptical about not using it but as long as you take care and name things appropriately it's not really missed. Of course if people named things appropriately in the first place then using it would probably not have caused as many problems as it did.

  • Anonymous
    December 27, 2006
    Hungarian doesn't have any place in production code.  It goes beyond "messy" or "dirty".  Most variable names imply their data type.  ID is int, Name is string, so on.  For complex data types, you get into trouble when people start using their own abbreviations.  Class People does it abbriviate to pplPerson? or plePerson, what if there's a pointer ppple?  Also most IDEs nowadays(especially VS) provide abundant hover over which tells you data type, among other things. Redundancy is bad.

  • Anonymous
    December 29, 2006
    The comment has been removed

  • Anonymous
    January 05, 2007
    The comment has been removed

  • Anonymous
    January 13, 2007
    The comment has been removed

  • Anonymous
    April 23, 2007
    I was bored this weekend so I ended up trawling through a bunch of blog archives and came across posts

  • Anonymous
    September 25, 2008
    To celebrate the 5th aniversary of this post... If a variable stores the count of something, put 'count' in its name. That way, you get a more readable version of the semantic indicator and no one can accuse you of being Hungarian. Modern IDEs solve the problem of typing long names (auto completion) and also of looking up the representation (Find declaration...)

  • Anonymous
    January 30, 2009
    PingBack from http://relationary.wordpress.com/2009/01/31/databases-50-years-of-stupidity/

  • Anonymous
    April 15, 2011
    Wow...what I learned Hungarian Notation is...years ago in college...was based on a definition similar to Petzold's...which is wrong.  No wonder I never understood it's purpose.

  • Anonymous
    April 15, 2011
    True Hungarian Notation is only useless to those of us who develop in modern tools where variable names are allowed to be long--like 30 or 40 characters.  If you were coding with tools where your variable names needed to be...let's say...16 characters or less...I can see the usefulness.