Jaa


System.Xml or System.XML?

Brad Abrams

recently asked me if I knew of any research that investigated whether or not mixed
case words (WebClient) are easier to recognize than all upper case words (WEBCLIENT).

 

I wasn t immediately aware of any research on this issue so I turned to the folks
on the PPIG discussion list (sign up at https://www.ppig.org)
and asked if they knew. This resulted in a very interesting discussion from which
the general consensus opinion seemed to be that mixed case is more legible than upper
case. However, nobody was able to point me towards recent studies that clearly demonstrated
this. I was however given some fairly useful references to follow up (for example,
see the bibliography section at the end of www.knosof.co.uk/cbook/sent782.pdf. Following
these up, I came across a paper published in the journal Brain and Cognition that
investigated the effect of case alteration and word length on word recognition (see https://iipdm.haifa.ac.il/case_alternation.pdf ).
Interestingly, they found that mixed case words take longer to recognize than all
upper case words. The effect is especially marked when the words are not recognizably
legal words. Additionally, the difference in recognition time increases when word
length increases.

 

This
may well have some interesting consequences for API design. Should abbreviations be
represented in code in upper case or mixed case? For example, should we use System.Tla
or System.TLA (where TLA is some random three letter abbreviation)? The research might
indicate that we should use TLA but consensus opinion seems to prefer Tla. What
do you think? How about when the abbreviation is well known, such as XML? Should it
be Xml instead?

Comments

  • Anonymous
    December 03, 2003
    No matter how bad you think the .Net naming guidelines (http://msdn.microsoft.com/library/en-us/cpgenref/html/cpconNamingGuidelines.asp) are, they should be following rigorously. Consistency is the name of the game. I just hate typing the name of a BCL class into VS.Net, using correct naming guidelines and finding my code doesn't compile. Have you ever tried using ClsCompliantAttribute lately?
  • Anonymous
    December 03, 2003
    I'm torn on this issue. On the one hand I think that Xml looks ugly, but on the other hand, XMLNode is even uglier.But ultimately it doesn't matter, like RichB said, consistency is much more important.
  • Anonymous
    December 04, 2003
    When I write about XML, I write "XML", not "Xml" or "xml". When I API about XML, I strongly prefer to do the same. Otherwise I need to stop and think "Oh yeah, I'm in BouncyCapsLand now; have to change the way I refer to XML".To my eye XmlNode is uglier than XMLNode, because I need to stop and figure out that they're really talking about XML.My preference is that all acronyms typically written all-uppercase should stay all-uppercase in APIs.
  • Anonymous
    December 04, 2003
    Consistency is paramount. OTOH I prefer (aesthetically) that common acronyms remain in all caps. On the gripping hand, acronyms that are common to one person may not be to another. So I'd have to conclude that, eye-pleasingness notwithstanding, it should be "System.Xml", "System.Io", and so forth.
  • Anonymous
    December 04, 2003
    Just add case insensitivty :-)I prefer my TLAs, ETLAs and MS-ETLA's to all be upper case.
  • Anonymous
    December 04, 2003
    I prefer just capitalizing the first letter of each word. It's much more readable that way. Though it did take some getting used to having done it the other way before.
  • Anonymous
    December 04, 2003
    I breezed over the case alternation pdf. They seem to be talking about something completely different, and therefore irrelevant to discussions on API naming. They seem to be discussing taking single words and mixing the case (Ex. HeLlO instead of hello, the confusion in mixed case is obvious here). What we want to discuss is concatenating multiple words. In this case, we need a visual cue to "split" the concatenation into individual words, thus camel or Pascal casing. The very existence of these two names for casing indicates to me that there is value in them above all caps.Now, when we start talking about well-known acronyms such as XML, everyone will have their own bias. In these cases, I subscribe to the "consistency" side of the argument. Make it consistent to I can guess the casing based on accepted rules.
  • Anonymous
    December 04, 2003
    The comment has been removed
  • Anonymous
    December 04, 2003
    If you're concerned about readability, then you need to talk to typographers, not programmers. Googling for "readability lowercase uppercase typography" will get you started, or you can run to the local bookstore and hit the graphic design section.
  • Anonymous
    December 04, 2003
    In general I do not have a strong opinion either way. However consistency is important, and there are cases when I think the mixed case solution is better. With XMLNode I find that hard to not read XMLN as the acronym. Where as with XmlNode I think it is much clearer that is is Xml and Node. Sure Xml looks weird compaired to XML - But, when in the middle of a string without spaces, I think that being able to quickly and subconsciencely seperate the words is the first and most important aspect of readability.
  • Anonymous
    December 04, 2003
    Typographers will probably say that mixed case is faster to read but they're also used to spaces between words. No spaces = harder to recognize individual word shapes. I vote for mixed case with full words when possible. For cases like XML, stick with the current .NET guidelines.
  • Anonymous
    December 04, 2003
    When reading code with XmlNode it is very likely to be scattered with other types beginning with Xml (XmlElement, XmlAttribute, XmlDocument). Taking the emphasis away from XML by casing it with Xml makes it easier to get the information you are interested in (is this a Element or a Attribute). So: The current standard is great!
  • Anonymous
    December 04, 2003
    The comment has been removed
  • Anonymous
    December 04, 2003
    I prefer ALL CAPS for my Three letter acronyms, but the worse is two letter abbreviations like ID or IO. Id and Io are so horrible looking I want to scream. So how about a compromise, two letter abbreviations are ALL CAPS and three letters and higher are Pascal?
  • Anonymous
    December 04, 2003
    Personally, I prefer camelCase and PascalCase for words and abbreviations and UPPERCASE for acronyms. But HTML is already Html in the framework.IMO, to get relevant results, you should repeat the experiment taking those new things into account:- Don't test UPPERCASE vs miXEdCAsE but UPPERCASE vs. camelCase and PascalCase.- Build the cases using technical terms concatenated without spaces.- Try with overheated CRT computer monitors and Courier New.- Then try with a crisp LCD, ClearType and a nice proportional font.- Take a population of healthy Microsoft dittoheads (like myself).- Give them an overdose of coffee.
  • Anonymous
    December 05, 2003
    That study is not relevant, as another poster pointed out. Of course randomly mixing case, not on word breaks, is horrible, but that is not an issue here. That study is misleading. In fact, you could argue that it reinforces the idea of using case to separate words: the reason random case changes are so hard to read, is because case IS already used as a word and sentence separator in our brains. So when the case changes don't align with word breaks, confusion results.In all my experiences with users (produced public libraries since 1986), using a capital letter or underlines (now prohibited) for word breaks consistently makes it easier for them to understand. I don't think you need a study to visually see why. The eyeballs tell it all. The problems with using all caps for only certain acronyms: the last letter of the acronym mixes in visually with the first letter of the next word. Consequently people sometimes think the last letter, or the whole acronym, is part of the next word. I have seen confusion sometimes. The other problem is consistency. People hate having to look up casing when they don't have intellisense. Not everyone uses VS, not all the time. They want to type without thinking.
  • Anonymous
    December 05, 2003
    I prefer all caps for any abbreviation 2 letters or less, and mixed case for longer abbreviations.
  • Anonymous
    December 05, 2003
    I also prefer mixed-case, for many of the reasons listed above.I think it is most awkward to use mixed-case when the acronym is the only word in the name, or when the acronym is the last word in the name, e.g., System.XML looks better than System.Xml, GetElementFromID better than GetElementFromId, InnerHTML better than InnerHtml. But consistency is better -- imagine having two methods called ConvertXmlToHTML and ConvertHtmlToXML...
  • Anonymous
    December 05, 2003
    Thanks for all the great comments. I agree that the study I originally pointed to is not really relevant - I'll try to be more careful in future. As many of you have pointed out though, regardless of whatever the studies tell us, we've set a precedent already with the names for some of the classes we've chosen and being consistent with those naming guidelines is much more important.
  • Anonymous
    December 05, 2003
    Great comments. Love the issue. As for ID vs Id for our property on Element...there is still some thinking to do.I list the current state here: http://longhornblogs.com/rrelyea/posts/1632.aspx
  • Anonymous
    December 06, 2003
    I have to agree that consistency must be the primary concern here. I would just like to add that comments like "I prefer this/that" is pretty irrelevant, because one of the purposes of having a guideline is that individual (and varying) preferences shouldn't influence the end result too much. The guidelines decided on a strict casing rule years ago, so there really isn't any use in debating them now. They just have to be followed.
  • Anonymous
    December 07, 2003
    It seems to me that you have to let go of trying to force fit the format of an identifier in a programming language with a technology name or acronym that appears in print. Names like XML, HTML, etc are easy to read in print because of the spaces that separate these acronyms from surrounding words. Since you don't have this luxury in most programming languages, I find PascalCasing to be the easiest to read in source code. The most important thing IMO is to pick your strategy and stick with it and not allow any exceptions. Every exception to this rule, is another thing we have to remember in addition to the original rule.
  • Anonymous
    December 10, 2003
    I find it extremely non-intuitive to write TLAs like XML and HTML as Xml and Html. You wouldn't do it in any other context, so why in an API?
  • Anonymous
    December 10, 2003
    The comment has been removed
  • Anonymous
    February 02, 2004
    marklia got it right and concisely with: "In this case, we need a visual cue to "split" the concatenation into individual words, thus camel or Pascal casing." That is crucial. That applies to words that are concatenated.
    XMLSOAPSOA is harder to read than XmlSoapSoa ... and every day new TLAs are invented. We will be using more and more TLAs as time goes and will want to catenate them.

    Separate words that are acronyms, like the XML in System.XML should stay that way.

    (By the way, the C# GUI in VS.NET should help programmers get the right case, similar to the way VB does. It already knows the case of the defined symbols. It should use that info to help the programmer, but also allow them to ignore it if they like.)