Udostępnij za pośrednictwem


Reserved and Contextual Keywords

Many programming languages, C# included, treat certain sequences of letters as “special”.

Some sequences are so special that they cannot be used as identifiers. Let’s call those the “reserved keywords” and the remaining special sequences we’ll call the “contextual keywords”. They are “contextual” because the character sequence might one meaning in a context where the keyword is expected and another in a context where an identifier is expected.*

The C# specification defines the following reserved keywords:

abstract as base bool break byte case catch char checked class const continue decimal default delegate do double else enum event explicit extern false finally fixed float for foreach goto if implicit in int interface internal is lock long namespace new null object operator out override params private protected public readonly ref return sbyte sealed short sizeof stackalloc static string struct switch this throw true try typeof uint ulong unchecked unsafe ushort using virtual void volatile while

The implementation also reserves the magic keywords __arglist __makeref __reftype __refvalue which are for obscure scenarios that I might blog about in the future.

Those are the keywords that we reserved in C# 1.0; no new reserved keywords have been added since. It is tempting to do so, but we always resist. Were we to add a new reserved keyword then any program that used that keyword as an identifier would break upon recompilation. Yes, you can always use a keyword as an identifier if you really want: @typeof @goto = @for.@switch(@throw); is perfectly legal, though more than a little weird. But we prefer to avoid as many breaking changes as possible.

We also have a whole bunch of contextual keywords.

The “preprocessor” † uses all the directives (#define, and so on) which of course were never valid identifiers in the first place. But it also uses contextual keywords hidden default disable restore checksum.

C# 1.0 had contextual keywords get set value add remove for properties, indexers and events. The attribute locations event and return are already reserved keywords; assembly module type method field property param typevar are contextual keywords in the context of an attribute.

C# 2.0 added where partial global yield alias.

C# 3.0 added from join on equals into orderby ascending descending group by select let var.

C# 4.0 added dynamic.

The async CTP added async and await.

Every time we add one of these we need to carefully design the grammar so that if possible, the use of the new contextual keyword does not possibly change the meaning of an existing program which used it.

For example, when defining a partial class, the partial must go immediately before the class. Since there was never a legal C# 1.0 program where partial appeared immediately before class, we knew that adding this new feature to the grammar would not possibly break any existing programs.

Or, another example. Consider var x = 1; – that could have been a legal C# 2.0 program if there was a type called var with a user-defined implicit conversion from int. The semantic analyzer for declaration statements checks to see whether there is a type called var that is accessible at the declaration; if there is then the normal declaration rules are used. Only if there is not such a type can we do the analysis as an implicitly typed local declaration.

One might wonder why on earth we added five contextual keywords to C# 1.0, when there was no chance of breaking backwards compatibility. Why not just make get set value add remove into “real” keywords?

Because we could easily get away with making them contextual keywords, and it seemed likely that real people would want to name variables or methods things like get, set, value, add or remove. So we left them unreserved as a courtesy.

Those were easy to make contextual, unlike, say, return. That’s a lot harder to make a contextual keyword because then return (10); would be ambiguous; is that calling the method named “return” or returning ten? So we didn’t make any of the other reserved keywords into contextual keywords.

*******

(*) An unfortunate consequence of this definition is that using is said to be a reserved keyword even though its meaning depends on its context; whether the using begins a directive or a statement determines its meaning.

(†) An unfortunate name, since “preprocessing” is not done before regular language processing. In C#, the so-called “preprocessing” happens during lexical analysis.

Comments

  • Anonymous
    May 11, 2009
    Thanks for the great post Eric.  I always find this kind of "behind the scenes" stuff interesting.   It is interesting that you highlight the "using" keyword.  I was discussing disposable objects with my colleagues the other day, and mentioned the very cool using syntax you can use in C#.  They scratched their heads and said, "we thought 'using' was the equivalent of 'imports'", when I explained it did both depending on where you type it, they were very suprised :) This is a tricky point of language design; when one keyword is used to represent two completely different concepts, it can be confusing. But introducing a new keyword per concept makes the language feel a bit bloated. I personally would have chosen "imports" or some such syntax for the directive form to ensure that it is not confused with the statement form, but I understand that its a judgment call. We were designing a feature for C# 4.0 that got cut which was yet another form of "partial" class; basically, a way to share attribute metadata between the machine-generated and user-generated halves of a partial class. I pushed back on using the keyword "partial" for the feature because we would then have had THREE subtly different meanings for "partial" in C#, which I felt was two too many. (I was advocating adding another conditional keyword "existing". Unfortunately the point ended up moot since the feature was cut for lack of time. -- Eric By the way, I've just been reading your archives and I loved the "Riddle me this" Google posts :) Thanks! Those are among my favourites too. Since we have changed blog software it is now more difficult for me to extract and search the referrer logs, so I haven't written a fourth. -- Eric

  • Anonymous
    May 11, 2009
    The comment has been removed

  • Anonymous
    May 11, 2009
    The comment has been removed

  • Anonymous
    May 11, 2009
    There are .NET languages where it is dramatic: http://prismwiki.codegear.com/en/Language I've found a nice summary: http://jankajanos.spaces.live.com/blog/cns!C3E2695FC6F7B0A4!935.entry

  • Anonymous
    May 11, 2009
    Thank you for submitting this cool story - Trackback from DotNetShoutout

  • Anonymous
    May 11, 2009
    Of course, the fact that 'var' is still a legal name for a type, means that you can have some legitimately compiling C# 3.0 code which uses it, that then suddenly breaks when you import a namespace that contains a class with that name. Sure, but of course this was a problem before "var". It is always possible that when you add a new reference, you introduce a new ambiguity. -- Eric  So of course, one would expect a tool such as FxCop to dissuade you from creating classes called 'var'. Perhaps ironic, then, that an old version of FxCop's Microsoft.Cci.dll (which you have to reference in order to build FxCop rules) included a top-level (non-namespaced) class called 'var'... We considered adding a warning to the compiler if you use a type called "var" ambiguously, but never did implement it. -- Eric  

  • Anonymous
    May 12, 2009
    Interesting related story: Why "yield return" rather than "yield"?

  • Anonymous
    May 13, 2009
    The comment has been removed

  • Anonymous
    May 14, 2009
    @Matthew Jones You should be able to do it like this without having to revert totally to the non-"syntactic sugar" way: var results = (from product in products where product.productgroupid = deptid order by product.product_id descending).Take(productspaged).OrderBy(product.product_id).Take(pagesize).ToList();

  • Anonymous
    May 24, 2009
    Многие языки программирования, включая C#, трактуют определённые цепочки символов как «особые». Некоторые

  • Anonymous
    November 28, 2009
    I would like to add that some identifiers (assembly field method module param property @return type typevar) can also have special meaning when they are used as attribute targets. And 'value' is the name of implicit parameter in property, indexer and event accessors.