Jaa


Why no var on fields?

In my recent request for things that make you go hmmm, a reader notes that you cannot use "var" on fields. Boy, would I ever like that. I write this code all the time:

private static readonly Dictionary<TokenKind, string> niceNames =
new Dictionary<TokenKind, string>()
{
{TokenKind.Integer, "int"}, ...

Yuck. It would be much nicer to be able to write

private static readonly var niceNames =
new Dictionary<TokenKind, string>()...

You'd think this would be straightforward; we could just take the code that we use to determine the type of a local variable declaration and use it on a field. Unfortunately, it is not nearly that easy. Doing so would actually require a deep re-architecture of the compiler.

Let me give you a quick oversimplification of how the C# compiler works. First we run through every source file and do a "top level only" parse. That is, we identify every namespace, class, struct, enum, interface, and delegate type declaration at all levels of nesting. We parse all field declarations, method declarations, and so on. In fact, we parse everything except method bodies; those, we skip and come back to them later.

Once we've done that first pass we have enough information to do a full static analysis to determine the type of everything that is not in a method body. We make sure that inheritance hierarchies are acyclic and whatnot. Only once everything is known to be in a consistent, valid state do we then attempt to parse and analyze method bodies. We can then do so with confidence because we know that the type of everything the method might access is well known.

There's a subtlety there. The field declarations have two parts: the type declaration and the initializer. The type declaration that associates a type with the name of the field is analyzed during the initial top-level analysis so that we know the type of every field before method bodies are analyzed. But the initialization is actually treated as part of the constructor; we pretend that the initializations are lines that come before the first line of the appropriate constructor.

So immediately we have one problem; if we have "var" fields then the type of the field cannot be determined until the expression is analyzed, and that happens after we already need to know the type of the field.

But it gets worse. What if the field initializer in a "var" field refers to another (static) "var" field? What if there are long chains, or even cycles in those references? There can be arbitrary expressions in those initializers, expressions which contain lambdas which contain expressions which require method type inference or overload resolution. All of these algorithms that are in the compiler were written with the assumption that when they run, the types of every top-level program entity is already known. All of those algorithms would have to be rewritten and tested in a world where top-level type information is being determined from them rather than being consumed by them.

It gets worse still. If you have "var" fields then the initializer could be of anonymous type. Suppose the field is public. There is not yet any standard in the CLR or the CLS about what the right way to expose a field of anonymous type is. We don't have good policies for documenting them, versioning them, or interoperating with them across languages. Doing this feature would potentially cause huge costs across the division.

Inferred locals have none of these problems; inferred locals never have cycles or refer to things that haven't been analyzed yet. Inferred locals never escape into public visibility.

So apparently this simple-seeming feature has the potential to cause really, really bad implementation issues in multiple ways, and all in order to avoid a small redundancy. This seems like it is possibly not worth the cost. If our goal is to remove the redundancy, I would therefore prefer to remove it the other way. Make this legal:

private static readonly Dictionary<TokenKind, string> niceNames =
new()...

That is, state the type unambiguously in the declaration and then have the "new" operator be smart about figuring out what type it is constructing based on what type it is being assigned to. This would be much the same as how the lambda operator is smart about figuring out what its body means based on what it is being assigned to.

Thoughts?

Comments

  • Anonymous
    January 26, 2009
    I would say the ‘new’ operator is obviously a feature of the kind ‘nice to have but not really important’. In contrast to ‘var’ or the type inference for lambdas which can greatly improve the readability of a method, the ‘new’ operator would save you at most one second of parsing – I mean, a declaration of a field is generally so obvious that you don’t have to read it twice or spend several minutes to understand it. Therefore the gaigned profit is probably not worth the effort to implement/test/... it. But there is something in your post which confused me a little bit: “What if there are long chains, or even cycles in those references?” First I wanted to respond: Hey it’s not possible to have cycles in a definition of a field because you can’t refer to other fields inside the definition, but I just found out that is not true for static fields. An example: public static class Foo1 { public static List<int> Bar = new List<int>() { Foo2.Bar.Count, }; } public static class Foo2 { public static List<int> Bar = new List<int>() { Foo1.Bar.Count, }; } The code compiles and throws a null-reference as expected because either Foo1.Bar is not constructed when Foo2.Bar is accessed or vice versa. Which brings me to my question: Why is it possible to reference to other static fields inside the definition of a static field? Since no order of compilation is guaranteed at all, I can hardly think of any possible use of this.

  • Anonymous
    January 26, 2009
    Order of compilation is irrelevant. What is relevant is the order in which the static field initializers run, and that is well-defined. See section 10.12 of the specification for details. (It is arguably a bad programming practice to rely upon these details, but it is legal.)

  • Anonymous
    January 26, 2009
    Why not limit "var" fields to some well-defined number of constructs, like constants and object creation expressions. This would probably cover 80% of cases, allow for future expansion of the feature and doesn't look too binding for future. As for performance of compiler, parsing should already be done at this point and you can easily detect if var is valid from AST. As for resolving the type for top-level structure: for constants you know it, and for object creation expression it is the same as resolving type specification to the left of the field's name.

  • Anonymous
    January 26, 2009
    The comment has been removed

  • Anonymous
    January 26, 2009
    The comment has been removed

  • Anonymous
    January 26, 2009
    Not a big fan of the new() idea - as MichaelGG said, it doesn't "feel" right. C# started off as an extremely clean language, but since C# 3.0 it's feels as though a large number of kludges were added solely for LINQ. I recently did a demonstration of C# 3.0 for our development team and most of them said "Ughhh" to the language extensions before I showed them LINQ. The real draw of C# was that it was straightforward and clean. The C# 3.0 extensions feel forced and as though the language is heading down the wrong path - loading on unnecessary solutions for fringe cases. Overall the language will suffer. Languages don't need to evolve with every product release, it really feels like at this point the C# language team is trying to justify it's existance and not really improving the language (no offense!). C# 2.0 was as close to a "perfect" strongly typed language as you could get, and 3.0 really destroyed that. Lets not go further down that path - I'd rather see C# stay the way it is (now a somewhat mature language) and focus put into the compiler, BCL, and CLR. Sorry!

  • Anonymous
    January 26, 2009
    @Eric, The only reason C# 3.0 feels "dirty" is precisely because all these things were added only for specific cases, namely LINQ's cases. Nothing feels like it was designed for the language as a whole. OTOH, it tries to be a C-ish syntax language, and by the time you finish cleaning it up and simplifying the syntax... not sure you'd end up with anything C-like. C# 3 was a major step forward, but it was only a start, and I was so hoping that C# 4 would follow through with the apparent path set out. But as Eric Lippert said before, too many users thought this was too hard, too much, too complex. With the recent announcement of C# and VB going to be "equal, just different looking", it's clear what the future path for MS .NET languages is.

  • Anonymous
    January 26, 2009
    I don't like the idea of the "{type} {name} = new(...)" syntax much, mainly because of the inconsistency with the way that var works. It would feel very weird to be able to specify the type only on the RHS within method body as we do now, and then be able to specify it only on the LHS when it's a field. And then what happens in the future if you do re-architect the compiler so that declaring fields using var would be possible, and if the CLR/CLS does come up with a specification for anonymous types to be exposed and shared between languages? Then you're left with an inconsistent syntactic wart which arose from technical issues rather than being designed into the language as the best way to do things. Sure, it's redundant type information, but (a) there's a fair bit of that in C# anyway, and (b) it's not really that painful, especially if you have something like ReSharper which will fill in the RHS for you anyway by simply hitting TAB. I'd say either do it the way that would be ideal (var) or don't do it at all (until technically possible).

  • Anonymous
    January 26, 2009
    Visual basic has a special syntax to avoid repetition of the type in the most common case: dim x as Object = new Object() becomes: dim x as new Object() Of course that syntax only works in VB because it fit naturally with how things were already declared. C# goes 'type name', which would naively mean something like "new List<Widget>(128) widgets". I would suggest the following syntax: var x: new Object()

  • Anonymous
    January 28, 2009
    I don't think that an alternate new syntax should be introduced.  Firstly, new syntax is extra mental weight, so it better be worth it.  Secondly, I don't think C# should encourage the use of constructors at all - constructors already have weird semantics as is.  Many constructors have various implicit initializations - i.e. call one overload and you get a class loaded from DB, another and you get an "unintialized" object, another and you get an inline-initialized object, etc.  These initializations are bad since they're unnamed; that is; a reader of code (and by extension the intellisense-using writer) cannot easily determine which overload to use. Constructors are one of the few methods where people find it acceptable for the "same" method to have vastly different semantics amongst overloads.  That's not a good habit. Adding such a syntax would seduce programmers into adding "handy" constructors and make a bad situation worse; even more functionality would be put into constructors. Constructors are already one of the most amorphous aspects of the language; they come across as a bag of various features only loosely coupled to a vague intent.  Why are only constructors the only static methods able to be required for a generic type paramenter?  Why are only constructors able to guarantee non-null return value?  Why are constructors not able to return null?  Why are only constructors unable to return a subclass of their normal return value (leading to overcomplicated factory methods)?  Why are only constructors able to require that a subclass call them?  Why is the collection and object initializer syntax only available for constructors? I'd much prefer the language evolve toward dissociating these many features and making them generally useful than to convolute the constructor even further. So, in the name of avoiding unnecessary syntax baggage, and in the name of making the language "general", I'd vote for not implementing such syntax.

  • Anonymous
    January 29, 2009
    Array initializers state the type only once: int[] values = { 0, 1, 2 }; Perhaps the syntax could be extended to collection initializers: List<int> values = { 0, 1, 2 }; Dictionary<TokenKind, string> niceNames = { }; And object initializers: Point point = { X = 0, Y = 1 };

  • Anonymous
    January 29, 2009
    The comment has been removed

  • Anonymous
    January 29, 2009
    PPS: I posted using Firefox 3 this time (previous attempts using IE7 did not work). Don't know if that made a difference but the posting appeared. Well, maybe that's just Microsoft trying to make the EU happy ;-)

  • Anonymous
    January 29, 2009
    I agree with Ilya, except that I would expand the proposal futher: you can use var fields, but your initialization expression cannot access other var fields. This would probably cover 95% of the useful cases. But I do appreciate that a change like this messes up the architecture of the compiler. Igor Ostrovsky

  • Anonymous
    January 29, 2009
    First off, I also had trouble posting with IE7 - this post was created with Firefox. As for the idea to introduce a new syntax for new, I think that it is a bad idea for several reasons. First, unlike var, it could only be used to initialize members that are concrete types - since there would be no way to infer the type to construct when the left hand side is a interface or abstract class. The compiler can certainly warn you .. but it's an awkward inconsistency that doesn't buy you much. Second, and more important, this syntax may actually allow the semantics of a program to change subtly without the developer being aware. Take the following example: class Animal { override string ToString() { return "Animal"; } } class Dog { override string ToString() { return "Dog"; } } class Vet { public readonly Dog ThePatient = new(); } No some brilliant developer comes along, and without thinking too deeply about it says: hey, we should expose ThePatient as a reference to the base type Animal. Well, as a result, the compiler infers that the type to create should now be an instance of Animal, rather than Dog. The developer may not have intended this ... they just didn't realize that this inference is taking place. (Yes, developers should pay attention to what they're doing and understand the language, but it's an easy thing to overlook). The compiler won't complain ... it will happily change the runtime type instantiated - potentially leading to subtle and difficult to track down bugs. The var keyword doesn't have this issue because the compiler isn't deciding what type to instantiate - just what type of reference to assign to. In other words, var never results in a different method than you expect getting invoked. I think that allowing the compiler to make inferences about what runtime types to instantiate is a bad idea - this is a case where C# should favor correctness rather than convenience. IMHO.

  • Anonymous
    January 29, 2009
    I guess I'll go out on a limb here and say that I like the idea of having a clean syntax that doesn't force me to repeat things. List<KeyValuePair<string, LinkedList<TreeNode>>> list = new(...) is pretty clean i think.  I don't get any bad feelings at all from it and the syntax seems entirely reasonable.  This also comes down to an issue of maintenance.  As code is being prototyped the internal structures that I am using get shuffled around and changed a lot and a syntax like this keeps me from having to constantly revisit places where the full type name would normally be required.  I must say that I would prefer this syntax over the var syntax currently being used.  I like the type to be textually tied to the identifier it is associated with and having the type declaration on the LHS makes this more clear in my mind.  It would also be nice to support something like this: List<KeyValuePair<string, LinkedList<TreeNode>>> list; list = new( ... ); So that the declaration and the new don't need to appear as part of the same statement. All in all it seems like a nice mechanism to default to the declared type when creating an object which is probably only going to become a bigger issue going forward with generics becoming pervasive.  I tend not to worry about the corner cases tho but it seems like it would be useful in a number of not so corner cases.

  • Anonymous
    January 29, 2009
    If this was to be implemented in the very conservative manner, I like Rising's syntax best: Dictionary<TokenKind, string> niceNames = { };

  • Anonymous
    January 30, 2009
    Isn't Anders considering a fairly deep re-architecture of the C# compiler in modularizing it?  See the PDC 2008 talk on "The Future of C#" [1]; I took a few notes at [2].  I should think a major objective of you guys at MS should be to reduce syntactic cruft in situation where not much more complexity is added.  See the Sapir-Whorf hypothesis [3].  Less cruft == ability to focus on what matters.  Combine that with the limited amount of information the brain can actually process in short-term memory and reduction of syntactic cruft, IMHO, becomes extremely important.  Yes, -100 points -- give us some transparency into some specifics and we can discuss. :-p [1] http://channel9.msdn.com/pdc2008/TL16/ [2] http://luke.breuer.com/time/item/C_40/465.aspx [3] http://en.wikipedia.org/wiki/Sapir-Whorf_hypothesis

  • Anonymous
    January 30, 2009
    Rearchitecturing the  ompiler is probably -10000 and not -100 ;)

  • Anonymous
    January 31, 2009
    If you want to create an object of the same type as the member used to store it without having to repeat the type name, you can use "Stockton new" as an alternative to var (which may be useful for fields where var cannot be used) as discussed. The downside is that you have to repeat the member name in the initialization. Here's how it looks: class Program {        public static T New<T>(out T item) where T : new()        {            item = new T();            return item;        }        static Dictionary<Int32, Int32> _member = New(out _member);        static void Main(string[] args)        {            Dictionary<Int32, Int32> local = New(out local);        } } In addition, we can extend this method to create concrete classes for corresponding interfaces with a couple of simple overloads: public static IDictionary<TKey, TValue> New<TKey, TValue>(out IDictionary<TKey, TValue> item) {      item = new Dictionary<TKey, TValue>();      return item; } public static IList<T> New<T>(out IList<T> item) {      item = new List<T>();      return item; } Now you can write this: IDictionary<Int32, Int32> local = New(out local); Not perfect but no compiler changes required. BTW I also had problems posting with IE8. This post was done with Safari.

  • Anonymous
    January 31, 2009
    Eric, I would much rather have consistent syntax var x for fields as it is for variables.  If that means that I have to wait until the CLR / compiler matures further then so be it.  But I, personal feeling, would rather have consistency if the feature is introduced. Thanks for reading --Avi

  • Anonymous
    January 31, 2009
    Eric, I would much rather have consistent syntax var x for fields as it is for variables.  If that means that I have to wait until the CLR / compiler matures further then so be it.  But I, personal feeling, would rather have consistency if the feature is introduced. Thanks for reading --Avi

  • Anonymous
    February 02, 2009
    had this big comment written, but it seems to have gotten lost, so I'll go with the short version - var fields => bad, because they sacrifice too much in terms of documentation.  Local var is great, because it's obvious from the scope what the type is, but it will be as hard for a human to work out what the type of a field is meant to be as you are saying it will be for the compiler  (yes, miracle of miracles, I'm not supporting a new feature) MyType x = new ( 1,2, "happy"); => good, or at least better than what we have.  I agree with the poster above, constructors are a bad thing and there are better ways, however given that people use them, this simplifies the code a bit, and I don't see it as any different syntax from dropping the type in array initialization, which has already been done. more and more I am ditching constructors (well, privatising them and only exposing a static factory method) - as the posts above mention, there are a whole bunch of benefits and no real downsides, however it doesn't help very much, because you still have to duplicate the type name.. eg MyType x = MyType.New( 1, 2, "happy"); however, this yet again can be solved if you allow type inference to work backwards - for instance if I typed: MyType x = Create();   where Create is defined as      T Create<T>()  where T : new() {...} Then it happily proceeds and assumes T is MyType.  If we get that, then it's the perfect solution, because it solves way more than just this problem, and allows us to define anything we want. Thanks, Darren

  • Anonymous
    February 02, 2009
    The comment has been removed

  • Anonymous
    February 17, 2009
    This has been sitting in Connect for a while: https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=388649&wa=wsignin1.0

  • Anonymous
    February 25, 2009
    Gee, that new() syntax proposal looks so much alike this one of Java's: http://bugs.sun.com/view_bug.do?bug_id=4879776 I'm glad C# 3 took the 'var' way instead of the way it's proposed in the link above.Going from left to right fits the way people read and write, but somehow looks weird...

  • Anonymous
    March 13, 2009
    I'd write this: var x = new(); Just to see how creative you got with the error message :) Something like, "You wanna do WHAT here?"

  • Anonymous
    March 21, 2009
    F# people somehow managed to handle that. Please go and ask them how to infer type for 'var', how to resolve long chains and cycles. No magic here IMO.

  • Anonymous
    August 13, 2009
    How about: private static readonly new Dictionary<TokenKind, string> niceNames() =  ... Oh, wait... that smacks of VB.Net.

  • Anonymous
    August 13, 2009
    Rising, What about the calling of non-default contructors?

  • Anonymous
    October 06, 2009
    Shocked by the negative response to the "new()" sugar. It makes perfect sense. It is comparatively cheap that the C# team may actually get to implementing it, despite the comparatively small benefit. It is possibly the only realistic solution. It doesn't help to say that "language X can do this". What does that have to do with how much effort this would be to implement for C#? Still hoping that something like "new()" could be added for exactly the scenario mentioned in this blog post - which I somehow keep running into all the time myself.

  • Anonymous
    October 06, 2009
    I ALMOST like the new() sugar. My concern is the potential for confusiion with existing meanings of new. Using a different keyword would address this cleanly. Ony thing would consider important is that the syntax for the RHS should be swappable between a field initializer and other contexts. I frequently will cut/paste [e.g. from/to a constructor body] between locations. If the syntax is not compatible, it will typically NOT be a viable shortcut (for my typical development style).

  • Anonymous
    January 10, 2010
    I don't see how it's that difficult to enable.  During the class-level parsing routine, simply leave all the var fields as type "var".  After that insert a step that recursively figures out what the "var" members are, and throw an error if anything is anonymous, a delegate, or circular.  Once that's done you can proceed to compile the methods and everything else just as before.  Is that really a huge refactor?

  • Anonymous
    April 01, 2012
    The comment has been removed

  • Anonymous
    June 15, 2012
    Eric I love your posts.....:) but really.... you should change the purple color.....:)