What Are The Semantics Of Multiple Implicitly Typed Declarations? Part Two

發行項
06/27/2006

Many thanks for all your input in my informal poll yesterday. The results were similar to other "straw polls" we've done over the last couple of months. In this particular poll the results were:

var a=A, b=B; where the expressions are of different types should:

have the same semantics as var a=A; var b=B;: 12
replace the var with some type for both: 3
give an error: 6

There were 18 comments; a few people voted twice, which is fine with me.

The way the feature is specified is that the var is to be replaced with the best type compatible with all the expressions, to maintain the invariant that parallel declarations like this always give the same type to each variable. Many people that we've polled believe that this is the "intuitively obvious" choice, including much of the language design team. A larger group of language users believes that "infer each variable type separately" is the "intuitively obvious" choice.

So what to do? We have a relatively unimportant edge-case feature where customers strongly disagree as to what the code "obviously" means, and the difference can lead to subtle bugs. That's clearly badness. Given this feedback, amply confirmed by you all, we are probably going to simply remove multiple implicitly typed declarations from the C# 3.0 language.

Thanks for your feedback!

Comments

Anonymous
June 27, 2006
Does this mean that you're going with the "give an error" option or are you not allowing "var a=1,b=1"?
Anonymous
June 27, 2006
The plan right now is to disallow the whole thing. That is, if var is being used as a contextual keyword, then you get one declaration per var, not a list of declarations per var.
Anonymous
June 27, 2006
so even though
int a=1, b;
is semantically equivalent to
int a=1; int b;

the exact same form with var instead of int will be illegal? Is there any other case where replacing a fixed type name with "var" would cause a declaration error?
Anonymous
June 27, 2006
Since "var" is only legal in a local variable declaration with an assignment, the answer to your question is "yes -- all other cases are such cases".

Also, any local variable context in which the type of the expression cannot be determined will also fail. For example, Func<int, int> f = c=>c+1; succeeds, var f = c=>c+1; fails because we have no idea what the desired type of the lambda is.
Anonymous
June 28, 2006
The comment has been removed
Anonymous
June 28, 2006
Why even alow the var type at all then? Granted, I do not work with C# or any other C-derived, strongly-typed language, but it seems to me that the only purpose of it is to allow the declaration of variables without the programmer actually deciding what type of information they will hold...which is increadibly lazy and probably dangerous to some degree.
Anonymous
June 28, 2006
Two reasons. The not particularly good reason is that

Dictionary<string, List<int>> mydict = new Dictionary<string, List<int>>();

is somewhat redundant and gross looking.

The really good reason is "because C# 3.0 will have anonymous types". Obviously if a type cannot be named then there is no way to declare a variable of that type without some kind of type inference.
Anonymous
June 29, 2006
So, then, what is the advantage of anonymous types? I'm not trying to be a pain, by the way, I'm just curious. As I said, I don't work with C-derived languages. I only do scripting, where the declaration of variables is almost always optional anyway; so I don't really understand the advantages/disadvantages of being required to define a type for a variable.
Anonymous
June 29, 2006
I will leave the enumeration of the advantages of static typing vs dynamic typing for another day.

There are two main advantages of anonymous types.

First, anonymous is generally goodness. We already have "anonymous variables" in C#. That is, you can write:

a = b + c * d;

See the anonymous variable in there? Of course you don't. We are so used to anonymous variables that we don't even see them anymore. The C# compiler of course is actually generating the equivalent of

temp = c * d;
a = b + temp;

C# 2.0, Jscript, etc, have anonymous methods, which is also handy.

Anonymous types are just one more step in this direction. You ought to be able to say "I want a name, age, phone number triplet" and have that be a statically typed entity without having to give that thing a name.

Second, having anonymous types makes query comprehensions much easier to write:

var results = from c in customers where c.City == "London" select new {c.Name, c.Age};

Now suppose that we didn't have anonymous types or type inferencing. You'd have to write:

internal class NameAndAge { private string name; private int age; internal string Name { get ... blah blah blah, and then

IEnumerable<NameAndAge> results = from c in customers where c.City == "London" select new NameAndAge(c.Name, c.Age);

Now you decide that you want phone number in there as well and you have to define ANOTHER new type! What a pain! And then you have to update the type of results too.

The point of all of these new features is to make query comprehensions work painlessly without giving up static typing.
Anonymous
June 29, 2006
The comment has been removed
Anonymous
June 29, 2006
I would also prefer anonymous methods to be first-class. However, doing it right requires changing the CLR type system, not just the C# type system. (The versioning issues are considerable.)

Given that we're not going to change the CLR type system, I am hoping that there are things we can do to make this work. Suppose, for example, that refactoring to extract a method upon code that references an anonymous type caused a new nominal type to be emitted into your source code. Would that make you feel better?

I'm not saying that we're going to do that, of course. But it's definitely an idea that's been kicking around here for a while. :-)

(And sure, that foreach looks good to me.)
Anonymous
June 29, 2006
The comment has been removed
Anonymous
June 29, 2006
The comment has been removed
Anonymous
June 30, 2006
What sort of changes would be required for making anonymous types first class? What's the difference between a local variable and one that can be returned from or passed to a function?
Anonymous
June 30, 2006
Adding new stuff to the CLR type system has a major impact on all languages. For example, all languages are now required to be able to talk to generic types if they want to be CLI-compliant languages. That's a major burden on language implementors and we do not take imposing it lightly.

The difference between a local and something returned, in this case, is that a local never escapes into any context in which its type can be part of a publically visible contract. If a method could be 'var' then we'd either have to say "only private/internal methods can be var", which is gross, or come up with some standardized, versionable, secure, safe way to represent public methods that return anonymous types.

By keeping anonymous types restricted to being used only inside contexts in which they cannot "leak out" we don't have to solve any of those hard problems. They can be solved in future versions of the CLR. We've got to ship this thing! If we wait for the type system to be perfect, we'll wait forever.
Anonymous
July 03, 2006
I think that's the best thing to do, nobody can agree on what the compiler should do, means most developers will make many mistakes and a question asked a thousand times in the forums

and this is not a big deal (to not have it either way) anyway
Anonymous
July 04, 2006
I'm with the "disallow the whole thing" or "error" option. When there is such a question as no one really knows what happens, remove the ambiguity and disallow the case to happen. This can save a lot of time and trouble.
Anonymous
July 06, 2006
Eric would allow var declaration for out parameters?
Anonymous
July 06, 2006
Nope. Just local variable declarations. Out parameters could leak information about anonymous types to the outside world.
Anonymous
July 06, 2006
How are they going to leak. The function is declared with a given type, it only at the call site that I am asking.

For example given
int GetSomething(out bool alsoReturn)

To be able to call
var a = GetSomething(var out also);
Would delcare also to be bool.
Anonymous
July 07, 2006
I finally purchased The Design and Evolution of C++, and it appears that Dr. Stroustrup would agree with you. In section 3.11.5.2 he says, "I would probably also ban uninitialized variables and abandon the idea of declaring more than one name in a declaration."

I suspect this may have more to do with declarations of the sort:

int *a, b, *c[10];

The declaration semantics of C# are simpler than C/C++ due to the lack of pointers, so it may not be as much of a concern, but it is nonetheless an interesting point.
Anonymous
July 07, 2006
First off, hey buddy, C# does too have pointers! It's perfectly legal to declare a pointer to char, int, etc in C#. That's what the "unsafe" keyword is for, so that you can create incredibly dangerous programs that crash in the same horrible ways that your unmanaged C++ programs used to. (Or, because they'll likely corrupt the garbage collector, new horrible ways.)

Second, the declaration semantics of C# are simpler because of two things:

* the lack of const
* the more consistent syntax for describing a type

The latter is the one you're getting at. In C#, all the modifiers to make a base type into an array, a pointer, etc, are actually part of the type. A careful reading of the C++ standard shows that C++ can't even get it self-consistent. As you note, in a local variable declaration, the * is part of the variable, not part of the type. But in a formal parameter list, the * is part of the type.
Anonymous
July 10, 2006
Ahem... A thousand apologies. I've read some C# books, but never delved into it too very deeply. I was applying my Java knowledge to C#, with results as above. I do remember pointers in unsafe code, but foolishly chose to forget them.

I see the point of the more consistent declaration syntax now; I had mentally glossed over the difference between:

int[] a;
int a[];

I like const. What makes it difficult? Or is it things like:

char const * const a; //same as (?): const char * const a;
char const * b; //same as (?): const char * b;
char * const c;

that make it difficult because you can have non-const pointers to const data, const pointers to non-const data, and const pointers to const data and you have to keep track of all that? To say nothing of pointers to pointers to etc. and the subsequent explosion of const combinations.

Lastly, I enjoy your blog immensely. It makes me hope that one day I get to work on cool stuff like this, although it'll obviously take me quite a while to get up to speed on my declaration knowledge :-)
Anonymous
May 30, 2007
Eric, there is one mistake. Do forgive me for only pointing out your mistakes. :) 'Adding new stuff to the CLR type system has a major impact on all languages. For example, all languages are now required to be able to talk to generic types if they want to be CLI-compliant languages. That's a major burden on language implementors and we do not take imposing it lightly.' Try putting a CLSCompliant( true ) on assembly level. Defining a public generic type/method in C# emits a not CLS-compliant warning.
Anonymous
August 14, 2007
Raymond has an interesting post today about two subtle aspects of C#: how order of evaluation in an expression
Anonymous
December 27, 2007
Tanveer, CLS & CLI are two different things.

共用方式為

What Are The Semantics Of Multiple Implicitly Typed Declarations? Part Two

Comments

其他資源