Compound Assignment, Part One

Artikkeli
03/29/2011

When people try to explain the compound assignment operators += –= *= /= %= <<= >>= &= |= ^= to new C# programmers they usually say something like “x += 10; is just a short way of writing x = x + 10;”. Now, though that is undoubtedly true for a local variable x of type int, that’s not the whole story, not by far. There are actually many subtle details to the compound assignment operators that you might not appreciate at first glance.

First off, suppose the expression on the left hand side has a side effect or is expensive to call. You only want it to happen once:

class C
{
private int f;
private int P { get; set; }
private static C s = new C();
private static C M()
{
Console.WriteLine("Hello");
return s;
}
private struct Evil
{
public int f; // Mutable value type with a public field, evil!
public int P { get; set; }
}
private static Evil[] evil = new Evil[2000];
private static Evil[] N()
{
Console.WriteLine("Badness");
return evil;
}

If somewhere inside C you have M().f += 10; then you only want M’s side effect to happen once. This is not the same as M().f = M().f + 10;

What is it the same as then? How about this:

C receiver = M();
receiver.f = receiver.f + 10;

Is that right? It seems to be, but suppose we make it a bit more complicated. Suppose we have N()[123].f += 10; . Is this then

Evil receiver = N()[123];
receiver.f = receiver.f + 10;

Clearly not. We've made a copy of the contents of variable N()[123] and we are now mutating the variable containing the copy but we need to be mutating the original.

Once more we see how much pure concentrated evil mutable value types are!

To express the real semantics concisely we need a feature that C# does not have, namely, “ref locals”. C# has ref-typed parameters, but not ref-typed locals. When you make a ref-typed parameter essentially you are saying “this parameter is an alias for this variable”:

void N(ref int x) { x = 10; }
…
N(ref M().f);

That says “Evaluate the expression as a variable and then make the variable x refer to the same storage location as the variable”. Suppose we had the ability to do that with locals instead of just parameters. That is, we can make a local variable that is an alias for a (possibly non-local) variable. Then M().f += 10 would be equivalent to:

ref int variable = ref M().f;
variable = variable + 10;

And thus the side effect of M only happens once. Similarly N()[123].f += 10; where the array is of mutable value type becomes

ref int variable = ref N()[123].f;
variable = variable + 10;

and the side effect of N only happens once, and we mutate the field of the correct variable.

C# does not have the “ref local” feature though we could implement it if we wanted to; the CLR supports it. I think we have higher priorities though.

What if instead of a variable we modified a property?

M().P += 10;

You might again think that this is a a syntactic sugar for

C receiver = M();
receiver.P = receiver.P + 10;

which is of course a syntactic sugar for:

C receiver = M();
receiver.set_P(receiver.get_P() + 10);

Again, we only want the side effect exercised once, though of course we have to call two different methods for the getter and the setter; that’s unavoidable.

But again, we have a problem if the receiver is a variable of value type. If we have N()[123].P += 10; then we have to generate

ref Evil receiver = ref N()[123];
receiver.set_P(receiver.get_P() + 10);

So that we make sure that the mutable value type property we're invoking is on the right variable.

Similarly if we had an indexer defined on C:

M()[X()] += 10;

Now we have to keep track of both the receiver and the index to make sure they are not evaluated twice. That’s the same as:

C receiver = M();
int index = X();
receiver[index] = receiver[index] + 10;

and of course just as with properties, those too are just syntactic sugars for calls, and again, we need to make sure we get the refness right if the receiver is a variable of a mutable value type.

And similarly with += –= on events, though of course those are different because they are syntactic sugars for event add and remove methods.

Anyway, I don’t think I need to further belabour the point that side effects are only computed once and that determining the correct location to mutate is not as easy as you might think.

Another interesting aspect of the predefined compound operators is that if necessary, a cast – an allegedly “explicit” conversion – is inserted implicitly on your behalf. If you say

short s = 123;
s += 10;

then that is not analyzed as s = s + 10 because short plus int is int, so the assignment is bad. This is actually analyzed as

s = (short)(s + 10);

so that if the result overflows a short, it is automatically cut back down to size for you.

A final subtle point is that for the predefined operators if the assignment without the compounded operation would not have been legal, then the compound assignment is not legal either. If you say

int i = 10;
short s = 123;
s += i;

then that’s not legal because s = i is not legal.

Those design details are interesting in of themselves; next time we’ll see how some of these subtleties affect some proposed extensions to the language.

Comments

Anonymous
March 28, 2011
Forgive my ignorance, but how is | ref int variable = ref M().f; | variable = variable + 10; not equivalent to | C receiver = M(); | receiver.f = receiver.f + 10;
Anonymous
March 28, 2011
I also didn't understand
Anonymous
March 28, 2011
I was going to ask the same thing as Patrick...
Anonymous
March 28, 2011
@Patrick: The resolutuion of receiver.f may have side effects, which would be run twice in your latter example, but only once in the former.
Anonymous
March 29, 2011
@Dave: Could you give an example of how the resolution of a field can have side effects?
Anonymous
March 29, 2011
@Eric - Another fascinating article - I never realized compound assignment was so complicated. Keep up the great posts. I was a little confused by the first example also. Why use the term "ref locals"? Wouldn't "f" normally be referred to as a "field" of class C?
Anonymous
March 29, 2011
The only such situation I can think of is if f is not a field but a property, in which case nothing gets run twice - the getter runs once and the setter runs once.
Anonymous
March 29, 2011
@Patrick Say f is a property. receiver.f will call receiver.f.get() Which could do: get { FirePropertyAccessedEvent(); return value; } In the first case I think this would be fired once, in the second case it would be fired twice.
Anonymous
March 29, 2011
@Random832 You're right - it would only be run once - so ignore my previous post :)
Anonymous
March 29, 2011
private property P { get; set; } Should be: private int P { get; set; } Whoops, that was a silly editing error. Thanks. -- Eric
Anonymous
March 29, 2011
The comment has been removed
Anonymous
March 29, 2011
The comment has been removed
Anonymous
March 29, 2011
"that’s not legal because s = i is not legal." Not so. DateTime d = DateTime.UtcNow; TimeSpan s = new TimeSpan(1,0,0); d += s; I suspect that the real reason it isn't legal is because the s += 10 case is some kind of special rule for integer literal expressions, similar to the rule that allows s = 10 itself. I was missing a word in the text; I should have noted that this rule only applies to the "predefined" operators that are built in to the language. It does not apply to user-defined operators, like that defined between DateTime and TimeSpan. Thanks for the note. -- Eric
Anonymous
March 29, 2011
"Capturing the receiver turns a variable of value type into a different variable of value type, and therefore can change which copy is mutated." What this doesn't clarify is when can you A) have side effects and B) not have already made a copy that won't outlive the expression. The answer is when you've got an expression with side effects that returns a reference type, which you then proceed to assign to a field within a [possibly multiply nested] valuetype field* of that object. Which suggests an obvious solution - capture that intermediate 'receiver'. If you use an expression with side effects to get an index to an array element, capture that too. *I am loosely defining array elements to be "fields" for the purpose of this. Sure, that would work. I note that I'm looking for a concise way to specify the rule. And of course, that concise thing is in practice what we actually emit during IL generation; we make a reference to the variable and put the ref on the stack for later. -- Eric **this ignores that the expression may involve a method that returns a value type by reference, but it's not clear that C# supports this any more than it does "ref locals", other than the one used internally for multidimensional arrays. Right, the feature is not generally supported but there are a small number of special cases where we take advantage of it. I wrote a prototype of C# a few years ago which did support this feature generally and it worked quite nicely, but I don't think it will make it into the language proper for quite some time, if ever. -- Eric
Anonymous
March 29, 2011
I hope that the proposed ??= operator we talked about a while ago will be discussed in your next article! x ??= y would be 'simply' defined as 'x = x ?? y'. I use that pattern somewhat often for properties, like so: private T _data = null; public T Data { get { return _data = _data ?? new T(); } }
Anonymous
March 29, 2011
The comment has been removed
Anonymous
March 29, 2011
On a somewhat related matter: When you have a Indexer property on a class that returns a struct, I am often inclined to write: MyObject[i].f += 1; This won't work, it won't even compile. For the right reasons, i suppose. C# doesn't allow methods to return refs, and i am aware that i should not hold my breath until you do. But it would be fabulous to understand why that must be so...
Anonymous
March 29, 2011
Ferdinand: Are you asking why methods can't returns refs? It's because you could end up returning a reference to a local variable, which is on the stack. Once your method returns, though, that local variable ceases to exist. Do you understand why now? It's actually a fairly common bug in C code for a function to return a pointer to a local variable. It often leads to unusual results because the value can stay there on the stack for some time before being overwritten by something else. ==== Ah, but it is legal in the CLR for a method to return a ref. Now, it should not be verifiable for a method to return a ref to a local that is going away, as you note. But it is possible to determine when a method definitely returns a ref to a local that is going away, or possibly returns a ref to a local that is going away, and make that code not verifiable. Obviously it is easy to determine if a method definitely returns a ref to its own local; just do a flow analysis and see where the address that is being returned on the stack came from. The more complicated scenario is:
struct S { public int f; }
ref int M()
{
S s = new S();
return ref N(ref s);
}
ref int N(ref S s) { return ref s.f; } See the problem? M indirectly returns a reference to s.f, but s.f is on the stack of the call to M! N is fine; the verifier knows that even if s.f is on the stack, it is on the stack lower than N. The verifier would have to be written to handle this situation and disallow it.

Eric

Anonymous
March 29, 2011
@Gabe: Given the CLR type system, you could restrict ref-typed C# expressions (return or otherwise) to the typesafe subset fairly easily: in particular, if C is an arbitray reference typed (including arrays) expr and F is a access of a value typed field the following form is safe: "ref" C ("." F | "[" expr "]")*. Notably, returning ref arguments should be illegal - otherwise this is permissible: ref int Inner(ref int a) { return a; } ref int Outer(int a) { return Inner(ref a); } Which returns a ref to an argument which has been popped off the stack. Of course, Inner could be defined in a C++/CLI assembly, but it would not be typesafe (/clr:safe) in that case. OT: One of the more annoying bugs in Adobe's Actionscript 3 compiler is it naively expanding "e += v" to "e = e + v", duplicating arbitrary expressions on the LHS. No matter how unreasonable I feel your hatred of mutable structs is, Eric, I'm continuously thankful you can make a compiler that works correctly.
Anonymous
March 29, 2011
Just as I thought! Once you get it, it's fabulous!
Anonymous
March 29, 2011
Nice article.. though i could only understand whole of it by going through it at least twice. :) And sorry but your comment confused me even more. :( I'm hoping you will explain it (as you've said) in your update.
Anonymous
March 29, 2011
Eric, not as long as a week ago, we are debating the miss of the "ref" keyword also for inline declarations. The problem on a desktop platform maybe is not important, but we are talking around the .Net Micro Framework. On that environment the memory usage and the performance is very important. That's because I have realized that the only way to alter a structure is to pass it as "ref" on a dedicated function. Why there's no other way? Well, it was only a curiosity. I never missed that gap, also because structs scare me a lot. OK, nothing else. Just for your info. Cheers
Anonymous
March 29, 2011
Is the "ref local" feature available through reflection? How could "N()[123].f += 10" be executed efficiently through reflection?
Anonymous
March 30, 2011
@Rising - I struggled with the concept of "ref local" until I thought about it in terms of a call to a method: static void CompoundAdd(ref int variable, int i) { variable = variable + i; } N()[123].f += 10; would be replaced with: CompoundAdd(ref C.N()[123].f, 10); Basically, the "ref local" feature would allow you to inline the CompoundAdd method.
Anonymous
March 30, 2011
Don't worry mutable structs. I still love you.
Anonymous
March 31, 2011
If mutable structs would require a ref local for fields, why do properties work for the same issue? Wouldn't they have the same problem?
Anonymous
April 01, 2011
@Eric > Ah, but it is legal in the CLR for a method to return a ref. Now, it should not be verifiable for a method to return a ref to a local that is going away, as you note. But it is possible to determine when a method definitely returns a ref to a local that is going away, or possibly returns a ref to a local that is going away, and make that code not verifiable. Interestingly enough, if looking at this strictly from Ecma-335 perspective, it's legal for a method to be declared as returning a ref in verifiable code, but it's not legal to call one. P III, 1.8.1.2.1 "Verification types": "A method can be defined as returning a managed pointer, but calls upon such methods are not verifiable. When returning byrefs, verification is done at the return site, not at the call site. [Rationale: Some uses of returning a managed pointer are perfectly verifiable (e.g., returning a reference to a field in an object); but some not (e.g., returning a pointer to a local variable of the called method). Tracking this in the general case is a burden, and therefore not included in this standard. end rationale]" Now in practice, .NET extends CLI verification rules such that calling methods like that is verifiable, but the body of the method may or may not be depending on where the ref comes from. However, this is an extension, not part of the standard, so conforming assemblies may not rely on it. In fact, .NET rule for this is actually subtly incompatible with CLI verification rule, because, according to the latter, a byref-returning method that is never called would always be verifiable - which will not be the case with .NET.
Anonymous
October 05, 2011
I second Chad's motion for an "??=" operator. It's weird that C# has this inconsistency (that almost any operator except ?? can participate in compound assignment.) Of course I would also like a conditional dot operator: loyc-etc.blogspot.com/.../i-want-conditional-dot-operator.html - oh well.

Jaa

Compound Assignment, Part One

Comments

Lisäresursseja