Jaa


Why Do Initializers Run In The Opposite Order As Constructors? Part Two

As you might have figured out, the answer to last week's puzzle is "if the constructors and initializers run in their actual order then an initialized readonly field of reference type is guaranteed to be non null in any possible call. That guarantee cannot be met if the initializers run in the expected order."

Suppose counterfactually that initializers ran in the expected order, that is, derived class initializers run after the base class constructor body. Consider the following pathological cases:

class Base
{
public static ThreadsafeCollection t = new ThreadsafeCollection();
public Base()
{
Console.WriteLine("Base constructor");
if (this is Derived) (this as Derived).DoIt();
// would deref null if we are constructing an instance of Derived
Blah();
// would deref null if we are constructing an instance of MoreDerived
t.Add(this);
// would deref null if another thread calls Base.t.GetLatest().Blah();
// before derived constructor runs
}
public virtual void Blah() { }
}
class Derived : Base
{
readonly Foo derivedFoo = new Foo("Derived initializer");
public DoIt()
{
derivedFoo.Bar();
}
}
class MoreDerived : Derived
{
public override void Blah() { DoIt(); }
}

Calling methods on derived types from constructors is dirty pool, but it is not illegal. And stuffing not-quite-constructed objects into global state is risky, but not illegal. I'm not recommending that you do any of these things -- please, do not, for the good of us all. I'm saying that it would be really nice if we could give you an ironclad guarantee that an initialized readonly field is always observed in its initialized state, and we cannot make that guarantee unless we run all the initializers first, and then all of the constructor bodies.

Note that of course, if you initialize your readonly fields in the constructor, then all bets are off. We make no guarantees as to the fields not being accessed before the constructor bodies run.

Next time on FAIC: how to get a question not answered.

Comments

  • Anonymous
    February 18, 2008
    Carrying on from the previous article... I think the C++ model of (as you put it) "objects that mutate their own runtime type" is appropriate. The constructor is what makes an object of a type. Until the constructor is run, the type's invariants aren't met (a type theorist would say that it's not yet of that type). Once the destructor has run, the invariant is once again not met (it's no longer of the fully-derived type). This leads to some surprises (pure virtual function calls being the obvious ones). On the other hand, to me, the CLR model is deeply weird. As I understand things, even before my constructor runs, my member functions can be called, and my member variables can be read or even changed. Any hope of a well-defined notion of a class invariant is lost. Now, you could argue that the problem is the same in both cases -- that essentially, trying to treat a not-yet-constructed object as its derived type before the derived constructor is run is simply an error -- but I would disagree. In C++, you can't get into trouble without explicitly (static_)casting to the derived class, but in C#, you can get into trouble if you call a virtual function from the base class's constructor.

  • Anonymous
    February 18, 2008
    The comment has been removed

  • Anonymous
    February 18, 2008
    To Eric: If constructor is just a method, why bother to have constructors at all?

  • Anonymous
    February 18, 2008
    The comment has been removed

  • Anonymous
    February 18, 2008
    Bill Wagner published an article on the subject in Visual Studio Magazine in December, 2007 http://visualstudiomagazine.com/columns/article.aspx?editorialsid=2377

  • Anonymous
    February 18, 2008
    > To Eric: If constructor is just a method, why bother to have constructors at all? An excellent question. There are languages which have no constructors -- you want to run code to initialize an object, you go right ahead and run that code. The reason we have constructors is because the "run a particular method exactly once when an object is created but never again" is a very common pattern, so common that the designers of several languages have deemed it worthy of inclusion in and enforcement by the language and runtime.

  • Anonymous
    February 18, 2008
    To Eric: Then why not to restrict calls from the constructor to base(), this() and static helpers?

  • Anonymous
    February 18, 2008
    Because then you end up with duplicated code. Consider a mutable object which represents an enumerator over some sort of collection. A common pattern is class C {  public C() { Reset(); }  public void Reset() { ...  } } With your way, either you have to force the user to call Reset() after construction, which produces an opportunity for a bug, or you duplicate the code in Reset(), which is an opportunity for maintenance problems.

  • Anonymous
    February 18, 2008
    To Eric: You're buying the ability to never see a NULL readonly member at the price of allowing to call derived class members before the derived class constructor. This means the derived class cannot implement an invariant that would hold throughout its lifetime (post-constructor and pre-destructor).

  • Anonymous
    February 18, 2008
    I'm not following you. How does the order of running intializers before constructors make it impossible to implement an invariant?

  • Anonymous
    February 18, 2008
    To Eric: There shall be no way to call instance methods before instance is constructed, so that post constructor will be the first one called after instance construction.

  • Anonymous
    February 18, 2008
    Well, the other possibility was to have "A is B" return false (because the object is not B yet) and thus disallow downcasting to a not-yet-constructed type. the way you have it, the invariant in   class A { A() { Invariant(); } public virtual void Invariant() {} }; class B : public A { private int i; public B(int i_) { i=i_; } public void Invariant() { assert(i==1);} } does not hold if A calls Invariant(). In fact, strictly speaking it's accessing an uninitialized variable. It's just that the runtime system went and zeroed out all the fields, so they seem initialized.

  • Anonymous
    February 18, 2008
    Correct. That's why its a bad programming practice to call virtual methods from constructors. The reason it is a bad idea to call virtual methods from constructors is because a method on a derived class might run before the derived class constructor runs. That is, the tradeoff made is we are trading the benefit of "an object is always of one type throughout its entire lifetime" against the benefit of "it is always safe to call a virtual method, even from a constructor". What I am confused about is your statement that this restriction on when you should call virtual methods is a consequence of the fact that initializers run before constructors. That is saying it backwards. Rather, both the fact that you should not call virtual methods in constructors, AND the fact that initializers run before constructors, are consequences of the fact that object type is not mutable. Does that make sense?

  • Anonymous
    February 18, 2008
    Okay, from this discussion, I can see why it makes sense to initialize class member variables of both Base and Derived before running any constructors; despite the assumption from the previous article, that behaviour didn't surprise me at all.  Doing things in between calling the Base and Derived constructors would be more surprising to me. What I don't understand is why it's necessary to do Derived's members before Base's.  According to my experiments, member initializers can't refer to "this" anyhow, so nobody can (accidentally) get any reference to any of either Base's or Derived's member initializers during the initialization phase. If that's true, there seems to be no obvious reason to run Derived's first; it's irrelevant either way.  It will probably never affect me, but just for symmetry with constructors, running Base's initializers before Derived's seems to make more sense.

  • Anonymous
    February 18, 2008
    The implementation of the "reverse order" semantics is simple -- have the constructor run the initializers, then call the base class constructor, then run the constructor body. Suppose you wanted the base class initializers to run before the derived class initializers. Imagine you are the compiler developer; how would you implement it?

  • Anonymous
    February 18, 2008
    To Eric: Have "hidden" initializer that runs base initializer then runs itself then you run constructor which calls base constructor. By the way. Do you know why VB.NET specifies "Java style" initialization in language definition? Did they have any reasons or did it just because VB programmers (myself included) are supid and won't know better?

  • Anonymous
    February 18, 2008
    > Have "hidden" initializer that runs base initializer then runs itself then you run constructor which calls base constructor. And then how does the base constructor know that it doesn't have to run its hidden initializer?  It had better not run it again, otherwise we've just initialized all the base stuff twice. > Do you know why VB.NET specifies "Java style" initialization in language definition? Nope. I have not attended VB design team meetings since 2001, and even then I was only there as an expert on the differences between VB and VBScript. I have no idea why they made the specific design choices that they did; you should ask a VB expert. Paul Vick, say.

  • Anonymous
    February 18, 2008
    Constructor doesn't invoke initializer. Initializers invoke initializers, constructors invoke constructors. I asked Paul Vick, his principles well known and published: Working in a natural way is a higher priority than language purity.

  • Anonymous
    February 18, 2008
    It's like baking a cake! You source and buy the ingredients (initialising) before mixing & baking (constructing) and once it has finished baking you can use it for what ever purpose you intended - usually eating... It wouldn't make sense so start mixing before sourcing the ingreidents, you would getting half way through and realise you need to go to the shop for baking powder... Or may be it's just me who sees it like baking a cake...

  • Anonymous
    February 19, 2008
    <i>Working in a natural way is a higher priority than language purity.</i> I agree -- but it's not obvious to me which is more unnatural: running all initializers before all constructors, or having supposedly immutable fields take on different values during initialization/construction. I have to say after many years of Java I find the C# approach rather attractive.

  • Anonymous
    February 19, 2008
    At least I have found why MyClass keyword was introduced in VB.NET. It allows one to call virtual functions of the class even if they are overloaded in the derived class. Thus one can simulate C++ behavior by prepending virtual function calls in constructor with MyClass.

  • Anonymous
    February 19, 2008
    > Constructor doesn't invoke initializer. Initializers invoke initializers, constructors invoke constructors. OK, so who gets the ball rolling?  You've got to tell the CLR in the metadata of the assembly which method to call when an object is constructed by "new".  You cannot tell it the method that runs just the initializers, and you cannot tell it the method that runs just the constructors. What are you going to do? I anticipate your answer -- generate a third method that runs both, and have that be the "real" constructor. So in short, you're suggesting that every constructor declaration potentially create three different methods, one which implements initializers, one which implements constructor bodies, and one which calls the other two. This added complication would not maintain any invariant about the class, since the order of initialization makes no difference to the class itself -- the whole point is that the instance is not inspectable until after the initializers run. The only difference it makes is if there is a side effect in two or more of the initializers, and you care about the order in which those side effects are effected, and you want them to go base to derived. I do not see "side effects are effected in a different order" as a compelling reason to massively complicate the code generator for constructors. It's complicated enough already, believe me!

  • Anonymous
    February 19, 2008
    I got your point. Thank you for your patience with all the clarifications.

  • Anonymous
    February 19, 2008
    You're welcome! Thanks for asking a good question and bearing with me through the answer. :-)

  • Anonymous
    February 19, 2008
    To Eric: I retract my first comment, I didn't understand the point about C# objects "always being of one type". Everything pretty much follows from that requirement. In fact, you could even initialize derived members first, too. Would create less of a surprise for those derived virtual functions that can be called before the constructor is.

  • Anonymous
    March 11, 2008
    Welcome to the forty-first Community Convergence. The big news this week is that we have moved Future

  • Anonymous
    March 12, 2008
    Eric said: "You've got to tell the CLR in the metadata of the assembly which method to call when an object is constructed by "new".  You cannot tell it the method that runs just the initializers, and you cannot tell it the method that runs just the constructors. What are you going to do?" Have the newobj instruction call the initializer, then the specified constructor.  Since there is only one initializer for any type there's no need to pass it to newobj.

  • Anonymous
    March 17, 2008
    The comment has been removed

  • Anonymous
    March 20, 2008
    As a long time (35+ years) developer, I have been involved in this debate since C++ as introduced. There are definite pros and cons to both approaches, and the safest bet is to never call virtual methods from within the constructor. Unfortunately it is very difficult (even with fxCop) that a call to a non-virtual member which in it's body calls a virtual method is flagged. One BIG advantage of the C++ model, applies to the development of library code. In the C++ model (Initializers run right before the opening brace of the constructor) is that the initialization behavior of a base class is 100% invariant with regards to the dervied class. this definately produces more predictable results. However it does also impose limitations

  • Anonymous
    March 27, 2008
    The comment has been removed

  • Anonymous
    March 27, 2008
    It is conceptually bizarre that the type of an object varies throughout its lifetime. It is also conceptually bizarre that you can end up with an instantiated object of an abstract type that is not of a more specific concrete type.   The cure is worse than the disease, in this case.  In C#, objects always have the type that you asked for from the moment of their creation, and you will never see an instance of an abstract type that is not also of a concrete type.  The C++ way of solving this problem is just plain weird.  I'm not aware of any other OOP language that does it that way.

  • Anonymous
    March 27, 2008
    I recently ran Analyze over a core library and found that by base constructor was assigning a value to an abstract property. ie. Calling a virtual method. By design, my derived class wanted to maintain the private field which doesn't seem such a good idea anymore when the property is to be accessible via the base class. Refactoring meant relocating the private field to the base class, providing a vitual base implementation for the puiblic property and then overwridding this in the derived class to get the behaviour I needed. Analyze gave no more than a warning. Yet I was very impressed to get that much.

  • Anonymous
    March 28, 2008
    > It is conceptually bizarre that the type of an object varies throughout its lifetime. I don't see that, myself. When one constructs, say, a Door, there's a period in time when it's just an Aperture (which for the purpose of this example is the base class of Door). During that period, if I try to travelThrough() the object, I want to do the Aperture thing, not the Door thing (the latter would fail, since the object has no handle to pull yet). Or, look at it another way. Let's view a type as a set of values. Base class X is a pair of ints (a, b), such that a <= b. Derived class Y has another int c, such that a <= c <= b. Let's suppose we construct an object of eventual type Y, passing in (3, 5, 4) as (a, b, c). First, an X base object is constructed with values (3, 5). This base object is of type X, since it's in the set of possible Xs. But the complete object is not of type Y yet, since (in CLR) the value of (3, 5, 0) is not in the set of possible Ys. Only once Y's constructor has run and set c such as to enforce Y's invariant is the complete object in the set of possible Ys and as such of type Y. > It is also conceptually bizarre that you can end up with an instantiated object of an abstract type that is not of a more specific concrete type. Yes, that certainly goes against a lot of conventional teaching ("abstract means not instantiatable"). But that's the cost of allowing constructors and destructors to enforce class invariants. Which (for C++ at least) is invaluable. On the other hand, I find it conceptually bizarre to allow methods on an object of type D to be called before D's constructor is called, and before D's members have been constructed.

  • Anonymous
    March 30, 2008
    Thanks Eric. This was a useful refresher for me. Sure, C++ does things differently and I still have a lot of fondness for that language even if I get to use it less and less these days. But I'm just as happy with how Java/.Net do things. I think the important thing here is to know how the language your using works, what the limits are and what the benefits are. I don't think that either method is universally right or wrong, just different. Eric should be commended for his persaverance with this discussion.

  • Anonymous
    April 04, 2008
    The comment has been removed

  • Anonymous
    May 11, 2008
    The comment has been removed

  • Anonymous
    May 28, 2008
    The comment has been removed

  • Anonymous
    June 04, 2008
    Question: Can somebody provide a real life (non-pathological) example when all this (order of member initialization) matters? Opinion: 1) One must never provide a language feature that would require that much confused discussion. 2) If such highly-arguable feature has been provided one must never write code which uses that feature. Conclusion: never write code that depends on order of member initialization.

  • Anonymous
    July 12, 2008
    我们在实现类的继承时,创建派生类的实例时,基类与派生类的实例字段都要进行实例化,他们的构造函数都需要调用,那执行的顺序是怎样的呢?一起来做做这个测试题吧。

  • Anonymous
    July 14, 2008
    The comment has been removed

  • Anonymous
    November 17, 2008
    I think the problem lies in the fact that the initializers are in a sense constructors. Basically there are 2 types of constructors, running in opposite directions. I am not sure what real bennefit bring the initializers. Don't tell me it makes code shorter because it sounds like a bad joke. Readonly members? There are alternatives. You don't have to duplicate code because you can create a private method that initializes the members. It cannot be overriden, it gives no headaches. You call it from all of the constructors and all is nice and leads to no contradictions or unexpected situations. As for calling methods of the derived class from the constructor, you can do it in C++ and it is idiotic. I don't think this should trigger big changes in the language but rather in the brain of the guy who does it.

  • Anonymous
    December 18, 2008
    Sorry Eric, but I you haven't explained a solution to the actual problem. The interesting problem was not why constructors should be invoked after initialisers, I think everyone with a bit of analytical insight agrees to this being a sensible design decision. The actual problem was why initialisation should be done in a certain order. My conjecture here is that it doesn't really matter much in which order initialisations are carried out as long as they are performed before all constructors, meaning that the order if initialisation is not a serious source for ambiguity and unexpected behaviour. The reason for this, I would say, is that prior to initialisation the object is not observable: it might already physically exists in memory but no reference to it is exposed yet. Clearly, the aggregated objects should besides have no knowledge of the context in which they are initialised either. The exposure of the object reference is what creates ambiguities and unexpected behaviour, and the earliest that happens is when the (first) constructor is called. Prior to this we cannot observe what happens, at least if confining our observation to the state of the object to be instantiated, thus it is irrelevant from a logical point of view. This is maybe not quite true, I could imagine pathological cases where instantiation of the aggregated objects might have some side effect that depends on their order, but this would have to involve some global state and seems rather contrived. I'm not really a (C#) programmer so might be missing something; if anyone had a genuine example where the order of initialisation prior to constructor invocation is observable in the sense described above, and more importantly creates potential for misunderstanding and errors of the sort that happen when interleaving initialisation and constructor calls, that would be of interest to justify this design decision. Btw even if there is no favourable order it may still make sense to impose one to avoid unintentional exploitation of unspecified behaviour. Best, Frank