Practice thinking like a compiler tester
I don’t know why but for some reason I find this little recursive algorithm I ran across in the compiler the other day to be completely hilarious.
For every class in your program we have to emit metadata. For various historical reasons, there are strict rules we must follow about the order in which metadata is emitted:
- Every class must be emitted exactly once. (We cannot omit emitting any class and we cannot emit the same class twice.)
- A derived class must be emitted after its base class.
- A nested inner class must be emitted after its outer class.
(To simplify this discussion I am omitting interfaces. A class must also be emitted after all of its interfaces, but for the purpose of this discussion they are just like having multiple base classes.)
We have a recursive algorithm that goes something (but not quite!) like this:
foreach(Class c in TopLevelClasses)
Emit(c);
…
void Emit(Class c)
{
if (c == null || c.Emitted) return; // don’t emit twice!
Emit(c.BaseClass);
Emit(c.OuterClass);
// OK, now we are ready to really emit c,
// because its base and outer classes are really emitted.
ReallyEmit(c); // This must only happen once per class, so mark us as emitted.
c.Emitted = true;
// Recurse on c’s inner classes, so that we don’t miss them.
foreach(Class i in c.InnerClasses)
Emit(i);
}
Does that look correct to you?
Note that at this point we have already guaranteed that base classes form a tree. You cannot start going up the base class chain and end up back where you started. So there is no infinite recursive descent along the base class. Similarly, we already know that outer/inner classes make a well-formed tree. We’re not going to go up the chain of outer classes and end up back where we started, so there is no infinite recursive descent there either.
We are also not concerned about generic classes, whether the classes in the generic type constraints have been emitted, etc. Assume C# 1.0 classes.
Nevertheless, there is still a bug in this algorithm. Can you find a set of legal C# classes which causes this algorithm to violate one of the preconditions I mentioned?
This one is pretty easy. I’ll post an example tomorrow, propose a fix to the algorithm, and pose a slightly harder version of the puzzle then.
Comments
Anonymous
April 10, 2007
There is no already emitted check like if (!c.Emitted) { ReallyEmit(c); }Anonymous
April 10, 2007
The comment has been removedAnonymous
April 10, 2007
Assume the following:
- Outer is a top level class
- Inner is a public class nested within Outer
- DerivedFromInner is a top level class derived from Outer.Inner If DerivedFromInner is processed by the foreach loop BEFORE Outer, then Inner will be emitted twice due to the fact that c.Emitted is only set to true AFTER the call to Emit(c.OuterClass)
Anonymous
April 10, 2007
I do not know enough C# to write a set of classes to expose a bug, but I am thinking you might want to write the top of Emit like this: if (c == null ) return; // got no class Emit(c.BaseClass); Emit(c.OuterClass); if ( c.Emitted) return; // don’t emit twice!Anonymous
April 10, 2007
Rich posted while I was typing...looks like he posited the bug case.Anonymous
April 10, 2007
RichM: How do you propose Derived be processed before Outer?Anonymous
April 10, 2007
kfarmer: Outer and DerivedFromInner are both top level classes and would therefore be contained in the TopLevelClasses collection. The problem statement does not indicate how items are ordered within TopLevelClasses. Since this is undefined, I am assuming it would be possible (but not certain) for DerivedFromInner to be processed before Outer.Anonymous
April 10, 2007
Nice job RichM, that is exactly the example I have on my whiteboard right now. kfarmer: the compiler can choose to enumerate the top level classes however it wishes to. The algorithm it actually uses is something like: EmitFiles(files){ foreach(file f in sourcefiles) emitNamespace(f.rootnamespace); } EmitNamespace(n) { foreach(class c in n.toplevelclasses) Emit(c); foreach(Namespace nc in n.childnamespaces) EmitNamespace(n); } Top level classes in a namespace are enumerated in the order they appear in the source code.Anonymous
April 10, 2007
So essentially what you have is not a tree, but a DAG (directed acyclic graph), where an edge from A to B indicates A is B's base or outer class. Then all you need to do is perform a topological sort using a depth-first search: Stack<Class> stack = new Stack<Class>(); foreach(Class c in TopLevelClasses) Emit(c); foreach(Class c in stack) ReallyEmit(c); void Emit(Class c) { if (!c.Emitted) { foreach(Class i in c.InnerClasses) Emit(i); foreach(Class d in c.DerivedClasses) Emit(d); stack.Push(c); c.Emitted = true; } }Anonymous
April 10, 2007
So if I take as given that your strict rules are consistent, then class Outer : Outer.Inner { public class Inner { } } is not permissible? (Nor if there were some chain of classes separating Outer and Outer.Inner in the inheritence tree)?Anonymous
April 11, 2007
c: From VS2k5: Error 1 Circular base class dependency involving 'Scratch.Outer' and 'Scratch.Outer.Inner' C:...ProjectsScratchScratchProgram.cs 16 11 Scratch I always kind of wondered what error 1 was. :-)Anonymous
April 11, 2007
Reader RichM found the same solution to the puzzle I posed yesterday that I did: public class V : S.TAnonymous
September 06, 2007
Yesterday I posed a slightly harder version of the puzzle I posted the day before . Reader Steve found