Jaa


Implementing the virtual method pattern in C#, Part Three

(This is part three of a three-part series; part one is here; part two is here.)

Last time we saw how you could emulate virtual methods in a language that only had static methods by creating fields of delegate type, and then choosing what delegates go into the fields. However, this is not very space-efficient. Suppose there were a hundred virtual methods on Animal instead of two. That means that every class derived from Animal has a hundred fields, and in most of them, those fields are exactly the same, all the time. You make three hundred giraffes and each one of them will have exactly the same delegates in those hundred fields. This seems redundant and wasteful.

What the CLR actually does is it bundles those up into a virtual function dispatch table, or "vtable" for short. A vtable is a collection of delegates; the purpose of the vtable is to answer the question "if a virtual method was invoked on an object of this runtime type, which method should be called?"

sealed class VTable
{
public readonly Func<Animal, string> Complain;
public readonly Func<Animal, string> MakeNoise;
public VTable(Func<Animal, string> complain, Func<Animal, string> makeNoise)
{
this.Complain = complain;
this.MakeNoise = makeNoise;
}
}

Suppose we want to construct vtables for Giraffe, Cat and Dog to answer the questions "if these methods were invoked on an object of compile-time type Animal but runtime type Giraffe/Cat/Dog, which methods should actually be called?" That's easy enough:

static VTable GiraffeVTable = new VTable(Giraffe.Complain, Animal.MakeNoise);
static VTable CatVTable = new VTable(Cat.Complain, Cat.MakeNoise);
static VTable DogVTable = new VTable(Dog.Complain, Animal.MakeNoise);

And now we can just have every class have a reference to the appropriate vtable. We rewrite the classes again as follows:

abstract class Animal
{
public VTable VTable;
public static string MakeNoise(Animal _this)
{
return "";
}
// No factory; Animal is abstract.
}
class Giraffe : Animal
{
private static VTable GiraffeVTable = new VTable(Giraffe.Complain, Animal.MakeNoise);
public bool SoreThroat { get; set; }
public static string Complain(Animal _this)
{
return (_this as Giraffe).SoreThroat ? "What a pain in the neck!" : "No complaints today.";
}
public static Giraffe Construct()
{
// There are no more instance constructors; this just allocates the memory.
Giraffe giraffe = new Giraffe();
// Ensure that virtual method fields are initialized before other code is run.
giraffe.VTable = Giraffe.VTable;
// Now do the rest of the initialization that the constructor would have done.
}
}
class Cat : Animal
{
private static VTable CatVTable = new VTable(Cat.Complain, Cat.MakeNoise);
public bool Hungry { get; set; }
public static string Complain(Animal _this)
{
return (_this as Cat).Hungry ? "GIVE ME THAT TUNA!" : "I HATE YOU ALL!";
}
public static string MakeNoise(Animal _this)
{
return "MEOW MEOW MEOW MEOW MEOW MEOW";
}
public static Cat Construct()
{
Cat cat = new Cat();
cat.VTable = Cat.VTable;
// Now do the rest of the initialization that the constructor would have done.
}
}
class Dog : Animal
{
private static VTable DogVTable = new VTable(Dog.Complain, Animal.MakeNoise);
public bool Small { get; set; }
public static string Complain(Animal _this)
{
return "Our regressive state tax code is... SQUIRREL!";
}
public static string MakeNoise(Dog _this) // Remember, we forgot to say "override"
{
return _this.Small ? "yip" : "WOOF";
}
public static Dog Construct()
{
Dog dog = new Dog();
dog.VTable = Dog.VTable;
// Now do the rest of the initialization that the constructor would have done.
}
}

And now we have to rewrite the call sites again:

string s;
Animal animal = Giraffe.Create();
s = animal.VTable.Complain(animal);
s = animal.VTable.MakeNoise(animal);
animal = Cat.Create();
s = animal.VTable.Complain(animal);
s = animal.VTable.MakeNoise(animal);
Dog dog = Dog.Create();
animal = dog;
s = animal.VTable.Complain();
s = animal.VTable.MakeNoise();
s = Dog.MakeNoise(dog);

That is basically how the virtual method pattern is implemented behind the scenes. The C# compiler analyzes every method that is declared as abstract, virtual or override and figures out which "vtable slot" that method corresponds to; it then emits metadata that has enough information to allow the runtime to lay out a vtable. For every call site, the C# compiler works out whether to call an instance method directly or do an indirection through the vtable, and emits the code accordingly. When the object is allocated at runtime, the CLR consults the metadata to figure out which vtable to plug into the hidden vtable field of each object. The vtable is of course not actually a class, and its fields are not actually delegates; the CLR has a less heavyweight way to represent a reference to a method internally. But the concept is the same.

Obviously I've omitted a great many details here. For example, it is not legal to invoke a virtual or instance method with a null receiver, but it is legal for a static method to be called with a null first argument. By rights, when rewriting the call sites to go from instance or virtual calls to static calls, I should have been putting in null checks as well. I've also not at all described how "base" calls work (though perhaps you can figure it out from this sketch). I've also not said how this works for boxed value types that override virtual methods of object, though again, perhaps you can figure it out. I've ignored issues of security and accessibility. I've omitted describing what happens when a class creates a "new virtual" method. And I've said nothing about how interfaces work; the CLR uses special techniques to solve interface overriding problems that I might discuss in a later blog.

The real takeaway here is that by choosing to embed the "virtual and instance calls" pattern in the language, you get to take advantage of its power without having to go through all the pain of declaring those fields and setting up those vtables yourself. The compiler and CLR teams get to experience that pain for you; you're welcome.

You also get safety! The pattern as implemented here depends on everyone following the rules of the pattern carefully. For instance, what stops you from calling giraffe.VTable.Complain(dog)?  Nothing, that's what. The C# compiler ensures that it is impossible to actually call a virtual method with a "this" that does not match the type expected by the function, but in our pattern-based implementation of virtual methods that error is not trapped until the conversion fails at runtime, if it is caught at all.

(This is part three of a three-part series; part one is here; part two is here.)

Comments

  • Anonymous
    March 24, 2011
    interestingly if you take your previous part's code and instead transform it so that you only have one type, but allow changing (a copy of) the vtable after the empty constructor you have something broadly like prototype based inheritance.

  • Anonymous
    March 24, 2011
    Are there any differences in this mapping compared the mapping that allows you to emulate C++ virtual methods in C using structs and function pointers? C++ requires you to think about how to deal with vtables in a world with multiple inheritance. Aside from that, there's no particularly interesting difference between this sketch and how you'd do something similar in C. -- Eric Well, except using delegates instead of function pointers. Right; C# does not allow you to manipulate references to methods directly; it requires that they be wrapped up in a delegate. However if we so chose we could add "function pointer types" to C# and generate "calli" instructions to invoke them; it seems unlikely that this feature will be implemented any time soon though. -- Eric

  • Anonymous
    March 24, 2011
    > sealed class VTable > { >   public readonly Func<Animal, string> Complain; >   public readonly Func<Animal, string> MakeNoise; >   public AnimalVTable(Func<Animal, string> complain, Func<Animal, string> makeNoise) >   { >     this.Complain = complain; >     this.MakeNoise = makeNoise; >   } > } I think your classes name should be AnimalVTable.

  • Anonymous
    March 24, 2011
    > For example, it is not legal to invoke a virtual or instance method with a null receiver, but it is legal for a static method to be called with a null first argument It is legal in the CLR to call (but not callvirt) a non-virtual method with a null 'this'; the method runs until it tries to dereference the this pointer, at which point a NullReferenceException is thrown.

  • Anonymous
    March 24, 2011
    How does the VTable deal with other functions (other than Complain and MakeNoise, for example).  Suppose I had a Tiger class, to which I added a Bite() function, which wasn't in Animal or any of the other classes, what would the VTable look like?  Or is that a discussion for another day?

  • Anonymous
    March 24, 2011
    Please do discuss calling interface methods in another post.

  • Anonymous
    March 24, 2011
    > It is legal in the CLR to call (but not callvirt) a non-virtual method with a null 'this'; the method runs until it tries to dereference the this pointer, at which point a NullReferenceException is thrown. C# emits callvirt in the vast majority of cases actually. (Including non-virtual calls) I believe the only exceptions are static and calls to base class methods. This ensures a null-reference exception happens at the "appropriate" point. (I assume)

  • Anonymous
    March 24, 2011
    Here's another vote for the interfaces post.  I've always wordered how you could make multiple interfaces map onto the same methods in a generic fashion.

  • Anonymous
    March 24, 2011
    -> "the CLR uses special techniques to solve interface overriding problems that I might discuss in a later blog" Please do so, as they are more interesting then simple virual methods,  I can see how to make them work, but not in a fast way...

  • Anonymous
    March 24, 2011
    The comment has been removed

  • Anonymous
    March 24, 2011
    John: If Tiger.Bite isn't a virtual function, it doesn't go into the vtable -- its calls are all just converted to regular static function calls. If Bite is added as a virtual function by Tiger, it goes into another slot in Tiger's vtable. In this example, though, it's not really possible because VTable is a sealed class. If this were a more realistic example, AnimalVTable would derive from ObjectVTable, and TigerVTable would derive from AnimalVTable: public class TigerVTable : AnimalVTable {  public readonly Func<Tiger, string> Bite;  public TigerVTable(Func<Animal, string> complain, Func<Animal, string> makeNoise, Func<Tiger, string> Bite)    : base(complain, makeNoise)  {    this.Bite = bite;  } }

  • Anonymous
    March 24, 2011
    The comment has been removed

  • Anonymous
    March 24, 2011
    @Gabe: Right, I was assuming that a VTable class specific to Tiger, containing a Bite method would be generated, though I knew that was impossible because VTable is sealed.  Perhaps there's a reason for that and Eric will explain it in a future post in the series.

  • Anonymous
    March 24, 2011
    I guess that for interfaces each implementing class fills out an appropriate vtable. Right?

  • Anonymous
    March 24, 2011
    "C++ requires you to think about how to deal with vtables in a world with multiple inheritance." Speaking of multiple inheritance (or, multiple something anyway), is this series going to cover how interfaces work?

  • Anonymous
    March 25, 2011
    Some very very very (about 50 trailing 'very's here) advanced stuff about interface implmentation in CLR blogs.msdn.com/.../51381.aspx These two posts may also be relevant blogs.msdn.com/.../51425.aspx blogs.msdn.com/.../51495.aspx Everytime I go back to read them, some of the my neurons die.

  • Anonymous
    March 25, 2011
    There is also this MSDN Magazine article with some nice diagrams. msdn.microsoft.com/.../cc163791.aspx All that stuff is very handy if you ever want to write a privilege escalation (sandbox breaking) exploit for CLR, as patching the vtable and then executing the method is by far the easiest way to inject your code - once you find a way to fool the verifier to do so, that is :)

  • Anonymous
    March 27, 2011
    @Tanveer, Pavel: Thanks for the links - I was lost for several hours there :). The latest virtual dispatch mechanism is fairly different, I haven't found a canonical source but this analysis of the CLR2 method seems to be close to the current behaviour: blogs.microsoft.co.il/.../JIT-Optimizations_2C00_-Inlining_2C00_-and-Interface-Method-Dispatching-2800_Part-1-of-N_2900.aspx Presumably there's still some form of vtable backing the JITter implementation.