Dela via


Curiouser and curiouser

Here's a pattern you see all the time in C#:

class Frob : IComparable<Frob>

At first glance you might ask yourself why this is not a "circular" definition; after all, you're not allowed to say "class Frob : Frob"(*). However, upon deeper reflection that makes perfect sense; a Frob is something that can be compared to another Frob. There's not actually a real circularity there.

This pattern can be genericized further:

class SortedList<T> where T : IComparable<T>

Again, it might seem a bit circular to say that T is constrained to something that is in terms of T, but actually this is just the same as before. T is constrained to be something that can be compared to T. Frob is a legal type argument for a SortedList because one Frob can be compared to another Frob.

But this really hurts my brain:

class Blah<T> where T : Blah<T>

That appears to be circular in (at least) two ways. Is this really legal?

Yes it is legal, and it does have some legitimate uses. I see this pattern rather a lot(**). However, I personally don't like it and I discourage its use.

This is a C# variation on what's called the Curiously Recurring Template Pattern in C++, and I will leave it to my betters to explain its uses in that language. Essentially the pattern in C# is an attempt to enforce the usage of the CRTP.

So, why would you want to do that, and why do I object to it?

One reason why people want to do this is to enforce a particular constraint in a type hierarchy. Suppose we have

abstract class Animal
{
public virtual void MakeFriends(Animal animal);
}

But that means that a Cat can make friends with a Dog, and that would be a crisis of Biblical proportions! (***) What we want to say is

abstract class Animal
{
public virtual void MakeFriends(THISTYPE animal);
}

so that when Cat overrides MakeFriends, it can only override it with Cat.

Now, that immediately presents a problem in that we've just violated the Liskov Substitution Principle. We can no longer call a method on a variable of the abstract base type and have any confidence that type safety is maintained. Variance on formal parameter types has to be contravariance, not covariance, for it to be typesafe. And moreover, we simply don't have that feature in the CLR type system.

But you can get close with the curious pattern:

abstract class Animal<T> where T : Animal<T>
{
public virtual void MakeFriends(T animal);
}

class Cat : Animal<Cat>
{
public override void MakeFriends(Cat cat) {}
}

and hey, we haven't violated the LSP and we have guaranteed that a Cat can only make friends with a Cat. Beautiful.

Wait a minute... did we really guarantee that?

class EvilDog : Animal<Cat>
{
public override void MakeFriends(Cat cat) { }
}

We have not guaranteed that a Cat can only make friends with a Cat; an EvilDog can make friends with a Cat too. The constraint only enforces that the type argument to Animal be good; how you use the resulting valid type is entirely up to you. You can use it for a base type of something else if you wish.

So that's one good reason to avoid this pattern: because it doesn't actually enforce the constraint you think it does. Everyone has to play along and agree that they'll use the curiously recurring pattern the way it was intended to be used, rather than the evil dog way that it can be used.

The second reason to avoid this is simply because it bakes the noodle of anyone who reads the code. When I see List<Giraffe> I have a very clear idea of what the relationship is between the "List" part -- it means that there are going to be operations that add and remove things -- and the "Giraffe" part -- those operations are going to be on Giraffes. When I see "FuturesContract<T> where T : LegalPolicy" I understand that this type is intended to model a legal contract about a transaction in the future which has some particular controlling legal policy. But when I read "Blah<T> where T : Blah<T>" I have no intuitive idea of what the intended relationship is between Blah and any particular T. It seems like an abuse of a mechanism rather than the modeling of a concept from the program's "business domain".

All that said, in practice there are times when using this pattern really does pragmatically solve problems in ways that are hard to model otherwise in C#; it allows you to do a bit of an end-run around the fact that we don't have covariant return types on virtual methods, and other shortcomings of the type system. That it does so in a manner that does not, strictly speaking, enforce every constraint you might like is unfortunate, but in realistic code, usually not a problem that prevents shipping the product.

My advice is to think very hard before you implement this sort of curious pattern in C#; do the benefits to the customer really outweigh the costs associated with the mental burden you're placing on the code maintainers?


(*) Due to an unintentional omission, some past editions of the C# specification actually did not say that this was illegal! However, the compiler has always enforced it. In fact, the compiler has over-enforced it, sometimes accidentally catching non-cycles and marking them as cycles.

(**) Most frequently in emails asking "is this really legal?"

(***) Mass hysteria!

Comments

  • Anonymous
    February 02, 2011
    I think the curiously recurring generic pattern has one use you haven't covered, which I consider more legitimate than the examples in your blog: allowing statically-typed access to a derived type. I've only used CRGP once (so far), in this type:  nitokitchensink.codeplex.com/.../62063 which implements INotifyPropertyChanged on behalf of a derived class. It enables me to provide this method:  protected void OnPropertyChanged<TProperty>(Expression<Func<TDerived, TProperty>> expression); in a way so that when the implementer of the derived class is writing:  OnPropertyChanged(x => x. they get completion with IntelliSense.

  • Anonymous
    February 03, 2011
    At the cost of one down cast (disregarding the InvalidCastException if you guessed wrong) per call you can get the same information if not using CRGP.

  • Anonymous
    February 03, 2011
    As somebody who has been guilty of using this pattern before, I find it speaks more to the lack of a LISP-like macro feature or mixins in C# more than anything else. Its an attempt to do metaprogramming in a language that doesn't have a lot of support for it.

  • Anonymous
    February 03, 2011
    This is a poor man's self type annotations that C# does not have, but, say, Scala and Fortress do. As you've pointed out, it does not enforce that self type is correct, which also means that compiler does not know the correct type of "this" in class body, but it's better than nothing. First, it quite clearly documents that T is supposed to be self type, not just any type, and second, the compiler will know correct type of T in the body.

  • Anonymous
    February 03, 2011
    I've been using it in several cases, but typically only for code which is not directly exposed to "end-users" (e.g. private classes, or abstract classes used through derived classes). Usually I call the type parameter TSelf and the derived classes are often sealed classes. One such example can be found here: code.google.com/.../GrammarObject.cs

  • Anonymous
    February 03, 2011
    I have a similar problem in my Protocol Buffers (serialization framework) port. We have the concept of a "message" type (immutable) and a corresponding "builder" type (mutable). So we end up with a generic interface which is partly self-referential and partly mutually-referential to its corresponding one. Extracts of both:  public interface IMessage<TMessage, TBuilder> : IMessage<TMessage>      where TMessage : IMessage<TMessage, TBuilder>      where TBuilder : IBuilder<TMessage, TBuilder>  {      TBuilder ToBuilder();      // ... other stuff  }  public interface IBuilder<TMessage, TBuilder> : IBuilder      where TMessage : IMessage<TMessage, TBuilder>      where TBuilder : IBuilder<TMessage, TBuilder>  {      TMessage Build();      // ... other stuff  } I don't like it, but it generally makes the rest of the code cleaner. Fortunately most client code only cares about the concrete types (which are generated from schema files). I'm not sure whether I'd say that it's a pity that C# doesn't allow for a richer way of expressing generic type relationships, but there are certainly times where it introduces ugliness.

  • Anonymous
    February 03, 2011
    I'm hoping this is the start of a series of "hypothetical future version of C#" posts. :) Sometimes C# backs you into a corner, and this is the just the least terrible way to handle it.

  • Anonymous
    February 03, 2011
    The comment has been removed

  • Anonymous
    February 03, 2011
    >> This is a poor man's self type annotations that C# does not have, but, say, Scala and Fortress do. As Eric noted, a self-typed argument is inherently covariant, while LSP requires arguments to be contravariant. You actually need path-dependent types to sort this out, but they are tricky to get right themselves. Ironically, a type safety problem in Eiffel, which has non-path-dependent self-type ("other: like Current"), and allows their use in method arguments, is known as "catcall" in Eiffel parlance.

  • Anonymous
    February 03, 2011
    So why aren't there covariant return types on virtual methods? It seems like an obvious feature to have, which makes me think that it was considered, but hasn't met the feature bar for any release yet to date. Covariance on virtual method return types is a fairly frequently requested feature, and I'd use it if I had it. It's never made the bar because (1) the CLR does not support it; we'd have to either generate lots of helper code behind the scenes to make it work, or convince the CLR team to change (the C++/CLI team did the former) (2) in most cases you can generate the necessary helper functions yourself quite easily; just make a "new" method that has the right return type that delegates its implementation to the virtual method, and (3) Anders does not consider it to be a particularly important feature. - Eric

  • Anonymous
    February 03, 2011
    Heh, interesting scenario (Cat, Dog, MakeFriends). I suspect that the Best Tool(tm) for that job is typeclasses (e.g. as in Haskell). C# of course does not have type classes. I vaguely recall seeing an example somewhere where an author had simulated typeclasses in C#, it required manual work on the part of the programmer. In the end, for non-mission-critical code, I concur with your conclusion: don't enforce the constraint at compile time, but rather assert the type in the implementation of MakeFriends. If you need the certainty of compile time checking, consider a different programming language that is designed to provide it.

  • Anonymous
    February 03, 2011
    > You actually need path-dependent types to sort this out I think you're confusing path dependent types with F-bounded polymorphism (www.cs.cmu.edu/.../f-bounded.pdf, the proverbial Eiffel example is mentioned on the second page). And the thing is, C# almost has it. You can specify F[t] (e.g. IComparable<T>) and you can express condition t<:F[t] in where for functions (though a shorter syntax may be handy). The problems are

  1. you have to explicitly declare that t<:F[t] for your type t, but there's nothing stopping you from declaring t<:F[u]
  2. F bound is inherited, which also creates cases where t<:F[u] for t!=u So if there was a way to force t<:F[t] when implementing (which is what self type annotation does) and also, say, only allow implementing F[t] on sealed classes, we should be fine. @Eric, btw, in C++ any template that is supposed to receive derived type as one of the arguments is called CRTP. So IComparable<T> also qualifies. There's no equivalent to IComparable<T> where T:IComparable<T>, as C++ does not have a type system for templates.
  • Anonymous
    February 03, 2011
    HarHar! You can sprinkle in even more generics for a laugh...4 years ago this design happened unto me. I'd probably do it differently now, but it was a wonderful neuron twister back then...enjoy! http://realfiction.net/go/57

  • Anonymous
    February 03, 2011
    The comment has been removed

  • Anonymous
    February 03, 2011
    > That's where you need path-dependent types, for type system to be able to reason about whether the argument you're passing is of the same dynamic type as the receiver. Can you give a link on how path-dependent types help? I've only read about path-dependent types in the context of Scala, and Scala doesn't solve the problem. Also, the problem with inheriting F[t] is much more subtle than with implementing F[u] -- it doesn't break type safety, just allow using operations that discard new data in derived classes. But I'm not sure what else you can do in languages with subtyping.

  • Anonymous
    February 03, 2011
    The comment has been removed

  • Anonymous
    February 03, 2011
    Here's an example of trying to "enforce a particular constraint in a type hierarchy" : www.typesandotherdistractions.com/.../decorating-recursive-types.html but I wouldn't argue that at the loss of some type safety, casts would be a much more readable option. It's presented for amusement more than anything else...

  • Anonymous
    February 03, 2011
    The logic in these two paragraphs is faulty: "We have not guaranteed that a Cat can only make friends with a Cat; an EvilDog can make friends with a Cat too. The constraint only enforces that the type argument to Animal be good; how you use the resulting valid type is entirely up to you. You can use it for a base type of something else if you wish. "So that's one good reason to avoid this pattern: because it doesn't actually enforce the constraint you think it does. Everyone has to play along and agree that they'll use the curiously recurring pattern the way it was intended to be used, rather than the evil dog way that it can be used." As a matter of fact, a Cat can only "MakeFriends" with a Cat. Just because you can create an entirely new subclass and "MakeFriends" with a Cat does not change your original Cat class. The "flaw" here is no different than any other situation where one might create any arbitrary class and write a "MakeFriends" method.

  • Anonymous
    February 03, 2011
    The comment has been removed

  • Anonymous
    February 04, 2011
    WindJammer said (on Fri, Feb 4 2011 4:27 PM ): "As a matter of fact, a Cat can only "MakeFriends" with a Cat. Just because you can create an entirely new subclass and "MakeFriends" with a Cat does not change your original Cat class. The "flaw" here is no different than any other situation where one might create any arbitrary class and write a "MakeFriends" method." Actually, this is true; maybe the MakeFrends() was not the best example... While an Evil dog might make friends with a Cat, a Cat won't do that... And that's the problem with the example - it assumes that making friends operation has nothing to do with the friend-to-be object. Speaking of that, how about: abstract class Animal<T> where T : Animal<T> {    public virtual bool MakeFriends(T animal);    public virtual bool AcceptFriendship(Animal<T> animal); } class Cat : Animal<Cat> {    public override bool MakeFriends(Cat cat)    {        return cat.AcceptFriendship(this);    }    //err... Animal<Cat> would be animal that might try to make friends with a cat...    public sealed override bool AcceptFriendship(Animal<Cat> animal)    {        if (animal is Cat)            // ok...        else            // I don't wanna be friends with you...    } } Of course, this involves some extra work, and partly defeats the initial purpose of the CRTP here. Dumping generics altogether wouldn't make that much of a difference in this case. I guess it's better to reconsider the design before opting for such Frankenstein-constructs. Also, a method semantically equivalent to MakeFriends() is probably not the best candidate? Lucero said (on Thu, Feb 3 2011 5:53 PM): "I've been using it in several cases, but typically only for code which is not directly exposed to "end-users" (e.g. private classes, or abstract classes used through derived classes)." I think that's a good practice.

  • Anonymous
    February 04, 2011
    I've seen this pattern used differently - as a kind of factory.  So an abstract base class can create instances of the specific concrete classes that derive from it, usually in static methods which are invoked DerivedType.StaticMethod(). The huge problem with that is you can only have one layer of inheritance from that abstract base.

  • Anonymous
    February 04, 2011
    The comment has been removed

  • Anonymous
    March 04, 2011
    What about this usage of CRTP: interface IHierarchical<T> where T : IHierarchical<T> {    T Parent { get; set; } }

  • Anonymous
    July 26, 2011
    I just want to say, Eric, that when you say the CRTP "really hurts my brain", it makes me feel much better about myself. (Even if I suspect you're being modest.)