How I started to really understand generics
First off, a little explanation for the post title: not that I didn't understand generics before. I'd had a pretty good understanding of how they work, how they are implemented in the CLR, what facilities are provided by the compiler, etc.
It's just that recently I started seeing some patterns and analogies that I hadn't seen before or just didn't pay enough attention to. This is a little fuzzy feeling but I'll still try and describe these insights, hopefully the reader will understand what I mean. These insights have helped me to think more clearly and have a better understanding of how to use generics in a richer way.
For a start, let's consider a method called Process that processes command line arguments and modifies the argument list by removing the arguments that it could process:
public ArgumentList Process(ArgumentList source)
{
}
This design is not very good because it mutates the source variable, and I read somewhere that a method should either be immutable and return something or mutate stuff and return void, but not both at the same time. I think it was in Martin Fowler's Refactoring, but I'm not sure.
But that's not the thing I wanted to talk about today. This design has another serious flaw: it essentially truncates its return type to be ArgumentList, so by calling this method we basically "seal" the output type to be ArgumentList:
CustomArgumentList customList = new CustomArgumentList();
ArgumentList processed = Process(customList);
Even if we originally had a CustomArgumentList, once we pass it into our method, it "truncates" the type and all we have left on the output is the basic ArgumentList. We lose some type information and cannot enjoy the benefits of static typing and polymorphism at its fullest.
So what have generics to do with all this? Well, generics allow us to "unseal" the return type of our method and to preserve the full static type information available:
public T Process<T>(T source)
where T : ArgumentList
{
}
This way we can call Process on our CustomArgumentList and receive a CustomArgumentList back:
CustomArgumentList customList = new CustomArgumentList();
CustomArgumentList processed = Process(customList);
See, type information is not sealed anymore and can freely "flow" from the input type to the output type. Generics enabled us to use polymorphism and preserve full static typing where it wasn't possible before. It might be obvious for many of you, but it was a wow-effect discovery for me. Note how we used type inference to make a call to Process<T> look exactly like a call to Process.
Another observation here is that when you're talking about covariance and contravariance of generics, you can divide type usages into two groups: "in" and "out". Thus, we can have return type covariance and input parameter contravariance. Let's notice how "out" type usages essentially truncate the type information and limit ("seal") it to one base type. Generics are there to "unseal" this limitation and to allow for derived types to be returned without losing type information.
I hope this wasn't too fuzzy and handwavy. It's just these little discoveries make me very excited and I'd like to share the way I see these things.
Let's consider another insight: what's the difference between:
- class MyType<T> : IEquatable<T> and
- class MyType<T> where T: IEquatable<T>
Update: I'd like to explain in a little bit more detail, what's going on here. In the first line, we're declaring that the type MyType<T> implements the IEquatable<T> interface (complies with the IEquatable<T> contract). Simpler said, MyType<T> is IEquatable<T> (can be compared with objects of type T). The <T> part reads "of T", which means MyType has a method Equals that accepts a parameter of type T. Hence, the first line imposes the constraint on the type MyType<T> itself.
Now the second line imposes the constraint on the type parameter, not on the type itself. Now the type parameter that you specify must be a type that has an Equals method that accepts a parameter of type T (objects of type T can be compared with other objects of type T). I would imagine second line could be useful more often than the first line.
Also, another line is possible:
class MyType<T> : IEquatable<MyType<T>>
This line would mean that MyType can be compared with itself (and, independently of this fact, use some other type T).
Thinking about this difference made another aspect of generics clearer for me: with generics, there are always at least two different types involved: the actual type and the generic type parameters.
Armed with this wisdom, let's move to a third mental exercise (again, it might all be simple and obvious for some readers, in this case I apologize and am very honored to have such readers). Is the following thing valid?
class MyGenericType<T> : T
{
}
It turns out it's not valid - we cannot inherit from a generic type parameter (Update: in the original post I wrote that this would lead to multiple inheritance - that's not true). This illustrates my previous points - with generics, there are always at least two different types involved, and we shouldn't mix them arbitrarily.
Finally, to at least partially save this post from being a complete theoretical disaster, I'd like to share some code I wrote recently. Suppose you're implementing the Command design pattern and have various concrete Commands: CutCommand, CopyCommand, CloseAllDocuments command etc. Now the requirement is to support a unified exception throwing and catching functionality - every command should throw a generic exception InvalidOperationException<T>, where T is the type of the command. You might find this weird (who needs generic exceptions?) but in my real project at work we have some very clever exception classification logic, which requires that each exception should have a different type. Basically, each exception should know what type threw it. Based on this strongly typed exception source information, we do some magic to greatly simplify logging and classify exceptions into buckets based on their root cause. Anyway, here's the source code:
class Command
{
public virtual void Execute() { }
}
class InvalidOperationException<T> : InvalidOperationException
where T : Command
{
public InvalidOperationException(string message) : base(message) { }
// some specific information about
// the command type T that threw this exception
}
static class CommandExtensions
{
public static void ThrowInvalidOperationException<TCommand>(
this TCommand command, string message)
where TCommand : Command
{
throw new InvalidOperationException<TCommand>(message);
}
}
class CopyCommand : Command
{
public override void Execute()
{
// after something went wrong:
this.ThrowInvalidOperationException("Something went wrong");
}
}
class CutCommand : Command
{
public override void Execute()
{
// after something went wrong:
this.ThrowInvalidOperationException("Something else went wrong");
}
}
Note how two seemingly equal calls to ThrowInvalidOperationException will, in fact, throw different exceptions. The call in CopyCommand will throw an InvalidOperationException<CopyCommand>:
while the same call from CutCommand will throw an InvalidOperationException<CutCommand>:
Here (attention!) we infer the type of the command from the type of the "this" reference. If you try calling it without the "this." part, the compiler will fail to resolve the call:
So this is a situation where extension methods and generic type inference play nicely together to enable what I think is a very interesting trick. We preserve type information all the way from where an exception is thrown, "remember" the type that threw the exception, and still have full type information available when we catch and process it. A nice example of how generics help type information "flow" from one part of your code to other. This is also in line with my first example where we see how type information "flows" from input to output parameter of a method.
Note: for those of you who spent some time with Prolog, don't you have a déjà vu of how Prolog allows values to "flow" between predicates? In Prolog every parameter to a predicate (method) can be "in" or "out" - input or output. Prolog infers which parameter is which depending on its usage elsewhere and inside the predicate body. I don't know if this analogy is valid at all, but I feel that C# compiler in a certain way does its type inference in a similar manner.
Comments
Anonymous
August 19, 2008
That little bit about the typed exceptions is really cool. I already have ideas as to how to use this.Anonymous
August 19, 2008
Have you considered returning InvalidOperationException<TCommand> (rather than void) from the extension method? Then your callsites become: "throw this.NewInvalidOperationException(...);" The nice thing with that pattern is that the debugger callstack ends up in the right place (on break when thrown) rather than one method too deep.Anonymous
August 19, 2008
Jacob - that sounds like a good idea :)Anonymous
August 30, 2008
A nice post kirill... Never knew of such "Typed" Exceptions :-)Anonymous
September 08, 2008
In the "interesting usage of generics" thread I'll add this pattern: Note that this is a made up example and I know there's better ways to do what it does, but it's just an example. The pattern is useful whenever you have a generic factory class, not just XmlSerializer. public class MyBase<T> where T:MyBase<T> { private static XmlSerializer serializer = new XmlSerializer(typeof(T)); protected virtual AfterDeserialization() { } public static T CreateFromXml(Stream stream) { T newObj = (T) serializer.Deserialize(stream); newObj.AfterDeserialization(); return newObj; } } Basically the idea is that you can have a single generic base class implement a factory pattern for multiple derived classes. It also points out another interesting thing about generics. Static members on a generic class are PER T. Using a pattern like this you have to make sure you understand what that means. You CANNOT properly do MyDerivedA: MyBase<MyDerivedA> { } and MyDerivedB: MyDerivedA { } Since the call to MyDerivedB.Deserialize will use the same serializer as MyDerivedA, resulting in a typecast error.Anonymous
September 08, 2008
How does 'class MyGenericType<T> : T' lead to multiple inheritance?Anonymous
September 08, 2008
Hi Kirill, thank you for the article, but I fail to see what this way of using generics improves. Why not just add the ThrowInvalidOperationException method to the Command class, and instantiate the exception with this.GetType()? That's just as safe... Marking it virtual would even allow you to override the behavior in derived commands where the extension/static method approach doesn't... Additionally, I don't see the advantage of adding extension/(static) methods to classes that you have created yourself, or am I missing the point here? Sorry I don't mean to be negative but I think you've found a complex solution to a simple problem :)Anonymous
September 09, 2008
The advantage of using extension methods there is that the "this" is the most derived type when it is used. Funny enough you could use the pattern I outlined above as well. Where you have Command<T> where T:Command<T> Then you have CopyCommand: Command<CopyCommand> etc. and finally void ThrowInvalidOperationException<T>(message) I like the extension method pattern much better though since it elegantly supports more than one layer of derivations. With mine you'd end up having: Command<T> where T: Command<T> CopyCommandBase<T>: Command<T> where T:CopyCommandBase<T> CopyCommand: CopyCommandBase<CopyCommand> SpecialCopyCommand: CopyCommandBase<SpecialCopyCommand> Note you can't directly derive SpecialCopyCommand directly from CopyCommand. If you do then all the T's will reference CopyCommand, NOT the most derived as you wanted.Anonymous
September 09, 2008
uhh.... public abstract class Command<T> : ICommand<Object> where T : Command<T>, ICommand<Object> { public virtual void Execute() { throw new NotImplementedException(); } protected InvalidOperationExceptionTyped GetException(String message) { return new InvalidOperationExceptionTyped(this, message); } #region Nested type: InvalidOperationExceptionTyped public class InvalidOperationExceptionTyped : InvalidOperationException { private readonly Command<T> _typeUsed; public InvalidOperationExceptionTyped(Command<T> typeUsed, string message) : base("Type was " + typeUsed + " message was " + message) { _typeUsed = typeUsed; } public Command<T> CommandType { get { return _typeUsed; } } } #endregion object ICommand<object>.Instance { get { return Instance; } } public Command<T> Instance { get { return this; } } void ICommand<object>.Execute() { Execute(); } } public class CopyCommand : Command<CopyCommand> { public override void Execute() { throw GetException("Something went wrong"); } } public abstract class CutCommandBase : Command<CutCommandBase> { public abstract override void Execute(); } public abstract class CutCommandShapeBase : CutCommandBase { public abstract override void Execute(); } public class CutCommandShapeCircle : CutCommandShapeBase { public override void Execute() { throw GetException("THis is a little crazy I admint"); } } public class CutCommandPoint : CutCommandBase { public override void Execute() { throw GetException("Something went wrong"); } } public interface ICommand<T> { T Instance { get; } void Execute(); } P.S. This is a joke (grin).. damonAnonymous
September 11, 2008
This was an ace article! What this means for us is that our DAL which using interfaces fill objects with data from our datastore. We were using ArrayLists because it was impossible to declare a list of interfaces and then add objects that implement that interface to it. The method described above now allows be to get a List<T> passed then I can restrict to making sure that T implements a certain interface. Then create new T's and add them to the list! Thanks very much for this! It's going to save us loads of code and will probably be more efficient too. Win-Win not often you get that.Anonymous
September 11, 2008
So what is the difference between these two? class MyType<T> : IEquatable<T> and class MyType<T> where T: IEquatable<T>Anonymous
September 11, 2008
Alec, In the first definition, MyType<T> implements IEquatable<T>. In the second MyType<T> does not implement IEquatable at all; rather it requires that T implement IEquatable<T>, presumably so that it MyType<T> can call members of IEquatable<T> in its definition.Anonymous
September 14, 2008
The comment has been removedAnonymous
September 14, 2008
Here are some more unsorted thoughts on generics to continue this post (which has some interesting commentsAnonymous
September 17, 2008
Quote: Let's consider another insight: what's the difference between: class MyType<T> : IEquatable<T> and class MyType<T> where T: IEquatable<T> If you immediately grasped what's going on, then good. Because it takes me a while to understand what's going on. Thinking about this difference made another aspect of generics clearer for me: with generics, there are always at least two different types involved: the actual type and the generic type parameters. </Quote> The thing is I din't grasp what is going on, and you failed to explain it, you just ploughed on with the next example. Ay chance you could explain further so the rest of us can catch up ?Anonymous
September 18, 2008
Alec, Barry, I've updated the post with some more explanations regarding class MyType<T> : IEquatable<T> and class MyType<T> where T: IEquatable<T> Please don't take those examples as real-life - they are very contrived and only serve the purpose of explaining stuff about generics, type parameters and constraints. Feel free to ask more questions.