Linq to bits !

Following my previous post, let's see how to extend Linq to bits...

To have a good understanding of what I am working on in this post, it's highly recommended to read my previous post: Enumerating Enums. In fact, this is part II.

Using Linq natural extensions to enumerations we have all the benefits of methods like Intersect, Union or Contains for free. But let me remind the problem: (weekend & Day.Saturday) is one of the fastest instruction almost unchanged from the time we were writing with assembler.
To compute weekend.Intersect(otherdays), let's see how Linq is doing:

 static IEnumerable<T> IntersectIterator<T>(IEnumerable<T> first, IEnumerable<T> second) {
    Dictionary<T, object> dict = new Dictionary<T, object>();
    foreach (T element in first) dict[element] = null;
    foreach (T element in second) {
        if (dict.ContainsKey(element)) dict[element] = dict;
    }
    foreach (KeyValuePair<T, object> pair in dict) {
        if (pair.Value != null) yield return pair.Key;
    }
}

This code is generic and can be performed for any kind of enumeration. In our particular case, we know we can make it better. Linq is really cool and can be extended to change the way our code is interpreted.

To do such we will have to implement IQueryable<T> which is a subinterface of IEnumerable<T>. Just doing this, all sequences of our objects (weekend.Intersect(Day.Saturday)) are compiled to expressions instead of IL. Implementing IQueryable<S> IQueryable<T>.CreateQuery<S>(System.Expressions.Expression expression), we will have the opportunity to analyze the expression to do something else.

 public struct MyEnum<T> : IQueryable<T>, IEnumerable<T> 
{

    ...

    #region IQueryable<T> Members

    IQueryable<S> IQueryable<T>.CreateQuery<S>(System.Expressions.Expression expression)
    {
        return new BinaryQuery<S>(expression);
    }

    public System.Expressions.Expression Expression
    {
        get
        {
            return Expression.Constant(this);
        }
    }

    ...

    #endregion
}

A lot of other methods are needed to implement IQueryable<T> but in our case, you may think I am lazy (which is another question), we do not need them, so I will let them throw the NotImplemented exception.
You can notice that we created a new BinaryQuery class to analyze of the expression and implement IQueryable<S>. Doing such is really more simple because CreateQuery<S> needs to return IQueryable<S> again, so the user can choose to query the sequence again and again.

Now let's see the expression analysis. It's quite simple (especially compared to DLinq :) ). We have to recursively navigate the expression tree and replace method calls (Intersect, Union) by our binary operations. If the node is a constant, we just retrieve the value. If the node is a method call, depending on its name, we call the right binary operation.

 public static MyEnum<T> ParseExpression(Expression expression)
{
    if (expression.NodeType == ExpressionType.MethodCall)
    {
        MethodCallExpression method = expression as MethodCallExpression;
        if (method.Method.DeclaringType == typeof(System.Query.Queryable))
        {
            switch (method.Method.Name)
            {
                case "Intersect":
                    return ParseExpression(method.Parameters[0]) 
                        & ParseExpression(method.Parameters[1]);
                    break;
                case "Union":
                    return ParseExpression(method.Parameters[0]) 
                        | ParseExpression(method.Parameters[1]);
                    break;
                default:
                    throw new NotImplementedException("Not supported");
                    break;
            }
        }
        else
            throw new NotImplementedException();
    }
    else if (expression.NodeType == ExpressionType.Constant)
    {
        ConstantExpression exp = expression as ConstantExpression;
        if (exp.Value is T)
            return new MyEnum<T>((T) exp.Value);
        else if (exp.Value is MyEnum<T>)
            return (MyEnum<T>)exp.Value;
        else
            throw new NotImplementedException();
    }
    throw new NotImplementedException();
}

At the end of this process, we have to return a MyEnum<T> which is the result of our calculation.

This is almost done. Some extensions are breaking the sequence. For example, .Count<T>() returns an int or .Contains<T>() returns a bool. Their interpretation needs to implement another method: public S Execute<S>(Expression expression).
The binary operation to test if a set contains a value is a binary and.

 public S Execute<S>(System.Expressions.Expression expression)
{
    if (expression.NodeType == ExpressionType.MethodCall)
    {
        MethodCallExpression method = expression as MethodCallExpression;
        if (method.Method.DeclaringType == typeof(System.Query.Queryable))
        {
            switch (method.Method.Name)
            {
                case "Contains":
                    T t = (BinaryQuery<T>.ParseExpression(method.Parameters[0]) 
                        & BinaryQuery<T>.ParseExpression(method.Parameters[1]));
                    object result = Convert.ToInt32(t) > 0;
                    return (S) result;
                    break;
                default:
                    throw new NotImplementedException();
                    break;
            }
        }
        else
            throw new NotImplementedException();
    }
    else
        throw new NotImplementedException();
}

This time, it's done ! Writing "weekend.Intersect(Day.Saturday)" or "weekend.Contains(Day.Saturday)" does not call the default implementation any more but our implementation using binary operations.

I remind you that our enumeration supports naturally the Linq syntax:

 var q =
    from day in weekend.Intersect(set1)
    select day;

foreach (Day d in q)
    Console.WriteLine(d);

Conclusion

  • Enumerating values is efficient.
  • The nullable like structure MyEnum<T> costs a little due to operator overriding.
  • Extending Linq to support our operations is a nice exercise. Things are more efficient than using default extension methods but stays far away from the basic efficiency.
  • However, considering that Enums are not huge datas, you can choose to use this solution given the clarity of the code.
  • This remains a small sample and the whole possible extensions are not implemented, but the main technical issues are solved. 

The whole solution (including source code of my previous post) is attached in this post.

Mitsu

PS: My next post should talk about binding and data virtualization.

LinqToBits.zip

Comments

  • Anonymous
    October 08, 2006
    Mitsu Furuta, évangéliste chez Microsoft France, vient d'ouvrir un nouveau blog sur la plateforme MSDN:http://blogs.msdn.com/mitsu/Celui-ci...

  • Anonymous
    October 11, 2006
    Compiler and Language Eric Lippert Luke Hoban (PM) Mads Torgersen Peter Hallam (DEV) Rok Yu (DEV Lead)

  • Anonymous
    October 12, 2006
    What about supporting all underlying types of enums, ant not only Int32? enums can be of any integral types (except char), that is SByte, Byte, Int16, UInt16, Int32, UInt32, Int64, UInt64. PS: Salut Mitsu :-)

  • Anonymous
    October 12, 2006
    Hi Simon and thanks for your question, Using Int32 was a common choice that can be easily replaced. If you want to dynamically use the underlying type of the Enum, it's quite difficult. On the first hand, generics do not support constraints on Enums, so available functions are limited. On the second hand, binary operators and more generally lamda calcul are not supported on generic parameters. So values must be of a real numeric types. I only found Convert methods that support conversion from <T>. That's why I had to choose a "hard coded" numerical target type. Mitsu

  • Anonymous
    October 23, 2006
    Hi Mitsu, This is an interesting exercise to discover IQueryable, but it should be possible to implement Intersect, Union and Contains directly in the MyEnum class, I think. Am I missing something? Fabrice

  • Anonymous
    October 23, 2006
    Hi Fabrice, As I am only using the expression tree for method interception, I could have replace my code using regular methods in the MyEnum class, you are right. My first idea, but I had no time to do it was to analyse and optimize the expression tree with binary rules. (ex: !(a&!b) => !a|b) There is some other advantages implementing IQueryable but let's say it was just an interesting exercise :-). Mitsu

  • Anonymous
    December 08, 2007
    Zoals ik ben alleen met behulp van de expressie boom methode voor het aftappen, had ik vervang mijn code met reguliere methoden in de MyEnum klasse, je hebt gelijk.

  • Anonymous
    December 08, 2007
    Compsiler esn Taal Ersic Lisppert Luke Hoban (PM) Mads Torgersen Peter Hallam (DEV) Rok Yu (DEV Lead)

  • Anonymous
    December 08, 2007
    Op de tweede hand, binaire operatoren en meer in het algemeen lamda calcul worden niet ondersteund op genetische parameters. Dus waarden moeten worden van een echte numerieke typen. Ik alleen gevonden Wisselkoers methoden die steun conversie van <T>. Dat is de reden waarom ik moest kiezen voor een "harde gecodeerde" numerieke doelgroep.

  • Anonymous
    March 19, 2008
    I'm trying to get this working in VS2008 but a lot of things seem to have changed since this example was created. Aside the namespaces having to be replaced with System.Linq etc i am having trouble with IQueryable<S> IQueryable<T>.CreateQuery<S>(System.Expressions.Expression expression) IQueryable doesn't have a CreateQuery member and I'm not sure how to fix it. Can you provide a quick fix or is there any chance of updating this for the .NET 3.5?

  • Anonymous
    March 23, 2008
    Yes, the expression API changed in RTM version. I will update it when I get some time. You can read Matt Warren excellent series about Linq extensibility: http://blogs.msdn.com/mattwar/archive/2007/07/30/linq-building-an-iqueryable-provider-part-i.aspx