Compartir a través de


Adventures in F#--Discriminated Unions

Jomo Fisher-- Easily my favorite feature of F# so far is the combination of discriminated union and pattern matching. Together, these allow you to concisely represent a complex language-like data structure and operations over that structure. We used this pattern extensively in LINQ to SQL (though in C#) and it can be seen in some form or another in most compilers.

As in my past F# bloglets--Lay of the Land, Probing Type Inference, Function Type Inference, Poking Tuples with a Stick--I'm going to compile some simple F# and then use Lutz Roeder's Reflector to disassemble into C#. For this article I'm going to look at discriminated unions and save pattern matching for later.

Here's a simple discriminated union:

type MyUnion =
Int of int
| String of string
| TwoStrings of string * string

This represents a type which may hold an integer, a string or a pair of strings. Here's the C#:

[Serializable, CompilationMapping(SourceLevelConstruct.SumType)]

public class MyUnion : IStructuralHash, IComparable

{

    public const int tag_Int = 0;

    public const int tag_String = 1;

    public const int tag_TwoStrings = 2;

    public MyUnion();

    public override int CompareTo(object that);

    public sealed override int CompareTo(Test.MyUnion that);

    public override bool Equals(object that);

    public override int GetHashCode();

    public sealed override int GetStructuralHashCode(ref int nRemainingNodes);

    [CompilationMapping(SourceLevelConstruct.Alternative, 0)]

    public static Test.MyUnion Int(int Int1);

    public bool IsInt();

    public bool IsString();

    public bool IsTwoStrings();

    [CompilationMapping(SourceLevelConstruct.Alternative, 1)]

    public static Test.MyUnion String(string String1);

    [CompilationMapping(SourceLevelConstruct.Alternative, 2)]

    public static Test.MyUnion TwoStrings(string TwoStrings1, string TwoStrings2);

    [CompilationMapping(SourceLevelConstruct.Field, 0, 0)]

    public int Int1 { get; }

    [CompilationMapping(SourceLevelConstruct.Field, 1, 0)]

    public string String1 { get; }

    public int Tag { get; }

    [CompilationMapping(SourceLevelConstruct.Field, 2, 0)]

    public string TwoStrings1 { get; }

    [CompilationMapping(SourceLevelConstruct.Field, 2, 1)]

    public string TwoStrings2 { get; }

    [Serializable]

    public class _Int : Test.MyUnion

    {

        public int _Int1;

        public _Int(int _Int1);

    }

    [Serializable]

    public class _String : Test.MyUnion

    {

        public string _String1;

        public _String(string _String1);

    }

    [Serializable]

    public class _TwoStrings : Test.MyUnion

    {

        public string _TwoStrings1;

        public string _TwoStrings2;

        public _TwoStrings(string _TwoStrings1, string _TwoStrings2);

    }

}

That's a lot of code, let's walk through it.

[Serializable, CompilationMapping(SourceLevelConstruct.SumType)]

public class MyUnion : IStructuralHash, IComparable

{

It looks like discriminated unions are marked Serializable like Tuples. That SourceLevelConstruct.Sum surely is marking this class as a discriminated union so that the F# compiler can recognize it when reference through an assembly.

That IStructuralHash interface has appeared a few times now. I'm starting to think its important. When I click on it in reflector there' a nice comment waiting for me along with the equivalent C#:

"F# structural types such as tuples, records and discriminated unions support a form of cooperative structural hashing, via implementations of interface IStructuralHash. Implmentations of this interface are added to concrete types (records, discriminated unions and classes) automatically, though types can also define their own implementations of this interface, thus altering the hashing semantics of the type. The byref argument points to a count of the number of significant nodes remaining to be hashed in the cooperative hash. Substructures and leaf nodes (such as integers) should be hashed by calling Microsoft.FSharp.Core.LanguagePrimitives.StructuralHashParam, but only if the hash count is non-zero. If the hash count is zero StructuralHashParam must not be called. Structural comparison is supported via a similar scheme, using implementations of the System.IComparable interface."

[Serializable, CompilationMapping(SourceLevelConstruct.ObjectType)]

public interface IStructuralHash

{

    // Methods

    int GetStructuralHashCode(ref int);

}

It's an interface that represents participation in a cooperative hashing process. I need to think about this, it’s not clear to me why this is necessary or what the point of the byref int.

    public const int tag_Int = 0;

    public const int tag_String = 1;

    public const int tag_TwoStrings = 2;

 

These look like discriminator values that indicate which type of thing the current MyUnion is holding. These are returned by the Tag property which is further down.

    public MyUnion();

    public override int CompareTo(object that);

    public sealed override int CompareTo(Test.MyUnion that);

    public override bool Equals(object that);

    public override int GetHashCode();

    public sealed override int GetStructuralHashCode(ref int nRemainingNodes);

 

This is just the constructor, and implementation of IStructuralHash and IComparable. The CompareTo method is implemented so that all Ints will come first, then all Strings, then all TwoStrings (this is the order they occur in the orignal F# source code).

     public bool IsInt();    public bool IsString();    public bool IsTwoStrings();

These are predicate-style methods to determine which sort of thing is held. These methods are implemented with the equivalent of C#’ ‘is’ operator.

    [CompilationMapping(SourceLevelConstruct.Alternative, 0)]

    public static Test.MyUnion Int(int Int1);

    [CompilationMapping(SourceLevelConstruct.Alternative, 1)]

    public static Test.MyUnion String(string String1);

    [CompilationMapping(SourceLevelConstruct.Alternative, 2)]

    public static Test.MyUnion TwoStrings(string TwoStrings1, string TwoStrings2);

These are factory methods f constructing each of the union types. That attribute must be indicating to the compiler that these are the special factory methods for this union.

    [CompilationMapping(SourceLevelConstruct.Field, 0, 0)]

    public int Int1 { get; }

    [CompilationMapping(SourceLevelConstruct.Field, 1, 0)]

    public string String1 { get; }

    [CompilationMapping(SourceLevelConstruct.Field, 2, 0)]

    public string TwoStrings1 { get; }

    [CompilationMapping(SourceLevelConstruct.Field, 2, 1)]

    public string TwoStrings2 { get; }

These properties access the individual fields in the union.

    public int Tag { get; }

 

This returns tag_Int, tag_String, tag_TwoStrings depending on what the underlying type is.

 
     [Serializable]    public class _Int : Test.MyUnion    {        public int _Int1;        public _Int(int _Int1);    }     [Serializable]    public class _String : Test.MyUnion    {        public string _String1;        public _String(string _String1);    }     [Serializable]    public class _TwoStrings : Test.MyUnion    {        public string _TwoStrings1;        public string _TwoStrings2;        public _TwoStrings(string _TwoStrings1, string _TwoStrings2);    } 
 These are nested classes representing the data in each of the individual union types. It’s interesting to me that they are public and not sealed.

This posting is provided "AS IS" with no warranties, and confers no rights.

Comments

  • Anonymous
    September 17, 2007
    I'm really enjoying this series, I hope you keep it up.

  • Anonymous
    September 17, 2007
    I agree very much with this article. Union types/pattern matching are one of my favourite features of F#, certainly the thing I miss most when I have to write C#. Chapter 13 in Founcations of F# provides a similar discussion but more focused on how to use union types generated by F# from C#.

  • Anonymous
    September 19, 2007
    The F#.NET Journal articles describe sum types and pattern matching in detail: http://www.ffconsultancy.com/products/fsharp_journal/ The first article is free and introduces these concepts. Also, you might like the "Benefits of OCaml" series, as this is largely applicable to F# as well: http://www.ffconsultancy.com/ocaml/benefits/ A great way to tinker with F# definitions is to put them into an interactive session and look at the response (that gives the inferred type). For example: > let f (a, b) = (b, a);; val f : 'a * 'b -> 'b * 'a You can tell a lot about a definition from its inferred type!

  • Anonymous
    October 09, 2007
    This exploration of F# is very insightful.  Seeing the C# equivalent finally makes unions click for me.  Thanks!

  • Anonymous
    January 09, 2008
    I was writing some F# code this week and ran into problem. Consider the following code: type Thingey = This | That | SomethingElse Which looks like an enum. So I assumed that, like things inheriting from System.Enum, an instance of the type had a ToString

  • Anonymous
    February 09, 2008
    Cool, its alot of work done, behind the scenes...when u see the c# code..

  • Anonymous
    May 26, 2009
    第一篇,从零开始编写我们的第一个F#程序。 什么是F#,我为何要学它? F#是一种.NET平台上的 函数式编程 语言。就像C#和VB.NET,F#可以利用.NET的核心类库,如 WPF , WCF ,