Jaa


I like LINQ...

  • Especially LINQ To XML. There are many exciting aspects of LINQ in general, and LINQ To XML in particular , but I especially like LINQ To XML as an option to (de)serialize between XML and CLR types.

    The established pattern for XML serialization in .Net is usually as follows:

    · Use tools like WSDl.exe or XSD.exe (if you are using the XML serializer) or svcutil.exe (if you are using the DataContractSerializer) to codegen a set of CLR types from the metadata(XML schema) defining the XML document structure.

    · Write some code to cons up an XMLSerializer/DataContractSerializer and call the suitable API to transfer CLR state defined in an instance of a CLR type (or an object tree) to XML data or vice versa

    This is a very effective, easy-to-use pattern and is suitable for a significantly large number of scenarios involving XML serialization, especially in case of web services where the serialization piece for in/outbound messages is a part of the plumbing, and all you do is provide the contract and its compatible type system. As such, I am not discounting the established norm at all, and would prescribe it wherever applicable.

    However, where I find LINQ To XML having an upper hand is especially in the degree of control afforded to you during the (de)serialization process. Passing in a tree of objects to the serializers and piping XML on the other end or vice versa seems a little like "black magic" - control is made available to you by direct modification (via attribution or specific interface implementations) of the CLR types themselves at authoring time. But what if you had no control on the CLR types ? What if they were designed independent of how they need to be serialized or cannot be modified due to other design restrictions ? Yeah - you would say "bad design" - but it happens, for many reasons - not all of them bad.

    Let's take an example. I have the following XML documents - a fictitious scheme for defining some movie metadata:

    Movies.xml

    <Movies>

    <Movie Name="SpiderMan" Genre="Action/Adventure" Rating="PG-13">

    <Cast>

    <Member>Toby McGuire</Member>

    <Member>Willem Defoe</Member>

    </Cast>

    </Movie>

    <Movie Name="BatMan" Genre="Action/Adventure" Rating="PG-13">

    <Cast>

    <Member>Christian Bale</Member>

    <Member>Liam Neeson</Member>

    </Cast>

    </Movie>

    <Movie Name="X-Men" Genre="Action/Adventure" Rating="PG-13">

    <Cast>

    <Member>Hugh Jackman</Member>

    <Member>Patrick Stewart</Member>

    </Cast>

    </Movie>

    </Movies>

    Actors.xml

    <Actors>

    <Actor FirstName="Toby" LastName="McGuire">

    <Gender>Male</Gender>

    <Age>30</Age>

    </Actor>

    <Actor FirstName="Liam" LastName="Neeson">

    <Gender>Male</Gender>

    <Age>50</Age>

    </Actor>

    <Actor FirstName="Hugh" LastName="Jackman">

    <Gender>Male</Gender>

    <Age>32</Age>

    </Actor>

    <Actor FirstName="Patrick" LastName="Stewart">

    <Gender>Male</Gender>

    <Age>60</Age>

    </Actor>

    <Actor FirstName="Christian" LastName="Bale">

    <Gender>Male</Gender>

    <Age>35</Age>

    </Actor>

    <Actor FirstName="Willem" LastName="Defoe">

    <Gender>Male</Gender>

    <Age>52</Age>

    </Actor>

    </Actors>

    I also have the following set of types:

    public class Movie

    {

    public string Name { get; set; }

    public string Genre { get; set; }

    public string Rating { get; set; }

    public IEnumerable<CastMember> Cast { get; set; }

    }

    public class CastMember

    {

    public string FirstName { get; set; }

    public string LastName { get; set; }

    public string Gender { get; set; }

    public int Age { get; set; }

    }

    You immediately see the challenges of serializing the XML into say a collection of Movie instances using traditional means. The <Member> element for each entry in the <Cast> of a <Movie> is simply defined as a concatenated result of the first name and the last name of a cast member, with the actual cast member details defined in a separate , unrelated XML document, where as the Movie type is closely related to the CastMember type, by defining it's Cast field as a collection of CastMember. If you used traditional XML Serialization, there is no straightforward way of associating these two separate XML data sources at once to create the combined CLR representation. But let's look at what this would entail using LINQ To XML.

    XElement docMovies = XElement.Load("Movies.xml");

    XElement docActors = XElement.Load("Actors.xml");

    IEnumerable<Movie> Movies = from movie in docMovies.Elements("Movie")

    //projecting over the docMovies collection and creating a Movie for each item

    select new Movie

    {

    Name = (string)movie.Attribute("Name"),

    Genre = (string)movie.Attribute("Genre"),

    Rating = (string)movie.Attribute("Rating"),

    //joining each Member in the Cast with the appropriate Actor in the Actors data source

    //on FirstName + LastName

    Cast = from member in movie.Element("Cast").Elements("Member")

    join actor in docActors.Elements("Actor")

    on (string)member equals (string)actor.Attribute("FirstName") + " " + (string)actor.Attribute("LastName")

    //projecting to create a new CastMember for each item resulting from the join

    select new CastMember

    {

    FirstName = (string)actor.Attribute("FirstName"),

    LastName = (string)actor.Attribute("LastName"),

    Age = (int)(actor.Element("Age")),

    Gender = (string)actor.Element("Gender")

    }

    };

    Looks simple does it not ? It is. Projecting over the collection of <Movie> elements in the document root and creating a new Movie instance for each is the fairly obvious part. The beauty lies in how LINQ allows us to use a join operation to join the Actors.xml data source on the concatenated first name and last name string , and pull in the matching cast member details as well to create the CLR type based composite.

    Let's look at a similar example in reverse. Let's say we have the following CLR types and the following code snippet that populates some instances:

    public class Movie

    {

    public string Name { get; set; }

    public string Genre { get; set; }

    public string Rating { get; set; }

    public IEnumerable<string> CastNames { get; set; }

    }

    public class CastMember

    {

    public string FirstName { get; set; }

    public string LastName { get; set; }

    public string Gender { get; set; }

    public int Age { get; set; }

    }

    List<Movie> movieList = new List<Movie>{

    new Movie{ Name="SpiderMan", Rating="PG-13", Genre="Action",

    CastNames=new List<string>{"Toby McGuire","Willem Defoe"}

    },

    new Movie{ Name="BatMan", Rating="PG-13", Genre="Action",

    CastNames=new List<string>{"Christian Bale","Liam Neeson"}

    },

    new Movie{ Name="X-Men", Rating="PG-13", Genre="Action",

    CastNames=new List<string>{"Hugh Jackman","Patrick Stewart"}

    }};

    List<CastMember> castList = new List<CastMember>{new CastMember{FirstName="Toby", LastName="McGuire", Gender="Male", Age=30},

    new CastMember{FirstName="Willem", LastName="Defoe", Gender="Male", Age=55},

    new CastMember{FirstName="Christian", LastName="Bale", Gender="Male", Age=35},

    new CastMember{FirstName="Liam", LastName="Neeson", Gender="Male", Age=55},

    new CastMember{FirstName="Hugh", LastName="Jackman", Gender="Male", Age=32},

    new CastMember{FirstName="Patrick", LastName="Stewart", Gender="Male", Age=60}};

    Now let's say we want to serialize the two collections - movieList and castList into the following XML:

    <Movies>

    <Movie Name="SpiderMan" Rating="PG-13" Genre="Action">

    <Cast>

    <Member FirstName="Toby" LastName="McGuire">

    <Gender>Male</Gender>

    <Age>30</Age>

    </Member>

    <Member FirstName="Willem" LastName="Defoe">

    <Gender>Male</Gender>

    <Age>55</Age>

    </Member>

    </Cast>

    </Movie>

    <Movie Name="BatMan" Rating="PG-13" Genre="Action">

    <Cast>

    <Member FirstName="Christian" LastName="Bale">

    <Gender>Male</Gender>

    <Age>35</Age>

    </Member>

    <Member FirstName="Liam" LastName="Neeson">

    <Gender>Male</Gender>

    <Age>55</Age>

    </Member>

    </Cast>

    </Movie>

    <Movie Name="X-Men" Rating="PG-13" Genre="Action">

    <Cast>

    <Member FirstName="Hugh" LastName="Jackman">

    <Gender>Male</Gender>

    <Age>32</Age>

    </Member>

    <Member FirstName="Patrick" LastName="Stewart">

    <Gender>Male</Gender>

    <Age>60</Age>

    </Member>

    </Cast>

    </Movie>

    </Movies>

    The challenge again is pretty obvious - the types Movie and CastMember have no direct physical association, with Movie.CastNames defined as a collection of strings - each string being a concatenation of the first name and last names of actors. Where as, in the serialized XML form, the <Member> element defines a cast member in its full glory - all the fields from the CastMember type are represented - two as attributes and two others as nested elements. Not that it is insurmountable using traditional serialization means, LINQ To XML makes it trivial.

    XElement moviesRoot =

    new XElement("Movies",from movie in movieList

    select new XElement("Movie", new XAttribute("Name", movie.Name),

    new XAttribute("Rating", movie.Rating),

    new XAttribute("Genre", movie.Genre),

    new XElement("Cast",from memberName in movie.CastNames

    join castMember in castList on memberName equals castMember.FirstName + " " + castMember.LastName

    select new XElement("Member", new XAttribute("FirstName", castMember.FirstName),

    new XAttribute("LastName", castMember.LastName),

    new XElement("Gender", castMember.Gender),

    new XElement("Age", castMember.Age)

    )

    )

    )

    );

    moviesRoot.Save("MoviesOut.xml");

    One of the XElement constructors defines the Content parameter as a variable parameter list, and we take full advantage of it. The code above constructs an XElement for the root element <Movies>, and then passes a projection on the movieList collection as the Content, constructing a new XElement for each <Movie> element within. Since Content can be specified as a variable parameter list, the necessary attributes for <Movie> are created, and the same concept is applied to defining the <Cast> element etc. The trick again is in the ability to join the two collections Movie.CastNames and castList on the concatenation of FirstName and LastName, that allows us to create the required composite serialized version from the two otherwise unassociated types.

    In summary, I would urge you to take a close look at LINQ To XML - it can be a pretty important tool in your arsenal, especially in those cases where serialization is accompanied with non-trivial transformation needs in either direction.

Comments