Compartilhar via


Quick Aside on XML - The Wrong Way to Use XML

XML is a great way to collect and author data in a way that can be consumed by any number of applications. There are some exceptionally powerful tools that can consume XML formats and enforce levels of data validation, but this is only useful if the implementation make sense. After working on some legacy tools and trying to design some new test systems, I’ve realized that there’s a right way to use XML and a horribly wrong way to use XML.

Wrong Way #1: Parsing XML With XElement

Don’t get me wrong, XELement/XPath searching is a great way to look through XML when you’re just trying to retrieve specific values from generic XML files.
For Example:
I want to find every instance within 10 types of XML files where server value is IIS6.0 and change it IIS8.0.

XElement and XPath can quickly search through the XML files regardless of their structure (provided that they are valid XML files) and change those values. But, the limitations of this solution are that your searching through the files generically and you’re treating XML as just a structured input type. In addition, it just doesn’t scale. Even with fancy regex/XPath queries you can only do so much.

Wrong Way #2: Manually Translating XML to Objects

Another quick fix solution I’ve seen implemented a number of times is using Wrong Way #1 to build up an object by hand. First you create the object that you’re trying to build, then you populate the values of the object by reading values and setting them. At this point, any way of extracting information from a file is just as good. The one slight advantage of XML is that the APIs available can help query values. 

This pattern’s translation layer can also lend itself to maintenance challenges because of written out step-by-step translation from XML document to object. It also makes it hard to enforce an object-XML document mapping when the translation occurs. The differences between the XML format or object changes are obscured by the translation layer:

image
Figure 1: Don’t Do This.

Wrong Way #3: Never Using a Schema

If you’ve taken the time to plan out your XML document layout and designing an object model for how you’re going to use all that sweet XML data, you’ve probably also created a schema. So, use it! Schemas that are loaded into editors, like Visual Studio, can make manual authoring of XML much faster. In addition, if you enforce your schema at document load time, you can ensure that you’re only loading documents that the system can handle.

XML Done Right

First, design the object model first. The important logic is going to work with objects not with XML files or random values pulled from the file. Make it strongly typed and serializable. The people consuming your data will appreciate you for it.

Second, let the XmlSerializer do the translation work for you. It won’t mess up with misspelled node names or incorrect value translations (most of the time). The benefit of this model is that when you want to save the object state all you have to do is deserialize the object out to a file. Done. An XML file is saved and available for later.

Third, with the object model in hand you can quickly create a schema. Again, with this schema you can enforce validation at load time and use the schema to get XML Intellisense when hand editing XML files.

Hopefully by planning your XML/Object model with these tips in hand, you can save yourself a lot of trouble in maintenance and helpful in feature design.

image
Figure 2: Do This

Comments

  • Anonymous
    April 11, 2012
    I have used all three wrong ways int he past. I know you are correct, using the XmlSerializer is the right way to go. My only bug bear is all the ugly attributes you have to add and also the ugly workarounds to get non-serializable objects like TimeSpan to serialize. Whats your take on this?

  • Anonymous
    April 11, 2012
    What's your take on using this technique to work with multi-gigabyte-sized XML files?  Frequently seen in ETL situations, it's been difficult to work with them since SAX was deprecated.

  • Anonymous
    April 11, 2012
    Chip, that's a good question.  To be honest, I'm not sure.  My troubles were coming from managing 100's of stand alone xml files that were >100kb each.  If I were to guess I'd work on trying to map the large XML file to a dataset or something like that, but following an object model design.