Jaa


XmlNameTable Revisited

After my post on the XmlNameTable: The Shiftstick of System.Xml I thought that I would follow it up with a second one to discuss it in greater detail as a result of the feedback comments. The XmlNameTable is an internal piece of implementation exposed on the System.Xml APIs which is why I drew parallels to it being the shift stick on a car. Most people drive automatic cars in the USA and having the ability to play with the performance of the car via a stick shift is unnecessary for most. In fact it is an inconvencience given that people typically eat, drink, do their hair and talk on the cell phone simultaneously.

Internals - Internally the XmlNameTable is vital to the XmlReader. As characters are read from the stream these are added to the XmlNameTable via the Add Method (Char[], Int32, Int32) method (note: no string creation here) and then a hashing algorithm is applied to the character array to create a hash value to store for later lookup when used. All element and attribute names, namespaces and prefixes are stored in the internal hash table. The hashing algrorithm was ok for v1.1 and in the V2 release we did much more research to make this significantly better to help get that 2x performance improvement. This process is called string atomized and is done by the XmlReader classes such as the XmlTextReader. All “strings” must be added to the XmlNameTable, you cannot choose to ignore any otherwise the XmlReader implementation breaks as object reference comparison is performed everywhere internally. The benefit of the XmlNameTable is not only object comparison, but also that it greatly reduces the number of strings that need to be allocated during parsing. It would be very slow if there were no XmlNameTable since a new string would be allocated for each name.

User created - You could choose to implement your own XmlNameTable since it is an abstract class (the default implementation is the NameTable class) however this is hard for a fast general purpose algorithm and acheivable for a specific instance. Using the CLR Hashtable class instead is not a good idea for example since you have to create strings in order to add these to the Hashtable which adds a significant performance overhead. However you could imagine a scenario where you optimize for the most repeated expected names in your document  i.e. have a faster lookup for them with a B-tree or similar.

Usability - Although the XmlNameTable is useful, you should not feel that you have to use this everywhere in your code. That is why most examples in the .NET documentation do not show it being used (although the ones that do show the XmlNameTable do not show the best approach - a V2 doc fix has been done here). One performance tenet is “don't optimize your code unless it gives measurable and needed benefits”, so if non-XmlNameTable parsing is good enough for your scenario then you're done. However, I would always recommend sharing the XmlNameTable across components noting that in V1.1 the NameTable implementation is not thread safe. There will be thread safe version in V2. Hence do not create separate threads with XmlTextReaders using the same NameTable and expect this to work in V1.1.

Here is a example of reading several XML documents from a directory using the same NameTable for each XmlTextReader and then independently using the same NameTable when loading a separate XML document.

static XmlNameTable nt = new NameTable();

static void GlobalNameTable()

{

      int invoicecount = 0, lineitemcount = 0;

      object book = nt.Add("book");

      object price = nt.Add("price");

      object invoice = nt.Add("Invoice");

      object lineitem = nt.Add("LineItem");

      object lineitems = nt.Add("LineItems");

      object description = nt.Add("Description");

      object cust = nt.Add("CustomerName");

      //Create the reader.

      string[] files = Directory.GetFiles("input");

      foreach (string file in files)

      {

            XmlTextReader reader = new XmlTextReader(file, nt);

            while (reader.Read())

            {

                  object localname = reader.LocalName;

                  if (invoice == localname)

                        invoicecount++;

                  if (lineitem == localname)

                        lineitemcount++;

            }

      }

      XmlDocument doc = new XmlDocument(nt);

      doc.Load("anotherfile.xml");

}

Comments

  • Anonymous
    April 29, 2004
    Could you expand a bit on how to correctly support XmlNameTable in custom implementations of XPathNavigator? What strings need to be put into the table in order for, for instance, the .NET XSLT transform engine to work correctly? What methods/properties of an XPathNavigator subclass need to return strings form the nametable, and which don't strictly need to?

    I assume that in particular the Value property doesn't need to, because that requirement would imply you need to put into memory the entire store that the custom XPathNavigator is exposing - one of the prime reasons to subclass XPathNavigator is exactly NOT having to do that.

    But what about other properties/methods? To what extent do existing "clients" of XPathNavigator (e.g. the XPath expression egine and the XSLT transformer) assume the strings are interned in an XmlNameTable?

    And, what is the preferred way of getting a nametable in my custom XPathNavigator? Should I share it between instances? Should I create a new nametable for each really new instance (but not for a Cloned instance). Should the nametable be passed from the client code as a constructor argument?

    And there are many methods/properties that are supposed to return String.Empty when they do no apply. Is that really String.Empty, or is that the equivalent of String.Empty as stored in the nametable?

    It is hard to find correct examples of XPathNavigator subclasses. There are several of them floating around on the net, and there have been some MSDN magazine articles about them, but I have yet to see a sample that handles namespaces correctly, for instance (there must always be at least one namespace defined, namely the one bound to 'xml:', but all samples I have seen always return String.Empty...); so I doubt all of these samples do handle XmlNametable correctly...
  • Anonymous
    May 03, 2004
    Mark,
    When, I said even Hashtable implements a better algorithm, I didn’t mean you can use it while implementing XmlNameTable, I was just saying it’s rehashing algorithm is superior to the very simple one implemented in NameTable.

    Can I ask for a method PeekNextElement on XmlReader/XmlTextReader
    The scenario I am thinking of is parsing Soap message, once I parse header I have to parse body which can contain soap:fault. Before returning Body as XmlReader to the user I need to make sure that the Body doesn’t contains fault, but once I read the element and it’s not a fault there is no way backtrack.

    Also, can I ask for a something like a member on XmlTextReader (or maybe even XmlReader) the returns the byte position of the last element read. This is somewhat similar to LineNumber/LinePosition except that those are only useful to display to a user. This is just a speed improvement over GetRemainder() method in the cases where I already have a stream in memory and don’t have to waste time/space on reading it again into StreamBuffer.

    Thank you very much.
  • Anonymous
    June 01, 2004
    hi,

    interesting!

    when you know that you are processing a lot of xml documents with the same schema, could you let XmlTextReader create the NameTable for you for the first document and then use this NameTable for subsequent XmlTextReaders??
  • Anonymous
    June 16, 2009
    PingBack from http://lowcostcarinsurances.info/story.php?id=3178