XmlNameTable: The Shiftstick of System.Xml
I spent much of today in a customer lab on performance in .NET applications covering best practices for System.Xml. As always the majority of people used XML somewhere in their application and needed to understand the performance implications of using one technique over another. One approach that I covered, among many others, is the use of the XmlNameTable class. This insignificant, yet crucial class, surfaces itself on all the classes in System.Xml that do some form of processing (XmlTextReader, XPathNavigator and XmlDocument) and like the shift stick on it car (gear stick if you live in the UK), is it an implementation detail that allows you to play with the performance of your XML processing.
Here is an example of it in use. Take this portion of an example XML document called invoices.xml that liist a number of LineItems for a given named customer.
<Invoices xmlns="https://example.invoice/invoices">
<Invoice>
<CustomerName>Levi</CustomerName>
<LineItems>
<LineItem>
<ID>18148</ID>
<Price>564</Price>
<Description>A description</Description>
</LineItem>
</LineItems>
</Invoice>
...
</Invoices>
The following code uses the XmlTextReader with and without an XmlNameTable. The XmlNameTable enables object reference comparison rather than string value comparison and is useful in documents with many repeating known elements, attributes or namespaces which are automatically added to the XmlReaders XmlNameTable, a process called atomization. This allows you to then added your own names to the nametable and perform efficicent object comparisons rather than character by character string comparisons.
static void RunPerfNameTable()
{
Console.WriteLine("** XmlNameTable vs No XmlNameTable **");
// Warm up run
PerfNoNameTable("invoices.xml");
for (int i=0;i<5;i++)
{
PerfNoNameTable("invoices.xml");
PerfNameTable("invoices.xml");
}
Console.ReadLine();
}
static void PerfNoNameTable(string filename)
{
int start = 0, stop = 0, invoicecount = 0, lineitemcount = 0;
Console.WriteLine("Reading XML without NameTable comparison");
start = Environment.TickCount;
for (int i = 0; i < 80; i++)
{
//Create the reader.
XmlTextReader reader = new XmlTextReader(filename);
while (reader.Read())
{
if ("Invoice" == reader.LocalName)
{
invoicecount++;
}
if ("LineItem" == reader.LocalName)
{
lineitemcount++;
}
}
}
stop = Environment.TickCount;
Console.WriteLine("XmlTextReader document parsing time in ms WITHOUT NameTable: " + (stop - start).ToString());
}
static void PerfNameTable(string filename)
{
int start = 0, stop = 0, invoicecount = 0, lineitemcount = 0;
NameTable nt = new NameTable();
object invoice = nt.Add("Invoice");
object lineitem = nt.Add("LineItem");
Console.WriteLine("Reading XML WITH NameTable comparison");
start = Environment.TickCount;
for (int i = 0; i < 80; i++)
{
XmlTextReader reader = new XmlTextReader(filename, nt);
while (reader.Read())
{
// Cache the local name to the reader.LocalName property
object localname = reader.LocalName;
// comparison between object references. This just compares pointers
if (invoice == localname)
{
invoicecount++;
}
// comparison between object references. This just compares pointers
if (lineitem == localname)
{
lineitemcount++;
}
}
}
stop = Environment.TickCount;
Console.WriteLine("XmlTextReader document parsing time in ms WITH NameTable: " + (stop - start).ToString());
}
The crucial piece of code shown above is this line which performs two things
object localname = reader.LocalName;
1) This caches the call to the LocalName property which in the V1 implementation of the XmlReader prevents two virtual method calls one public and the other internal each time this property is accessed.
2) Allows the Localname to be compared as an object reference multiple times via reference comparison such as in this line of code
if (invoice == localname)
The end result is a performance increase for parsing a 230kb XML file on a machine with a PIII processor and 1Gb memory of around 6-9%. This is not enormous, but in scenarios where there is a high through-put of XML documents or the documents are large (>200kb) then using the XmlNameTable gives you enough of a performance benefit to make it worthwhile especially if your processing starts to spans multiple XML components in a piplelining scenario and the XmlNameTable is shared across them i.e. XmlTextReader->XmlDocument->XslTransform.
Comments
- Anonymous
April 27, 2004
Great post! Thanks! - Anonymous
April 27, 2004
Mark,
NameTable's implementation is not very efficient (even Hashtable implements a better algorithm) and there is no way to tell XmlReader(s) not to use any XmlNameTable at all.
Are any improvement planned in this area?
Also, is there a place we can express our wishes for the next version?
For example I would like to have ValueBytes property on reader in addition to Value.
Consider the case where a binary is encoded in CDATA section and the xml encoding is ascii it's a huge overkill to convert the whole thing to string.
Thanks.
Dmitriy - Anonymous
April 28, 2004
The comment has been removed - Anonymous
April 28, 2004
Dmitriy - Send your ideas and requests to me for what you would like to see in future versions or post them on this blog.
The XmlNameTable has be re-written in the V2 release or .NET and is significantly faster. So 'yes' to the improvements here.
With regards to reading data sequential (stream like) there is a ReadChars() method today on the XmlTextReader that allows you to read characters into an array, which works on Element text nodes. In the V2 release we have added typed read methods including one where you can specify the type that this should be read into. For example
ReadValueAs(typeof(Stream)) or
ReadValueAs(typeof(TextReader))
This enables you read the contents of an attribute or element into a stream and to read byte values as opposed to characters. You are still bound by the overall encoding of the file (i.e. you have to have your file in ASCII if you want to read ASCII). This enables you to read binary data that has been encoded as Base64 or BinHex into your text file encoding. What you cannot do is call these methods on CDATA sections node types. What was you scenario that you wanted this over using an element text node that has been escaped? - Anonymous
April 28, 2004
Jiho -
>>I can't use switch statement for the node comparison
Yes this is a limitation of switch. You have to use a conditional statement.
>>what is the implication for declaring localname variable as a string type rather than an object? XmlReader.LocalName returns a string type anyway right?
It will attempt to do a string comparison. You need to enforce object reference comparison by casting to object.
'Yes' I see that Oleg posted this before me. I never harms to have a best practice repeated! The additional aspect of this post is that caching the local name value from the reader also provides a perf improvement which in V1.1 is nearly as much as the XmlNameTable lookup.
Thanks. Mark - Anonymous
April 28, 2004
The comment has been removed - Anonymous
April 29, 2004
Mark,
What about providing concrete class derived from XPathExpression. So that XPathExpression(s) could be cached.
Currently the only portable way to cache XPathExpression(s) is to create a dummy class that derives from XPathNavigator create a dummy instance and call Compile on that. The resulting XPathExpression(s) can be now reused (call SetContext)but some of them leak for example XPathDocuments. So my solution was to clone them.
Anyway it would be nice to have a clean way of doing it.
Thanks again - Anonymous
April 29, 2004
Mark,
Thanks for clarifying the object reference question.
Also, I never meant to say that since your post appeared later than Oleg's that it is any less helpful and if it did appear that way I apologize. I merely meant it as an informational link so that you or other visitors may gain more exposure. It is actually rather helpful to see multiple postings on the same topic/issue since it confirms that the pattern discussed is indeed valid.
Thanks and keep up the good work! - Anonymous
April 29, 2004
Dmitriy - I raised a design request to look at an easier way to create XPathExpression, probably with a static method on the XPathExpression class rather than an instance.
Equally there is a very good chance that ReadBinHex/ReadBase64 will be added to the XmlReader since we have had this request from others. I am looking into this for .NET Beta 2 planning in the next few weeks.
I have raised another design request for the ReadBinHex and ReadBase64 methods to work on CDATA sections. This is a new request (no one has asked for this before) so I will have to see how this goes.
Thanks for the feedback and look forward to V2.