How to: Stream XML Fragments from an XmlReader
When you have to process large XML files, it might not be feasible to load the whole XML tree into memory. This topic shows how to stream fragments using an XmlReader.
One of the most effective ways to use an XmlReader to read XElement objects is to write your own custom axis method. An axis method typically returns a collection such as IEnumerable<T> of XElement, as shown in the example in this topic. In the custom axis method, after you create the XML fragment by calling the ReadFrom method, return the collection using yield return. This provides deferred execution semantics to your custom axis method.
When you create an XML tree from an XmlReader object, the XmlReader must be positioned on an element. The ReadFrom method does not return until it has read the close tag of the element.
If you want to create a partial tree, you can instantiate an XmlReader, position the reader on the node that you want to convert to an XElement tree, and then create the XElement object.
The topic How to: Stream XML Fragments with Access to Header Information contains information and an example on how to stream a more complex document.
The topic How to: Perform Streaming Transform of Large XML Documents contains an example of using LINQ to XML to transform extremely large XML documents while maintaining a small memory footprint.
Example
This example creates a custom axis method. You can query it by using a LINQ query. The custom axis method, StreamRootChildDoc, is a method that is designed specifically to read a document that has a repeating Child element.
Note
The following example uses the yield return construct of C#. Because there is no equivalent feature in Visual Basic 2008, this example is provided only in C#.
static IEnumerable<XElement> StreamRootChildDoc(StringReader stringReader)
{
using (XmlReader reader = XmlReader.Create(stringReader))
{
reader.MoveToContent();
// Parse the file and display each of the nodes.
while (reader.Read())
{
switch (reader.NodeType)
{
case XmlNodeType.Element:
if (reader.Name == "Child") {
XElement el = XElement.ReadFrom(reader) as XElement;
if (el != null)
yield return el;
}
break;
}
}
}
}
static void Main(string[] args)
{
string markup = @"<Root>
<Child Key=""01"">
<GrandChild>aaa</GrandChild>
</Child>
<Child Key=""02"">
<GrandChild>bbb</GrandChild>
</Child>
<Child Key=""03"">
<GrandChild>ccc</GrandChild>
</Child>
</Root>";
IEnumerable<string> grandChildData =
from el in StreamRootChildDoc(new StringReader(markup))
where (int)el.Attribute("Key") > 1
select (string)el.Element("GrandChild");
foreach (string str in grandChildData) {
Console.WriteLine(str);
}
}
This example produces the following output:
bbb
ccc
In this example, the source document is very small. However, even if there were millions of Child elements, this example would still have a small memory footprint.