Share via


XmlDocument.PreserveWhitespace and memory usage

The PreserveWhitespace property is sometimes misunderstood. As this MSDN page describes, this affects non-significant whitespace nodes. Significant whitespace is always preserved.

So what is significant whitespace? The XmlSignificantWhitespace explains: it's the whitespace under the scope of an xml:space='preserve' , or the whitespace between elements in a mixed content node (with both elements and text).

For many documents, non-significant whitespace is, well, not significant. It's just there so elements align and are more readable to humans, but processors aren't expected to do anything with it.

So if your program doesn't need it, there's no need for you to pay the price of creating objects to represent them and keep them in your document tree. If you do "typical" indentation, where every element is on a different line and you indent as elements get deeper, then you may actually save yourself some memory by getting rid of the whitespace. Thankfully, PreserveWhitespace defaults to false for XmlDocument, but I've seen cases where folks turn it to true because they aren't quite sure of what it does and it feels safer to not throw something away.

Here's a little snippet that shows the impact (your mileage may vary of course, depending on CPU architecture, version, compiler settings, etc).

bool

[] preserves = new bool[] { true, false };
foreach (bool preserve in preserves)
{
  GC.Collect();

  // Build a document.
  StringBuilder s = new StringBuilder();
s.AppendLine("<root>");
  for (int i = 0; i < 1000; i++)
{
s.AppendLine(" <element with='attributes' />");
}
s.AppendLine("</root>");

string text = s.ToString();
XmlDocument doc = new XmlDocument();
doc.PreserveWhitespace = preserve;
doc.LoadXml(text);

  Console.WriteLine(
    "PreserveWhitespace: " + preserve);
  Console.WriteLine(
    "Memory used before collection: {0}",
    GC.GetTotalMemory(false));
  GC.Collect();
  Console.WriteLine(
    "Memory used after full collection: {0}",
    GC.GetTotalMemory(true));

  Console.WriteLine();
  GC.KeepAlive(doc);
}

And this is the output I get on my machine.

PreserveWhitespace: True
Memory used before collection:     456616
Memory used after full collection: 280208

PreserveWhitespace: False
Memory used before collection:     405480
Memory used after full collection: 232072

So (280208 - 232072) / 280208 ~= 17%, which isn't too bad for a line of code you don't have to write (because it's the default after all).

Enjoy!