Freigeben über


Splitting Up XML Text Nodes

Will the XML DOM I get from a message exactly match the XML DOM that was originally sent?

This isn't a real question I've seen, but it's often hidden inside of another question. For example, there are people that extract content from an XML document by building a DOM and taking the contents of the fourth child of the second child of the root node as a string. This is a really bad idea. There are all kinds of subtly different trees of nodes that can be built from even a simple XML document.

Here's one example. Take this program that reads an XML document and starts printing out its text nodes.

 string body =
@"<poem>
   'Twas brillig, and the slithy toves
   Did gyre and gimble in the wabe;
   All mimsy were the borogoves,
   And the mome raths outgrabe.
   'Beware the Jabberwock, my son!
   The jaws that bite, the claws that catch!
   Beware the Jubjub bird, and shun
   The frumious Bandersnatch!'
   He took his vorpal sword in hand:
   Long time the manxome foe he sought--
   So rested he by the Tumtum tree,
   And stood awhile in thought.
   And as in uffish thought he stood,
   The Jabberwock, with eyes of flame,
   Came whiffling through the tulgey wood,
   And burbled as it came!
   One, two! One, two! And through and through
   The vorpal blade went snicker-snack!
   He left it dead, and with its head
   He went galumphing back.
   'And hast thou slain the Jabberwock?
   Come to my arms, my beamish boy!
   O frabjous day! Callooh! Callay!'
   He chortled in his joy.
   'Twas brillig, and the slithy toves
   Did gyre and gimble in the wabe;
   All mimsy were the borogoves,
   And the mome raths outgrabe.
</poem>";

reader = XmlReader.Create(new StringReader(body));
while (reader.Read())
{
   if (reader.NodeType == XmlNodeType.Text)
   {
      Console.WriteLine("[TEXT]");
      Console.WriteLine(reader.Value);
   }
}
 [TEXT]

   'Twas brillig, and the slithy toves
   Did gyre and gimble in the wabe;
   All mimsy were the borogoves,
   And the mome raths outgrabe.
   'Beware the Jabberwock, my son!
   The jaws that bite, the claws that catch!
   Beware the Jubjub bird, and shun
   The frumious Bandersnatch!'
   He took his vorpal sword in hand:
   Long time the manxome foe he sought--
   So rested he by the Tumtum tree,
   And stood awhile in thought.
   And as in uffish thought he stood,
   The Jabberwock, with eyes of flame,
   Came whiffling through the tulgey wood,
   And burbled as it came!
   One, two! One, two! And through and through
   The vorpal blade went snicker-snack!
   He left it dead, and with its head
   He went galumphing back.
   'And hast thou slain the Jabberwock?
   Come to my arms, my beamish boy!
   O frabjous day! Callooh! Callay!'
   He chortled in his joy.
   'Twas brillig, and the slithy toves
   Did gyre and gimble in the wabe;
   All mimsy were the borogoves,
   And the mome raths outgrabe.

One block of text went in and one text node came out. Now, what happens when all we do is write that XML into a message body and then read it back out again? I've kind of spelled out what's going to happen as early as the title of this post already, but try to guess the exact output.

 reader = XmlReader.Create(new StringReader(body));
message = Message.CreateMessage(MessageVersion.Default, "", reader);
reader = message.GetReaderAtBodyContents();
while (reader.Read())
{
   if (reader.NodeType == XmlNodeType.Text)
   {
      Console.WriteLine("[TEXT]");
      Console.WriteLine(reader.Value);
   }
}
 [TEXT]

   'Twas brillig, and the slithy toves
   Did gyre and gimble in the wabe;
   All mimsy were the borogoves,
   And the mome raths outgrabe.
   'Beware the Jabberwock, my son!
   The jaws that bite, the claws that catch!
   Beware the Jubjub bird, and shun
[TEXT]

   The frumious Bandersnatch!'
   He took his vorpal sword in hand:
   Long time the manxome foe he sought--
   So rested he by the Tumtum tree,
   And stood awhile in thought.
   And as in uffish thought he stood,
   The Jabberwock, with eyes of flame,

[TEXT]
  Came whiffling through the tulgey wood,
   And burbled as it came!
   One, two! One, two! And through and through
   The vorpal blade went snicker-snack!
   He left it dead, and with its head
   He went galumphing back.
   'And hast thou slain the Jabber
[TEXT]
wock?
   Come to my arms, my beamish boy!
   O frabjous day! Callooh! Callay!'
   He chortled in his joy.
   'Twas brillig, and the slithy toves
   Did gyre and gimble in the wabe;
   All mimsy were the borogoves,
   And the mome raths outgrabe.

The nodes coming back from the reader are totally different even though the XML content is exactly the same once everything is pasted together. In this case, WCF chunks text nodes every 256 bytes to avoid making potentially unbounded memory allocations.

Next time: Building a Secure Composite Duplex

Comments

  • Anonymous
    November 15, 2006
    I've been told that I'm exceeding the maximum number of items that can be serialized or deserialized.

  • Anonymous
    February 19, 2007
    I haven't forgotten about the goal to put together a table of contents for all of these articles. The