C#:XML Document SelectNodes() vs GetElementsByTagName()
This wiki was prompted by the following post on the MSDN Forums: http://social.msdn.microsoft.com/Forums/en-US/1ce92598-f9a1-4df4-903d-350ac7170e34/inconsistent-use-of-xmlnodelist-with-selectednodes-and-getelementsbytagname?forum=netfxbcl.
The question is what is the difference between the XML Document SelectNodes() and GetElementsByTagName()? The following two tests illustrate what the documentation is trying to convey. The test method removes the nodes using the SelectedNodes() method.
[TestMethod]
public void TestSelectNodesBehavior()
{
XmlDocument doc = new XmlDocument();
doc.LoadXml("<root><person><id>1</id><name>j</name></person><person><id>2</id><name>j</name></person><person><id>1</id><name>j</name></person><person><id>3</id><name>j</name></person><business></business></root>");
XmlNodeList nodeList = doc.SelectNodes("/root/person");
Assert.AreEqual(5, doc.FirstChild.ChildNodes.Count, "There should have been a total of 5 nodes: 4 person nodes and 1 business node");
Assert.AreEqual(4, nodeList.Count, "There should have been a total of 4 nodes");
foreach (XmlNode n in nodeList)
n.ParentNode.RemoveChild(n);
Assert.AreEqual(1, doc.FirstChild.ChildNodes.Count, "There should have been only 1 business node left in the document");
Assert.AreEqual(4, nodeList.Count, "There should have been a total of 4 nodes");
}
The next test illustrates the difference by performing the same function (removing the person nodes) but by using the GetElementByTagName() method to select the nodes. Though the same object type is returned it construction is different. The SelectNodes() is a collection of references back to the xml document. That means we can remove from the document in a foreach without affecting the list of references. This is shown by the count of the nodelist not being affected. The GetElementByTagName() is a collection that directly reflects the nodes in the document. That means as we remove the items in the parent, we actually affect the collection of nodes. This is why the nodelist can not be manipulated in a foreach but had to be changed to a while loop.
[TestMethod]
public void TestGetElementsByTagNameBehavior()
{
XmlDocument doc = new XmlDocument();
doc.LoadXml("<root><person><id>1</id><name>j</name></person><person><id>2</id><name>j</name></person><person><id>1</id><name>j</name></person><person><id>3</id><name>j</name></person><business></business></root>");
XmlNodeList nodeList = doc.GetElementsByTagName("person");
Assert.AreEqual(5, doc.FirstChild.ChildNodes.Count, "There should have been a total of 5 nodes: 4 person nodes and 1 business node");
Assert.AreEqual(4, nodeList.Count, "There should have been a total of 4 nodes");
while (nodeList.Count > 0)
nodeList[0].ParentNode.RemoveChild(nodeList[0]);
Assert.AreEqual(1, doc.FirstChild.ChildNodes.Count, "There should have been only 1 business node left in the document");
Assert.AreEqual(0, nodeList.Count, "All the nodes have been removed");
}