Transforming XML data into a rowset using the nodes() method
Usually there are a couple of scenarios in which you want to take XML data and convert it into a regular relational rowset. In SQL 2000, you probably used OpenXML for this, which works pretty well if you are only working on a single piece of XML data, but as soon as you want to convert a rowset of XML instances into a rowset of relation columns things begin to break down pretty quickly.
In SQL Server 2005, the solution is to use the XML datatype's nodes() method. The method works like a table valued function that accepts a single parameter, which is an XQuery statement which will result in a set of nodes. We then return references (basically a pointer) from the nodes() TVF to each node which is selected by the XQuery statement. You can then apply the XML datatype "scalar" functions such as exist(), value(), and query() to these references.
By way of an example, lets unpack some of this. Lets say I have a piece of XML which represents some kind of contact list. It has a bunch of people, their various phone numbers, and some metadata about the people (in this case, their age).
What I want to do is query out the names and ages of each person. I will use the nodes() method to get a reference to each <Person /> element, and then use value() method on each of these to pull out the properties that I am interested in.
create table contacts (pk int primary key identity(1,1), contactXml xml)
insert into contacts values ('
<People>
<Person age="35">
<Name>Pete</Name>
<Phone>
<Mobile>555-555-1234</Mobile>
<Home>555-555-0001</Home>
</Phone>
</Person>
<Person age="40">
<Name>Paul</Name>
<Phone>
<Mobile>555-555-4567</Mobile>
</Phone>
</Person>
<Person age="24">
<Name>Susan</Name>
<Phone>
<Home>555-555-2323</Home>
</Phone>
</Person>
</People>
')
select
pk,
person.ref.value('(Name/text())[1]', 'nvarchar(30)') as [name],
person.ref.value(<'@age'>, 'smallint') as [age]
from contacts c
cross apply c.contactXml.nodes('/People/Person') person(ref)
The cool thing about nodes() method is that it is also composable. I can continue to drill into the data to get the phone numbers for each contact as well by applying the nodes() method again to the references which are returned by the first nodes() invocation.
select
pk,
person.ref.value('(Name/text())[1]', 'nvarchar(30)') as [name],
person.ref.value(<'@age'>, 'smallint') as [age],
phoneNumbers.ref.value('local-name(.)', 'nvarchar(30)') as [numberType],
phoneNumbers.ref.value('text()[1]', 'nvarchar(30)') as [number]
from contacts c
cross apply c.contactXml.nodes('/People/Person') person(ref)
cross apply person.ref.nodes('Phone/*') phoneNumbers(ref)
The result of this query is the following rowset:
pk name age numberType number
----------- ------------------------------ ------ ------------------------------ ------------------------------
1 Pete 35 Mobile 555-555-1234
1 Pete 35 Home 555-555-0001
1 Paul 40 Mobile 555-555-4567
1 Susan 24 Home 555-555-2323
The nodes() method also allows us to filter, and since we are doing a cross apply, this filter will reduce the total number of rows that we see output (as opposed to outer apply, which is more like an outer join). So if we only want to see people that have Mobile numbers, then we can alter the XQuery statement in the second nodes() invocation from 'Phone/*' to 'Phone/Mobile'.
In general, nodes() method is a great way to manipulate XML data both when it comes into the server for persisting as a relation rowset, or for converting your already stored XML data into a rowset for returning to the client. Using XML Indexes are also a great way to increase nodes() performance since the reference that is returned via the nodes() method is part of the primary key (along with the primary key of the base table) of our clustered XML Index. As a consequence of this, if you are using nodes() heavily in your application, you definitely want to look into utilizing an XML Index to increase query performance.
Comments
- Anonymous
December 15, 2005
What if your example:
1) Had 100,000 person elements
and
2) The XML was coming into in as a @parameter, in other words, it's being passed into the database for shredding and bulk insertion into a Person table.
Would you still use a nodes('/People/Person) approach? - Anonymous
December 16, 2005
Well, I guess it would depend on my performance expectations. In this particular case, I threw together a quick test without really optimizing anything - default database settings, fully logged, in other words nothing really tweaked out for performance. I got what I would consider to be pretty acceptable times.
What other approach were you thinking about? - Anonymous
December 16, 2005
Well, I was working on a small project to learn some of the new 2005 features.
I built 'yet another' app to import IIS logs to a database. I have logs in an XML format.
Performance wasn't a top priority, but I was glad to see the XML approach import 300-400 records per second on very modest hardware. I could import 60,000 log entries in under 5 minutes, which is great for a batch task running in the middle of the night.
Problem is, I've found the application useful for day to day use, but now the database has 2 million records and the import takes 8 hours!
I've done maintenance (reindex / defrag / etc), but can't get back to a reasonable speed.
I'm just wondering if I'm using the feature in a way it was not intended. Plenty of other options exist to transform XML into records, but I'm just curious what I've done wrong here so I can learn from the experience. - Anonymous
December 17, 2005
What I've done is redo the import with a SQLCLR sproc. The sproc uses an XmlDocument and SelectSingleNodes to tear apart the XML and issue an insert for each element. Import time for 60,000 records is now under 2 minutes, while the nodes() approach was around 10 hours :/ - Anonymous
December 21, 2005
Scott, I think it would be beneficial to move this over to the MSDN forums for SQL Server XML functionality. The url is http://forums.microsoft.com/MSDN/ShowForum.aspx?ForumID=89&SiteID=1. If you could provide some code samples there and describe what you are seeing it will have more visibility to the SQL engine team as a whole. Thanks!