Efficient Property Promotion from XML Instances (Part 2)

Last week I covered the basics of property promotion i.e. automatically extracting data from an XML instance and storing this as relational data in order to improve query readability and/or performance. If you missed this previous post, you can access it at https://blogs.msdn.com/mrorke/archive/2005/05/24/421576.aspx.

In the earlier post, I made the assumption that there was always a single value that we wanted to extract per XML instance i.e. the promoted properties had a 1-to-1 relationship with the instance they were extracted from. Often though, XML is used to aggregate lots of different data values into a single place. In these cases, we may often find that we want to extract multiple values for a given property from a single XML instance. Take the XML instance described by the XSD schema shown below:

 create xml schema collection CompanyDataXSD as N'
<xs:schema xmlns:xs="https://www.w3.org/2001/XMLSchema" xmlns:company="https://www.example.com/mycompany"
    targetNamespace="https://www.example.com/mycompany" elementFormDefault="qualified">
  <xs:element name="company">
    <xs:complexType>
      <xs:choice maxOccurs="unbounded">
        <xs:element name="employee">
          <xs:complexType>
            <xs:all>
              <xs:element name="name" type="xs:string" />
            </xs:all>
            <xs:attribute name="id" type="xs:ID" />
          </xs:complexType>
        </xs:element>
      </xs:choice>
    </xs:complexType>
  </xs:element>
</xs:schema>'

This simple schema allows us to store multiple 'employee' elements (each with a name and an employee ID). We then create a table with an XML data-type instance that references this schema and insert some data:

 create table CompanyData( id int primary key identity, xml_data xml( CompanyDataXSD ))
 insert into CompanyData values( '
<company xmlns="https://www.example.com/mycompany">
  <employee id="EMP1"><name>Joe</name></employee>
  <employee id="EMP2"><name>Fred</name></employee>
  <employee id="EMP3"><name>Tom</name></employee>
</company>
')

Notice that this is the same data I used in the previous posting - only this time the individual employee elements are all contained inside a single XML instance. To re-produce the data from the first posting, we can run the following nodes() query to extract the various employee elements:

 with xmlnamespaces( default 'https://www.example.com/mycompany' )
  select ref.query('.')
from CompanyData cross apply xml_data.nodes('/company/employee') as T(ref)

The nodes() method allows us to crack a particular XML instance up into multiple rows. In the original posting, we wanted to extract the employee names and IDs out of the XML data and store these in standard relational columns. We can update the query above to do just this:

 with xmlnamespaces( default 'https://www.example.com/mycompany' )
select
  id as [Database ID], 
  ref.value( './@id', 'nvarchar(10)' ) as [Employee ID],
  ref.value( './name', 'nvarchar(100)' ) as [Name]
from CompanyData cross apply xml_data.nodes('/company/employee') as T(ref)

Executing this query produces the following result:

 Database ID  Employee ID  Name
1            EMP1         Joe
1            EMP2         Fred
1            EMP3         Tom

Which is exactly what we had in the original post, other than the fact that the 'Database ID' in this case is always the same - indicating that the data all came from the same instance.

Since the result of the queries above is a set of rows, this type of query can not be used to define computed columns. We can still create a view to encapsulate this data as follows:

 create view EmployeeDataProperties
with schemabinding
as
with xmlnamespaces( default 'https://www.example.com/mycompany' )
select
  id as [Database ID],
  ref.value( './@id', 'nvarchar(10)' ) as [Name],
  ref.value( './name', 'nvarchar(100)' ) as [Employee ID]
from dbo.CompanyData cross apply xml_data.nodes('/company/employee') as T(ref)

As explained in the first posting though, we are unable to index this view (and thus materialize it) since it contains XML data-type methods. This means that each access of the view causes the query above to be re-executed instead of just reading the values from an index. The work-around in the first posting was to wrap each call to an XML data-type method into a UDF and use these UDFs in the view definition. This will not work in this case, since we are returning multiple rows for each invocation of the query.

It is possible to wrap this functionality into a table valued function, as shown below:

 create function GetEmployeeDetails( @input xml( CompanyDataXSD ))
returns @employeeDetails table( 
  [Database ID] int,
  [Employee ID] nvarchar(10),
  [Name] nvarchar(100)
)
as
begin
  with xmlnamespaces( default 'https://www.example.com/mycompany' )
    insert @employeeDetails select
      id as [Database ID],
      ref.value( './@id', 'nvarchar(10)' ) as [Employee ID],
      ref.value( './name', 'nvarchar(100)' ) as [Name]
    from dbo.CompanyData cross apply xml_data.nodes('/company/employee') as T(ref)
  return
end

Unfortunately, we are still unable to create a view over this. The choice of whether to use a view directly, or create a table valued function to encapsulate this functionality depends entirely on your usage scenario and which syntax looks/works better for you.

-
Disclaimer:
This posting is provided “AS IS” with no waranties, and confers no rights.
Use of included script samples are subject to the terms specified at https://www.microsoft.com/info/cpyright.htm.

Comments