Indexes on xml Data Type Columns

Artigo
12/03/2008

XML instances are stored in xml type columns as large binary objects (BLOBs). These XML instances can be large, and the stored binary representation of xml data type instances can be up to 2 GB. Without an index, these binary large objects are shredded at run time to evaluate a query. This shredding can be time-consuming. For example, consider the following query:

WITH XMLNAMESPACES ('https://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ProductModelDescription' AS "PD")

SELECT CatalogDescription.query('
  /PD:ProductDescription/PD:Summary
') as Result
FROM Production.ProductModel
WHERE CatalogDescription.exist ('/PD:ProductDescription/@ProductModelID[.="19"]') = 1

To select the XML instances that satisfy the condition in the WHERE clause, the XML binary large object (BLOB) in each row of table Production.ProductModel is shredded at run time. Then, the expression (/PD:ProductDescription/@ProductModelID[.="19"]) in the exist() method is evaluated. This run-time shredding can be costly, depending on the size and number of instances stored in the column.

If querying XML binary large objects (BLOBs) is common in your application environment, it helps to index the xml type columns. However, there is a cost associated with maintaining the index during data modification.

XML indexes fall into the following categories:

Primary XML index
Secondary XML index

The first index on the xml type column must be the primary XML index. Using the primary XML index, the following types of secondary indexes are supported: PATH, VALUE, and PROPERTY. Depending on the type of queries, these secondary indexes might help improve query performance.

Primary XML Index

The primary XML index is a shredded and persisted representation of the XML BLOBs in the xml data type column. For each XML binary large object (BLOB) in the column, the index creates several rows of data. The number of rows in the index is approximately equal to the number of nodes in the XML binary large object.

Each row stores the following node information:

Tag name such as an element or attribute name.
Node value.
Node type such as an element node, attribute node, or text node.
Document order information, represented by an internal node identifier.
Path from each node to the root of the XML tree. This column is searched for path expressions in the query.
Primary key of the base table. The primary key of the base table is duplicated in the primary XML index for a back join with the base table, and the maximum number of columns in the primary key of the base table is limited to 15.

This node information is used to evaluate and construct XML results for a specified query. For optimization purposes, the tag name and the node type information are encoded as integer values, and the Path column uses the same encoding. Also, paths are stored in reverse order to allow matching paths when only the path suffix is known. For example:

//ContactRecord/PhoneNumber where only the last two steps are known

/Book/*/Title where the wildcard character (*) is specified in the middle of the expression.

The query processor uses the primary XML index for queries that involve xml data type methods and returns either scalar values or the XML subtrees from the primary index itself. (This index stores all the necessary information to reconstruct the XML instance.)

For example, the following query returns summary information stored in the CatalogDescriptionxml type column in the ProductModel table. The query returns <Summary> information only for product models whose catalog description also stores the <Features> description.

WITH XMLNAMESPACES ('https://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ProductModelDescription' AS "PD")

SELECT CatalogDescription.query('
  /PD:ProductDescription/PD:Summary
') as Result
FROM Production.ProductModel
WHERE CatalogDescription.exist ('/PD:ProductDescription/PD:Features') = 1

With regard to the primary XML index, instead of shredding each XML binary large object instance in the base table, the rows in the index that correspond to each XML binary large object are searched sequentially for the expression specified in the exist() method. If the path is found in the Path column in the index, the <Summary> element together with its subtrees is retrieved from the primary XML index and converted into an XML binary large object as the result of the query() method.

Note that the primary XML index is not used when retrieving a full XML instance. For example, the following query retrieves from the table the whole XML instance that describes the manufacturing instructions for a specific product model.

USE AdventureWorks;

SELECT Instructions
FROM Production.ProductModel 
WHERE ProductModelID=7;

Secondary XML Indexes

To enhance search performance, you can create secondary XML indexes. A primary XML index must first exist before you can create secondary indexes. These are the types:

PATH secondary XML index
VALUE secondary XML index
PROPERTY secondary XML index

PATH Secondary XML Index

If your queries generally specify path expressions on xml type columns, a PATH secondary index may be able to speed up the search. As described earlier in this topic, the primary index is helpful when you have queries that specify exist() method in the WHERE clause. If you add a PATH secondary index, you may also improve the search performance in such queries.

Although a primary XML index avoids having to shred the XML binary large objects at run time, it may not provide the best performance for queries based on path expressions. Because all rows in the primary XML index corresponding to an XML binary large object are searched sequentially for large XML instances, the sequential search may be slow. In this case, having a secondary index built on the path values and node values in the primary index can significantly speed up the index search. In the PATH secondary index, the path and node values are key columns that allow for more efficient seeks when searching for paths. The query optimizer may use the PATH index for expressions such as those shown in the following:

/root/Location which specify only a path

/root/Location/@LocationID[.="10"] where both the path and the node value are specified.

The following query shows where the PATH index is helpful:

WITH XMLNAMESPACES ('https://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ProductModelDescription' AS "PD")

SELECT CatalogDescription.query('
  /PD:ProductDescription/PD:Summary
') AS Result
FROM Production.ProductModel
WHERE CatalogDescription.exist ('/PD:ProductDescription/@ProductModelID[.="19"]') = 1

In the query, the path expression /PD:ProductDescription/@ProductModelID and value "19" in the exist() method correspond to the key fields of the PATH index. This allows for direct seek in the PATH index and provides better search performance than the sequential search for path values in the primary index.

VALUE Secondary XML Index

If queries are value based, for example, /Root/ProductDescription/@*[. = "Mountain Bike"] or //ProductDescription[@Name = "Mountain Bike"], and the path is not fully specified or it includes a wildcard, you might obtain faster results by building a secondary XML index that is built on node values in the primary XML index.

The key columns of the VALUE index are (node value and path) of the primary XML index. If your workload involves querying for values from XML instances without knowing the element or attribute names that contain the values, a VALUE index may be useful. For example, the following expression will benefit from having a VALUE index:

//author[LastName="someName"] where you know the value of the <LastName> element, but the <author> parent can occur anywhere.
/book[@* = "someValue"] where the query looks for the <book> element that has some attribute having the value "someValue".

The following query returns ContactID from the Contact table. The WHERE clause specifies a filter that looks for values in the AdditionalContactInfoxml type column. The contact IDs are returned only if the corresponding additional contact information XML binary large object includes a specific telephone number. Because the <telephoneNumber> element may appear anywhere in the XML, the path expression specifies the descendent-or-self axis.

WITH XMLNAMESPACES (
  'https://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ContactInfo' AS CI,
  'https://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ContactTypes' AS ACT)

SELECT ContactID 
FROM   Person.Contact
WHERE  AdditionalContactInfo.exist('//ACT:telephoneNumber/ACT:number[.="111-111-1111"]') = 1

In this situation, the search value for <number> is known, but it can appear anywhere in the XML instance as a child of the <telephoneNumber> element. This kind of query might benefit from an index lookup based on a specific value.

PROPERTY Secondary Index

Queries that retrieve one or more values from individual XML instances might benefit from a PROPERTY index. This scenario occurs when you retrieve object properties by using the value() method of the xml type and when the primary key value of the object is known.

The PROPERTY index is built on columns (PK, Path and node value) of the primary XML index where PK is the primary key of the base table.

For example, for product model 19, the following query retrieves the ProductModelID and ProductModelName attribute values using the value() method. Instead of using the primary XML index or the other secondary XML indexes, the PROPERTY index may provide faster execution.

WITH XMLNAMESPACES ('https://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ProductModelDescription' AS "PD")

SELECT CatalogDescription.value('(/PD:ProductDescription/@ProductModelID)[1]', 'int') as ModelID,
       CatalogDescription.value('(/PD:ProductDescription/@ProductModelName)[1]', 'varchar(30)') as ModelName        
FROM Production.ProductModel   
WHERE ProductModelID = 19

Except for the differences described later in this topic, creating an XML index on an xml type column is similar to creating an index on a non-xml type column. The following Transact-SQL DDL statements can be used to create and manage XML indexes:

Creating a Primary XML Index

To create a primary XML index, use the CREATE PRIMARY XML INDEX Transact-SQL DDL statement. Not all options available for non-XML indexes are supported on XML indexes.

Note the following when you are creating an XML index:

To create a primary XML index, the table that contains the XML column being indexed, called the base table, must have a clustered index on the primary key. This makes sure that if the base table is partitioned, the primary XML index can be partitioned by using the same partitioning scheme and partitioning function.
If an XML index exists, the clustered, primary key of the table cannot be modified. You will have to drop all XML indexes on the table before modifying the primary key.
A primary XML index can be created on a single xml type column. You cannot create any other type of index with the XML type column as a key column. However, you can include the xml L type column in a non-XML index. Each xml type column in a table can have its own primary XML index. However, only one primary XML index per xml type column is permitted.
XML indexes exist in the same namespace as non-XML indexes. Therefore, you cannot have an XML index and a non-XML index on the same table with the same name.
IGNORE_DUP_KEY and ONLINE options of are always set to OFF for XML indexes. You can specify these options with a value of OFF.
The filegroup or partitioning information of the user table is applied to the XML index. Users cannot specify these separately on an XML index.
The DROP_EXISTING index option can drop a primary XML index and create a new primary XML index, or drop a secondary XML index and create a new secondary XML index. However, this option cannot drop a secondary XML index to create a new primary XML index or vice versa.
Primary XML index names have the same restrictions as view names.

You cannot create an XML index on an xml type column in a view, on a table valued variable with xml type columns, or xml type variables.

To change an xml type column from untyped to typed XML, or vice versa, by using the ALTER TABLE ALTER COLUMN option, no XML index on the column should exist. If one does exist, it must be dropped before the column type change is tried.
The option ARITHABORT must be set to ON when an XML index is created. To query, insert, delete, or update values in the XML column using XML data type methods, the same option must be set on the connection. If it is not, the XML data type methods will fail.

Note

Information about an XML index can be found in catalog views. However, sp_helpindex is not supported. Examples provided later in this topic show how to query the catalog views to find XML index information.

Creating a Secondary XML Index

Use the CREATE XML INDEX Transact-SQL DDL statement to create secondary XML indexes and specify the type of the secondary XML index that you want.

Note the following when you are creating secondary XML indexes:

All indexing options that apply to a nonclustered index, except IGNORE_DUP_KEY and ONLINE, are permitted on secondary XML indexes. The two options must always be set to OFF for secondary XML indexes.
The secondary indexes are partitioned just like the primary XML index.
DROP_EXISTING can drop a secondary index on the user table and create another secondary index on the user table.

You can query the sys.xml_indexes catalog view to retrieve XML index information. Note that the secondary_type_desc column in the sys.xml_indexes catalog view provides the type of secondary index:

SELECT  * 
FROM    sys.xml_indexes

The values returned in the secondary_type_desc column can be NULL, PATH, VALUE, or PROPERTY. For the primary XML index, the value returned is NULL.

Modifying an XML Index

The ALTER INDEX Transact-SQL DDL statement can be used to modify existing XML and non-XML indexes. However, not all the ALTER INDEX options are available to XML indexes. The following options are not valid when modifying XML indexes:

The rebuild and set option IGNORE_DUP_KEY is not valid for XML indexes. The rebuild option ONLINE must be set to OFF for secondary XML indexes. The option DROP_EXISTING is not permitted in the ALTER INDEX statement. When rebuilding the index, connection options must be set as described in Setting Options (XML Index).
The modifications of the primary key constraint in the user table are not automatically propagated to XML indexes. The user must drop the XML indexes first and then re-create them.
If ALTER INDEX ALL is specified, it applies to both non-XML and XML indexes. Indexing options may be specified that are not valid for both types of indexes. In this case, the whole statement fails.

Dropping an XML Index

The DROP INDEX Transact-SQL statement can be used to drop existing primary or secondary XML and non-XML indexes. However, no options of DROP INDEX apply to XML indexes. If you drop the primary XML index, any secondary indexes that are present are also deleted.

The DROP syntax with TableName**.**IndexName is being phased out and is not supported for XML indexes.

Examples

The following examples show XML indexes being created, modified, and dropped.

A. Creating and dropping a primary XML index

In the following example, an XML index is created on an xml type column.

DROP TABLE T
GO
CREATE TABLE T (Col1 INT PRIMARY KEY, XmlCol XML)
GO
-- Create Primary XML index 
CREATE PRIMARY XML INDEX PIdx_T_XmlCol 
ON T(XmlCol)
GO
-- Verify the index creation. 
-- Note index type is 3 for xml indexes.
-- Note the type 3 is index on XML type.
SELECT *
FROM sys.xml_indexes
WHERE object_id = object_id('T')
AND name='PIdx_T_XmlCol' 
-- Drop the index.
DROP INDEX PIdx_T_XmlCol ON T

When a table is dropped, all the XML indexes on it are also automatically dropped. However, an XML column cannot be dropped from a table if an XML index exists on the column.

In the following example, an XML index is created on an xml type column. For more information, see Typed vs. Untyped XML.

CREATE TABLE TestTable(
 Col1 int primary key, 
 Col2 xml (Production.ProductDescriptionSchemaCollection)) 
GO

Now, you can create a primary XML index on Co12.

CREATE PRIMARY XML INDEX PIdx_TestTable_Col2 
ON TestTable(Col2)
GO

B. Creating secondary XML indexes

The following example illustrates how secondary XML indexes are created. The example also shows information about the XML indexes that you created.

CREATE TABLE T (Col1 INT PRIMARY KEY, XmlCol XML)
GO
-- Create primary index.
CREATE PRIMARY XML INDEX PIdx_T_XmlCol 
ON T(XmlCol)
GO
-- Create secondary indexes (PATH, VALUE, PROPERTY).
CREATE XML INDEX PIdx_T_XmlCol_PATH ON T(XmlCol)
USING XML INDEX PIdx_T_XmlCol
FOR PATH
GO
CREATE XML INDEX PIdx_T_XmlCol_VALUE ON T(XmlCol)
USING XML INDEX PIdx_T_XmlCol
FOR VALUE
GO
CREATE XML INDEX PIdx_T_XmlCol_PROPERTY ON T(XmlCol)
USING XML INDEX PIdx_T_XmlCol
FOR PROPERTY
GO

You can query the sys.xml_indexes to retrieve XML indexes information. The secondary_type_desc column provides the secondary index type.

SELECT  * 
FROM    sys.xml_indexes

You can also query the catalog view for index information.

SELECT *
FROM sys.xml_indexes
WHERE object_id = object_id('T')

You can add sample data and then review the XML index information.

INSERT INTO T VALUES (1,
'<doc id="123">
<sections>
<section num="2">
<heading>Background</heading>
</section>
<section num="3">
<heading>Sort</heading>
</section>
<section num="4">
<heading>Search</heading>
</section>
</sections>
</doc>')
GO
-- Check XML index information.
SELECT *
FROM   sys.dm_db_index_physical_stats (db_id(), object_id('T'), NULL, NULL, 'DETAILED')
GO
-- Space usage of primary XML index
DECLARE @index_id int
SELECT  @index_id = i.index_id
FROM    sys.xml_indexes i 
WHERE   i.name = 'PIdx_T_XmlCol' and object_name(i.object_id) = 'T'
 
SELECT *
FROM sys.dm_db_index_physical_stats (db_id(), object_id('T') , @index_id, DEFAULT, 'DETAILED')
go
--- Space usage of secondary XML index (for example PATH secondary index)  PIdx_T_XmlCol_PATH
DECLARE @index_id int
SELECT  @index_id = i.index_id 
FROM    sys.xml_indexes i 
WHERE  i.name = 'PIdx_T_XmlCol_PATH' and object_name(i.object_id) = 'T'
 
SELECT *
FROM sys.dm_db_index_physical_stats (db_id(), object_id('T') , @index_id, DEFAULT, 'DETAILED')
go
 
-- Space usage of all secondary XML indexes for a particular table
SELECT i.name, object_name(i.object_id), stats.* 
FROM   sys.dm_db_index_physical_stats (db_id(), object_id('T'), NULL, DEFAULT, 'DETAILED') stats
JOIN sys.xml_indexes i ON (stats.object_id = i.object_id and stats.index_id = i.index_id)
WHERE secondary_type is not null
-- Drop secondary indexes.
DROP INDEX PIdx_T_XmlCol_PATH ON T
GO
DROP INDEX PIdx_T_XmlCol_VALUE ON T
GO
DROP INDEX PIdx_T_XmlCol_PROPERTY ON T
GO
-- Drop primary index.
DROP INDEX PIdx_T_XmlCol ON T
-- Drop table T.
DROP TABLE T
Go

C. Modifying an XML index

In the following example, an XML index is created and then modified by setting the option ALLOW_ROW_LOCKS to OFF. When ALLOW_ROW_LOCKS is OFF, rows are not locked and access to the specified indexes is obtained by using page-and table-level locks.

CREATE TABLE T (Col1 INT PRIMARY KEY, XmlCol XML)
GO
-- Create primary XML index. 
CREATE PRIMARY XML INDEX PIdx_T_XmlCol 
ON T(XmlCol)
GO
-- Note the type 3 is index on XML type.
SELECT *
FROM sys.xml_indexes
WHERE object_id = object_id('T')
AND name='PIdx_T_XmlCol'

-- Modify and set an index option.
ALTER INDEX PIdx_T_XmlCol on T 
SET (ALLOW_ROW_LOCKS = OFF)

D. Disabling and enabling an XML index

By default, an XML index is enabled. If an XML index is disabled, the queries running against the XML column do not use the XML index. To enable an XML index, use ALTER INDEX with the REBUILD option.

CREATE TABLE T (Col1 INT PRIMARY KEY, XmlCol XML)
GO
CREATE PRIMARY XML INDEX PIdx_T_XmlCol ON T(XmlCol)
GO
ALTER INDEX PIdx_T_XmlCol on T DISABLE
Go
-- Verify index is disabled.
SELECT *
FROM sys.xml_indexes
WHERE object_id = object_id('T')
AND name='PIdx_T_XmlCol'
-- Rebuild the index.
ALTER INDEX PIdx_T_XmlCol on T REBUILD
Go

E. Creating an XML index by using the DROP_EXISTING index option

In the following example, an XML index is created on a column (XmlColx). Then, another XML index with the same name is created on a different column, (XmlColy). Because the DROP_EXISTING option is specified, the existing XML index on (XmlColx) is dropped and a new XML index on (XmlColy) is created.

DROP TABLE T
GO
CREATE TABLE T(Col1 int primary key, XmlColx xml, XmlColy xml)
GO
-- Create XML index on XmlColx.
CREATE PRIMARY XML INDEX PIdx_T_XmlCol 
ON T(XmlColx)
GO
-- Create same name XML index on XmlColy.
CREATE PRIMARY XML INDEX PIdx_T_XmlCol 
ON T(XmlColy) 
WITH (DROP_EXISTING = ON)
-- Verify the index is created on XmlColy.d.
SELECT sc.name 
FROM   sys.xml_indexes si inner join sys.index_columns sic 
ON     sic.object_id=si.object_id and sic.index_id=si.index_id
INNER  join sys.columns sc on sc.object_id=sic.object_id 
AND    sc.column_id=sic.column_id
WHERE  si.name='PIdx_T_XmlCol' 
AND    si.object_id=object_id('T')

This query returns the column name on which the specified XML index is created.

Compartilhar via