SQL Server 2008 XML Indexing
Here are some notes on "SQL Server 2008 XML Indexing" I took while attending an advanced class on SQL Server taught by Adam Machanic (https://sqlblog.com/blogs/adam_machanic/default.aspx).
Please note that, although these notes were taken during the class, I might have added some of my own (mis)interpretation :-). Always check your facts on Books Online (I try to provide links when applicable). As with anything else you get from a blog, never use any of this in production before you thoroughly validate it a test environment. Please keep in mind that some of those will be hard to follow without the right pre-requisite knowledge and the right context. Reading the post from top to bottom will help.
XML Indexing
- Engine needs to XML in order to process it
- Solution: XML Indexes – pre-parsed representation of XML data
- Include tages, values, path from the XML instance, turned into a hidden table
- You can actually see the table if connected via admin connection
- XML indexes almost always speed up XML queries, but they do come at a cost
- Use typed XML as much as you can. Optimizer will like it.
- See https://msdn.microsoft.com/en-us/library/ms191497.aspx
XML Indexing - Details
- Types: Primary (base) and Secondary (PATH, VALUE, PROPERTY – Non-clustered on the primary)
- Storage: Increases XML storage cost. Estimate as 3 times the storage requirement of XML instance in base table.
- To create the primary XML index, table must have a clustered primary key
- Stored in the same FG/partition as the base table
- Primary XML index includes column for ordpath, to keep track of hierarchy
- Ordpath also used in SQL Server 2008 for HierachyID data type
- Primary index has 11 columns, including id, nid (node), hid (hierarchy), value, lvalue, PK[n]
- See https://msdn.microsoft.com/en-us/library/bb500237.aspx
XML Indexing – Demo
- Create XML Schema, Create Table with XML column
- Gathering data for products, turning into some nice XML
- Put XML data into a table (no index)
- Query WHERE X.Exist (‘//Name[.=”Adjustable Race”’), look at the plan
- Create an index, run the query again, look at the plan again
- Look at sys.xml_indexes
XML Indexing – Secondary Indexes
- Path index – (‘/a/b[.=”abc”]) - Good for path queries (index on hid, value)
- Value index – (‘//b[.=”abc”]) - Optimized to find the value and we don’t know the path (index on value, hid)
- Property index – .value(‘/a[1]/b[1]’)='abc' - Optimized for name-value pair storage (index on PK[n], hid, value)
- See https://msdn.microsoft.com/en-us/library/bb522562.aspx
XML FullText Searching
- XPath and XQuery expressions – case insensitive
- FTS will index element values only – tags and attributes will be ignored
- FTS can do initial filter, then more exact XML methods
Shred vs. Store Decision
- Performance considerations – relational query is faster, shredding and re-combining is slow
- Data Structure considerations – XML is extremely flexible, but loosely structured
- Data lifecycle considerations – if you input and output XML, don’t shred
- Storing as XML data type – XML data type will check, strip, eliminate extra spaces.
- Storing as [N]VARCHAR(MAX) – If you care about white space or never do any XML operations
- There are cases where you need to store both
- Schema binding – Typed XML will be validated, schema will improve XML query performance
- Schema binding – Uses more space, no wildcard-based queries, schema migration can be difficult
- You can add relational constraints on top of those expressed as XSD
Where to put the XML
- Need a clustering key in the table
- Consider using a separate table if you can’t have a clustering key
- Consider using in-row XML column if smaller than 8,000 bytes
- You can “promote a property” to a relational column for better indexing and relational querying using UDF
Markup choice
- Element-centric - <a><b>val</b></a>
- - Element values considered to be content
- - Verbose, more human readable. Maybe yield simpler qiery plans
- Attribute-centric - <a b=”val”/>
- - Lower storage requirements
Queries
- Context node is a dot (.) - X.Exist(‘/a/b[@attr=”abc”]’)=1 not same as X.Exist(‘/a/b[.@attr=”abc”]’)=1
- Avoid wildcars – Node level - X.Exist(‘//b[@attr=”abc”]’)=1 – Attribute level - X.Exist(‘/a/b/@*[.=”abc”]’)=1
- Singletons – /a/b - /a[1]/b[1] – (/a/b)[1]
- Outer Data binding
- Merging Value Executions
- Favor Exist() over Value()
Shredding
- OPENXML – backward compatible, will be deprecated. Always use sp_xml_removedocument
- Nodes() – newer, current, faster (test both to be sure).
Comments
- Anonymous
January 01, 2003
Thank you for submitting this cool story - Trackback from DotNetShoutout - Anonymous
May 20, 2014
Pingback from XML Large DataBase Is The Best? | Click & Find Answer !