INF: How to retrieve data from a large XML string
We encountered an interesting issue some time back. We already had XML data inserted into a SQL Server 2008 R2 database however we were getting below error when retrieving (also called shredding) values from the XML data.
Msg 8621, Level 17, State 1, Line 1
The query processor ran out of stack space during query optimization. Please simplify the query.
Based on error, one obvious solution would be to rewrite the query by changing XML data but since we were getting the XML data from a different team/source so we did not have much control on the source XML data.
Example
======
We can also get the error reproduced in SQL Server 2005 (even in SQL Server 2008 and in SQL Server 2008 R2). The source XML was a single level (one element below base/root element) XML of below format however had around 1800 elements in this single level. For example, below XML has around 2000 elements called element0, element1…element1999. This is not a multi- level XML since it does not contain further nested XML:
<root>
<element0> dummy data 0</element0>
<element1> dummy data 1</element1>
--removed elements in between so as to keep this brief
<element1999> dummy data 1999</element1999>
</root>
The expected output was in below format (the “…” indicates columns between element1 and element1999) however we were not reaching the output phase because query was not even getting compiled due to the 8621 error:
element0
|
element1
|
…
|
element1999
|
dummy data 0
|
dummy data 1
|
…
|
dummy data 1999
|
As an example with two elements and retrieving values from XML variable, we can re-write the above XML data into an XML variable using below and retrieve results using a SELECT query:
DECLARE @x xml
SET @x = CAST(N'
<root>
<element0> dummy data 0</element0>
<element1> dummy data 1</element1>
</root>
'
AS xml)
--SELECT @x AS 'XmlData' /* uncomment this line if you want to see the XML */
Select a.value('element0[1]', 'nvarchar(max)') element0
, a.value('element1[1]', 'nvarchar(max)') element1
FROM @x.nodes('/*') nodeset(a)
As another example with two elements and retrieving values from a XML column in a table, we can insert the XML data into a XML columns using below and retrieve results using a SELECT query:
USE tempdb
-- drop the table if it already exists else create it.
IF OBJECT_ID('XmlData') IS NOT NULL
DROP TABLE XmlData
CREATE TABLE XmlData(Data xml)
go
-- insert xml data.
INSERT INTO XmlData (Data)
SELECT CAST(N'
<root>
<element0> dummy data 0</element0>
<element1> dummy data 1</element1>
</root>
' AS xml)
go
-- Select data from table
Select Data.value('(/root/element0)[1]', 'nvarchar(max)') 'element0'
,Data.value('(/root/element1)[1]', 'nvarchar(max)') 'element1'
from XmlData
go
-- cleanup
DROP TABLE XmlData
Another “optimal” way would be to use below query. It is optimal since combination of the nodes() and value() methods can be more efficient in generating the rowset when it has several columns and, perhaps, when the path expressions used in its generation are complex.
USE tempdb
-- drop the table if it already exists else create it.
IF OBJECT_ID('XmlData') IS NOT NULL
DROP TABLE XmlData
CREATE TABLE XmlData(Data xml)
go
-- insert dummy xml data.
INSERT INTO XmlData (Data)
SELECT CAST(N'
<root>
<element0> dummy data 0</element0>
<element1> dummy data 1</element1>
</root>
' AS xml)
go
-- Select data from table
SELECT nref.value('(/root/element0)[1]', 'nvarchar(max)') 'element0',
nref.value('(/root/element1)[1]', 'nvarchar(max)') 'element1'
FROM XmlData
CROSS APPLY data.nodes('/root') AS t2(nref) --'Data' is column of XML data type in table XmlData
GO
-- cleanup
DROP TABLE XmlData
Simple! Huh? However, as mentioned earlier, our XML data had around 1800 elements. The SELECT query was around 100,000 characters since it retrieved each value with “.value” for each of the 1800 elements in the XML data. The output of this SELECT query was to be consumed by application per the requirement.
When we run a query retrieving XML data, SQL Server 2005 (and above SQL Server 2008 / SQL Server 2008 R2) use SQLXML. Beginning with SQL Server 2005, SQL Server natively supports XML typed data using the xml data type. Prior to SQL Server 2005, SQL Server 2000 did not have a XML data type and could only parse XML data so as to extract XML values using sp_xml_preparedocument / sp_xml_removedocument / OPENXML. This parsing was done using MSXML parser (Msxml2.dll) however had memory limitations as mentioned in https://msdn.microsoft.com/en-us/library/aa260385(SQL.80).aspx. Also SQL Server 2000 did not have an inbuilt XML data type which is present from SQL Server 2005. However, for this issue, despite using SQL Server 2005 (and even SQL Server 2008 or SQL Server 2008 R2), we were getting the 8621 error when SQL Server query optimizer parsed the query to retrieve values of the 1800 elements in the XML data.
Why we are getting error?
We were getting this error because SQL Server was running out of stack space when SQL Server query optimizer parsed the query. Processing a single SQL statement is the most basic way that SQL Server executes SQL statements and this is done using ‘SQL Server Query Optimizer’. ‘Parsing’ is one of the phases in query optimization prior to query execution.
For example, when were run below query, it returns two values (of element0 and of element1) and also displays the execution plan (in text format) used by SQL query optimizer:
USE tempdb
-- drop the table if it already exists else create it.
IF OBJECT_ID('XmlData') IS NOT NULL
DROP TABLE XmlData
CREATE TABLE XmlData(Data xml)
Go
-- insert dummy xml data.
INSERT INTO XmlData (Data)
SELECT CAST(N'
<root>
<element0> dummy data 0</element0>
<element1> dummy data 1</element1>
</root>
' AS xml)
GO
SET STATISTICS PROFILE ON --this to display the execution plan(text)
GO
SELECT nref.value('(/root/element0)[1]', 'nvarchar(max)') element0
, nref.value('(/root/element1)[1]', 'nvarchar(max)') element1
FROM XmlData
CROSS APPLY Data.nodes('/root') as T2(nref) --‘Data’ is column of XML data type in table XmlData
GO
SET STATISTICS PROFILE OFF
GO
DROP TABLE XmlData
The above query executed using SQL Server Management Studio will also display the execution plan which includes the below two Stream Aggregate / UDX nodes. The UDX (Extended Operators) nodes indicate implementation of XQuery and XPath operations in SQL Server:
--Nested Loops(Inner Join, OUTER REFERENCES:([tempdb].[dbo].[XmlData].[Data]))
|--Nested Loops(Inner Join, OUTER REFERENCES:([tempdb].[dbo].[XmlData].[Data]))
| |--Nested Loops(Inner Join, OUTER REFERENCES:([tempdb].[dbo].[XmlData].[Data]))
| | |--Table Scan(OBJECT:([tempdb].[dbo].[XmlData]))
| | |--Filter(WHERE:(STARTUP EXPR([tempdb].[dbo].[XmlData].[Data] IS NOT NULL)))
| | |--Table-valued function
| |--Stream Aggregate(DEFINE:([Expr1021]=MIN(CASE WHEN [tempdb].[dbo].[XmlData].[Data]
| |--UDX((XML Reader.[id], XML Reader.[nid], XML Reader.[tid], XML Reader.[value],
| |--Nested Loops(Inner Join, OUTER REFERENCES:(XML Reader with XPath filter.
| |--Top(TOP EXPRESSION:((1)))
| | |--Compute Scalar(DEFINE:([Expr1012]=0x58, [Expr1038]=getdescenda
| | |--Table-valued function
| |--Table-valued function
|--Stream Aggregate(DEFINE:([Expr1036]=MIN(CASE WHEN [tempdb].[dbo].[XmlData].[Data] IS NU
|--UDX((XML Reader.[id], XML Reader.[nid], XML Reader.[tid], XML Reader.[value], XML
|--Nested Loops(Inner Join, OUTER REFERENCES:(XML Reader with XPath filter.[id],
|--Top(TOP EXPRESSION:((1)))
| |--Compute Scalar(DEFINE:([Expr1027]=0x58, [Expr1039]=getdescendantlim
| |--Table-valued function
|--Table-valued function
However the above execution plan is for query that’s only retrieving two values from XML data, namely element0 and element1 in our example.
Imagine a query that requires to retrieve 1800 values from XML data! This was our requirement. For retrieving such a large number of values, SQL Server would create one UDX node for each of the 1800 values and so we could potentially see around 1800 nodes. However, when SQL Server query optimizer creates these nodes, it uses a section of memory called the stack. Stack is memory space reserved for each thread that’s executed in Windows and this is a limited amount of space which can reach its limit for 1800 element query due to multiple nodes mentioned above and thus give error 8621. More details on Stack at https://msdn.microsoft.com/en-us/library/ms686774(VS.85).aspx .
This error 8621 means that the SQL Query Processor does not have enough space in the stack to accommodate a new row/frame for a new operator during processing. The issue happens because each call to .value() method for the element at the first hierarchical level of the XML document causes the stack size to grow during the query optimization until it runs out of stack space. This is the reason why when retrieving lesser number of values, the issue doesn’t occur:
User Mode default Stack Size Limitations in Windows
Architecture
|
OS thread stack size
|
SQL thread stack size
|
32bit
|
1 MB
|
0.5 MB
|
X64
|
2 MB
|
2 MB
|
IA64
|
4 MB
|
4 MB
|
A manual breakpoint memory dump would show that the XML Reader with XPath filter is actually making a remote query to the sqlxml DLL to parse and get the required data similar to a FTS query, and this query get repeated over and over again for each .value call in the TSQL. So, more the number of nodes, more the repetition of the above function code which will generate the following in the plan:
Resolution/Workaround
In this case, we were already on 64bit and the best/only way around is normally to rewrite the query (remember we can’t change source XML in our scenario). Re-writing query for XML data required using a workaround. We first retrieved the element values in an unpivoted form and then PIVOTed the data to give required output. PIVOT rotates a table-valued expression by turning the unique values from one column in the expression into multiple columns in the output, and performs aggregations where they are required on any remaining column values that are wanted in the final output. Please refer below for more details. Please feel free to test and run below query:
You can run below to get the 8621 error:
/************ BELOW TO GET THE 8621 ERROR ***************/
--<1> Generate large XML of single-level
--this is an untyped XML since it is not associated with an XSD schema
SET NOCOUNT ON
DECLARE @GenerateXML nvarchar (max)
DECLARE @ExecStr nvarchar(max)
DECLARE @i int
SET @GenerateXML = '<root>'
SET @ExecStr = 'Select'
SET @i = 0
--create 2000 elements
WHILE @i < 2000
BEGIN
SET @GenerateXML = @GenerateXML + '<element' + CAST(@i AS nvarchar(4)) + '> dummy data ' + CAST(@i AS nvarchar(4)) + '</element' + CAST(@i AS nvarchar(4))+'>'
IF @i <> 0 --not 1st iteration
SET @ExecStr=@ExecStr+ ','
SET @ExecStr = @ExecStr+ ' Data.value(' + CHAR(39) + 'element' + CAST(@i AS nvarchar(4)) + '[1]' + CHAR(39)+ ', ' + CHAR(39)+ 'nvarchar(max)' + CHAR(39)+ ') element' + CAST(@i AS nvarchar(4))
SET @i = @i + 1
END
SET @GenerateXML = @GenerateXML + '</root>'
--<2> Insert XML data into table
USE tempdb
IF OBJECT_ID('XmlData') IS NOT NULL
DROP TABLE XmlData
CREATE TABLE XmlData(Data xml)
INSERT INTO XmlData (Data)
SELECT CAST(@GenerateXML AS xml)
--SELECT * FROM XmlData
--<3> Generate and XML query
SET @ExecStr = @ExecStr + ' FROM XmlData'
--SELECT @ExecStr
EXEC sp_executesql @ExecStr
DROP TABLE XmlData
--returned below error in around 35 seconds on my desktop
Msg 8621, Level 17, State 1, Line 1
The query processor ran out of stack space during query optimization. Please simplify the query.
Please run below workaround script to avoid the 8621 error:
/************ BELOW IS THE RE-WRITTEN QUERY ***************/
--restarted SQL Server service to ensure clean repro (tempdb initial size is 8MB default,pre-sizing may help)
--<1> Generate large XML of single-level
--this is an untyped XML since it is not associated with an XSD schema
SET NOCOUNT ON
DECLARE @GenerateXML nvarchar (max)
DECLARE @ExecStr nvarchar(max)
DECLARE @i int
SET @GenerateXML = '<root>'
SET @ExecStr = 'Select'
SET @i = 0
--create 2000 elements
WHILE @i < 2000
BEGIN
SET @GenerateXML = @GenerateXML + '<element' + CAST(@i AS nvarchar(4)) + '> dummy data ' + CAST(@i AS nvarchar(4)) + '</element' + CAST(@i AS nvarchar(4))+'>'
IF @i <> 0 --not 1st iteration
SET @ExecStr=@ExecStr+ ','
SET @ExecStr = @ExecStr+ ' Data.value(' + CHAR(39) + 'element' + CAST(@i AS nvarchar(4)) + '[1]' + CHAR(39)+ ', ' + CHAR(39)+ 'nvarchar(max)' + CHAR(39)+ ') element' + CAST(@i AS nvarchar(4))
SET @i = @i + 1
END
SET @GenerateXML = @GenerateXML + '</root>'
--<2> Insert XML data into table
USE tempdb
IF OBJECT_ID('XmlData') IS NOT NULL
DROP TABLE XmlData
CREATE TABLE XmlData(Data xml, DataID bigint IDENTITY(1, 1)) --*****add a unique column
INSERT INTO XmlData (Data)
SELECT CAST(@GenerateXML AS xml)
--SELECT * FROM XmlData
--<3> Generate and XML query
--update statistics of source table if any
UPDATE STATISTICS XmlData WITH FULLSCAN
--above statement completes in 0 seconds
--all below took around 1minute 13 seconds on my Core 2 Duo 3.00GHz x64 with 4 GB RAM)
/**** Query table XmlData for each column and save result in keep_results ****/
--we're saving the results so we can create an INDEX so as to optimize the PIVOT
IF OBJECT_ID('keep_results') IS NOT NULL
DROP TABLE keep_results
CREATE TABLE [dbo].[keep_results](
[target_column_name] [nvarchar](250) NULL,
[value] [nvarchar](MAX) NULL, --this is data type of all columns in given query
[dataid] bigint NOT NULL --this column so as to have an unique identifier for each row. This required for the PIVOT.
) ON [PRIMARY];
--below is the workaround query however returns results which need to be PIVOTed through query further below
WITH Data_CTE (name, value, dataid) --if using an unique column with existing data in original table
AS
(
SELECT nref.value('local-name(.)', 'nvarchar(max)') target_column_name
, nref.value('(./text())[1]', 'nvarchar(max)') value
, dataid --can use existing column if its unique
FROM XmlData CROSS APPLY Data.nodes('root/*') AS R(nref) --'Data' is name of column with XML data type in table XmlData
--PLEASE NOTE THAT "root" mentioned above per the original XML data is case-sensitive so "root" is not equal to "Root"
WHERE nref.value('(./text())[1]', 'nvarchar(max)') IS NOT NULL --filtering NULL values if any since it drastically reduces time required for Index by PIVOT later since most of PIVOT cost is on Index Scan which is proportional to number of rows in this scenario
)
INSERT INTO tempdb.dbo.[keep_results] SELECT * FROM Data_CTE
--optimize this table for PIVOT
CREATE CLUSTERED INDEX idx_keep_results_DataID ON keep_results(DataID)
CREATE NONCLUSTERED INDEX idx_keep_results_target_column_name ON keep_results(target_column_name)
--above two queries complete in 2minutes
/**** Use PIVOT to return the results ****/
DECLARE @Xml xml
SET @Xml = (SELECT TOP 1 Data FROM XmlData)
DECLARE @cols NVARCHAR(MAX)
SELECT @cols = STUFF(( SELECT TOP 100 PERCENT
'],[' + t2.target_column_name
FROM
(
--below dynamically generates the columns for the PIVOT however assumes that the 1st XML has all required elements
SELECT nref.value('local-name(.)', 'nvarchar(max)') target_column_name
FROM @Xml.nodes('root/*') AS R(nref)
)
AS t2
ORDER BY '],[' + t2.target_column_name
FOR XML PATH('')
), 1, 2, '') + ']'
--SELECT @cols
--SELECT LEN(@cols)
DECLARE @query NVARCHAR(MAX)
SET @query =
N'SELECT *
FROM
(
SELECT dataid, target_column_name, value
FROM tempdb.dbo.[keep_results]
) AS source_data
PIVOT
(
MIN(value)
FOR target_column_name
IN (' + @cols + ')
) AS pvt'
--SELECT @query
EXECUTE(@query)
DROP TABLE keep_results
DROP TABLE XmlData
/************ please note ***************/
I’ve tested above on SQL Server 2005 Express and on SQL Server 2008 R2. If you get any errors, please retry above after running below:
USE tempdb
DROP TABLE keep_results
DROP TABLE XmlData
Additional information unrelated to this issue
From https://msdn.microsoft.com/en-us/library/ms345117(SQL.90).aspx
Up to 128 levels of the XML hierarchy are accommodated; XML instances containing longer paths are rejected during insertion and modification.
Similarly, up to the first 128 bytes of a node's value are indexed; longer values are accommodated within the system and are not indexed.
But the above doesn’t apply directly to above example because everything in that XML is at level 1.
Happy Learning!
Regards,
Vijay Rodrigues
SE, SQL Server Support
Reviewed by
Balmukund Lakhani
TL, SQL Server Support
Shamik Ghosh
TL, SQL Server Support
Amit Banerjee
SEE, SQL Server Support