“QUOTED_IDENTIFIER” causes Unexpected Query Plan for Persisted Computed Column query
Author: Shaun Tinline-Jones
Technical Reviewers: Thomas Kejser, Steve Howard, Jaime Alva Bravo, Kun Cheng, Jimmy May
Note: Validation for this post was performed in the SQL CAT Customer Lab on an HP Proliant DL580 G7, Intel Xeon Nehalem E7-4870 2.40 GHz 4 socket, 10 physical cores, 20 logical cores for a total of 40 physical cores, 80 logical cores; 1TB RAM. SQL Server 2008 R2 was installed on a Fusion-io ioDrive Duo 1.28TB MCL using driver version 2.3.1
We recently engaged a Tier 1 Global ISV in our labs, where the objective was to achieve the highest Business Transactions for their application.
During the testing and optimizing we encountered an interesting behavior where we kept incurring a clustered index scan; irrespective of applying known techniques to generate an index seek. In this blog, I will share the reproducible steps that reveal the following observation:
“If the table has a persisted computed column*, the query optimizer will choose a clustered index scan over a clustered index seek.”
*This applies to the case, which is most common, where the compute expression doesn’t produce a constant.
Here is the code to create the table & SPs that we’ll refer to for describing this behavior:
IF OBJECT_ID('dbo.testComputedCol') IS NOT NULL
DROP TABLE dbo.TestComputedCol
GOIF OBJECT_ID('dbo.GetResults01') IS NOT NULL
DROP PROC dbo.GetResults01
GOIF OBJECT_ID('dbo.GetResults02') IS NOT NULL
DROP PROC dbo.GetResults02
GOCREATE TABLE dbo.TestComputedCol (Col1 int PRIMARY KEY CLUSTERED, Col2 int)
GOCREATE NONCLUSTERED INDEX NDX_TestComputedCol_Col2 ON dbo.TestComputedCol (Col2)
GOSET NOCOUNT ON
DECLARE @Count int = 1;
WHILE @Count < 1000
BEGIN
INSERT INTO dbo.TestComputedCol (Col1, Col2) VALUES (@Count, @Count*10)SET @Count += 1
END
GOSET QUOTED_IDENTIFIER OFF
GOCREATE PROC dbo.GetResults01
AS
SELECT * FROM dbo.TestComputedCol WHERE Col1 = 10
GOSET QUOTED_IDENTIFIER ON
GOCREATE PROC dbo.GetResults02
AS
SELECT * FROM dbo.TestComputedCol WHERE Col1 = 10
GO
To baseline the scenario we are describing, we run both SPs and confirm that we experience the index seek operation.
EXEC dbo.GetResults01
EXEC dbo.GetResults02
In our scenario we recognized the need to partition the data to avoid the last page insert contention issue and therefore implemented a computed column, similar to the below statement. While creating the column it is documented that the Quoted Identifier must be ON in order to create the computed column.
Side Note: For resolving the Last Page Insert Contention challenge, we are left with two choices, troubleshoot the reason for the differing behavior or change the insert behavior to include setting a value for the partition bucket. We chose to troubleshoot the issue. This blog is not striving to optimize the partition hash; however it’s important to recognize that resolving the Last Page Insert Contention through a computed column is not optimal.
In its literal sense QUOTED_IDENTIFIER is required to parse quoted strings as identifiers. A recommended practice is to rather use square brackets “[” as an identifier. Therefore it’s not intuitive that the QUOTED_IDENTIFIER setting influences queries against computed columns.
NOTE: When a table is created, the QUOTED_IDENTIFIER option is always stored as ON in the metadata for the table, irrespective of the current setting. However when creating a computed column, the setting must be set to ON.
SET QUOTED_IDENTIFIER ON
GOALTER TABLE dbo.TestComputedCol
ADD ComputedCol AS CAST(Col1%10 AS tinyint) PERSISTED NOT NULL
In our solution, prior to adding the computed column, a frequently called SP generated a query plan very similar to what you see below:
EXEC dbo.GetResults01
However after the computed column was added, executing the same SP generated the following query plan.
At first this may seem unexpected, and frustrating. However, some may infer that this is expected behavior, as described in article Creating Indexes on Computed Columns. Take note that this is an inference or a source of data points, as we did not add an index to the computed column. Does this happen if the SELECT and WHERE columns are not referencing the computed column? We modified the SP to follow the recommended practice of explicitly specifying only the required columns,
SELECT Col1, Col2 FROM dbo.TestComputedCol WHERE Col1 = 10
Unfortunately, you cannot shake the scan this way!
The only way to resolve the issue is to ensure the QUOTED_IDENTIFIER to ON during the execution of the batch. The challenge is finding the right place to ensure the effective setting at runtime.
Side Note: In this scenario a computed column will be the partitioning column, which requires a PERSISTED property. The above issue doesn’t occur if the computed column is not persisted.
The default value for QUOTED_IDENTIFIERS is ON; however it can be changed at a variety of points. Therefore you may not encounter the above issue, yet when it does occur it may not be intuitive to check the QUOTED_IDENTIFIER setting or even at which point along the path it has been changed.
The setting can be applied:
- As a connection attribute
- As a database option
- As an object attribute
- As a runtime SET option
In our case, we unexpectedly started incurring the index scan. We identified these index scans when the table partitioning strategy didn’t produce the benefits we sought. Troubleshooting revealed that when we ran the code (from the SP) as an isolated batch in a Query Analyzer window, we got the seek behavior. If we ran the SP in the same Query Analyzer window but in a separate batch, we got the scan behavior. When we ran the code using the osql utility, we got scan behavior for both batch executions. If we changed the computed column to a constant value, then we got the seek behavior.
The clue to finding a resolution is in the differences between the osql.exe and SSMS outcomes. These utilities have different defaults for connection attributes. The resolution lies in creating the stored procedure with QUOTED_IDENTIFIERS set to ON. An interesting observation is that it’s redundant to set the quoted identifier setting within the stored procedure; the setting at creation time will take precedence.
Let me share with you how ALM (Application Lifecycle Management) practices of the ISV caused this issue to slither its way into the solution.
NOTE: Keep in mind that there are many variations of what I’m about to describe
A developer creates a module, such as an SP, using a tool such as SSMS, VSTS, osql, sqlcmd, and others but most often SSMS. When the developer is done unit testing, a script file is checked into the source control system, below is an example of such a file.
CREATE PROC dbo.GetResults01
AS
SELECT * FROM dbo.TestComputedCol WHERE Col1 = 10
GO
The Build Process then generates a deployment script. There are a variety of ways that deployment scripts can be generated and run against the target SQL Server and database instance. The point in the above scenario is that the QUOTED_IDENTIFIER setting for this object is dependent on the current connection attribute of the deployment script generator.
If the deployment script does not specify QUOTED_IDENTIFIER setting for the objects, then the setting will be dependent on the connection settings when the script is run. The Build Process for the ISV leverages the osql.exe utility, which has a default setting of OFF. It’s not uncommon for build processes to use the SQLCMD.exe, which would cause the same behavior.
NOTE: If the connection does not set this attribute, then the database setting will take precedence. By default, ODBC and OLEDB set the option to ON. For DB-Library the default is OFF. Therefore it is very unlikely that the database default is ever enforced.
Tool |
Default Quoted Identifier Setting |
osql.exe |
OFF |
sqlcmd.exe |
OFF |
SQLCMD Scripts |
ON |
SSMS |
ON |
Below are T-SQL queries to identify current QUOTED_IDENTIFIER settings:
SELECT DATABASEPROPERTYEX(DB_NAME(), 'IsQuotedIdentifiersEnabled')
SELECT * FROM sys.databases WHERE is_quoted_identifier_on = 0
SELECT quoted_identifier, * FROM sys.dm_exec_sessions WHERE session_id = @@SPID
They each have cases where they are useful, though I particularly like:
SELECT * FROM sys.objects
WHERE OBJECTPROPERTYEX(object_id, 'IsQuotedIdentOn') = 0 AND TYPE <> 'S'
This last query reveals those objects that deviate from the recommended standard of having QUOTED_IDENTIFIER set to ON.
In conclusion…there are connection settings that can influence the behaviors of the queries and output. Therefore it’s important to standardize on a setting, the more aligned it is with the industry defaults the easier it is to adhere to the standard.
Historically, prior to computed columns, quoted identifiers were used merely to distinguish strings from literals. It is recommended that literals are identified by square brackets, and strings are defined by single quotes. It is further recommended to keep quoted identifier setting to ON in all cases, even if it means explicitly adding this to script files before objects are created.
For our scenario, we conclude that it’s not intuitive that the reason behind the index scan is related to a quoted identifiers setting, therefore we standardize that the setting be ON and run the above queries to validate all appropriate objects have this setting to ON as part of validating a build deployment.
Comments
- Anonymous
September 06, 2016
Shaun Tinline-Jones:Thanks for doing the hard work and placing this in the public domain.It is obviously a very difficult problem to diagnose, because of its salience.I am glad there was a particular problem, shall we say a performance riddle, that you were trying to tackle and because of its perplexity, Index Seek Versus Index Scan, and in you refusing to just accept that you had to live with Index Scan the community as a whole benefited.Blessings,Daniel Adeniji