Tuning Enovia SmarTeam -- Statistics
[Prior Post in Series] [Next Post in Series]
Adding statistics helps SQL Server to determine which tables to scan first when there are joins. To understand statistics better, see Statistics Used by the Query Optimizer in Microsoft SQL Server 2005. Statistics are typically added after indexes because if there is an index on the column then there is usually not a need for a statistic.
I have run a sample trace log through Database Engine Tuning Advisor (DTA) and received recommendations which I have saved as XML.
My approach for this demonstration tuning is simple:
- Identify all column names recommended per table.
- Put statistics on all tables with these column names.
This produces a good solution given the light-weight trace log and does not require much analysis time.
Getting the list of recommended statistics columns
As cited in the introduction, I use the XML results file to speed analysis. I opened one of my saved XML files and then pasted it below after doing some simple modifications.
- Delete:
- <?xml version="1.0" encoding="utf-16"?>
- Change:
- <DTAXML xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance" xmlns="https://schemas.microsoft.com/sqlserver/2004/07/dta">
- To
- <DTAXML>
As cited before, this avoids issues with XML namespaces from my TSQL below.
DECLARE @Xml Xml
Set @Xml='{paste here}'
Select ColName,Count(1) FROM
(SELECT node.value('(.)[1]','sysname') as ColName FROM @Xml.nodes('//Statistics/Column') as Ref(node)) DTA
GROUP BY ColName
ORDER BY COUNT(1) DESC
Which produced results similar to those shown below.
The first column is the column name and the second one is the number of tables it occurs on.
The stored procedure below creates and then executes TSQL to add statistics across all tables that match the patterns identified in the DTA recommendations. These patterns are modified as follows:
- Only one statistic is added per table
- The columns in the statistics are added in the order that they appear in the table
This is a quick reasonable solution. If you have the time, a more detailed analysis is best.
This approach reduces the recommendations into just 16 patterns, shown below
The Code Solution
The TSQL below does some transformations on the XML and then creates statistics on all tables matching the pattern. The Statistics name is generated from the table name and the column's ordinal number(aka position in the table) in the statistics so the procedure may be executed as needed without duplicating statistics.
CREATE PROC dta_Statistics @Xml Xml
AS
SET NOCOUNT ON
DECLARE @CMD nvarchar(max),@ColName nvarchar(max), @Table nvarchar(max),@LastTable nvarchar(max),@StatID nvarchar(max)
DECLARE @ColCnt int,@ColIdx int
CREATE Table #temp(TName varchar(128),CName varchar(128), ColNo int)
INSERT INTO #temp(TName,CName)
SELECT DISTINCT
TableName,
node2.value('.','sysname') as ColName FROM (
SELECT
node.value('(parent::*/parent::*/parent::*/Name)[1]','sysname') as TableName,
node.query('.') as sNode
FROM @Xml.nodes('//Create/Statistics') as Ref(node)
) DTA
CROSS APPLY snode.nodes('(Statistics/Column/Name)') Ref(node2)
UPDATE #Temp
SET ColNo=Ordinal_Position
FROM #Temp JOIN INFORMATION_SCHEMA.Columns
ON TName=Table_Name
And Replace(Replace(CName,'[',''),']','')=Column_Name
CREATE Table #temp2(InCol varchar(max), ColCnt int, StatId varchar(max))
SET @Cmd=''
SET @StatId=''
SET @LastTable=''
SET @ColCnt=0
DECLARE PK_Cursor CURSOR FOR
Select TName,CName,ColNo FROM #temp ORDER BY TName,ColNo
OPEN PK_Cursor;
FETCH NEXT FROM PK_Cursor INTO @Table,@ColName,@ColIdx
WHILE @@FETCH_STATUS = 0
BEGIN
IF @LastTable <> @TABLE
BEGIN
IF Len(@CMD) > 0
INSERT INTO #temp2 (InCol,ColCnt,StatID) Values(@CMD,@ColCnt,@StatID)
SET @LastTable = @TABLE
SET @CMD=''
SET @StatId=''
SET @ColCnt=0
END
IF Len(@CMD) > 0
BEGIN
SET @CMD=@CMD+','
SET @StatId=@StatId+'_'
END
SET @CMD=@CMD+@ColName
SET @StatID=@StatID+Cast(@ColIdx as varchar(11))
SET @ColCnt=@ColCnt+1
FETCH NEXT FROM PK_Cursor INTO @Table,@ColName,@ColIdx
END
IF Len(@CMD) > 0
INSERT INTO #temp2 (InCol,ColCnt,StatId) Values(@CMD,@ColCnt,@StatId)
CLOSE PK_Cursor;
DEALLOCATE PK_Cursor;
CREATE Table #temp3(TName varchar(max), InCol varchar(max), StatId varchar(max) )
SET @CMD=''
DECLARE P2_Cursor CURSOR FOR
Select InCol,ColCnt,StatId FROM #temp2 ORDER BY InCol,ColCnt
OPEN P2_Cursor;
FETCH NEXT FROM P2_Cursor INTO @ColName,@ColCnt,@StatId
WHILE @@FETCH_STATUS = 0
BEGIN
IF Len(@CMD) > 0
SET @CMD=@CMD+'
UNION
'
SET @CMD=@CMD+'Select Table_Name,'''+@ColName +''',''STATS_''+Table_Name+''_''+'''+@StatID+'''
FROM Information_Schema.Columns JOIN Sys.Objects ON NAME=Table_Name AND Objects.Type=''U''
WHERE COLUMN_Name in ('+REPLACE(REPLACE(@ColName,'[',''''),']','''')+') GROUP BY TABLE_NAME HAVING Count(1)='
+Cast(@ColCnt as varchar(11))
FETCH NEXT FROM P2_Cursor INTO @ColName,@ColCnt,@StatId
END
CLOSE P2_Cursor;
DEALLOCATE P2_Cursor;
CREATE TABLE #Temp4 (TName sysname,Cols nvarchar(max), StatId sysname)
INSERT INTO #Temp4 EXEC (@Cmd)
DECLARE P3_Cursor CURSOR FOR
Select TName,Cols,StatId FROM #temp4
OPEN P3_Cursor;
FETCH NEXT FROM P3_Cursor INTO @Table,@ColName,@StatId
WHILE @@FETCH_STATUS = 0
BEGIN
IF NOT EXISTS(SELECT 1 FROM Sys.Stats where Name=@STATID)
BEGIN
SET @CMD='CREATE STATISTICS ['+@STATID+'] ON ['+@Table
+'] ('+@ColName+')'
EXEC(@CMD)
PRINT @CMD
END
FETCH NEXT FROM P3_Cursor INTO @Table,@ColName,@StatId
END
CLOSE P3_Cursor;
DEALLOCATE P3_Cursor;
I did a PRINT @CMD above so you can see the creation of the statistics code, for example:
CREATE STATISTICS [STATS_USER_GROUP_8_9_10_11]
ON [USER_GROUP] ([USR_OBJECT_ID],[GRP_OBJECT_ID],[CNTX_OBJECT_ID],[CNTX_CLASS_ID])
CREATE STATISTICS [STATS_USERS_1_2_3_10]
ON [USERS] ([OBJECT_ID],[CLASS_ID],[LOGIN],[FIRST_NAME])
For this database, a total of 291 statistics were added.
Summary
The above solution is a good solution but is likely not perfect. Some of the statistics are candidates for further optimization. For example, consider this set:
CREATE STATISTICS [STATS_TN_VIDEO_1_2_3_4_5_6_7_8_9_10_16]
CREATE STATISTICS [STATS_TN_VIDEO_1_2_3_4_5_6_7_8_9_10_50]
CREATE STATISTICS [STATS_TN_VIDEO_1_2_3_4_5_6_7_8_9_10_51]
CREATE STATISTICS [STATS_TN_VIDEO_1_2_3_4_5_6_7_8_9_10_54]
CREATE STATISTICS [STATS_TN_VIDEO_1_2_3_4_5_6_7_8_9_10_57]
CREATE STATISTICS [STATS_TN_VIDEO_1_2_3_4_5_6_7_8_9_10_58]
Analysis and testing would reveal if dropping these 6 indexes and replacing them with one of the following would improve performance better:
- [STATS_TN_VIDEO_1_2_3_4_5_6_7_8_9_10]
- [STATS_TN_VIDEO_1_2_3_4_5_6_7_8_9_10_16_50_51_54_57_58]
There is a balance between the time spent improving performance and the performance gain, and common sense should be used.
A second item that I did not do was checking for indexes using the columns specified. The TSQL code above was already pushing the limit for complexity in a blog post.