Jaa


Optimized Non-clustered Index Maintenance in Per-Index Plans

In my last post, I showed how SQL Server 2005 only updates non-clustered indexes when the data in the index actually changes.  For my example, I used a simple update statement that results in a per-row or narrow plan.  In this post, I'll show how this optimization works in a per-index or wide update plan.

Let's use the same schema and update statement from last week.  To get a per-index plan, we can trick the optimizer into believing that the table is much larger than it really is by running some UPDATE STATISTICS statements with the ROWCOUNT and PAGECOUNT options.  (We could, of course, make the table larger by actually inserting more rows, but the UPDATE STATISTICS route is faster.)

CREATE TABLE T (PK INT, A INT, B INT, C INT, D INT, E INT)
CREATE UNIQUE CLUSTERED INDEX TPK ON T(PK)

CREATE INDEX TB ON T(B)
CREATE INDEX TCD ON T(C,D)
CREATE INDEX TE ON T(E)

INSERT T VALUES(0, 10, 20, 30, 40, 50)

UPDATE STATISTICS T WITH ROWCOUNT = 100000, PAGECOUNT = 10000
UPDATE STATISTICS T (TB) WITH ROWCOUNT = 100000, PAGECOUNT = 1000
UPDATE STATISTICS T (TCD) WITH ROWCOUNT = 100000, PAGECOUNT = 1000
UPDATE STATISTICS T (TE) WITH ROWCOUNT = 100000, PAGECOUNT = 1000

SET STATISTICS PROFILE ON

UPDATE T SET B = 29, C = 39, D = 49

By running this statement with SET STATISTICS PROFILE ON, we can see the exact number of rows processed by each index update operator:

Rows   Executes
2      1        |--Sequence
2      1             |--Index Update(OBJECT:([T].[TB]), SET:([PK1036] = [T].[PK],[B1037] = [T].[B]) WITH ORDERED PREFETCH)
2      1             |    |--Sort(ORDER BY:([T].[B] ASC, [T].[PK] ASC, [Act1035] ASC))
2      1             |         |--Filter(WHERE:(NOT [Expr1031]))
2      1             |              |--Table Spool
2      1             |                   |--Split
1      1             |                        |--Clustered Index Update(OBJECT:([T].[TPK]), SET:([T].[B] = [Expr1025],[T].[C] = [Expr1026],[T].[D] = [Expr1027]))
1      1             |                             |--Compute Scalar(DEFINE:([Expr1031]=[Expr1031], [Expr1032]=[Expr1032]))
0      0             |                                  |--Compute Scalar(DEFINE:([Expr1031]=CASE WHEN [Expr1003] THEN (1) ELSE (0) END, [Expr1032]=CASE WHEN [Expr1004] AND [Expr1005] THEN (1) ELSE (0) END))
0      0             |                                       |--Compute Scalar(DEFINE:([Expr1025]=(29), [Expr1026]=(39), [Expr1027]=(49)))
0      0             |                                            |--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN [T].[B] = (29) THEN (1) ELSE (0) END, [Expr1004]=CASE WHEN [T].[C] = (39) THEN (1) ELSE (0) END, [Expr1005]=CASE WHEN [T].[D] = (49) THEN (1) ELSE (0) END))
1      1             |                                                 |--Top(ROWCOUNT est 10000)
1      1             |                                                      |--Clustered Index Scan(OBJECT:([T].[TPK]), ORDERED FORWARD)
2      1             |--Index Update(OBJECT:([T].[TCD]), SET:([PK1038] = [T].[PK],[C1039] = [T].[C],[D1040] = [T].[D]) WITH ORDERED PREFETCH)
2      1                  |--Filter(WHERE:(NOT [Expr1032]))
2      1                       |--Sort(ORDER BY:([T].[C] ASC, [T].[D] ASC, [T].[PK] ASC, [Act1035] ASC))
2      1                            |--Table Spool

Let's review the various operators in this query plan.  First, observe that the portion between the clustered index scan and clustered index update operators looks essentially identical to the per-row plan from my last post.  In particular, notice how we have similar compute scalars to compute which indexes have really changed and need to be updated.  [Expr1031] indicates whether index TB must be updated and [Expr1032] indicates whether index TCD must be updated.  The only real difference between this portion of the plan and a per-row plan is that in this plan the clustered index update only updates the clustered index and does not update either of the non-clustered indexes.  The non-clustered indexes are updated by separate index update operators.

Next, we have a split operator.  The split operator converts a single row update into a delete followed by an insert.  This is why a single input row to the split turns into two output rows.  The split and sort operators in this plan are purely for performance and can be ignored for the purpose of this discussion.

The table spool is a common subexpression spool.  It eagerly consumes all of its input rows and then plays back these rows multiple times.  In this example, the spool plays back its rows twice: once for the update of index TB and once for the update of index TCD.  Note that the table spool below the index update of TCD is a secondary spool.  It has no input.  Instead, it plays back the rows from the primary table spool which is located between the index update of TB and the clustered index update.

The sequence operator glues the two portions of the plan together.  As its name suggests, it executes each child subtree in a sequential fashion.

Finally, the filter operators are new in SQL Server 2005.  They check the results of the compute scalars to determine whether each non-clustered index needs to be updated for each row.  In this example, both indexes need to be updated for the single row we are updating and so we see that the row passes through the filters and the index update operators each process two rows.  (Recall that the split converted the single row update into two rows: a delete followed by an insert.)

If we repeat the same update, we can see that this time the filters discard the rows as the data is not changing and the non-clustered indexes do not need to be updated.  Notice that both index update operators process no rows:

Rows   Executes
0      1        |--Sequence
0      1             |--Index Update(OBJECT:([T].[TB]), SET:([PK1036] = [T].[PK],[B1037] = [T].[B]) WITH ORDERED PREFETCH)
0      1             |    |--Sort(ORDER BY:([T].[B] ASC, [T].[PK] ASC, [Act1035] ASC))
0      1             |         |--Filter(WHERE:(NOT [Expr1031]))
2      1             |              |--Table Spool
2      1             |                   |--Split
1      1             |                        |--Clustered Index Update(OBJECT:([T].[TPK]), SET:([T].[B] = [Expr1025],[T].[C] = [Expr1026],[T].[D] = [Expr1027]))
1      1             |                             |--Compute Scalar(DEFINE:([Expr1031]=[Expr1031], [Expr1032]=[Expr1032]))
0      0             |                                  |--Compute Scalar(DEFINE:([Expr1031]=CASE WHEN [Expr1003] THEN (1) ELSE (0) END, [Expr1032]=CASE WHEN [Expr1004] AND [Expr1005] THEN (1) ELSE (0) END))
0      0             |                                       |--Compute Scalar(DEFINE:([Expr1025]=(29), [Expr1026]=(39), [Expr1027]=(49)))
0      0             |                                            |--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN [T].[B] = (29) THEN (1) ELSE (0) END, [Expr1004]=CASE WHEN [T].[C] = (39) THEN (1) ELSE (0) END, [Expr1005]=CASE WHEN [T].[D] = (49) THEN (1) ELSE (0) END))
1      1             |                                                 |--Top(ROWCOUNT est 10000)
1      1             |                                                      |--Clustered Index Scan(OBJECT:([T].[TPK]), ORDERED FORWARD)
0      1             |--Index Update(OBJECT:([T].[TCD]), SET:([PK1038] = [T].[PK],[C1039] = [T].[C],[D1040] = [T].[D]) WITH ORDERED PREFETCH)
0      1                  |--Filter(WHERE:(NOT [Expr1032]))
2      1                       |--Sort(ORDER BY:([T].[C] ASC, [T].[D] ASC, [T].[PK] ASC, [Act1035] ASC))
2      1                            |--Table Spool

We could also use the other techniques which I demonstrated in my previous post (that is, checking the sys.dm_db_index_operational_stats DMV, checking locks held, and checking logical reads) to verify that the non-clustered indexes were not really updated.  I'll leave trying these other techniques as an exercise.

Comments

  • Anonymous
    August 24, 2007
    Hi Craig,Nice post.I was wondering if it is possible to force the query optimizer to use a narrow or a wide plan. I thought the USE PLAN query hint could be the solution but then I realized that it cannot be specified with INSERT, UPDATE, or DELETE statements. Is there a way to do it?Thanks,
  • Anonymous
    August 29, 2007
    There is no easy way to force narrow or wide plans.  For testing purposes (on a development system only), you can fake the stats as I've demonstrated above.  Alternatively, Stefano from the optimizer team suggests forcing a wide plan by including a non-persisted computed column in the index.  Note that in some cases, the optimizer cannot choose a narrow plan.  (This is why we can force a wide plan by including a non-persisted computed column in an index.)
  • Anonymous
    August 30, 2007
    Also, starting with SQL Server 2008 CTP5, the USE PLAN query hint will work with INSERT, UPDATE, and DELETE statements.  However, because USE PLAN requires a valid XML plan, it is most useful only when you can generate the plan you want through some other means.
  • Anonymous
    September 27, 2007
    In my last post, I wrote about how aggregation WITH ROLLUP works. In this post, I will discuss how aggregation
  • Anonymous
    October 25, 2007
    One of the most important uses of CTEs is to write recursive queries. In fact, CTEs provide the only