Jaa


Partitioned Indexes in SQL Server 2008

In my last post, I looked at how SQL Server 2008 handles scans on partitioned tables.  I explained that SQL Server 2008 treats partitioned tables as tables with a logical index on the partition id column and that SQL Server 2008 implements partition elimination by performing a logical index seek on the partition id column.  Specifically, I showed some examples using a heap.  In this post, I'll continue this discussion and explore how SQL Server 2008 handles scans and seek on partitioned indexes.

Let's begin with a simple example:

CREATE PARTITION FUNCTION PF(INT) AS RANGE FOR VALUES (0, 10, 100)
CREATE PARTITION SCHEME PS AS PARTITION PF ALL TO ([PRIMARY])

CREATE TABLE T1 (A INT, B INT)
CREATE CLUSTERED INDEX T1A ON T1(A) ON PS(A)

As I noted in my last post, the SQL Server query processor logically treats this partitioned index as a multi-column index on ([PtnId], A).  We can scan this logical index just like any real index:

SELECT * FROM T1

  |--Clustered Index Scan(OBJECT:([T1].[T1A]))

This query uses the same plan as any other clustered index scan.  It implicitly scans all of the partitions.  Aside from the "Partitioned" attribute in the graphical and XML plans, there is no difference between this plan and a similar clustered index scan on a non-partitioned table.  We can also perform a seek on the logical index:

DECLARE @I INT
SELECT @I = 0
SELECT * FROM T1 WHERE A = @I

  |--Clustered Index Seek(OBJECT:([T1].[T1A]), SEEK:( [PtnId1000]=RangePartitionNew([@I],(0),(0),(10),(100)) AND [T1].[A]=[@I]) ORDERED FORWARD)

Notice that SQL Server derives the predicate on the [PtnId] column from the explicit predicate on column A.  SQL Server then seeks on both columns of the logical index - [PtnId] and column A - just as if the logical index were a real index.  So far, everything is pretty straightforward.  Now, let's consider a slightly more complex scenario:

DECLARE @I INT
SELECT @I = 0
SELECT * FROM T1 WHERE A < @I

Both this and the prior example have predicates on column A.  In both cases, SQL Server derives a predicate on the [PtnId] column.  However, in the prior example, the derived predicate was an equality predicate that limited the seek to a single partition.  In this example, due to the inequality, the derived predicate is actually a range of partitions:

  |--Clustered Index Seek(OBJECT:([T1].[T1A]), SEEK:( [PtnId1000] >= (1) AND [PtnId1000] <= RangePartitionNew([@I],(0),(0),(10),(100)) AND [T1].[A] < [@I]) ORDERED FORWARD)

If you are familiar with the rules on index seeks, you will notice something odd about this plan.  Ordinarily, SQL Server can only perform an index seek on the second column of a multi-column index if there exists an equality predicate on the first column.  Clearly, there is not an equality predicate on the [PtnId] column in this query and yet there is a predicate on column A.  What's going on?  The query processor supports a special type of index seek on partitioned tables.  Basically, the query processor first uses the predicate on the [PtnId] column to identify those partitions into which to seek and then seeks again using the predicate on column A.  Because the seek operator skips over some rows in each partition that do not match the predicate on column A, we refer to this type of seek as a skip scan.  Note that this operation really is a seek - it is not a residual predicate.  The storage engine only returns those rows that pass both predicates.  Also note that SQL Server 2008 only supports this skip scan functionality on the [PtnId] column of a partitioned table.

Now let's change the schema slightly:

CREATE TABLE T2 (A INT, B INT)
CREATE CLUSTERED INDEX T2A ON T2(A) ON PS(B)

This new table is indexed on column A but partitioned on column B.  Once again, SQL Server logically treats the index as a multi-column index on ([PtnId], B).

DECLARE @I INT
SELECT @I = 0
SELECT * FROM T2 WHERE A = @I

Since the table is partitioned on column B, we cannot use the predicate on column A to derive a predicate on the [PtnId] column.  Fortunately, the query processor can still perform a skip scan by adding a predicate on the [PtnId] column that explicitly scans all partitions:

  |--Clustered Index Seek(OBJECT:([T2].[T2A]), SEEK:([PtnId1000] >= (1) AND [PtnId1000] <= (4) AND [T2].[A]=[@I]) ORDERED FORWARD)

Finally, before I wrap up this post, I'd like to point out that, while it's not often useful, we can add an explicit predicate on the [PtnId] column of a table.  To do so, we use the $PARTITION function:

DECLARE @PtnId INT
SELECT @PtnId = 1
SELECT * FROM T1 WHERE $PARTITION.PF(A) = @PtnId

  |--Clustered Index Scan(OBJECT:([T1].[T1A]), SEEK:( [PtnId1000]=[@PtnId] ) ORDERED FORWARD)

Just as with the derived [PtnId] predicates, SQL Server can perform a seek on the [PtnId] column.  When using this syntax, it is important to use the correct $PARTITION function and the correct column or SQL Server will not recognize it as the [PtnId] column for this table.

Books Online has more information on these and other partitioned table changes in SQL Server 2008.

Comments