Index Build strategy in SQL Server - Part 4-1: Offline Serial/Parallel Partitioning (Non-aligned partitioned index build)

Recall that in the previous posts on index build, we defined "aligned" as the case when base object and in-build index use the same partition schema, and "non-aligned" to be the case when heap and index use different partition schemes, or the case when heap is not partitioned. In this post, we will talk about the two scenarios of non-aligned partitioned index build, source partitioned and source not partitioned.

Source Not Partitioned
Consider the following query.

Create

Partition Function pf (int)

as

range right for values (1, 100, 1000)

Create

Partition Scheme ps as Partition pf

ALL

TO ([PRIMARY])

Create

table t (c1 int, c2 int) –the table is created on the primary filegroup by default

Create

clustered Index idx_t on t(c1) on ps(c1) -- non-aligned index build

The serial plan is straightforward.

 Index Insert (write data to the in-build index)
   |
 Sort (order by index key)
   |
 Scan (read data from source)

The sort iterator creates one sort table per target partition (there are four partitions in this example so we will construct four sort tables concurrently). By default, we use the user database for sort to spill data. As we mentioned before, we free each extent from sort table after all its rows are copied. By doing this, for each partition, we can reduce the disk space requirement from 3*partition size (source + sort table + b-tree) to just about 2.2*partition size. Therefore, each file group requires 2.2*(size of all partitions that belong to this file group) of disk space. If the users specify sort_in_tempdb, then all the sort tables reside in the tempdb. Therefore, we require 2.2*(Size of the whole index) of free space in tempdb.

Index insert iterator can start building index after the sort iterator finishes sorting all sort tables. Therefore, we will have as many sort tables as the number of partitions at the same time. Recall that each sort table requires at least 40 pages. So, the minimum required memory will be #PT*40pages.

When it comes to parallel plan, it looks like

 X (Exchange)
   |
 Index Insert
   |
 Sort
   |
 Scan

Each worker thread is assigned (Partition Count)/(Degree Of Parallelism) number of partitions (e.g. if we have 4 partitions and 4 worker threads each gets 1 partition), which can be skewed. The sort iterator creates one sort table per assigned partition. Each worker scans the source once and retrieves the rows that belong to its partition(s), the retrieved rows will be inserted into the corresponding sort table depending on which partition they belong to.

After all sort tables got populated, the index builder starts consuming rows from sort tables, it consumes one sort table after another, building b-tree(s) in each partition's file group. Currently workers do not share partitions. Therefore, it is possible for one worker to finish all assigned partitions and idle while another worker is busy inserting rows.

The disk space and memory requirements are exactly the same as the serial plan. This is because in both cases, we cannot start building the index until all the sort tables are populated.

Comments