What happens to non-clustered indexes when the table structure is changed?
Here’s a topic that’s cropped up several times during Q&As at TechEd this year – what happens to non-clustered indexes when changes are made to the underlying table? Are they always rebuilt or not?
Before we get into that discussion, I’ll give you a little background.
One way to describe a non-clustered index is whether it is a covering index or not. A covering index is one which has all the table columns necessary to satisfy a query, and so there is no need to go back to the underlying table (which I’ll call the base table) to fetch additional columns. Most non-clustered indexes are not covering indexes and so it is necessary for each non-clustered index record to include a link back to the corresponding base table record. This link takes various forms, depending on the structure of the base table:
- Heap: A heap does not impose any logical ordering on the records within it, so the only way to find a particular record within a heap is to do a full table scan or to know the physical location of the record (i.e. which record slot on which data page) – called the physical record ID or physical RID. Clearly doing a full table scan every time a query needs to go from a non-clustered index record back to a heap is not feasible, so each non-clustered index record stores the physical RID of the corresponding heap record.
- Clustered index: A clustered index does impose a logical ordering on the records within it and the ordering exactly matches the definition of the clustered index keys – known as the cluster key. The fastest way to find a particular record in a clustered index is to use the cluster key to navigate through the clustered index b-tree to find the exact record. So, non-clustered indexes over a clustered index include the cluster key of the base table record in each non-clustered index record. This is known as the logical RID. There is a twist though, depending on whether the clustered index is unique or not.
- Non-unique clustered index: Every record in a clustered index HAS to be unique, otherwise there would be no way to deterministically navigate to a particular record using the index b-tree. In a non-unique clustered index, SQL Server has to add a special column, called the uniquifier, to each record, so that if multiple records have the same cluster key values, the uniquifier column will be the tie-breaker. This uniquifier column is added as part of the cluster key, and so it is also present in all non-clustered index records as part of the logical RID.
- Unique clustered index: Every record in a unique clustered index is already unique and so no extra column is required in the cluster key.
Ok - background out of the way. Now let's discuss what happens to non-clustered indexes when various actions are performed to the base table.
SQL Server 2000
- Going from a heap to a clustered index: Changing the fundamental structure of a table from a heap to a clustered index means you're changing the way that records can be looked up in the table, from physical to logical. This means that all the physical RIDs that are stored in the non-clustered index records need to be changed to logical RIDs. So - in this case, all non-clustered indexes are rebuilt after the clustered index has been built.
- Going from a clustered index to a heap: This is the opposite of the case above - you're changing the way records are looked up from logical to physical. This means that all the logical RIDs that are stored in the non-clustered index records need to be changed to physical RIDs. So - in this case, all non-clustered indexes are rebuilt after the clustered index has been dropped.
- Rebuilding a unique clustered index: This operation has no effect on the cluster key, so there is no need for the non-clustered indexes to be rebuilt. Various early Service Packs of SQL Server 2000 had bugs that caused this behavior to be broken - and this is the cause of much of the confusion around this behavior.
- Rebuilding a non-unique clustered index: In SQL Server 2000, when a non-unique clustered index (which contains uniquifier columns) is rebuilt, all the uniquifier values are regenerated. This essentially means that the cluster keys have changed and so all the non-clustered indexes are rebuilt after the clustered index rebuild has completed.
- Changing the clustered index schema: This is simple - any change to the clustered index schema that changes the cluster keys will cause all the non-clustered indexes to be rebuilt.
SQL Server 2005
- Going from a heap to a clustered index: This is the same as SQL Server 2000 - all non-clustered indexes are rebuilt, regardless of whether the clustered index was created online or offline.
- Going from a clustered index to a heap: Again, all non-clustered indexes are rebuilt, regardless of whether the clustered index is dropped online or offline.
- Rebuilding a unique clustered index: This is the same as SQL Server 2000 - the cluster keys aren't changing and so the non-clustered indexes are not rebuilt.
- Rebuilding a non-unique clustered index: Aha! This is different from SQL Server 2000. SQL Server 2005 will RE-USE the old uniquifier values so the cluster keys don't change. This means that non-clustered indexes are NOT rebuilt in this case - that's very cool!
- Changing the clustered index schema:
- Changing the cluster key: This behavior has to be the same as SQL Server 2000 - any change to the cluster keys forces all the non-clustered indexes to be rebuilt after the new clustered index has been created.
- Changing anything else (e.g. partitioning it or moving it): This is one of the cases that confuses people in SQL Server 2005. Applying a partitioning scheme to a clustered index, or moving it to a different filegroup, doesn't change the cluster key at all - so there's no need to rebuild the non-clustered indexes.
So the rule of thumb to remember is: if the physical or logical RID contained in the non-clustered index records needs to change then all the non-clustered indexes need to be rebuilt. Its as simple as that.
Hopefully this will help to clear up some confusion - let me know of any questions!
Comments
Anonymous
June 07, 2007
Quite informative. Thank you, PaulAnonymous
June 07, 2007
Hi Paul, You mentioned in "changing the clustered index schema - changing anything esle" in SQL Server 2005 that if you move the clustered index to a different filegroup one would not have to rebuild the non-clustered indexes. What if the filegroup is in another drive, wouldn't the base table also be moved thus changing the physical RID? Thanks LeighAnonymous
June 11, 2007
This is a very helpful article. Your rule of thumb says it all. If we start thinking of rebuilding indexes in that manner, it's pretty straight forward then. Thank you, Pal, for this helpful information. TejasAnonymous
June 11, 2007
Hi Leigh, Moving a clustered index has no effect on non-clustered indexes as the RID that is included in the non-clustered index is purely logical - it has nothing to do with the physical location of the clustered index. ThanksAnonymous
June 12, 2007
We have a heap table with non-clustered indexes on it ,If we partition it and move the half the data & related index from current filegroup to other filegroup which is on other drive will it affect the non-clustered index ie will they be rebuilt . In 2000 does the non-clustered index (on Heap )needs a rebuilt when we move it to other file group on other disk .Anonymous
June 13, 2007
KS - yes - ANY time the physical or logical RID changes, non-clustered indexes have to be rebuilt. In your example, the physical RID of the heap rows will change when they're moved to the new filegroups. ThanksAnonymous
June 13, 2007
Last update: June 13 , 2007 Document version 0.6 Preface If you have something to add, or want to takeAnonymous
June 13, 2007
Last update: June 13 , 2007 Document version 0.6 Preface If you have something to add, or want to takeAnonymous
January 02, 2008
Moving a unique clustered index to a different filegroup, or anywhere else, doesn't affect the nonclustered indexes because the leaf level of the nonclsutered index references sthe clustering key of the clustered index, not a RID or logical RID. The engine works through the B-Tree of the clustered index using the clustering key. A search using a non-clustered index where there is a clsutered index, works through two B-Trees to get to the full data row: one search through the B-Tree for the nonclustered index, and one for the clustered index.Anonymous
August 29, 2013
Paul Your articles are always full of knowledge and really gives good insight into SQL server Storage engine and query processing .Thanks for contributingAnonymous
February 09, 2015
Hi paul, this is very informative. I have gone through most of your articles and training courses. You have provided the best information. Thanks for the myths course. It cleared out many of my questions in mind. And this information is also very useful. Thanks, Vinny