Managing Schema Changes (Part 2)

Artigo
03/31/2006

SQLCAT Blog: Managing Schema Changes (Part 2)

Question: What are best practices for managing schema changes in SQL Server 2005? What are the fastest, least intrusive techniques for high availability?

Answer: In Part 1, we outlined the behavior of SQL Server 2005 schema changes. In Part 2, we will look at best practices techniques to de-couple the batch updates required (see “Managing Schema Changes Part 1”) for data type and most length changes. These techniques allow us to make schema changes quickly and with least impact to performance and concurrency for large tables like the Billion row scenario.

Scenario: Let’s assume we have a 5 character ZipCode column that we want to increase to 10 character format e.g. 12345-0001. We can change the ZipCode length from char(5) to char (10) using:

alter table alter column ZipCode Char(10)

However, this will incur a concurrency crippling batch update (as seen in Profiler) to change the column length in our Billion row table scenario. If you’d like to capture this in Profiler, see “Managing Schema Changes (Part 1).

Best Practice

The use of a NULL property (excluding DEFAULT WITH VALUES of course) is a best practice insofar as columns can be added quickly. In the case where you do need to set a value to the column, the NULL property also gives us the flexibility of de-coupling the update from the alter table statement.

An approach to accomplish this:

Add a char(10) column, NewZip
Set piecemeal batch size
Populate NewZip using an explicit conversion of ZipCode
Drop old column ZipCode.
Rename NewZip to ZipCode (optional)

Step 1: Add new column

alter table MyTable

add NewZip char(10) NULL

Upon the fast completion of this alter table (about 1 second in my tests), NewZip contains NULLs for all rows in the existing table. You may need to populate NewZip with the values from another column or other values. In step 2, SET ROWCOUNT is used to minimize the concurrency impact of write locks by limiting the number of rows that are affected at one time. An alternative to SET ROWCOUNT is a WHERE clause that specifies certain key ranges, limiting the number of rows affected at one time.

Step 2: Set Batch Size for update

--- (2) set piecemeal batch size

set nocount on

SET ROWCOUNT 1000

Next, the piecemeal update is repeatedly done until there are no NULL values in NewZip. This example sets all rows using the char(5) ZipCode column. Alternately, you can selectively set the value of an added column using a lookup or join.

Step 3: De-coupled Piecemeal Update

declare @rowcount int, @batchcount int, @rowsupdated int

select @rowcount = 1,@batchcount = 0, @rowsupdated = 0

while @rowcount > 0

begin

begin tran

update MyTable

set NewZip = ZipCode

where NewZip is NULL

set @rowcount = @@rowcount

select @rowsupdated = @rowsupdated + @rowcount

if @rowcount > 0 select @batchcount = @batchcount + 1

commit

end

Step 4: drop the char(5) column ZipCode (optional)

alter table MyTable

drop column ZipCode

Step 5: Rename the NewZip column to ZipCode (optional)

exec sp_rename 'MyTable.NewZip','ZipCode','COLUMN'

--- Observe the data type for ZipCode is now char(10)

sp_help MyTable

Trigger for ZipCode

What happens if a user changes a customer ZipCode AFTER the NewZip update? If the user application updates ZipCode char(5) during the piecemeal update process, NewZip may have already been updated with the old zip code value, and thus is out of synch with ZipCode. An update trigger can be used to keep ZipCode and NewZip in synch during the piecemeal update process. If you intend to drop the old char(5) ZipCode (in Step 4), and then rename NewZip to ZipCode (in Step 5), the update trigger will no longer be needed after the piecemeal update.

Explicit data type conversion

Some data type changes require explicit conversion. That is, they cannot be implicitly converted using alter table alter column. For example, an explicit conversion is required to change a datetime datatype to an int or numeric(8) YYYYMMDD format. This scenario can be handled using a convert (or datepart) function in exactly the same manner as the piecemeal update scenario above.

Dropping columns

Dropping columns is a schema only operation. There are no concurrency issues unless you move the object to reclaim space from the dropped column.

Unique and Primary Key Constraints

Adding Unique and Primary Key constraints involves reads, sorts, and writes. By default UNIQUE constraints add nonclustered indexes while PRIMARY KEY adds a clustered index. Obvious performance and concurrency issues would occur when you add a PRIMARY KEY (and default CLUSTERED INDEX) to a large table that already has UNIQUE (and nonclustered index) constraints. This is because adding a Primary Key (and clustered index) would rebuild the nonclustered indexes replacing RIDs (row IDs) with Primary Keys.

Alter table, create and alter index includes ONLINE options for managing unique and primary key constraints and indexes. Where possible, for best performance, highest concurrency and availability, ONLINE options should be used for managing indexes. Otherwise, it would wise to postpone these operations to batch maintenance windows.

Conclusions and best practices:

SQL Server 2005 does support schema changes however some schema changes are faster than others. The fastest schema changes occur when:

Adding column with NULL property
Adding column with NULL property and DEFAULT
Changing NOT NULL to NULL property
Adding DEFAULT constraint
Dropping CHECK or DEFAULT constraint

A schema change best practice for performance and concurrency of large tables includes adding columns with NULL properties. In the case where you need to set a value to the newly added column, starting with a NULL property provides the flexibility where the update can be de-coupled from the alter table statement. Far preferable to a batch update that can significantly affect concurrency on large tables; a piecemeal update can be used to limit the number of rows updated at one time. One technique to limit the number of rows is SET ROWCOUNT. This greatly reduces (or effectively eliminates) the blocking impact of the piecemeal update. If it is important to disallow NULL values after the piecemeal update completes, the application would be have to enforce this.

The inclusion of CHECK constraints requires a read of the entire table to enforce the CHECK constraint. The accompanying table scan can impact write activity for a large table.

Other schema changes present performance and concurrency challenges for large tables due to accompanying batch update operations. Those include:

Adding NULL column with DEFAULT and WITH VALUES clauses
Adding NOT NULL column
Changing NULL to NOT NULL property

Using the Piecemeal Update best practice with NULL columns allows us to avoid the undesirable associated activities occurring with some schema changes such as unintended reads and writes, enabling us to utilize the fastest, least intrusive techniques for high concurrency and availability.

Tom Davidson

SQL Server Customer Advisory Team

Comments

Anonymous
April 03, 2006
Itzik Ben-Gan is offering the following tip in his upcoming "Inside SQL Server 2005 - TSQL Programming" book (should be available in May 2006): "You can use a simple test to check whether a certain schema change requires
physical access to the data and might take a while, or is merely a metadata change.
Turn on the STATISTICS IO option in your session and perform the schema change
against a small test table. If no IO is reported, you know that the change did not
involve access to the base data, and therefore is fast." I would like only to add to it - don't forget to load some rows into your test table. If you are not sure if it makes a difference or not, use the same schema for your test table as you have on your production table.
Anonymous
April 05, 2006
You probably ought to include an index on your NewZip column - I suspect executing the batch a million times for a billion rows would benefit from this.
Anonymous
May 17, 2006
I want to add a new column at a particular position in table without droping/recreating table. How to do it?
Anonymous
June 04, 2006
Why would an index help on the newZip column? I actually think it would hurt since you are doing inserts to the table only during the piecemeal update, so an index would slow it down some, no?
Anonymous
June 10, 2006
We’ve done related types of changes such as this with great success on tables (one very wide) comprised of hundreds of millions of rows—yet we were never successful in implementing a mechanism which wasn’t intrusive. Instead, our protocol involved creating a de novo table with I/U/D triggers on the original table which populated an auditing table to track changes. This has the benefit of re-claiming unused space, yet your article suggests that additional thought might reveal less intrusive means. This work was done on a SQL Server 2000 OLTP db.

I’m curious about your choice of the SET ROWCOUNT statement. In my own experiments--admittedly not 100% comprehensive--creative use of TOP has been significantly more performant (again, this was SQL Server 2000).
Anonymous
February 04, 2007
Σε προηγούμενο post Efficient schema changes είχα αναφέρει το πόσο χρονοβόρα μπορεί να γίνει μια διαδικασία
Anonymous
August 20, 2007
The comment has been removed
Anonymous
May 29, 2009
PingBack from http://paidsurveyshub.info/story.php?title=microsoft-sql-server-development-customer-advisory-team-managing
Anonymous
July 23, 2010
More attention needs to be drawn to indexes and the changing of a datatype or the dropping of a column. If indexes are not scripted out beforhand, a schema change may take much longer then desired as the change propogates through the indexes. A strategy for minimizing index operations should be completed before modifying the table.
Anonymous
April 27, 2017
What about column order? In reality, many applications are developed with INSERT statements that rely on the order of columns for inserting values. In that case, code would have to be updated.
- Anonymous
  April 27, 2017
  Good point. We generally recommend adding columns at the 'end' of the table on top of all the other guidelines in this post.

Partilhar via

Managing Schema Changes (Part 2)

Comments

Recursos adicionais