Partilhar via


Managing Schema Changes (Part 2)

SQLCAT Blog:  Managing Schema Changes (Part 2)

 

Question: What are best practices for managing schema changes in SQL Server 2005?  What are the fastest, least intrusive techniques for high availability? 

 

Answer:  In Part 1, we outlined the behavior of SQL Server 2005 schema changes.  In Part 2, we will look at best practices techniques to de-couple the batch updates required (see “Managing Schema Changes Part 1”) for data type and most length changes.  These techniques allow us to make schema changes quickly and with least impact to performance and concurrency for large tables like the Billion row scenario.

 

Scenario:   Let’s assume we have a 5 character ZipCode column that we want to increase to 10 character format e.g. 12345-0001.  We can change the ZipCode length from char(5) to char (10) using:

 

alter table alter column ZipCode Char(10)

 

However, this will incur a concurrency crippling batch update (as seen in Profiler) to change the column length in our Billion row table scenario.  If you’d like to capture this in Profiler, see “Managing Schema Changes (Part 1).

Best Practice

The use of a NULL property (excluding DEFAULT WITH VALUES of course) is a best practice insofar as columns can be added quickly.  In the case where you do need to set a value to the column, the NULL property also gives us the flexibility of de-coupling the update from the alter table statement. 

 

An approach to accomplish this: 

  1. Add a char(10) column, NewZip
  2. Set piecemeal batch size
  3. Populate NewZip using an explicit conversion of ZipCode
  4. Drop old column ZipCode.
  5. Rename NewZip to ZipCode (optional)

 

Step 1: Add new column

 

alter table MyTable

add NewZip char(10) NULL

Upon the fast completion of this alter table (about 1 second in my tests), NewZip contains NULLs for all rows in the existing table.  You may need to populate NewZip with the values from another column or other values.  In step 2, SET ROWCOUNT is used to minimize the concurrency impact of write locks by limiting the number of rows that are affected at one time.  An alternative to SET ROWCOUNT is a WHERE clause that specifies certain key ranges, limiting the number of rows affected at one time.

 

Step 2: Set Batch Size for update

 

--- (2) set piecemeal batch size

set nocount on

SET ROWCOUNT 1000

 

Next, the piecemeal update is repeatedly done until there are no NULL values in NewZip.  This example sets all rows using the char(5) ZipCode column.  Alternately, you can selectively set the value of an added column using a lookup or join.

 

Step 3: De-coupled Piecemeal Update

declare @rowcount int, @batchcount int, @rowsupdated int

select @rowcount = 1,@batchcount = 0, @rowsupdated = 0

while @rowcount > 0

begin

      begin tran

            update MyTable

            set NewZip = ZipCode

            where NewZip is NULL

            set @rowcount = @@rowcount

            select @rowsupdated = @rowsupdated + @rowcount

            if @rowcount > 0 select @batchcount = @batchcount + 1

      commit

end

Step 4: drop the char(5) column ZipCode (optional)

alter table MyTable

drop column ZipCode

go

Step 5: Rename the NewZip column to ZipCode (optional)

exec sp_rename 'MyTable.NewZip','ZipCode','COLUMN'

go

--- Observe the data type for ZipCode is now char(10)

sp_help MyTable

Trigger for ZipCode

What happens if a user changes a customer ZipCode AFTER the NewZip update?   If the user application updates ZipCode char(5) during the piecemeal update process, NewZip may have already been updated with the old zip code value, and thus is out of synch with ZipCode.  An update trigger can be used to keep ZipCode and NewZip in synch during the piecemeal update process.  If you intend to drop the old char(5) ZipCode (in Step 4), and then rename NewZip to ZipCode (in Step 5), the update trigger will no longer be needed after the piecemeal update.

Explicit data type conversion

Some data type changes require explicit conversion.  That is, they cannot be implicitly converted using alter table alter column.   For example, an explicit conversion is required to change a datetime datatype to an int or numeric(8) YYYYMMDD format.  This scenario can be handled using a convert (or datepart) function in exactly the same manner as the piecemeal update scenario above.

Dropping columns

Dropping columns is a schema only operation.  There are no concurrency issues unless you move the object to reclaim space from the dropped column.

 

Unique and Primary Key Constraints

Adding Unique and Primary Key constraints involves reads, sorts, and writes.  By default UNIQUE constraints add nonclustered indexes while PRIMARY KEY adds a clustered index.  Obvious performance and concurrency issues would occur when you add a PRIMARY KEY (and default CLUSTERED INDEX) to a large table that already has UNIQUE (and nonclustered index) constraints.  This is because adding a Primary Key (and clustered index) would rebuild the nonclustered indexes replacing RIDs (row IDs) with Primary Keys.

 

Alter table, create and alter index includes ONLINE options for managing unique and primary key constraints and indexes.  Where possible, for best performance, highest concurrency and availability, ONLINE options should be used for managing indexes.  Otherwise, it would wise to postpone these operations to batch maintenance windows.

 

Conclusions and best practices:

SQL Server 2005 does support schema changes however some schema changes are faster than others.  The fastest schema changes occur when:

  1. Adding column with NULL property
  2. Adding column with NULL property and DEFAULT
  3. Changing NOT NULL to NULL property
  4. Adding DEFAULT constraint
  5. Dropping CHECK or DEFAULT constraint

 

A schema change best practice for performance and concurrency of large tables includes adding columns with NULL properties.  In the case where you need to set a value to the newly added column, starting with a NULL property provides the flexibility where the update can be de-coupled from the alter table statement.  Far preferable to a batch update that can significantly affect concurrency on large tables; a piecemeal update can be used to limit the number of rows updated at one time.  One technique to limit the number of rows is SET ROWCOUNT.  This greatly reduces (or effectively eliminates) the blocking impact of the piecemeal update.  If it is important to disallow NULL values after the piecemeal update completes, the application would be have to enforce this.

 

The inclusion of CHECK constraints requires a read of the entire table to enforce the CHECK constraint.  The accompanying table scan can impact write activity for a large table.

 

Other schema changes present performance and concurrency challenges for large tables due to accompanying batch update operations.  Those include:

  1. Adding NULL column with DEFAULT and WITH VALUES clauses
  2. Adding NOT NULL column
  3. Changing NULL to NOT NULL property

 

Using the Piecemeal Update best practice with NULL columns allows us to avoid the undesirable associated activities occurring with some schema changes such as unintended reads and writes, enabling us to utilize the fastest, least intrusive techniques for high concurrency and availability.

 

Tom Davidson

SQL Server Customer Advisory Team

Comments

  • Anonymous
    April 03, 2006
    Itzik Ben-Gan is offering the following tip in his upcoming "Inside SQL Server 2005 - TSQL Programming" book (should be available in May 2006): "You can use a simple test to check whether a certain schema change requires
    physical access to the data and might take a while, or is merely a metadata change.
    Turn on the STATISTICS IO option in your session and perform the schema change
    against a small test table. If no IO is reported, you know that the change did not
    involve access to the base data, and therefore is fast." I would like only to add to it - don't forget to load some rows into your test table. If you are not sure if it makes a difference or not, use the same schema for your test table as you have on your production table.

  • Anonymous
    April 05, 2006
    You probably ought to include an index on your NewZip column - I suspect executing the batch a million times for a billion rows would benefit from this.

  • Anonymous
    May 17, 2006
    I want to add a new column at a particular position in table without droping/recreating table. How to do it?

  • Anonymous
    June 04, 2006
    Why would an index help on the newZip column? I actually think it would hurt since you are doing inserts to the table only during the piecemeal update, so an index would slow it down some, no?

  • Anonymous
    June 10, 2006
    We’ve done related types of changes such as this with great success on tables (one very wide) comprised of hundreds of millions of rows—yet we were never successful in implementing a mechanism which wasn’t intrusive.  Instead, our protocol involved creating a de novo table with I/U/D triggers on the original table which populated an auditing table to track changes.  This has the benefit of re-claiming unused space, yet your article suggests that additional thought might reveal less intrusive means.  This work was done on a SQL Server 2000 OLTP db.  

    I’m curious about your choice of the SET ROWCOUNT statement.  In my own experiments--admittedly not 100% comprehensive--creative use of TOP has been significantly more performant (again, this was SQL Server 2000).

  • Anonymous
    February 04, 2007
    Σε προηγούμενο post Efficient schema changes είχα αναφέρει το πόσο χρονοβόρα μπορεί να γίνει μια διαδικασία

  • Anonymous
    August 20, 2007
    The comment has been removed

  • Anonymous
    May 29, 2009
    PingBack from http://paidsurveyshub.info/story.php?title=microsoft-sql-server-development-customer-advisory-team-managing

  • Anonymous
    July 23, 2010
    More attention needs to be drawn to indexes and the changing of a datatype or the dropping of a column.  If indexes are not scripted out beforhand, a schema change may take much longer then desired as the change propogates through the indexes.  A strategy for minimizing index operations should be completed before modifying the table.

  • Anonymous
    April 27, 2017
    What about column order? In reality, many applications are developed with INSERT statements that rely on the order of columns for inserting values. In that case, code would have to be updated.

    • Anonymous
      April 27, 2017
      Good point. We generally recommend adding columns at the 'end' of the table on top of all the other guidelines in this post.