Table structure changes
I often receive questions about how we propagate changes to table structure in Visual Studio Team Edition for Database Professionals (VSDBPro)?
When you allow users to make arbitrary schema changes, the problem is that they expect miracles to happen when these need to be deployed. We are trying very hard to make life as easy as possible, but we are restricted in what we can do at deployment time based on the underlying support provided by the database. So lets go back to the basics for a moment. SQL Server, for the longest time, has been offering a small and very restrictive set of DDL operations when it comes to making physical table structure changes.
The basic 5 primitives:
- CREATE TABLE
- DROP TABLE
- ALTER TABLE ALTER COLUMN (for changing data types on a column)
- ALTER TABLE ALTER COLUMN ADD (for adding NOT NULL columns or NULL columns with a DEFAULT value (constraint), at the end of the table)
- ALTER TABLE ALTER COLUMN DROP (for removing a column)
As you can see the number of primitives we have to work with to deploy you changes is not very extensive. This is the reason why lots of tools fall back to the CREATE-COPY-DROP-RENAME (CCDR) strategy, for example Table Designer inside SQL Server Management Studio uses this method exclusively. This means they will:
- Create a new table based on the new table structure using a temporary name
- Copy all data from the old table in to the new table using an INSERT SELECT statement
- Drop the old table structure
- Rename the new table back to the old table name (that was dropped in step 3)
VSDBPro first tries to deploy the table changes using an ALTER TABLE statement, however when we cannot implement the changes using the 5 primitives mentioned above, we also have to fall back the CCDR strategy, because that is the only remaining option. VSDBPro does execute the CCDR (Create-Copy-Drop-Rename) inside a transaction to maintain schema consistency in case a failure would occur. But it is almost needless to say that the CCDR strategy is not the most efficient way of deploying schema changes in a production environment where you are dealing with large numbers of rows inside a table.
In case it is not obvious, here are some of the CCDR drawbacks:
- You need double the data space to make the table structure changes
- Causing additional T-LOG overhead
- And possible causes additional page fragmentation in table and index structures
- You lose the security context set on the original table
- Security is related to the object_id
- The new table has a new object_id
- You are to the old table name, you need to reapply the security context to the table when using a CCDR deployment strategy
So the question is how to prevent this?
The only real way to prevent this is to restrict the allowed changes on the table structure! However that does not go over very well with the consumers of the schema, most application developers want to see columns in certain order, with certain names etc. So how can we achieve this?
Best practices
The best practices therefore is to always abstract your physical structures. This means that you have VIEWs abstracting all tables. This results in a minimum of one VIEW per TABLE.
At that point, you restrict yourself to only making structural changes that can be implemented using the ALTER TABLE primitives. The rest of the changes are reflected in the VIEW.
The goal is to disambiguate the VIEW as much as possible, this means you need to specify a full column list. You are not allowed to use SELECT *, because you need to guarantee the ordinal position of each column inside the result set. Each column will have an ALIAS to reflect the column name. This abstracts you from name changes in the underlying table structure. The base view only reflects a single table, you can create more views that abstract tables, but at a minimum you should start with a 1-on-1 mapping from the table to the view.
An example view
The following shows an example of how the initial version of the view will look, given a table structure. The view and the table structure will evolve over time, with the goal being to always have the most efficient (and therefore fastest) way to deploy physical structure changes and isolate the application as much as possible from breaking changes.
Table definition
CREATE TABLE [Person].[Address]
(
[AddressID] int NOT NULL IDENTITY(1, 1) NOT FOR REPLICATION
,[AddressLine1] nvarchar(60) NOT NULL
,[AddressLine2] nvarchar(60) NULL
,[City] nvarchar(30) NOT NULL
,[StateProvinceID] int NOT NULL
,[PostalCode] nvarchar(15) NOT NULL
,[rowguid] uniqueidentifier NOT NULL ROWGUIDCOL
,[ModifiedDate] datetime NOT NULL
) ON [PRIMARY];
View definition
CREATE VIEW [Person].[vAddress]
AS
SELECT [PA].[AddressID] AS [AddressID]
,[PA].[AddressLine1] AS [AddressLine1]
,[PA].[AddressLine2] AS [AddressLine2]
,[PA].[City] AS [City]
,[PA].[StateProvinceID] AS [StateProvinceID]
,[PA].[PostalCode] AS [PostalCode]
,[PA].[rowguid] AS [rowguid]
,[PA].[ModifiedDate] AS [ModifiedDate]
FROM [Person].[Address] AS [PA];
Let walk through the various changes you want to be able to make and how to implement them:
Adding a column:
- Add the column to the TABLE. You always add the column at the end of the table structure, never in the middle or the beginning.
- You update the VIEW abstracting the table and placing the column at the ordinal required.
- Since you can only add NULLABLE columns or columns with a default value the VIEW can also be used to add additional logic that translates the column value of the added column by using functions.
Removing a column:
- First, remove the column from the VIEW.
- This enables you to make the application changes that are needed and then deploy these changes without having to change the underlying table structure.
- Alternatively, you could introduce an additional VIEW that reflects the old structure. This could be used to prevent applications from breaking. This is only applicable if you can provide values in the VIEW definition that are usable/expected by the applications that consumes the VIEW. It also requires you to start thinking about how to version and appropriately name these VIEWs.
- Then remove the column from the physical TABLE.
Changing a the data type of a column:
This type of change falls into two possible categories: the first is where the data type change is supported through an automatic implicit conversion by SQL Server and the second is where you need an explicit conversion to handle the conversion of the type. The implicit conversion matrix is documented as part of the SQL Server CAST and CONVERT statement.
- Assuming an implicit data type conversion is available
- First you change the VIEW, wrapping the column that you are changing with a CAST or CONVERT statement so the resulting type is the new data type.
- This allows you the make the application changes needed and deploy these changes without having to change the underlying table structure.
- Alternatively you could introduce an additional VIEW so you keep the one that reflects the old structure and create a new one with the new structure to prevent application breakage. This is only applicable if you can provide values in the VIEW definition that are usable/expected by the application consuming the VIEW.
- Next you change the table structure by using ALTER TABLE ALTER COLUMN and you have to remove the CAST or CONVERT statement that was introduced in the VIEW in step 1. You must do this because this CAST or CONVERT would now converts to the same type as the underlying column in the table.
- If you chose to introduce a compatibility view, you must make sure that this VIEW reflects the old data type by applying the reverse CAST or CONVERT to reflect the previous data type used.
- First you change the VIEW, wrapping the column that you are changing with a CAST or CONVERT statement so the resulting type is the new data type.
- If explicit data type conversion is required
- You must first add a column to the underlying table structure that uses the new data type.
- NOTE: A key question is if you will have enough available space in the row to add the new column and store all the data, before you can remove the old column! If you do not have enough space in the row to hold both the old and new copy of the column, you will need to introduce a helper table that has a 1:1 relationship with the table and holds the PK, old and new column data, then you can make all changes in that table first, remove the old column from the table before adding the new one and copy that data back from the helper table, this is tedious but most of the time more efficient then using a CCDR strategy and the VIEW allows you to temporally join with the helper table presenting a single table instead of two.
- Next you write some explicit conversion logic to UPDATE the new column value with the result of the explicit converted data.
- You update the VIEW to reflect the new data type by mapping the original column to the newly added column, so the old column is no longer referenced from the VIEW.
- Now you can remove the old data column from the underlying table.
- You must first add a column to the underlying table structure that uses the new data type.
Renaming a column:
- In order to rename a column you simply change the alias in the VIEW that wraps the table
Naming conventions
The next interesting question is how to name these views? This really depends on what you want and when you start using VIEWs that abstract your tables. When I have the freedom, I normally like to name the VIEWs that wrap my tables like I would normally name my tables and name the tables with some prefix. Some people choose to use different schemas to keep them apart but use the same name. Personally, I do not like this approach, but with all naming conventions, there is a lot of personal preference involved. The key is to have a standard that works for you. The method of abstracting tables using views, works regardless of the naming convention used, but you can make your life (and the lives of the people consuming the schema objects) a lot easier by having a consistent and easy to understand naming convention.
Summary
Abstracting your physical table structure using VIEWs can result in a more efficient and easier deployment of your schema changes without having to sacrifice any flexibility when it comes to making the requested and required schema changes that are based on the needs of your schema consumers.
-GertD
"DataDude" Software Architect
Comments
Anonymous
July 11, 2007
Good read! Using views does mean extra work and room for errors, which can be avoided with CCDR.Anonymous
July 12, 2007
Great post Gert. I hope to make this required reading for all my colleagues :)Anonymous
July 12, 2007
Gert Drapers has posted a really good article today which you can read here . In it he suggests a bestAnonymous
July 13, 2007
Interesting post Gert. I've always wanted to use views more than I have - someone always seems to complain that "views are eeeeeevil" (I'm paraphrasing). How would you respond in this case? Are there performance penalties? Also, when you said "best practice" did you mean in every case or were you thinking for only cases where the tables are large and the schema dynamic?Anonymous
July 17, 2007
Dave McKinstry on Part II - Build, Deploy and Database Connections. GertD on Table Structure Changes....Anonymous
February 18, 2008
The comment has been removedAnonymous
September 08, 2008
The comment has been removedAnonymous
March 30, 2009
Continuing with the VSTS: DB Best Practices post series, let’s take a look at managing data motion inAnonymous
March 01, 2010
I am curious about this situation: Say I have a table and I want to rename a field A to B. Then, this is all still in development, maybe a few weeks later I want to add a column A to that table, obviously a different purpose but the same data type and name. I have run into this scenario! When I go to deploy the changes to production, which doesn't have either change, I will end up with the data in column B in column A. Ideas?