SQL Server Schema Design
I have seen some strange schemas in my time which look like a good idea on paper but not on disk. A common scenario is the schema that is created by a tool controlled by a user and so we end up with columns like user21 in usertabel7 and so on. Then I got an e-mail about the poor performance of user defined functions (UDFs) following Simon Sabin's talk on the CLR and how good they could be for replacing T-SQL UDF's at the SQL Community event last Thursday .
The database involved in the problem had several tables where generic columns where used for different purposes. Taking customer as an example there was a separate table CustomerStrings that looked like this:
Column | Type |
CustomerID | int not null PK |
CustomerStringTypeID | int not null |
StringValue | varchar(50) |
CustomerID is foreign key to the main customer table and CustomerStringTypeID is another foreign key, this time to CustomerStringType which has an ID and a description, with values in the description such as 'Fax', 'Mobile', 'e-mail' etc.
So the CustomerString table would have multiple rows for the same customer for example a row for Fax, mobile, email and so on. The good thing about this is that rows would only be created where a given attribute for a customer was used and so would be efficient on space.
Getting back all the e-mail addresses for a customer wouldn't be too bad either depending on how this was implemented and this is where the developer tried to implement a generic solution with a UDF:
CREATE FUNCTION dbo.GetCustomerString(@customerID int, @customerStringTypeID int) RETURNS varchar(50) AS BEGIN
DECLARE @returnValue varchar(50);
SELECT @returnValue = CustomerStrings.StringValue
FROM CustomerStrings
WHERE CustomerStrings.CustomerID = @customerID
AND CustomerStrings.CustomerStringTypeID = @customerStringTypeID
RETURN @returnValue
END;
The problem arises when this function is used in-line on multiple occasions to get three attributes for every customer:
SELECT
cst.LastName,
dbo.GetCustomerString(Customers.CustomerID, 1) AS Fax,
dbo.GetCustomerString(Customers.CustomerID, 2) AS mobile,
dbo.GetCustomerString(Customers.CustomerID, 3) AS email FROM dbo.Customers cst;
So the UDF gets called three times for every row in the customer table which is always going to be much slower than using a set based operation for example:
SELECT
cst.LastName
fax.StringValue as Fax
mob.StringValue as Mobile
eml.StringValue as email
FROM dbo.Customers cst
LEFT JOIN dbo.CustomerStrings fax ON cst.CustomerID = fax.CustomerID AND fax.CustomerStringTypeID =1
LEFT JOIN dbo.CustomerStrings mob ON cst.CustomerID = mob.CustomerID AND fax.CustomerStringTypeID =2
LEFT JOIN dbo.CustomerStrings fax ON cst.CustomerID = eml.CustomerID AND eml.CustomerStringTypeID =3;,
However maintaining all the code for the possible combinations of attributes required is a big headache for this company so how can they get the speed of the second query without writing mountains of code. In my opinion the schema is the problem here. I would either create sufficient columns in the customer table for all the attributes required or using another group of tables with one row per customer to store contact details etc. Of course if there loads of columns required (as in SharePoint) many of which are usually empty then it's time to have a look at Sparse columns in SQL Server 2008 and that is what I will be posting about next..
Technorati Tags: SQL Server,User Defined Functions
Comments
Anonymous
March 03, 2008
I understand what the original database architect was trying to accomplish with this schema. I would have never designed anything like that myself however, until I began reading The Data Model Resource Book, Revised Edition, Volume 1, A Library of Universal Data Models For All Enterprises ISBN: 978-0-471-38023-8. This is similar to a modeling technique show initially in the 2nd chapter of this book in that there is a Value Column with a Type Column. The end goal is to create a very flexible data model. Consider going to the attribute approach you mentioned. It's easy to decide to make columns for Phone, Cell Phone, Pager, Email, Home Phone, Work Phone... etc. However, what happens when someone has multiple work phones or cell phones? The attribute approach quickly falls apart. I'm not directly advocating one approach over the other, but I'm very interested in exploring the approach from the original schema as a viable option. Exploring meaning implementing this from the physical data model to the webpage/form UI to reporting to importing data. Perhaps this is a discussion better suited for the forums of the book above, but regardless the parallels are worth noting.Anonymous
March 03, 2008
I understand what the original database architect was trying to accomplish with this schema. I would have never designed anything like that myself however, until I began reading The Data Model Resource Book, Revised Edition, Volume 1, A Library of Universal Data Models For All Enterprises ISBN: 978-0-471-38023-8. This is similar to a modeling technique show initially in the 2nd chapter of this book in that there is a Value Column with a Type Column. The end goal is to create a very flexible data model. Consider going to the attribute approach you mentioned. It's easy to decide to make columns for Phone, Cell Phone, Pager, Email, Home Phone, Work Phone... etc. However, what happens when someone has multiple work phones or cell phones? The attribute approach quickly falls apart. I'm not directly advocating one approach over the other, but I'm very interested in exploring the approach from the original schema as a viable option. Exploring meaning implementing this from the physical data model to the webpage/form UI to reporting to importing data. Perhaps this is a discussion better suited for the forums of the book above, but regardless the parallels are worth noting.Anonymous
March 03, 2008
Yep - this is a classic example of an EAV design, and "strange" is a kind adjective to use for it.