Loading data with foreign keys
Working on a customer project the other day, I needed to load some data into a database that had been scripted and re-created. The newly created database already had foreign keys in place which means for loading data (this was from SSIS), we had 2 options:
1) drop the foreign key constraints, load the data, re-apply the constraints
2) load the data in the correct order so that parent tables are loaded first, followed by children, followed by grandchildren, etc…
I decided to give option 2 a try. In our work we were dynamically generating an SSIS package in C# – so we had some flexibility in deciding how to load the data. So what I needed was a way to identify the correct order described in option #2.
The idea is fairly straightforward. We needed to find the parent tables that have references to them but don’t reference anything themselves. Then find all the objects that reference those tables, and so on… The system view sys.foreign_keys is our primary tool for this. We need to track changes at each level because a “child” table could reference tables at difference levels. In this case, we’re mostly interested in the lowest level reference to be sure all of it’s parent tables are loaded first. Finally, I found that with this method, self-referencing tables are problematic. In this version, I simply exclude those. If the data is loaded into that table in order, everything should be fine.
Here is the script I wrote for this – hope it helps:
DECLARE @FK_TABLE_ORDER TABLE ( FKLEVEL INT, OBJECTID INT ) DECLARE @CHILDCOUNT INT, @CURRLEVEL INT SELECT @CHILDCOUNT = 1, @CURRLEVEL = 0; WITH Top_Level_Tables AS ( SELECT t.object_id, t.schema_id, f.parent_object_id FROM sys.tables t INNER JOIN sys.foreign_keys f ON t.object_id = f.referenced_object_id ) INSERT INTO @FK_TABLE_ORDER SELECT DISTINCT 0, object_id from Top_Level_Tables WHERE object_id NOT IN (SELECT parent_object_id FROM Top_Level_Tables) SELECT @CHILDCOUNT = ( SELECT COUNT(*) FROM sys.foreign_keys WHERE referenced_object_id IN (SELECT OBJECTID FROM @FK_TABLE_ORDER WHERE FKLEVEL = @CURRLEVEL) ) WHILE (@CHILDCOUNT > 0) BEGIN SELECT @CURRLEVEL = @CURRLEVEL + 1; INSERT INTO @FK_TABLE_ORDER SELECT DISTINCT @CURRLEVEL, parent_object_id FROM sys.foreign_keys WHERE referenced_object_id IN (SELECT OBJECTID FROM @FK_TABLE_ORDER WHERE FKLEVEL = (@CURRLEVEL - 1) ) AND referenced_object_id IS NOT NULL AND referenced_object_id != parent_object_id SELECT @CHILDCOUNT = ( SELECT COUNT(*) FROM sys.foreign_keys WHERE referenced_object_id IN (SELECT OBJECTID FROM @FK_TABLE_ORDER WHERE FKLEVEL = @CURRLEVEL) ) END INSERT INTO @FK_TABLE_ORDER SELECT 0, t.object_id FROM sys.tables t WHERE t.object_id NOT IN (SELECT OBJECTID FROM @FK_TABLE_ORDER) SELECT OBJECT_NAME(OBJECTID) as 'Table Name', MAX(FKLEVEL) as 'Load Order' FROM @FK_TABLE_ORDER GROUP BY OBJECTID ORDER BY [Load Order] ASC
-Jay
Comments
- Anonymous
August 03, 2012
This is a very useful trick. I use it for deleting the contents of tables in a database ready for loading fresh test data. - Anonymous
August 09, 2012
This works great - thanks for sharing! I've been meaning to write a recursive lookup such as this for data loaded from 1 copy of a DB to another for a subset of data and this will be a great starter for that. Will also be good for what Phil Factor mentions...