Compartilhar via


Loading data with foreign keys

 

Working on a customer project the other day, I needed to load some data into a database that had been scripted and re-created.  The newly created database already had foreign keys in place which means for loading data (this was from SSIS), we had 2 options:

1) drop the foreign key constraints, load the data, re-apply the constraints
2) load the data in the correct order so that parent tables are loaded first, followed by children, followed by grandchildren, etc…

I decided to give option 2 a try.  In our work we were dynamically generating an SSIS package in C# – so we had some flexibility in deciding how to load the data.  So what I needed was a way to identify the correct order described in option #2. 

The idea is fairly straightforward.  We needed to find the parent tables that have references to them but don’t reference anything themselves.  Then find all the objects that reference those tables, and so on…  The system view sys.foreign_keys is our primary tool for this.  We need to track changes at each level because a “child” table could reference tables at difference levels.  In this case, we’re mostly interested in the lowest level reference to be sure all of it’s parent tables are loaded first.  Finally, I found that with this method, self-referencing tables are problematic.  In this version, I simply exclude those.  If the data is loaded into that table in order, everything should be fine.

Here is the script I wrote for this – hope it helps:

 

 DECLARE @FK_TABLE_ORDER TABLE
(
    FKLEVEL INT,
    OBJECTID INT
)

DECLARE @CHILDCOUNT INT, @CURRLEVEL INT
SELECT @CHILDCOUNT = 1, @CURRLEVEL = 0;

WITH Top_Level_Tables
AS
(
    SELECT
        t.object_id,
        t.schema_id,
        f.parent_object_id
    FROM sys.tables t
    INNER JOIN sys.foreign_keys f
    ON t.object_id = f.referenced_object_id
)
INSERT INTO @FK_TABLE_ORDER
SELECT DISTINCT 0, object_id from Top_Level_Tables
    WHERE object_id NOT IN (SELECT parent_object_id FROM Top_Level_Tables)

SELECT @CHILDCOUNT = ( SELECT COUNT(*) FROM sys.foreign_keys 
                        WHERE referenced_object_id IN 
                        (SELECT OBJECTID FROM @FK_TABLE_ORDER WHERE FKLEVEL = @CURRLEVEL)
                     )

WHILE (@CHILDCOUNT > 0)
 BEGIN
    
    SELECT @CURRLEVEL = @CURRLEVEL + 1;

    INSERT INTO @FK_TABLE_ORDER
    SELECT DISTINCT
        @CURRLEVEL,
        parent_object_id
    FROM sys.foreign_keys 
    WHERE referenced_object_id IN 
        (SELECT OBJECTID FROM @FK_TABLE_ORDER
            WHERE FKLEVEL = (@CURRLEVEL - 1)
        ) AND
        referenced_object_id IS NOT NULL AND
        referenced_object_id != parent_object_id

    SELECT @CHILDCOUNT = ( SELECT COUNT(*) FROM sys.foreign_keys    
                            WHERE referenced_object_id IN 
                            (SELECT OBJECTID FROM @FK_TABLE_ORDER WHERE FKLEVEL = @CURRLEVEL)
                         )

    
 END

 INSERT INTO @FK_TABLE_ORDER
 SELECT 0, t.object_id FROM sys.tables t
 WHERE t.object_id NOT IN (SELECT OBJECTID FROM @FK_TABLE_ORDER)

 SELECT OBJECT_NAME(OBJECTID) as 'Table Name', MAX(FKLEVEL) as 'Load Order'
     FROM @FK_TABLE_ORDER GROUP BY OBJECTID 
     ORDER BY [Load Order] ASC

-Jay

Comments

  • Anonymous
    August 03, 2012
    This is a very useful trick. I use it for deleting the contents of tables in a database ready for loading fresh test data.
  • Anonymous
    August 09, 2012
    This works great - thanks for sharing!  I've been meaning to write a recursive lookup such as this for data loaded from 1 copy of a DB to another for a subset of data and this will be a great starter for that.  Will also be good for what Phil Factor mentions...