Freigeben über


SYSK 60: How Does SqlBulkCopy Deal with Duplicate Records?

The other day, I was asked the following question “Does the SqlBulkCopy update records if they already exist or must it be used only for INSERTS only?”  So, I thought others might benefit from knowing the answer…

SqlBulkCopy is nothing more than a more efficient INSERT…  That means that if there is an identical row in the destination table, it will be repeated.  That is, if it’s possible given the table constrains.

Q:  What happens if you have an identity primary key that’s the same as a primary key of a record being inserted via SqlBulkCopy? 
A:  The primary key will be auto-incremented, and you’ll see a duplicate row with only difference being in the primary key value.

Q:  What happens if you have a non-identity primary key that’s the same as a primary key of a record being inserted via SqlBulkCopy? 
A:  The SqlBulkCopy method will throw an exception “Violation of PRIMARY KEY constraint 'Your_PK_Constrain_Name_Here'. Cannot insert duplicate key in object 'dbo.Test'.   The statement has been terminated.”

Now, since I believe in teaching to fish rather than giving a fish, here is how you could’ve figured this out:

1. Create a simple table (e.g. table Test with pk_id (identity), SomeValue, LastUpdatedTimestamp columns)
2. Add a row (e.g. Insert Into Test(SomeValue, LastUpdatedTimestamp) Values (123, getdate())
3. Create a simple .NET windows app with the following code under button click:
       SqlConnection cn = new SqlConnection("integrated security=SSPI;data source=(local);initial catalog=Test");
SqlDataAdapter da = new SqlDataAdapter("select top 1 * from Test", cn);
DataSet ds = new DataSet();
da.Fill(ds, "Test");

       System.Data.SqlClient.SqlBulkCopy bc = new SqlBulkCopy(cn);
bc.DestinationTableName= "Test"; // copy to self
cn.Open();
bc.WriteToServer(ds.Tables[0]);
cn.Dispose();
4. Run it
5. In Sql Management Studio, run the following query:  
       select * from Test
   You should now see two rows (pk_id 1 and 2)
6. Now change pk_id to not be an identity field
7. Run the C# program again – you should get an exception

Now you know!

Comments

  • Anonymous
    August 25, 2006
    thanks for the explanation.

  • Anonymous
    September 14, 2006
    The comment has been removed

  • Anonymous
    September 22, 2006
    You might want to set the BatchSize property to something that's a good compromise between getting the speed of bulk load and not having it be one transaction.  Each batch is a transaction, i.e. all or nothing.  For batches that fail, you have two options:
       1. Set batch size to 1, which then becomes a simple Insert
       2. Repeat the bulk copy into a so called working table, which has same columns as the destination table minus the unique constraint.  Then you filter out the duplicates and move the data from the working table to the destination.
    I have not profiled either of these solutions, so do your tests to see what solutions makes more sense and to determine the batch size…

  • Anonymous
    October 19, 2006
    if sqlbulkcopy throws an exception will it continue processing if that exception is caught? also how would I get the batch that failed to resubmit it?

  • Anonymous
    August 31, 2007
    but how will i copy the pk if i want to copy that using SqlBulkCopy.....???

  • Anonymous
    November 05, 2013
    Use a Trigger with INSTEAD OF INSERT and use inside MERGE statement. But don't forget add the parameter SqlBulkCopyOptions.FireTriggers in the SqlBulkCopy. This is my two cents.