SYSK 60: How Does SqlBulkCopy Deal with Duplicate Records?
The other day, I was asked the following question “Does the SqlBulkCopy update records if they already exist or must it be used only for INSERTS only?” So, I thought others might benefit from knowing the answer…
SqlBulkCopy is nothing more than a more efficient INSERT… That means that if there is an identical row in the destination table, it will be repeated. That is, if it’s possible given the table constrains.
Q: What happens if you have an identity primary key that’s the same as a primary key of a record being inserted via SqlBulkCopy?
A: The primary key will be auto-incremented, and you’ll see a duplicate row with only difference being in the primary key value.
Q: What happens if you have a non-identity primary key that’s the same as a primary key of a record being inserted via SqlBulkCopy?
A: The SqlBulkCopy method will throw an exception “Violation of PRIMARY KEY constraint 'Your_PK_Constrain_Name_Here'. Cannot insert duplicate key in object 'dbo.Test'. The statement has been terminated.”
Now, since I believe in teaching to fish rather than giving a fish, here is how you could’ve figured this out:
1. Create a simple table (e.g. table Test with pk_id (identity), SomeValue, LastUpdatedTimestamp columns)
2. Add a row (e.g. Insert Into Test(SomeValue, LastUpdatedTimestamp) Values (123, getdate())
3. Create a simple .NET windows app with the following code under button click:
SqlConnection cn = new SqlConnection("integrated security=SSPI;data source=(local);initial catalog=Test");
SqlDataAdapter da = new SqlDataAdapter("select top 1 * from Test", cn);
DataSet ds = new DataSet();
da.Fill(ds, "Test");System.Data.SqlClient.SqlBulkCopy bc = new SqlBulkCopy(cn);
bc.DestinationTableName= "Test"; // copy to self
cn.Open();
bc.WriteToServer(ds.Tables[0]);
cn.Dispose();
4. Run it
5. In Sql Management Studio, run the following query:
select * from Test
You should now see two rows (pk_id 1 and 2)
6. Now change pk_id to not be an identity field
7. Run the C# program again – you should get an exception
Now you know!
Comments
Anonymous
August 25, 2006
thanks for the explanation.Anonymous
September 14, 2006
The comment has been removedAnonymous
September 22, 2006
You might want to set the BatchSize property to something that's a good compromise between getting the speed of bulk load and not having it be one transaction. Each batch is a transaction, i.e. all or nothing. For batches that fail, you have two options:
1. Set batch size to 1, which then becomes a simple Insert
2. Repeat the bulk copy into a so called working table, which has same columns as the destination table minus the unique constraint. Then you filter out the duplicates and move the data from the working table to the destination.
I have not profiled either of these solutions, so do your tests to see what solutions makes more sense and to determine the batch size…Anonymous
October 19, 2006
if sqlbulkcopy throws an exception will it continue processing if that exception is caught? also how would I get the batch that failed to resubmit it?Anonymous
August 31, 2007
but how will i copy the pk if i want to copy that using SqlBulkCopy.....???Anonymous
November 05, 2013
Use a Trigger with INSTEAD OF INSERT and use inside MERGE statement. But don't forget add the parameter SqlBulkCopyOptions.FireTriggers in the SqlBulkCopy. This is my two cents.