Why is autogenerated UPDATE and DELETE statements complex?

Sometimes people ask why the code generated for INSERT/UPDATE/DELETE is so complex, or verbose if you like, why not just operate on the primary key?

To illustrate what we are talking about here, create a table in SQL Server and insert some rows:

create table Person (Id int primary key, FirstName nvarchar(50), LastName nvarchar(50), City nvarchar(50), Age int)

insert into Person values (1, 'John','Johnson','London', 20)

insert into Person values (2, 'Paul','Paulson','Paris', 30)

insert into Person values (3, 'Sven','Svenson','Stockholm', 40)

Then create a new C# console application and insert the following code:

    class Program

    {

        static void Main(string[] args)

        {

            string cs = @"Data Source=<your server>;Initial Catalog=<your database>;Integrated Security=SSPI";

            using (SqlConnection con = new SqlConnection(cs))

            {

                try

        {

                    con.Open();

                    SqlCommand cmd = con.CreateCommand();

                    cmd.CommandText = "SELECT Id, FirstName, LastName, City, Age FROM Person";

                    SqlDataAdapter adp = new SqlDataAdapter(cmd);

                    SqlCommandBuilder scb = new SqlCommandBuilder(adp);

                   

                    adp.InsertCommand = scb.GetInsertCommand();

                    adp.UpdateCommand = scb.GetUpdateCommand();

                    adp.DeleteCommand = scb.GetDeleteCommand();

                    Console.WriteLine("\n{0}", adp.InsertCommand.CommandText);

                    Console.WriteLine("\n{0}", adp.UpdateCommand.CommandText);

                    Console.WriteLine("\n{0}", adp.DeleteCommand.CommandText);

                    con.Close();                }

                catch (Exception ex)

                {

                    Console.WriteLine(ex);

                }

           }

        }

    }

And run it, this should give the following output:

INSERT INTO [Person] ([Id], [FirstName], [LastName], [City], [Age]) VALUES (@p1, @p2, @p3, @p4, @p5)

UPDATE [Person] SET [Id] = @p1, [FirstName] = @p2, [LastName] = @p3, [City] = @p4, [Age] = @p5 WHERE (([Id] = @p6) AND ((@p7 = 1 AND [FirstName] IS NULL)

OR ([FirstName] = @p8)) AND ((@p9 = 1 AND [LastName] IS NULL) OR ([LastName] = @p10)) AND ((@p11 = 1 AND [City] IS NULL) OR ([City] = @p12)) AND

((@p13 = 1 AND [Age] IS NULL) OR ([Age] = @p14)))

DELETE FROM [Person] WHERE (([Id] = @p1) AND ((@p2 = 1 AND [FirstName] IS NULL) OR ([FirstName] = @p3)) AND ((@p4 = 1 AND [LastName] IS NULL)

OR ([LastName] = @p5)) AND ((@p6 = 1 AND [City] IS NULL) OR ([City] = @p7)) AND ((@p8 = 1 AND [Age] IS NULL) OR ([Age] = @p9)))

So why this ‘complex’ SQL, why not only operate on the primary key? If we want to delete an entry, then “DELETE FROM Person WHERE id = @id” should do it, right?

Well, yes, that SQL would work, but ADO.Net is designed around a disconnected and optimistic concurrency approach.

For example, UserA retrieves the data from the Person table. This means that the code will open a connection, get the data, close the connection. Then UserA can edit, add, delete as much

as he or she likes. While UserA is doing this, the data remains unchanged on the server. When UserA is done operating on the data the connection opened, the changes are submitted (using the SQL above)

and the connection is closed. So far so good, but what if UserB also operates on the data at the same time. In other words;

UserA retrieves the data for Person with Id 2 to the client.

UserB retrieves the same data (Person with Id 2) to the client and edits it, for example, changes the City and Age.

UserB submits the changes back to the database from UserB client.

UserA deletes the Person with Id 2 and submits the changes back to the database.

If we were only operating on the primary key, it would mean that the edited Person (Id 2) would be deleted. UserB would not like this. Or at least be very surprised when the data is no longer there.

By using UPDATE and DELETE statements that operates on all columns in the table, we make sure that when UserA in the scenario above deletes a row, only the row that exactly matches the row

that UserA originally retrieved is deleted. If the exact matching fails (which it will if UserB has changed City and Age) the row is not deleted.

This is, for example, how LINQ operates in the background.

More info here:

"Optimistic Concurrency (ADO.NET)"

https://msdn.microsoft.com/en-us/library/aa0416cz.aspx

"Generating Commands with CommandBuilders (ADO.NET)"

https://msdn.microsoft.com/en-us/library/tf579hcz.aspx

Comments

  • Anonymous
    November 09, 2009
    Hi maspeng, Thanks for your explanation. I have a question. In auto generated UPDATE and DELETE statement, why does it generate where clauses like this: (@p7 = 1 AND [FirstName] IS NULL) OR ([FirstName] = @p8) If the purpose is to handle the case where parameter @p7 has a value of NULL, shouldn't it be: (@p7 IS NULL AND [FirstName] IS NULL) OR ([FirstName] = @p8) How do I make sense out of the fact that auto generated where clause compares @p7 with numeric value 1? where does 1 come from? Steven