T-SQL: Finding Difference in Columns in the Table
This article is going to discuss a common problem of finding difference in multiple columns of two tables and listing the new and old values in these columns along with the column names. This article is based on the following blog post String Manipulation to Find Difference in Different Columns in a Wide Table.
Problem Definition
Let's consider one table to explain the problem and a solution although this solution can be easily expanded for 2 copies of the same table. Let's use the following script to create the table which will be used to demonstrate the problem.
CREATE TABLE dbo.Test (
BusinessDate DATE ,
colPk VARCHAR (20),
col3 VARCHAR (10),
col4 VARCHAR (10),
col5 VARCHAR (10),
col6 VARCHAR (10),
col7 VARCHAR (10),
col8 VARCHAR (10),
col9 VARCHAR (10)
);
INSERT INTO Test
VALUES
( '20100628' , 'ABC01' , 'A' , 'B' , 'C' , 'D' , 'E' , 'F' , 'G' ),
( '20100628' , 'ABC02' , 'AA' , 'BB' , 'CC' , 'DD' , 'EE' , 'FF' , 'GG' ),
( '20100628' , 'ABC03' , 'AAA' , 'BBB' , 'CCC' , 'DDD' , 'EEE' , 'FFF' , 'GGG' ),
( '20100627' , 'ABC01' , 'A' , 'BB' , 'C' , 'D' , 'E' , 'F' , 'GG' ),
( '20100627' , 'ABC02' , 'A' , 'BB' , 'C' , 'DD' , 'E' , 'F' , 'GG' ),
( '20100627' , 'ABC03' , 'AAA' , 'BBB' , 'CC' , 'D' , 'EEE' , 'FF' , 'GGG' );
We also create the following table that will be used to log the changes:
CREATE TABLE dbo.logtable (
BusinessDate DATE,
colPk VARCHAR(20),
colName SYSNAME,
preVal VARCHAR(MAX),
curVal VARCHAR(MAX)
);
The task is to list all the columns changes along with the column's name for each primary key.
Solution
We will be using dynamic SQL to generate a script displaying the differences. Let's first generate a script that will just list columns when they contain different values and also add a column counting the differences. For that purpose let's consider the following script that uses meta-data of INFORMATION_SCHEMA.COLUMNS view:
SET NOCOUNT ON;
DECLARE @SQL nvarchar(max), @colsChanges nvarchar(max);
SELECT @colsChanges =
STUFF((SELECT '+ case when isnull(a.' + Quotename(c.COLUMN_NAME) + ',space(0)) <> isnull(b.' + Quotename(c.COLUMN_NAME) + ',space(0)) THEN ' + Char(39) + Quotename(column_name) + Char(39) + ' ELSE '''' END'
FROM INFORMATION_SCHEMA.COLUMNS c
WHERE c.TABLE_SCHEMA = 'dbo' and c.TABLE_NAME = 'test'
AND c.COLUMN_NAME NOT IN ('BusinessDate','ColPk')
ORDER BY c.COLUMN_NAME
FOR XML PATH(''), TYPE).value('.', 'nvarchar(max)'),1,2, '');
--print @colsChanges; -- to debug
SET @SQL = 'SELECT *, LEN(ColDiff) - LEN(replace(colDiff,'']'',space(0))) as CountDiff
FROM (
SELECT a.BusinessDate, a.ColPK, ' + @colsChanges + ' AS ColDiff
from dbo.test a
INNER JOIN dbo.test b on a.ColPK = b.ColPK and a.BusinessDate = dateadd(DAY,-1,b.BusinessDate)
) X';
--print @SQL; -- to debug
EXECUTE( @SQL);
We first generate the list of columns that may change using condition ISNULL(a.Column, '') <> ISNULL(b.Column,''). This condition works as the columns are all of character types. If the columns would be of different types we would use more complex expression of testing equality and NOT NULL for each column separately. As a result we will be listing the column's name if that column values changed. We're using primary key column and business date columns to compare rows in a table based on PK and one day difference.
I suggest to uncomment PRINT statements to see the generated SQL. We're using the common technique of XML PATH('') to concatenate rows in a table into a string.
Logging the Old and New Values of the Columns
Now let's return to the original definition of the problem of logging every column's difference in the log table with the previous and current value. For this purpose we'll make the following adjustments into the previous script:
SET NOCOUNT ON;
DECLARE @SQL nvarchar(max), @SQLFrom nvarchar(max);
SET @SQLFrom = ' FROM dbo.Test a
INNER JOIN dbo.Test b on a.ColPK = b.ColPK
and a.BusinessDate = dateadd(DAY,-1,b.BusinessDate)'
SELECT @SQL =
STUFF((SELECT ' UNION ALL
SELECT a.colPk, b.BusinessDate, ' + Quotename(c.COLUMN_NAME, '''') + ' as Column_Name, b.' +
quotename(c.COLUMN_NAME) + ' as PrevVal, a.' + quotename(c.COLUMN_NAME) + ' as CurVal' +
@SQLFrom +
' WHERE a.' + quotename(c.COLUMN_NAME) + '<> b.' + + quotename(c.COLUMN_NAME)
FROM INFORMATION_SCHEMA.COLUMNS c
WHERE c.TABLE_SCHEMA = 'dbo' and c.TABLE_NAME = 'test'
AND c.COLUMN_NAME NOT IN ('BusinessDate','ColPk')
ORDER BY c.COLUMN_NAME
FOR XML PATH(''), TYPE).value('.', 'nvarchar(max)'),1,12, '');
--print @SQL;
INSERT into dbo.LogTable (
ColPK, BusinessDate, ColName, CurVal, PreVal
)
EXECUTE sp_executeSQL @SQL;
select * from dbo.LogTable
ORDER BY colPk, BusinessDate;
If we run this new script we'll get the following results listing the date and PK along with the old and new column's values:
We can see all the changes. Obviously such script can take a long time to execute on a big table.
Conclusion
In this article we demonstrated that using SQL Server meta-data from INFORMATION_SCHEMA.COLUMNS view we can easily generate a script listing changes in a table or in the different tables. I hope the reader can see the different applications of this technique.
See Also
This article participated in the TechNet Guru Contributions for December 2015 and won the gold prize.