Compartilhar via


Large tuple uniqueness constraints in SQL Server

SQL Server 2008 limits unique constraints to 900 bytes of data per tuple. Here is a technique for enforcing uniqueness over larger tuple sizes. In many cases, the technique is more efficient than SQL Server’s unique constraints and can be used for performance critical cases.

Hash index based implementation

Essentially, the approach stores a hash of the columns in that need to be unique and uses this to minimize the number of rows that need to be checked when a row is inserted or updated. A trigger is used to enforce the constraint. A script with test code can be found attached.

As an example, suppose you want to add a unique constraint over Column1 and Column2 to the table:

create table [Schema].[Table]

(

  [Id] int not null constraint [PK_Table] primary key,

  [Column1] nvarchar(max) not null,

  [Column2] nvarchar(max) null

);

Firstly add a column to store the hash and an index over the hash column:

create table [Schema].[Table]

(

  [Id] int not null constraint [PK_Table] primary key,

  [Column1] nvarchar(max) not null,

  [Column2] nvarchar(max) null,

  [Internal_Hash_Column1_Column2] as checksum([Column1], [Column2]) persisted not null

);

create index [Unique_Hash_Table_Column1_Column2] on [Schema].[Table] ([Internal_Hash_Column1_Column2]) include ([Column1], [Column2]);

If the columns are typically large then you may get better overall performance by removing the included columns from the index. Secondly, add a trigger to enforce the constraint:

create trigger [ConstraintInsertTrigger_Table] on [Schema].[Table]

  after insert, update as

begin

       if (update([Column1]) or update([Column2])) and

       exists (select *

                     from [Schema].[Table] as T

                     inner join inserted as I on I.[Internal_Hash_Column1_Column2] = T. [Internal_Hash_Column1_Column2] and

                     I.[Column1] = T.[Column1] and

                     (I.[Column2] = T.[Column2] or (I.[Column2] is null and T.[Column2] is null))

                     group by I.[Internal_Hash_Column1_Column2], I.[Column1], I.[Column2]

                     having count(*) > 1)

       begin

       if @@trancount > 0

       begin

              rollback transaction;

       end;

       raiserror(N'Error: Values in the columns [Schema].[Table] ([Column1], [Column1]) must be unique.', 16, 0);

       end;

end;

Care is needed in handling nullable columns like Column2.

Below are some simple test cases. The insert statement:

insert into [Schema].[Table] ([Id], [Column1], [Column2])

  values (1, N'1A', N'2A'),

         (2, N'1B', N'2B');

results in the table:

Id

Column1

Column2

Internal_Hash_Column1_Column2

1

1A

2A

585901004

2

1B

2B

-1712577588

The insert statement:

insert into [Schema].[Table] ([Id], [Column1], [Column2])

  values (3, N'1A', N'2A'),

         (4, N'1C', N'2C');

fails because the tuple (1A, 2A) is already in the table. The insert statement:

insert into [Schema].[Table] ([Id], [Column1], [Column2])

  values (3, N'1A', null),

         (4, N'1A', null);

fails because the tuple (1A, null) is duplicated. The insert statement:

insert into [Schema].[Table] ([Id], [Column1], [Column2])

  values (3, N'1a', N'2a'),

         (4, N'1C', N'2C');

fails because the collation is case insensitive so the tuple (1a, 2a) is matches the tuple (1A, 2A) which is already in the table. Finally to test for the case where two different tuples have the same hash value, the insert statement:

insert into [Schema].[Table] ([Id], [Column1], [Column2])

  values (3, N'1a', N'2a'),

         (4, N'1C', N'2C');

results in the table:

Id

Column1

Column2

Internal_Hash_Column1_Column2

1

1A

2A

585901004

2

1B

2B

-1712577588

5

aaabab

2E

250031328

6

aaaaam

2E

250031328

Hashbytes based implimentation

There is another technique that almost always works and is very fast. Create a new persistent column using hashbytes and put a unique constraint on it. There is a very small chance that a valid tuple will violate the constraint. For some algorithms, a malicious user can generate collisions fairly easily.

BigUniqueConstraint.sql

Comments