Testing strings for equality counting trailing spaces

The SQL standard requires that string comparisons, effectively, pad the shorter string with space characters. This leads to the surprising result that N'' ≠ N' ' (the empty string equals a string of one or more space characters) and more generally any string equals another string if they differ only by trailing spaces. This can be a problem in some contexts. Unicode combining marks further complicate the issue because for many collations two distinct byte strings should compare as equal. For example, \u212b (Å) should be equal to \u0041 (A) + \u030a (combining ring above).

There are a number of approaches to the problem that people have suggested but they all have flaws:

1. Convert the strings to varbinary values and then test for equality. This does not handle combining marks correctly.

2. Replace all the space characters with an unlikely character like the unit separator (\u001f) and then test for equality. This can cause two strings to test equal if they match except for space characters corresponding to the unlikely character.

3. Append a character that is not treated oddly by any collation like ‘X’ to each string and compare them. This could cause an error if the string is already of maximum size.

4. Use the like operator to do the equality check. This may cause two strings to test equal if they match because of wildcard characters in the second string.

5. Use XQuery’s string comparison. This does not handle combining marks correctly.

6. Test for equality normally and check that the data lengths match. This does not handle combining marks correctly.

Overall I like 3 the best as it has lowest risk.

Here is a simple test script for testing various approaches:

declare @test table

(

   string1 nvarchar(100) collate Latin1_General_100_CI_AS_KS_WS,

   string2 nvarchar(100) collate Latin1_General_100_CI_AS_KS_WS,

   equal bit

);

insert into @test (string1, string2, equal)

  values (N' ', N' ', 1),

         (N' ', N' ', 0),

         (nchar(0x212b), nchar(0x0041) + nchar(0x030a), 1),

         (N'a', N'%', 0),

         (N' ', nchar(0x0001f), 0);

        

       

select convert(varbinary(10), T.string1) as [String1], convert(varbinary(10), T.string2) as [String2],

       case when case when T.string1 = T.string2 then 1 else 0 end = T.equal then N'OK'

            else N'wrong'

       end as [=],

       case when case when convert(varbinary(max), T.string1) = convert(varbinary(max), T.string2) then 1 else 0 end = T.equal then N'OK'

            else N'wrong'

       end as [varbinary],

       case when case when replace(T.string1, N' ', nchar(0x001f)) = replace(T.string2, N' ', nchar(0x0001f)) then 1 else 0 end = T.equal then N'OK'

            else N'wrong'

       end as [replace],

       case when case when T.string1 + N'X' = T.string2 + N'X' then 1 else 0 end = T.equal then N'OK'

            else N'wrong'

       end as [append],

       case when case when T.string1 like T.string2 then 1 else 0 end = T.equal then N'OK'

            else N'wrong'

       end as [like],

       case when convert(xml, N'').exist(N'/.[sql:column("T.string1") eq sql:column("T.string2")]') = T.equal then N'OK'

            else N'wrong'

       end as [XQuery],

       case when case when T.string1 = T.string2 and datalength(T.string1) = datalength(T.string2) then 1 else 0 end = T.equal then N'OK'

            else N'wrong'

       end as [datalength]

from @test as T;

This results in the table:

String1

String2

=

varbinary

replace

append

like

XQuery

datalength

0x2000

0x2000

OK

OK

OK

OK

OK

OK

OK

0x2000

0x20002000

wrong

OK

OK

OK

OK

OK

OK

0x2B21

0x41000A03

OK

wrong

OK

OK

OK

wrong

wrong

0x6100

0x2500

OK

OK

OK

OK

wrong

OK

OK

0x2000

0x1F00

OK

OK

wrong

OK

OK

OK

OK

Comments