다음을 통해 공유


SQL Server Unicode Madness: N'' is Null, but not Quite, Null is not Null, and N'' is N'Ǹ' ... What?!?!

Recently a customer asked me funny things. He came across with the following queries:

declare @tmp table (charvalue nvarchar(10), othervalue int);
insert  into @tmp values (N'', 4326);

So far, so good. But querying this tiny table variable in a certain way can be very surprising:

select * from @tmp where charvalue = N'Ǹ';

Note the N has an "accent grave".
We expect an empty result set, because an N with whatever accent on its top is very much different from an empty string, right?

This came out:

charvalue  othervalue


           4326

So, I thought this could need some further investigation. I wanted to find out what the unicode value for "charvalue" would be:

select unicode(charvalue),othervalue from @tmp

Surprising result:

            othervalue


NULL        4326

So, according to this result, N'' is null. My professor always told me it's not.
Ok, let's check that:


select * from @tmp where charvalue is null;

This returns

charvalue  othervalue


Empty record set... Oops... this means that NULL is not NULL!
Confused? So am I.

Anyways, my customer asked me for a solution as he did not want the above record in his result set when he was filtering for N``'Ǹ'.

I proposed this type of query to him:

select * from @tmp where charvalue = N'Ǹ' and charvalue !=N'';

This works.
Well, if you don't like it try this which also works fine:

select * from @tmp where charvalue = N'Ǹ' collate  SQL_Latin1_General_CP850_BIN2 

Both, SQL Server 2008R2 and 2012 have the same opinion on this matter.
Now, is it a bug or a feature?

======================== Update

I filed a connect item and received the following answer:

Quote:

"Thanks for submitting this feedback.
The behavior you are seeing is in fact by design if you are using the default collation for most locales (SQL_Latin1_general_CP1_CI_AS).

In general, I would recommend using a Windows collation at the instance or database level, for example Latin1_general_100_CI_AS_KS_WS.

You can also specify the collation in the query. For example, the following will return an empty result set.

declare @tmp table (charvalue nvarchar(10), othervalue int);
insert into @tmp values (N'', 4326);
select * from @tmp where charvalue collate Latin1_General_100_CI_AS_KS_WS = N'Ǹ';

--
Jos de Bruijn - SQL Server PM "

:Endquote

see: Unicode string comparison in WHERE clause fails with certain characters