Jaa


SQL Server 2008 : new binary – hex string conversion functionality can dramatically improve related query performance by orders of magnitude.

In previous SQL Server releases it wasn’t possible to convert binary data to string characters in hex format directly, because SQL Server did not have a built-in Transact-SQL command for converting binary data to a hexadecimal string. The Transact-SQL CONVERT command converted binary data to character data in a one byte to one character fashion. SQL Server would take each byte of the source binary data, convert it to an integer value, and then uses that integer value as the ASCII value for the destination character data. This behavior applied to the binary, varbinary, and timestamp datatypes.

The only workarounds were to use either a stored procedure as described in a Knowledge Base Article: "INFO: Converting Binary Data to Hexadecimal String" ( https://support.microsoft.com/kb/104829 ) or by writing a CLR function.

An ISV I work with doesn’t support CLR and therefore they implemented their own version of a custom convert function in form of a stored procedure. This one was even faster than everything else they found on the Internet.

NEW – IN SQL SERVER 2008 the convert function was extended to support binary data – hex string conversion. It looks like a tiny improvement almost not worth mentioning.

However, for the ISV it was a big step forward as some critical queries need this functionality. Besides the fact that they no longer have to ship and maintain their own stored procedure, a simple repro showed a tremendous performance improvement.

Repro:

=====

I transformed the procedure described in the KB article mentioned above into a simple function. The stored procedure below will create a simple test table with one varbinary column and insert some test rows in 10K packages ( e.g. nr_rows = 100 -> 1 million rows in the table ).

The repro shows two different test cases:

1. insert 0x0 two million times

2. insert 0x0123456789A12345 two million times

Depending on the length of the value the disadvantage of the stored procedure solution will be even bigger. On my test machine the results of the test queries below were:

(both tests were done with the same SQL Server 2008 instance - no change of any settings)

1. two million times value 0x0

a, using stored procedure : about 3460 logical reads, no disk IO, ~52 secs elapsed time

b, using new convert feature : about 5200 logical reads, no disk IO, < 1 sec elapsed time

2. two million times value 0x0123456789A12345

a, using stored procedure : about 3460 logical reads, no disk IO, ~157 secs elapsed time

b, using new convert feature : about 5200 logical reads, no disk IO, < 1 sec elapsed time

Repro Script:

========

create function sp_hexadecimal ( @binvalue varbinary(255) )

returns varchar(255)

as

begin

declare @charvalue varchar(255)

declare @i int

declare @length int

declare @hexstring char(16)

select @charvalue = '0x'

select @i = 1

select @length = datalength(@binvalue)

select @hexstring = '0123456789abcdef'

while (@i <= @length)

begin

declare @tempint int

declare @firstint int

declare @secondint int

select @tempint = convert(int, substring(@binvalue,@i,1))

select @firstint = floor(@tempint/16)

select @secondint = @tempint - (@firstint*16)

select @charvalue = @charvalue +

substring(@hexstring, @firstint+1, 1) +

substring(@hexstring, @secondint+1, 1)

select @i = @i + 1

end

return ( @charvalue )

end

create procedure cr_conv_test_table ( @value varbinary(16), @nr_rows int )

as

begin

declare @exist int

declare @counter int

set NOCOUNT ON

set statistics time off

set statistics io off

set statistics profile off

set @exist = ( select count(*) from sys.objects

where name = 'conv_test_table' and

type = 'U' )

if( @exist = 1 )

drop table conv_test_table

set @exist = ( select count(*) from sys.objects

where name = 'conv_test_table_temp' and

type = 'U' )

if( @exist = 1 )

drop table conv_test_table_temp

create table conv_test_table ( varbincol varbinary(16) )

create table conv_test_table_temp ( varbincol varbinary(16) )

set @counter = 10000

while @counter > 0

begin

insert into conv_test_table_temp values ( @value )

set @counter = @counter - 1

end

set @counter = @nr_rows

while @counter > 0

begin

insert into conv_test_table select * from conv_test_table_temp

set @counter = @counter - 1

end

end

-- create 2 million test rows

execute cr_conv_test_table 0x0, 200

set statistics time on

set statistics io on

-- compare runtime of stored procedure with new convert feature

select count(*) from conv_test_table

where dbo.sp_hexadecimal(varbincol) = '0x00'

select count(*) from conv_test_table

where CONVERT(varchar(255),varbincol,1) = '0x00'

-- create 2 million test rows

execute cr_conv_test_table 0x0123456789A12345, 200

set statistics time on

set statistics io on

-- compare runtime of stored procedure with new convert feature

select count(*) from conv_test_table

where dbo.sp_hexadecimal(varbincol) = '0x0123456789A12345'

select count(*) from conv_test_table

where CONVERT(varchar(255),varbincol,1) = '0x0123456789A12345'

Comments

  • Anonymous
    October 31, 2008
    I like that sql 2008 has support for converting binary into hex strings, that's great !!!, however i still wish it would also support operations on varbinary columns like BITWISE AND/OR/XOR, do you know if that is in the plan for future? Also the last 2 queries in your post got me thinking a little bit. Looks like the "WHERE" clause in the query was first converting a varbinary field into a string then doing the comparison, and that conversion is ging to be done for every row in the table. I think it would be more performant to convert once the string to match into a varbinary variable, then use the varbinary variable to use in the select statement to compare direclty varbinary with varbinary, or is it not possible to do that with varbinary fields?