SQL Server 2008 : new binary – hex string conversion functionality can dramatically improve related query performance by orders of magnitude.
In previous SQL Server releases it wasn’t possible to convert binary data to string characters in hex format directly, because SQL Server did not have a built-in Transact-SQL command for converting binary data to a hexadecimal string. The Transact-SQL CONVERT command converted binary data to character data in a one byte to one character fashion. SQL Server would take each byte of the source binary data, convert it to an integer value, and then uses that integer value as the ASCII value for the destination character data. This behavior applied to the binary, varbinary, and timestamp datatypes.
The only workarounds were to use either a stored procedure as described in a Knowledge Base Article: "INFO: Converting Binary Data to Hexadecimal String" ( https://support.microsoft.com/kb/104829 ) or by writing a CLR function.
An ISV I work with doesn’t support CLR and therefore they implemented their own version of a custom convert function in form of a stored procedure. This one was even faster than everything else they found on the Internet.
NEW – IN SQL SERVER 2008 the convert function was extended to support binary data – hex string conversion. It looks like a tiny improvement almost not worth mentioning.
However, for the ISV it was a big step forward as some critical queries need this functionality. Besides the fact that they no longer have to ship and maintain their own stored procedure, a simple repro showed a tremendous performance improvement.
Repro:
=====
I transformed the procedure described in the KB article mentioned above into a simple function. The stored procedure below will create a simple test table with one varbinary column and insert some test rows in 10K packages ( e.g. nr_rows = 100 -> 1 million rows in the table ).
The repro shows two different test cases:
1. insert 0x0 two million times
2. insert 0x0123456789A12345 two million times
Depending on the length of the value the disadvantage of the stored procedure solution will be even bigger. On my test machine the results of the test queries below were:
(both tests were done with the same SQL Server 2008 instance - no change of any settings)
1. two million times value 0x0
a, using stored procedure : about 3460 logical reads, no disk IO, ~52 secs elapsed time
b, using new convert feature : about 5200 logical reads, no disk IO, < 1 sec elapsed time
2. two million times value 0x0123456789A12345
a, using stored procedure : about 3460 logical reads, no disk IO, ~157 secs elapsed time
b, using new convert feature : about 5200 logical reads, no disk IO, < 1 sec elapsed time
Repro Script:
========
create function sp_hexadecimal ( @binvalue varbinary(255) )
returns varchar(255)
as
begin
declare @charvalue varchar(255)
declare @i int
declare @length int
declare @hexstring char(16)
select @charvalue = '0x'
select @i = 1
select @length = datalength(@binvalue)
select @hexstring = '0123456789abcdef'
while (@i <= @length)
begin
declare @tempint int
declare @firstint int
declare @secondint int
select @tempint = convert(int, substring(@binvalue,@i,1))
select @firstint = floor(@tempint/16)
select @secondint = @tempint - (@firstint*16)
select @charvalue = @charvalue +
substring(@hexstring, @firstint+1, 1) +
substring(@hexstring, @secondint+1, 1)
select @i = @i + 1
end
return ( @charvalue )
end
create procedure cr_conv_test_table ( @value varbinary(16), @nr_rows int )
as
begin
declare @exist int
declare @counter int
set NOCOUNT ON
set statistics time off
set statistics io off
set statistics profile off
set @exist = ( select count(*) from sys.objects
where name = 'conv_test_table' and
type = 'U' )
if( @exist = 1 )
drop table conv_test_table
set @exist = ( select count(*) from sys.objects
where name = 'conv_test_table_temp' and
type = 'U' )
if( @exist = 1 )
drop table conv_test_table_temp
create table conv_test_table ( varbincol varbinary(16) )
create table conv_test_table_temp ( varbincol varbinary(16) )
set @counter = 10000
while @counter > 0
begin
insert into conv_test_table_temp values ( @value )
set @counter = @counter - 1
end
set @counter = @nr_rows
while @counter > 0
begin
insert into conv_test_table select * from conv_test_table_temp
set @counter = @counter - 1
end
end
-- create 2 million test rows
execute cr_conv_test_table 0x0, 200
set statistics time on
set statistics io on
-- compare runtime of stored procedure with new convert feature
select count(*) from conv_test_table
where dbo.sp_hexadecimal(varbincol) = '0x00'
select count(*) from conv_test_table
where CONVERT(varchar(255),varbincol,1) = '0x00'
-- create 2 million test rows
execute cr_conv_test_table 0x0123456789A12345, 200
set statistics time on
set statistics io on
-- compare runtime of stored procedure with new convert feature
select count(*) from conv_test_table
where dbo.sp_hexadecimal(varbincol) = '0x0123456789A12345'
select count(*) from conv_test_table
where CONVERT(varchar(255),varbincol,1) = '0x0123456789A12345'
Comments
- Anonymous
October 31, 2008
I like that sql 2008 has support for converting binary into hex strings, that's great !!!, however i still wish it would also support operations on varbinary columns like BITWISE AND/OR/XOR, do you know if that is in the plan for future? Also the last 2 queries in your post got me thinking a little bit. Looks like the "WHERE" clause in the query was first converting a varbinary field into a string then doing the comparison, and that conversion is ging to be done for every row in the table. I think it would be more performant to convert once the string to match into a varbinary variable, then use the varbinary variable to use in the select statement to compare direclty varbinary with varbinary, or is it not possible to do that with varbinary fields?