Share via


T-SQL: Safely Concatenate Values Into a Scalar Variable

Introduction

It is often far too easy to experience some behavior in SQL Server, and then extrapolate your belief from that. One example that is seen frequently on forums is the simplistic case of concatenating values into a scalar variable, however the logic that is commonly assumed is false.

Problem

Take the example

use tempdb
go
 
if object_id('CharsTable','U') is  not null  drop table  CharsTable
 
create table  CharsTable (letter char(1))
 
insert into  CharsTable select  'a'
insert into  CharsTable select  'b'
insert into  CharsTable select  'c'
insert into  CharsTable select  'd'
insert into  CharsTable select  'e'
insert into  CharsTable select  'f'
 
declare @string2 as varchar(1000)
--query 1
set @string2 = ''
select @string2 = @string2 + letter from CharsTable
select @string2

this returns


abcdef

(1 row(s) affected)

and from that you might believe that concatenation of multiple values is that simple.  We defined a scalar variable, and SQL looped over the values returned by the select and 'built' the string up.

You can even change the order in which the string gets 'built'.

--query 2
set @string2 = ''
select @string2 = @string2 + letter from CharsTable order by  letter desc
select @string2

returns


fedcba

(1 row(s) affected)

All of which goes towards strengthening the belief that this is a good way of concatenating strings. But unfortunately it is not.

Try this :

--query 3
set @string2 = ''
select @string2 = @string2 + ltrim(letter) from CharsTable order by  ltrim(letter)
select @string2

almost identical query but with an additional check to make sure that we are stripping out any leading spaces.  This now gives us


f

(1 row(s) affected)

What has happened? Why is this now just returning a single value? Where has the concatenation gone?


Explanation

To understand the behaviour we can take a look at the execution plans for each of these queries.

The first query is a scan on the table and a Compute Scalar. 

The expression in the Compute Scalar is

CONVERT_IMPLICIT(varchar(1000),[@string2]+[tempdb].[dbo].[CharsTable].[letter],0)

and this is the 'concatenation' happening.

The second query, with the ORDER BY has a slightly different execution plan with a Sort operator as you would expect

and the Compute Scalar has the same expression value as previously.

CONVERT_IMPLICIT(varchar(1000),[@string2]+[tempdb].[dbo].[CharsTable].[letter],0)

Now for the third query, where everything went seemingly awry.

We have the same operators as are being used in the second query, but importantly being executed in a different order.  The Compute Scalar is now being executed before the sort.  Also the Compute Scalar now has 2 expressions.

ltrim([tempdb].[dbo].[CharsTable].[letter])
CONVERT_IMPLICIT(varchar(1000),[@string2]+ltrim([tempdb].[dbo].[CharsTable].[letter]),0)

The first being the function ltrim() on the values from CharsTable, the second being the string concatenation.

This explains (in this case) why we only get a single value as the result.  In fact this is the result we should have been expecting as we know that SQL does not loop over the rows, it processes in a set based fashion.  Think of the steps as being:

  1. Initialize the @String variable to ''
  2. select all the rows from CharsTable and apply the ltrim() function to the values
  3. string concatenate the 6 rows to the variable

It is this third step that we need to think about - how do you add a string to a set of rows?  You may expect SQL to throw an error, but it does not, it handles it gracefully and simply sets the string to be the last value in the row set, which in the case where we have an order by, is 'f'.

The behaviour we see in queries 1 and 2 is simply a trick.  There is no guarantee that the output will always be the ordered, concatenated string.  Given enough rows in the base table and you will likely see all kinds of differing output.  By adding the function to the query, we changed the way SQL built the execution plan, and actually forced it down the route that we would expect.

This behaviour is by design, see Reference section

The quote from the Connect item which is there in Reference section  sums this up perfectly:

".. we do not guarantee that @var = @var + will produce the concatenated value for any statement that affects multiple rows. The right-hand side of the expression can be evaluated either once or multiple times during query execution ..."


Resolution

So how do we get the values concatenated in query 3?  If order is not needed then the safest way is to use

set @string2 = ''
select @string2 = coalesce(@string2, ltrim(letter)) + letter from CharsTable
select @string2

and if you need the order preserved, then use the well known XML concatenation 'trick'

set @string2 = ''
select @string2 = (select '' + ltrim(letter) from CharsTable order by ltrim(letter) asc for xml path ( '' ) )
select @string2


 

Reference