T-SQL: Safely Concatenate Values Into a Scalar Variable
Introduction
It is often far too easy to experience some behavior in SQL Server, and then extrapolate your belief from that. One example that is seen frequently on forums is the simplistic case of concatenating values into a scalar variable, however the logic that is commonly assumed is false.
Problem
Take the example
use tempdb
go
if object_id('CharsTable','U') is not null drop table CharsTable
create table CharsTable (letter char(1))
insert into CharsTable select 'a'
insert into CharsTable select 'b'
insert into CharsTable select 'c'
insert into CharsTable select 'd'
insert into CharsTable select 'e'
insert into CharsTable select 'f'
declare @string2 as varchar(1000)
--query 1
set @string2 = ''
select @string2 = @string2 + letter from CharsTable
select @string2
this returns
abcdef
(1 row(s) affected)
and from that you might believe that concatenation of multiple values is that simple. We defined a scalar variable, and SQL looped over the values returned by the select and 'built' the string up.
You can even change the order in which the string gets 'built'.
--query 2
set @string2 = ''
select @string2 = @string2 + letter from CharsTable order by letter desc
select @string2
returns
fedcba
(1 row(s) affected)
All of which goes towards strengthening the belief that this is a good way of concatenating strings. But unfortunately it is not.
Try this :
--query 3
set @string2 = ''
select @string2 = @string2 + ltrim(letter) from CharsTable order by ltrim(letter)
select @string2
almost identical query but with an additional check to make sure that we are stripping out any leading spaces. This now gives us
f
(1 row(s) affected)
What has happened? Why is this now just returning a single value? Where has the concatenation gone?
Explanation
To understand the behaviour we can take a look at the execution plans for each of these queries.
The first query is a scan on the table and a Compute Scalar.
The expression in the Compute Scalar is
CONVERT_IMPLICIT(varchar(1000),[@string2]+[tempdb].[dbo].[CharsTable].[letter],0)
and this is the 'concatenation' happening.
The second query, with the ORDER BY has a slightly different execution plan with a Sort operator as you would expect
and the Compute Scalar has the same expression value as previously.
CONVERT_IMPLICIT(varchar(1000),[@string2]+[tempdb].[dbo].[CharsTable].[letter],0)
Now for the third query, where everything went seemingly awry.
We have the same operators as are being used in the second query, but importantly being executed in a different order. The Compute Scalar is now being executed before the sort. Also the Compute Scalar now has 2 expressions.
ltrim([tempdb].[dbo].[CharsTable].[letter])
CONVERT_IMPLICIT(varchar(1000),[@string2]+ltrim([tempdb].[dbo].[CharsTable].[letter]),0)
The first being the function ltrim() on the values from CharsTable, the second being the string concatenation.
This explains (in this case) why we only get a single value as the result. In fact this is the result we should have been expecting as we know that SQL does not loop over the rows, it processes in a set based fashion. Think of the steps as being:
- Initialize the @String variable to ''
- select all the rows from CharsTable and apply the ltrim() function to the values
- string concatenate the 6 rows to the variable
It is this third step that we need to think about - how do you add a string to a set of rows? You may expect SQL to throw an error, but it does not, it handles it gracefully and simply sets the string to be the last value in the row set, which in the case where we have an order by, is 'f'.
The behaviour we see in queries 1 and 2 is simply a trick. There is no guarantee that the output will always be the ordered, concatenated string. Given enough rows in the base table and you will likely see all kinds of differing output. By adding the function to the query, we changed the way SQL built the execution plan, and actually forced it down the route that we would expect.
This behaviour is by design, see Reference section
The quote from the Connect item which is there in Reference section sums this up perfectly:
".. we do not guarantee that @var = @var + will produce the concatenated value for any statement that affects multiple rows. The right-hand side of the expression can be evaluated either once or multiple times during query execution ..."
Resolution
So how do we get the values concatenated in query 3? If order is not needed then the safest way is to use
set @string2 = ''
select @string2 = coalesce(@string2, ltrim(letter)) + letter from CharsTable
select @string2
and if you need the order preserved, then use the well known XML concatenation 'trick'
set @string2 = ''
select @string2 = (select '' + ltrim(letter) from CharsTable order by ltrim(letter) asc for xml path ( '' ) )
select @string2