Convert Comma Separated String to Table : 4 different approaches
Recently, I came across a piece of TSQL code that would take a comma separated string as an input and parse it to return a single column table from it.
Lets’ call this function as Split1. The code is as follows:
CREATE FUNCTION [dbo].Split1(@input AS Varchar(4000) )
RETURNS
@Result TABLE(Value BIGINT)
AS
BEGIN
DECLARE @str VARCHAR(20)
DECLARE @ind Int
IF(@input is not null)
BEGIN
SET @ind = CharIndex(',',@input)
WHILE @ind > 0
BEGIN
SET @str = SUBSTRING(@input,1,@ind-1)
SET @input = SUBSTRING(@input,@ind+1,LEN(@input)-@ind)
INSERT INTO @Result values (@str)
SET @ind = CharIndex(',',@input)
END
SET @str = @input
INSERT INTO @Result values (@str)
END
RETURN
END
This is a very old fashioned (but still effective enough) script which does a loop over a string to cut out all possible string values that are separated by a comma.
Let’s see now, how the same could be achieved in modern day TSQL languages (such as SQL 2005 or SQL 2008).
Approach 1: Common Table Expression (CTE)
Lets call this function as Split2. here we are using
CREATE FUNCTION dbo.Split2 ( @strString varchar(4000))
RETURNS @Result TABLE(Value BIGINT)
AS
begin
WITH StrCTE(start, stop) AS
(
SELECT 1, CHARINDEX(',' , @strString )
UNION ALL
SELECT stop + 1, CHARINDEX(',' ,@strString , stop + 1)
FROM StrCTE
WHERE stop > 0
)
insert into @Result
SELECT SUBSTRING(@strString , start, CASE WHEN stop > 0 THEN stop-start ELSE 4000 END) AS stringValue
FROM StrCTE
return
end
GO
Approach 2: XML (surprise)
XML could be applied to do some type of string parsing (see this) Let’s call this function as Split3.
CREATE FUNCTION dbo.Split3 ( @strString varchar(4000))
RETURNS @Result TABLE(Value BIGINT)
AS
BEGIN
DECLARE @x XML
SELECT @x = CAST('<A>'+ REPLACE(@strString,',','</A><A>')+ '</A>' AS XML)
INSERT INTO @Result
SELECT t.value('.', 'int') AS inVal
FROM @x.nodes('/A') AS x(t)
RETURN
END
GO
Approach 4: Classic TSQL Way
I got this approach from SQL Server Central site. This approach is slightly unusual but very effective. this needs you to create a table of sequential numbers called a Tally Table.
SELECT TOP 11000 --equates to more than 30 years of dates
IDENTITY(INT,1,1) AS N
INTO dbo.Tally
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2
Lets index the table for better performance.
ALTER TABLE dbo.Tally
ADD CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N)
WITH FILLFACTOR = 100
Finally out Split4 function.
CREATE FUNCTION dbo.Split4 ( @strString varchar(4000))
RETURNS @Result TABLE(Value BIGINT)
AS
BEGIN
SET @strString = ','+@strString +','
INSERT INTO @t (Value)
SELECT SUBSTRING(@strString,N+1,CHARINDEX(',',@strString,N+1)-N-1)
FROM dbo.Tally
WHERE N < LEN(@strString)
AND SUBSTRING(@strString,N,1) = ',' --Notice how we find the comma
RETURN
END
GO
Now, what about the most crucial question: Performance. so lets put all 4 functions to test.
please note that I am running this on SQL Server 2008. you may need to modify it for SQL 2005.
DECLARE @str VARCHAR(4000) = '6,7,7,8,10,12,13,14,16,44,46,47,394,396,417,488,714,717,718,719,722,725,811,818,832,833,836,837,846,913,914,919,922,923,924,925,926,927,927,928,929,929,930,931,932,934,935,1029,1072,1187,1188,1192,1196,1197,1199,1199,1199,1199,1200,1201,1202,1203,1204,1205,1206,1207,1208,1209,1366,1367,1387,1388,1666,1759,1870,2042,2045,2163,2261,2374,2445,2550,2676,2879,2880,2881,2892,2893,2894'
SELECT * FROM dbo.split1 ( @str )
SELECT * FROM dbo.split2 ( @str )
SELECT * FROM dbo.split3 ( @str )
SELECT * FROM dbo.split3 ( @str )
On my laptop, I saw following numbers. I expected the XML to be the fastest.
so I changed the code slightly to test out XML function directly.
DECLARE @str VARCHAR(4000)
= '6,7,7,8,10,12,13,14,16,44,46,47,394,396,417,488,714,717,718,719,722,725,811,818,832,833,836,837,846,913,914,919,922,923,924,925,926,927,927,928,929,929,930,931,932,934,935,1029,1072,1187,1188,1192,1196,1197,1199,1199,1199,1199,1200,1201,1202,1203,1204,1205,1206,1207,1208,1209,1366,1367,1387,1388,1666,1759,1870,2042,2045,2163,2261,2374,2445,2550,2676,2879,2880,2881,2892,2893,2894'
Declare @x XML
select @x = cast('<A>'+ replace(@str,',','</A><A>')+ '</A>' as xml)
select t.value('.', 'int') as inVal
from @x.nodes('/A') as x(t)
Hmm. so just as I thought, XML is faster on its’ own.
there are some more options that I did not consider such as CLR functions. but we will get to those some other time.
Comments
Anonymous
December 26, 2009
Thanks for the reference, Amit.--Jeff ModenAnonymous
February 12, 2012
Heh... I run across this thread about once every 2 years or so.Testing with a single row is ok but it doesn't give the big picture. For example, it looks like the While Loop will beat a Tally table solution. Please see the following article for a bit more comprehensive testing of many types of splitters.www.sqlservercentral.com/.../72993Anonymous
June 03, 2012
great article.. thanks so much for thisAnonymous
July 15, 2012
Great JOB with the timing comparison!THANK YOUAnonymous
January 11, 2013
Amit... I just placed an adaptation of your XML technique in my collection of all-time T-SQL greats (i simply changed the Ints to varchar(max) for my purposes). I never knew that the XML datatype offered such power. THANK YOU SO MUCH !!Anonymous
October 24, 2013
Any Simple ways of doing this ?Anonymous
January 20, 2015
Although the article is great but every technique needs some explanation. Why is this and why is that... :)Anonymous
August 22, 2015
I used your XML surprise solution by changing the INTs with varchar. And it works great. Thank you very much.Anonymous
August 26, 2015
Thanks for this article the xml method is by far the best in my humble opinion. Simple and effective.Anonymous
January 21, 2016
Thanks. Time comparison gives edge over other competitor blogs. Wish you good luck!