SQL Remove Duplicates Function (T-SQL vs CLR)
Pinal Dave did a post of an example of a function to remove duplicates from a comma delimited string.
The code looks similar to this
CREATE FUNCTION dbo.fnDistinctList2
(
@List VARCHAR(MAX)
)
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE @Delim CHAR = ','
DECLARE @ParsedList TABLE
(
Item VARCHAR(MAX)
)
DECLARE @list1 VARCHAR(MAX), @Pos INT, @rList VARCHAR(MAX)
SET @list = LTRIM(RTRIM(@list)) + @Delim
SET @pos = CHARINDEX(@delim, @list, 1)
WHILE @pos > 0
BEGIN
SET @list1 = LTRIM(RTRIM(LEFT(@list, @pos - 1)))
IF @list1 <> ''
INSERT INTO @ParsedList VALUES (CAST(@list1 AS VARCHAR(MAX)))
SET @list = SUBSTRING(@list, @pos+1, LEN(@list))
SET @pos = CHARINDEX(@delim, @list, 1)
END
SELECT @rlist = COALESCE(@rlist+',','') + item
FROM (SELECT DISTINCT Item FROM @ParsedList) t
RETURN @rlist
END
which got me thinking that this is really one of those examples where I would expect managed CLR code to perform significantly better so I decided to quickly write an equivalent function in C# to see the difference in result from a performance perspective.
[Microsoft.SqlServer.Server.SqlFunction]
public static SqlString fnDistinctList2(SqlString inputlist)
{
List<SqlString> InternalList = new List<SqlString>();
char[] Delimiters = {','};
string[] InputStrings ;
SqlString OutputList = new SqlString();
OutputList= "";
SqlString PreviousValue = "";
InputStrings = inputlist.ToString().Split(Delimiters,StringSplitOptions.None);
foreach (string tempString in InputStrings)
{
InternalList.Add(tempString);
}
InternalList.Sort();
foreach (string TempString in InternalList)
{
if (TempString != PreviousValue)
if (PreviousValue == "")
OutputList = TempString;
else
OutputList = string.Concat(OutputList, "," ,TempString);
PreviousValue = TempString;
}
return OutputList;
}
and then created a simple TSQL loop to execute both of these a 1000 times and print the average execution time for each function’s execution in milliseconds.
DECLARE @Counter INT = 1000
DECLARE @Output varchar(100)
DECLARE @StartTime DateTime = GetDate()
WHILE @Counter > 0
BEGIN
SELECT @Output = dbo.fnDistinctList2('342,34,456,34,3454,456,bb,cc,aa')
SET @Counter = @Counter - 1
END
PRINT @Output
PRINT 1.0*DATEDIFF(MS,@StartTime,GetDate())/1000
and the results are ….
-- C# fnDistinctList2 -- 34,342,3454,456,aa,bb,cc -- 0.826000 -- TSQL fnDistinctList2 -- 34,342,3454,456,aa,bb,cc -- 3.003000
The C# code executed in less than a millisecond consistently whilst the TSQL code took in excess of 3ms to execute.
This is another one of those real life example of where CLR code can provide significant performance benefit.
CLR WINS !!!
<Updated 29th January 2009 to reformat code>
<Gary>