Partilhar via


SQL Remove Duplicates Function (T-SQL vs CLR)

Pinal Dave did a post of an example of a function to remove duplicates from a comma delimited string.

The code looks similar to this

 CREATE FUNCTION dbo.fnDistinctList2
(
    @List VARCHAR(MAX)
)
    RETURNS VARCHAR(MAX)
AS
BEGIN
    DECLARE @Delim CHAR  = ','
    DECLARE @ParsedList TABLE
    (
    Item VARCHAR(MAX)
    )
    DECLARE @list1 VARCHAR(MAX), @Pos INT, @rList VARCHAR(MAX)
    SET @list = LTRIM(RTRIM(@list)) + @Delim
    SET @pos = CHARINDEX(@delim, @list, 1)
    WHILE @pos > 0
    BEGIN
    SET @list1 = LTRIM(RTRIM(LEFT(@list, @pos - 1)))
    IF @list1 <> ''
    INSERT INTO @ParsedList VALUES (CAST(@list1 AS VARCHAR(MAX)))
    SET @list = SUBSTRING(@list, @pos+1, LEN(@list))
    SET @pos = CHARINDEX(@delim, @list, 1)
    END
    SELECT @rlist = COALESCE(@rlist+',','') + item
    FROM (SELECT DISTINCT Item FROM @ParsedList) t
    RETURN @rlist
END

which got me thinking that this is really one of those examples where I would expect managed CLR code to perform significantly better so I decided to quickly write an equivalent function in C# to see the difference in result from a performance perspective.

 [Microsoft.SqlServer.Server.SqlFunction]
public static SqlString fnDistinctList2(SqlString inputlist)
{
    List&lt;SqlString&gt; InternalList = new List&lt;SqlString&gt;();
    char[] Delimiters = {','};
    string[] InputStrings ;
    SqlString OutputList = new SqlString();
    OutputList= "";
    SqlString PreviousValue = "";
    InputStrings = inputlist.ToString().Split(Delimiters,StringSplitOptions.None);
    foreach (string tempString in InputStrings) 
    {
        InternalList.Add(tempString);
    }
    InternalList.Sort();
    foreach (string TempString in InternalList)
    {
        if (TempString != PreviousValue)
            if (PreviousValue == "")
                OutputList = TempString; 
            else
                OutputList = string.Concat(OutputList, "," ,TempString); 
        PreviousValue = TempString;
    }
    return OutputList;
}

and then created a simple TSQL loop to execute both of these a 1000 times and print the average execution time for each function’s execution in milliseconds.

 DECLARE @Counter INT = 1000
DECLARE @Output varchar(100)
DECLARE @StartTime DateTime = GetDate()
WHILE @Counter > 0 
BEGIN 
    SELECT @Output = dbo.fnDistinctList2('342,34,456,34,3454,456,bb,cc,aa') 
    SET @Counter = @Counter  - 1
END
PRINT @Output
PRINT 1.0*DATEDIFF(MS,@StartTime,GetDate())/1000

and the results are ….

 -- C# fnDistinctList2
-- 34,342,3454,456,aa,bb,cc
-- 0.826000

-- TSQL fnDistinctList2 
-- 34,342,3454,456,aa,bb,cc
-- 3.003000

The C# code executed in less than a millisecond consistently whilst the TSQL code took in excess of 3ms to execute.

This is another one of those real life example of where CLR code can provide significant performance benefit.

CLR WINS !!!

<Updated 29th January 2009 to reformat code>

<Gary>