C# 2.0: Fast string comparison with string-interning

<Also read the later posting on the same topic>

Frequently in code we do string comparisons which is culture-agnostic. While we should follow the string comparison guidelines, there is a much faster way of getting it done using string interning.

As a sample lets take method that accepts a string and does some action based on the string.

 static void ExecuteCommand(string command){    if (command == "START")        Console.WriteLine("Starting Build...");    else if (command == "STOP")        Console.WriteLine("Stopping Build...");    else if (command == "DELETE")        Console.WriteLine("Deleting Build...");    else        Console.WriteLine("Invalid command...");}

The problem with this code is that this actually does a full string comparison using string.Equals(command, "START", StringComparison.Ordinal); which results in iterating through each of the bytes of the two strings and comparing them. In case the strings are long and there are a lot of strings to compare, this becomes a slow process. However if we are sure that the string command is an interned string we can make this much faster. Using the following code

 static void ExecuteCommand(string command){    if (Object.ReferenceEquals(command,"START"))        Console.WriteLine("Starting Build...");    else if (Object.ReferenceEquals(command, "STOP"))        Console.WriteLine("Stopping Build...");    else if (Object.ReferenceEquals(command, "DELETE"))        Console.WriteLine("Deleting Build...");    else        Console.WriteLine("Invalid command...");}

This uses just the reference comparison (memory address comparison) and is much faster. However, the catch is that the command has to be an interned string. In case the command is not a literal string and is either generated or accepted from the user as a command line argument then this will not be an interned string and the comparisons will always fail. However we can intern it using the following

 string command = string .Intern(args[0].ToUpperInvariant());ExecuteCommand(command);

Here we have taken care of both the case handling as well as ensured that the command is interned. Another generic way of doing this could be

 string internedCmd = string.IsInterned(command)  ??  string.Intern(command);

Here IsInterned returns the interned string if the string is interned of else null. Intern interns the string and returns a reference to it. Interestingly this expression also uses the new ?? C# 2.0 operator.

Comments

  • Anonymous
    November 01, 2005
    I forgot to add that as always code for clarity and easy maintainability. Profile to see perf-bottlenecks and only then use optmizations as these....
  • Anonymous
    November 01, 2005
    The comment has been removed
  • Anonymous
    November 01, 2005
    Isn't the whole performance idea lost the moment you call string.intern and ToUpper() ?
  • Anonymous
    November 01, 2005
    This works mainly for interned string.
    Call to string.intern does have a performance hit. This'll work only if you need to intern the string say once and use comparisons multiple times with long strings. As with most optimization you need to profile and see if this is making a good difference to your perf and then use it.
  • Anonymous
    November 01, 2005
    I knew that intern has a perf hit; I was wondering whether IsInterned has the same perf hit or whether that could be done cheaply. If IsInterned is cheap, it would be possible to use it inside the standard equality operator for System.String so that interned strings compare fast with "==" while non-interned strings fall back to the slower path.

    If IsInterned is expensive of course that trick doesn't work...
  • Anonymous
    November 01, 2005
    The comment has been removed
  • Anonymous
    November 02, 2005
    The comment has been removed
  • Anonymous
    November 05, 2005
    The comment has been removed
  • Anonymous
    December 07, 2006
    The comment has been removed