C# 2.0: Fast string comparison with string-interning
<Also read the later posting on the same topic>
Frequently in code we do string comparisons which is culture-agnostic. While we should follow the string comparison guidelines, there is a much faster way of getting it done using string interning.
As a sample lets take method that accepts a string and does some action based on the string.
static void ExecuteCommand(string command){ if (command == "START") Console.WriteLine("Starting Build..."); else if (command == "STOP") Console.WriteLine("Stopping Build..."); else if (command == "DELETE") Console.WriteLine("Deleting Build..."); else Console.WriteLine("Invalid command...");}
The problem with this code is that this actually does a full string comparison using string.Equals(command, "START", StringComparison.Ordinal); which results in iterating through each of the bytes of the two strings and comparing them. In case the strings are long and there are a lot of strings to compare, this becomes a slow process. However if we are sure that the string command is an interned string we can make this much faster. Using the following code
static void ExecuteCommand(string command){ if (Object.ReferenceEquals(command,"START")) Console.WriteLine("Starting Build..."); else if (Object.ReferenceEquals(command, "STOP")) Console.WriteLine("Stopping Build..."); else if (Object.ReferenceEquals(command, "DELETE")) Console.WriteLine("Deleting Build..."); else Console.WriteLine("Invalid command...");}
This uses just the reference comparison (memory address comparison) and is much faster. However, the catch is that the command has to be an interned string. In case the command is not a literal string and is either generated or accepted from the user as a command line argument then this will not be an interned string and the comparisons will always fail. However we can intern it using the following
string command = string .Intern(args[0].ToUpperInvariant());ExecuteCommand(command);
Here we have taken care of both the case handling as well as ensured that the command is interned. Another generic way of doing this could be
string internedCmd = string.IsInterned(command) ?? string.Intern(command);
Here IsInterned returns the interned string if the string is interned of else null. Intern interns the string and returns a reference to it. Interestingly this expression also uses the new ?? C# 2.0 operator.
Comments
- Anonymous
November 01, 2005
I forgot to add that as always code for clarity and easy maintainability. Profile to see perf-bottlenecks and only then use optmizations as these.... - Anonymous
November 01, 2005
The comment has been removed - Anonymous
November 01, 2005
Isn't the whole performance idea lost the moment you call string.intern and ToUpper() ? - Anonymous
November 01, 2005
This works mainly for interned string.
Call to string.intern does have a performance hit. This'll work only if you need to intern the string say once and use comparisons multiple times with long strings. As with most optimization you need to profile and see if this is making a good difference to your perf and then use it. - Anonymous
November 01, 2005
I knew that intern has a perf hit; I was wondering whether IsInterned has the same perf hit or whether that could be done cheaply. If IsInterned is cheap, it would be possible to use it inside the standard equality operator for System.String so that interned strings compare fast with "==" while non-interned strings fall back to the slower path.
If IsInterned is expensive of course that trick doesn't work... - Anonymous
November 01, 2005
The comment has been removed - Anonymous
November 02, 2005
The comment has been removed - Anonymous
November 05, 2005
The comment has been removed - Anonymous
December 07, 2006
The comment has been removed