C#: Fast string comparison optimization
Sometime back I had blogged about using reference comparison to speed up ordinal string comparisons. Essentially this means that if you know for sure that your string is interned and you want to do an ordinal string comparison you can use the following optimization
if (command == "START") Console.WriteLine("Starting Build...");else if (command == "STOP") Console.WriteLine("Stopping Build..."); // Optimizationif (Object.ReferenceEquals(command,"START")) Console.WriteLine("Starting Build...");else if (Object.ReferenceEquals(command, "STOP")) Console.WriteLine("Stopping Build...");
This works only if the string command is an interned string. If its not (e.g. its accepted from command lines or is constructed) you can intern it using something like string internedCmd = string.IsInterned(command) ?? string.Intern(command); and then use the interned version of the same string.
However an email to the internal CSharp alias made me look again. The email stated that the static method String.Equals(a,b) is more efficient than instance method String.Equals(a). I looked in into the code and it appears that for static Equals we already do the reference comparison first, while for instance Equals we jump right to string comparison.
// Static Equals called from overloaded operator ==
public static bool Equals(string a, string b)
{
if ((object)a == (object)b)
{
return true;
}
if ((a != null) && (b != null))
{
return string.EqualsHelper(a, b);
}
return false;
}
This might get fixed in future .NET version when both Equals will do the reference comparison first. However, the optimization I suggested does not hold because its already been done in the framework right now.
So currently in the following snippet the second version is faster than the first
string a = ...;
string b = ...;
a.Equals(b);
string.Equals(a, b); // faster
Comments
Anonymous
January 06, 2006
Honestly, why do you care for this?
I run a billion string comparision in 1.5 seconds.
This performance difference is so small that I don't even think about it until I start doing profiling, and that optimization is way down the list.Anonymous
January 06, 2006
what about the switch statement? or has that just the same performance as the else if?
Ayende has a point, but well, its always nice to know :-)Anonymous
January 06, 2006
Frank take a look into http://blogs.msdn.com/abhinaba/archive/2005/10/28/486173.aspx. In this blog I had posted about common string comparisons and how they work including switch. in Switch the overloded == operator is called so the final comparison occurs in String.Equal(string, string)
As you said its nice to know. In case your application does lot of string comparison the info might be helpful... Anyways for framework development its always nice to know where time is being taken in your API... But as usual optimize only based on profiling and only when the data shows you are not meeting your perf criteria. For all other purpose elegance and maintainability wins over optimizationAnonymous
January 06, 2006
Does the framework's shortcut still require the strings be interned, then?Anonymous
January 06, 2006
The reference match succeeds only for interned string. The point is I had said in the earlier post that to take the benefit of interned string explicitly do a Object reference match. With the new info I got you are equaly good if you do a string.Equals(a,b) kind of comparison. However, both strings have to be interned....Anonymous
January 07, 2006
I think the ReferenceEquals method is still faster if the strings are interned because when the strings are not equal, the == operator and string.Equals still perform string comparision, however ReferenceEquals can just return false.Anonymous
January 07, 2006
BTW, Why interning requere you to change reference value in your code ?
Why Runtime unable to change reference for you automaticaly ? It's allowed to move pointers all over heap as long as they are not pined.
I.e. my suggestion is to make string.Intern() returning void.
This can be smart move - once string.Equals operator (or ==) find two exact strings but at different locations - then reference to one of them will be changed to another; second unreferenced and become eligible for garbage collection (even more - all others references to second string will be noticed during GC and changed).Anonymous
June 29, 2007
I recently had an app that I used the VSTS Profiler on to find that a significant portion of time wasAnonymous
January 29, 2008
Well, i made some tests for 2 strings: string a = "ASDFGHJKL"; string b = "ASDFGHJ"; In my measurements a.Equals(b)(average: 864 ms) performed a lot faster than string.Equals(a,b) (average: 2395). I used Framework 2.0. So if you could enlighten me i would appreciate.Anonymous
January 29, 2008
Do the same in a loop for say 100,000 times and then take the average....Anonymous
January 30, 2008
Hi! Here is my code: using System; using System.Diagnostics; namespace ConsoleApplication3 { class Program { static void Main(string[] args) { string a = "ASDFGHJKL"; string b = "ASDFGHJ"; Stopwatch stopper = new Stopwatch(); stopper.Start(); for (long i = 0; i < 200000000; i++) { //string.Equals(a, b); a.Equals(b); } stopper.Stop(); Console.WriteLine("Time: {0}", stopper.ElapsedMilliseconds); } } } a.Equals(b) - takes around 1650 ms. string.Equals(a,b) - takes around 4900 ms. So what is the catch?Anonymous
April 10, 2008
Elz is correct on my Dual Xeon procesor, a.Equals(b) - 1409 ms string.Equals(a,b) - 1610 ms a == b - 1610 msAnonymous
April 10, 2008
Also, c# reference says string == operator checks for values not reference. http://msdn2.microsoft.com/en-us/library/53k8ybth.aspxAnonymous
May 08, 2008
Here is what I found... If a=b, string.Equals(a, b) is fastest. If a <> b or a.Length <> b.Length, a.Equals(b) is fastest. The example Elz gave, a and b have different lengths.Anonymous
June 29, 2008
When looking on this results think that it is not better to use static Equals() - is fastest only if strings are same - or is good only if we just testing and expecting that string equals, but always - it plays role only in case of huge usage of string comparisons. This are meassures from testing code:m1 = first method a.Equals(b)
m2 = second method string.Equals(a, b) (using == operator is almost same then m2, but always very little slower) CNT = 20.000.000
- different length : m1=199 m2=216
- same length but different: m1=473 m2=499
- same strings : m1=485 m2=118 CNT = 2.000.000.000
- different length : m1=18708 m2=21120
- same length but different: m1=47607 m2=49944
- same strings : m1=48626 m2=11262
Anonymous
July 20, 2008
PingBack from http://akutz.wordpress.com/2008/07/20/stringing-along-in-net/Anonymous
July 20, 2008
I did some research on the topic at http://akutz.wordpress.com/2008/07/20/stringing-along-in-net/Anonymous
July 20, 2008
I did some research on the topic at http://akutz.wordpress.com/2008/07/20/stringing-along-in-net/Anonymous
July 29, 2008
I think the string comparison optimizatioin is not so useful. I wound rather to use the "a.Equals(b)". I think the optimization of string contains() or indexof() is more useful.Anonymous
October 21, 2008
The reason is, that both String.Equals(Object) and String.Equals(String) are virtual, thus cannot be inlined, whilst static String.Equals(String,String) can.Anonymous
June 23, 2011
why not use: string.Compare(...) ?Anonymous
August 01, 2011
I haven't seen anyone mention that you need to run the tests with BOTH values that are equal and values that are not. Some methods are faster when the strings are equal but slower than others when the strings are different. My metrics using Vance Morrison's MeasureIt app (by extending it), shows that string.Compare() is approx. 50x SLOWER than using either static string.Equals(a,b) or a.Equals(b) if the values are the the SAME. When the values are DIFFERENT, the string.Compare() is approx. 50x SLOWER than using either static string.Equals(a,b) or a.Equals(b) is approx. 12x SLOWER. You should use either the == operator or the Equals method for testing string value equality - string.Compare wasn't really meant for equality.Anonymous
August 01, 2011
Sorry, copy/paste error in my post... I meant to say if the values are different, string.Compare() is approximate 12x slower.