C#: Fast string comparison optimization

Sometime back I had blogged about using reference comparison to speed up ordinal string comparisons. Essentially this means that if you know for sure that your string is interned and you want to do an ordinal string comparison you can use the following optimization

 if (command == "START")    Console.WriteLine("Starting Build...");else if (command == "STOP")    Console.WriteLine("Stopping Build...");  // Optimizationif (Object.ReferenceEquals(command,"START"))    Console.WriteLine("Starting Build...");else if (Object.ReferenceEquals(command, "STOP"))    Console.WriteLine("Stopping Build...");

This works only if the string command is an interned string. If its not (e.g. its accepted from command lines or is constructed) you can intern it using something like string internedCmd = string.IsInterned(command) ?? string.Intern(command); and then use the interned version of the same string.

However an email to the internal CSharp alias made me look again. The email stated that the static method String.Equals(a,b) is more efficient than instance method String.Equals(a). I looked in into the code and it appears that for static Equals we already do the reference comparison first, while for instance Equals we jump right to string comparison.

 // Static Equals called from overloaded operator ==
public static bool Equals(string a, string b)
{
      if ((object)a == (object)b) 
      {
          return true;
      }
      if ((a != null) && (b != null))
      {
          return string.EqualsHelper(a, b);
      }
      return false;
}

This might get fixed in future .NET version when both Equals will do the reference comparison first. However, the optimization I suggested does not hold because its already been done in the framework right now.

So currently in the following snippet the second version is faster than the first

 string a = ...;
string b = ...;




a.Equals(b);
string.Equals(a, b); // faster

Comments

  • Anonymous
    January 06, 2006
    Honestly, why do you care for this?
    I run a billion string comparision in 1.5 seconds.

    This performance difference is so small that I don't even think about it until I start doing profiling, and that optimization is way down the list.

  • Anonymous
    January 06, 2006
    what about the switch statement? or has that just the same performance as the else if?
    Ayende has a point, but well, its always nice to know :-)

  • Anonymous
    January 06, 2006
    Frank take a look into http://blogs.msdn.com/abhinaba/archive/2005/10/28/486173.aspx. In this blog I had posted about common string comparisons and how they work including switch. in Switch the overloded == operator is called so the final comparison occurs in String.Equal(string, string)

    As you said its nice to know. In case your application does lot of string comparison the info might be helpful... Anyways for framework development its always nice to know where time is being taken in your API... But as usual optimize only based on profiling and only when the data shows you are not meeting your perf criteria. For all other purpose elegance and maintainability wins over optimization

  • Anonymous
    January 06, 2006
    Does the framework's shortcut still require the strings be interned, then?

  • Anonymous
    January 06, 2006
    The reference match succeeds only for interned string. The point is I had said in the earlier post that to take the benefit of interned string explicitly do a Object reference match. With the new info I got you are equaly good if you do a string.Equals(a,b) kind of comparison. However, both strings have to be interned....

  • Anonymous
    January 07, 2006
    I think the ReferenceEquals method is still faster if the strings are interned because when the strings are not equal, the == operator and string.Equals still perform string comparision, however ReferenceEquals can just return false.

  • Anonymous
    January 07, 2006
    BTW, Why interning requere you to change reference value in your code ?
    Why Runtime unable to change reference for you automaticaly ? It's allowed to move pointers all over heap as long as they are not pined.

    I.e. my suggestion is to make string.Intern() returning void.
    This can be smart move - once string.Equals operator (or ==) find two exact strings but at different locations - then reference to one of them will be changed to another; second unreferenced and become eligible for garbage collection (even more - all others references to second string will be noticed during GC and changed).

  • Anonymous
    June 29, 2007
    I recently had an app that I used the VSTS Profiler on to find that a significant portion of time was

  • Anonymous
    January 29, 2008
    Well, i made some tests for 2 strings: string a = "ASDFGHJKL"; string b = "ASDFGHJ"; In my measurements a.Equals(b)(average: 864 ms) performed a lot faster than string.Equals(a,b) (average: 2395). I used Framework 2.0. So if you could enlighten me i would appreciate.

  • Anonymous
    January 29, 2008
    Do the same in a loop for say 100,000 times and then take the average....

  • Anonymous
    January 30, 2008
    Hi! Here is my code: using System; using System.Diagnostics; namespace ConsoleApplication3 {    class Program    {        static void Main(string[] args)        {            string a = "ASDFGHJKL";            string b = "ASDFGHJ";            Stopwatch stopper = new Stopwatch();            stopper.Start();            for (long i = 0; i < 200000000; i++)                        {                //string.Equals(a, b);                a.Equals(b);            }            stopper.Stop();            Console.WriteLine("Time: {0}", stopper.ElapsedMilliseconds);        }                  }     } a.Equals(b) - takes around 1650 ms. string.Equals(a,b) - takes around 4900 ms. So what is the catch?

  • Anonymous
    April 10, 2008
    Elz is correct on my Dual Xeon procesor, a.Equals(b) - 1409 ms string.Equals(a,b) - 1610 ms a == b - 1610 ms

  • Anonymous
    April 10, 2008
    Also, c# reference says string == operator checks for values not reference. http://msdn2.microsoft.com/en-us/library/53k8ybth.aspx

  • Anonymous
    May 08, 2008
    Here is what I found... If a=b, string.Equals(a, b) is fastest.  If a <> b or a.Length <> b.Length, a.Equals(b) is fastest.   The example Elz gave, a and b have different lengths.

  • Anonymous
    June 29, 2008
    When looking on this results think that it is not better to use static Equals() - is fastest only if strings are same - or is good only if we just testing and expecting that string equals, but always - it plays role only in case of huge usage of string comparisons. This are meassures from testing code:

  • m1 = first method a.Equals(b)

  • m2 = second method string.Equals(a, b) (using == operator is almost same then m2, but always very little slower) CNT = 20.000.000

  1. different length : m1=199  m2=216
  2. same length but different: m1=473  m2=499
  3. same strings : m1=485  m2=118 CNT = 2.000.000.000
  4. different length : m1=18708  m2=21120
  5. same length but different: m1=47607 m2=49944
  6. same strings : m1=48626  m2=11262
  • Anonymous
    July 20, 2008
    PingBack from http://akutz.wordpress.com/2008/07/20/stringing-along-in-net/

  • Anonymous
    July 20, 2008
    I did some research on the topic at http://akutz.wordpress.com/2008/07/20/stringing-along-in-net/

  • Anonymous
    July 20, 2008
    I did some research on the topic at http://akutz.wordpress.com/2008/07/20/stringing-along-in-net/

  • Anonymous
    July 29, 2008
    I think the string comparison optimizatioin is not so useful. I wound rather to use the "a.Equals(b)". I think the optimization of string contains() or indexof() is more useful.

  • Anonymous
    October 21, 2008
    The reason is, that both String.Equals(Object) and String.Equals(String) are virtual, thus cannot be inlined, whilst static String.Equals(String,String) can.

  • Anonymous
    June 23, 2011
    why not use: string.Compare(...) ?

  • Anonymous
    August 01, 2011
    I haven't seen anyone mention that you need to run the tests with BOTH values that are equal and values that are not.  Some methods are faster when the strings are equal but slower than others when the strings are different. My metrics using Vance Morrison's MeasureIt app (by extending it), shows that string.Compare() is approx. 50x SLOWER than using either static string.Equals(a,b) or a.Equals(b) if the values are the the SAME. When the values are DIFFERENT, the string.Compare() is approx. 50x SLOWER than using either static string.Equals(a,b) or a.Equals(b) is approx. 12x SLOWER. You should use either the == operator or the Equals method for testing string value equality - string.Compare wasn't really meant for equality.

  • Anonymous
    August 01, 2011
    Sorry,  copy/paste error in my post... I meant to say if the values are different, string.Compare() is approximate 12x slower.