SYSK 390: String.Format vs. concatenation

Since .NET 1.0, we’ve all been told – strings are immutable, so for best performance, avoid large number of string concatenation and use StringBuilder.  Ok, but how large is “large” and what about string.Format?
If you use your favorite .NET de-compiler, you’ll see that string.Format does quite a bit of work, and it would be logical to expect to pay, in terms of resource utilization, for that work.  And, how bad is really string concatenation – for example, if I just need to put together an address from its parts into one string, should I use StringBuilder?
Here are some numbers from some tests on my laptop:
- Doing string concatenation (e.g. result +=x) doesn’t appear to be a problem until I deal with over 10,000 strings.  Then the performance gets significantly worse:

Number of
concatenations Time taken in ms
-------------------------------------------
2,000 2
5,000 12
10,000 46
100,000 14,631

- Getting data elements from an array results in about the same performance in small numbers (up to 10,000), but at 100,000, the performance is almost 70% slower

- Doing 10,000 concatenations with string.Format with a format string taking 1,000 elements at a time (adding a loop to execute logic 10 time, so we end up with 10,000 concatenations) takes 18 ms (compare to 46 when doing appending one string at a time via traditional += type of concatenation)

- When Format string is only 100 elements and looping 100 times, the time taken for 10,000 concatenations drops down to under 3 ms.

The performance numbers change slightly, but the ratio stays about the same when I add GC.Collect call.
Ok, so that’s for large number of strings…  What about just a few concatenations, like putting together an address from its parts?
Running Concat function vs. Format function below 10,000 times resulted in roughly the same performance – about 3-4 ms total time taken.   Increasing the loop count to 10,000,000 resulted in 2,831 for 6 string concatenated via Concat function vs. 3,380 ms for same number of concatenations via Format.
Interestingly, when concatenating only two or three strings (e.g. person’s first and last name), the concatenation took almost 3 times less time than using the Format function -- 546 ms for 10,000,000 concatenations of 2 strings with Concat vs. 1501 via Format.
private static string Concat(string s1, string s2, string s3, string s4, string s5, string s6)
{
    return s1 + "\r\n" + s2 + "\r\n" +s3 + "\r\n" + s4 + ", " + s5 + " " + s6;
}
private static string Format(string s1, string s2, string s3, string s4, string s5, string s6)
{
    return string.Format("{0}\r\n{1}\r\n{2}\r\n{3}, {4} {5}", s1, s2, s3, s4, s5, s6);
}

So, what’s the significance of all of this?   In my opinion, when putting together small number of strings, e.g. person’s name, address, etc., even if you’re dealing with dozens and a few hundred strings, from performance point of view, the difference is negligible.  I used to favor string.Format because of the type conversion, but it you have a mismatch between the number of arguments in the format string’s and those passed in, you’ll end up with a runtime error. Nowadays, the framework properly handle concatenating a number, a string and a null, e.g.
int number = 5;
string nullString = null;
string result = number + nullString + "abc";

will give you "5abc".   So, string concatenations (in reasonable numbers) don’t appear to be such an evil…  However, with large number of concatenations you should evaluate your case and consider whether the performance hit, as well as the CPU utilization, warrant a different way of handing your data.

Comments

  • Anonymous
    July 06, 2014
    Isn't the performance difference related to string allocation? How does a large number of string allocations influence the overall memory allocation, and garbage collection, of the application? The recommendation to not use string concatenation is not to avoid one call to string.Concat with many arguments but to avoid many calls to string.Concat with two arguments.

  • Anonymous
    July 06, 2014
    Worth to note is that the Concat method has a different performance profile than concating with str += "newStr"; in a loop. s1 + "rn" + s2 + "rn" +s3 + "rn" + s4 + ", " + s5 + " " + s6 is compiled the same as string.Concat(s1, "rn", s2, "rn", s3, "rn", s4 + ", ", s5, " ", s6) That way the strings are concatenated all at once with only one memory allocation. With += in a loop you get lots of memory allocations and that is quite expensive when you overdo it. msdn.microsoft.com/.../ms228504.aspx

  • Anonymous
    July 07, 2014
    Also note that, the compiler is quite clever about concatenating strings like "abc" + myVar + "efg" as it will generate code that gets the length of each of the three string and allocate one string with the total length of the new string before copying each substring into the new string. But doing concatenation in a loop will result in a new string being allocated for each iteration of the loop. Here a StringBuilder should be used.

  • Anonymous
    July 09, 2014
    Nice findings. I personally always go with String.Format(). Using + to concatenate strings has never looked right...

  • Anonymous
    July 20, 2014
    Reply to Paulo Morgado:  My tests included GC.Collect to account for object allocation/deallocation.