Comparing RegEx.Replace, String.Replace and StringBuilder.Replace – Which has better performance?
A few days ago I was with Frank Taglianetti (no links here, he doesn’t have a blog yet), a PFE from my team that I met for the first time at that day while doing a Lab for one of our customers. By Lab I mean stress testing and troubleshooting a customer’s application in our laboratory.
At some point we were reviewing a snippet of C# code that was the culprit for the slow performance. After reviewing it we started asking ourselves what would be a better approach: String.Replace() , RegEx.Replace() or StringBuilder.Replace() .
We didn’t care about case-sensitivity because the application had to replace special characters.
Then the fun began…
At that point, without doing any tests, we guessed about which would be the best and worst Replace() call.
Frank was able to come up with a theory to justify his guess for the worst performer, and it proved to be right!
Whenever I mention a co-worker, I like to add some words about him or her. During the Lab I was impressed with Frank (we use to call him Tag, based on his last name); he is a great developer and a great debugger, proving something I use to say, that these two skills walk together. Besides, the guy has a lot of experience! It’s always a pleasure to work with people like him because I always learn something new!
Back to the Lab…to find the fastest way to replace a character with another character from a large string, I decided to use PowerShell to test the three approaches.
Below I present the scripts and the results… They are by no means full stress tests; however, they are useful to give us a baseline when processing large text files. You may be surprised with the results. We were!
Note: The tests don’t consider the regular expression syntax that is part of the PowerShell language, since it cannot be reused from VB.Net or C#.
If you are curious about it, just create another function that uses –match and –imatch.
RegEx.Replace()
StringBuilder.Replace()
String.Replace()
Source code for RegEx.Replace:
#########################################################
## This is a sample test to measure the performance of StringBuilder.Replace against
## RegEx.Replace
##
## RegEx.Replace is case insensitive!
#########################################################
param(
[string] $fileName = $(throw "Error! You must provide the method name.")
)
set-psdebug -strict
$ErrorActionPreference = "stop"
trap {"Error message: $_"}
write-Host "Starting RegEx.Replace..." -foreground Green -background Black
# Attention! If you use [string] $text, the variable is not going to be a generic Object, but System.String.
# Doing that the performance improves a lot, although it was not enough to beat String.Replace.
# Get file content.
$str = get-Content $fileName
# For testing purposes, let's repeat the operation "n" times. ((?<value>(\n)))
for($i = 0; $i -le 200; $i++)
{
[regex]::Replace($str, "`n", "");
}
write-Host "End!" -foreground Green -background Black
Source code for StringBuilder.Replace:
#########################################################
## This is a sample test to measure the performance of StringBuilder.Replace against
## RegEx.Replace
##
## According to MSDN: The strings to replace are checked on an ordinal basis; that is,
## the replacement is not culture-aware. If newValue is a null reference
## (Nothing in Visual Basic), all occurrences of oldValue are removed. This method is case-sensitive.
#########################################################
param(
[string] $fileName = $(throw "Error! You must provide the method name.")
)
set-psdebug -strict
$ErrorActionPreference = "stop"
trap {"Error message: $_"}
write-Host "Starting StringBuilder.Replace..." -foreground Green -background Black
$builder = New-Object System.Text.StringBuilder
$fileContent = New-Object System.Text.StringBuilder
$value = ""
# Assign the content to our local variable.
[System.String] $str = get-Content $fileName
$fileContent.Append($str)
# For testing purposes, let's repeat the operation "n" times.
for($i = 0; $i -le 200; $i++)
{
$builder = $fileContent.Replace("`n", "")
}
write-Host "End!" -foreground Green -background Black
Source code for String.Replace:
#########################################################
## This is a sample test to measure the performance of StringBuilder.Replace against
## RegEx.Replace and String.Replace
##
## According to MSDN: The strings to replace are checked on an ordinal basis; that is,
## the replacement is not culture-aware. If newValue is a null reference
## (Nothing in Visual Basic), all occurrences of oldValue are removed. This method is case-sensitive.
#########################################################
param(
[string] $fileName = $(throw "Error! You must provide the method name.")
)
set-psdebug -strict
$ErrorActionPreference = "stop"
trap {"Error message: $_"}
write-Host "Starting String.Replace..." -foreground Green -background Black
[System.String] $builder = ""
[System.String] $fileContent = ""
$value = ""
# Assign the content to our local variable.
$fileContent = get-Content $fileName
# For testing purposes, let's repeat the operation "n" times.
for($i = 0; $i -le 200; $i++)
{
$builder = $fileContent.Replace("`n", "")
}
write-Host "End!" -foreground Green -background Black
From MSDN we have:
https://msdn2.microsoft.com/en-us/library/aa289509.aspx
Here’s another article that comes to the same conclusion:
Based on this simple test, RegEx.Replace() has the worst performance and the award goes to…drum roll, please… String.Replace()!
Comments
Anonymous
April 02, 2008
The comment has been removedAnonymous
April 02, 2008
Have you tried running it with the RegexOptions.Compiled compiled flag? You should get better performance.Anonymous
April 02, 2008
The comment has been removedAnonymous
April 02, 2008
The comment has been removedAnonymous
April 02, 2008
Thanks for the correction! After testing with other special chars I made this mistake when rolling back to newline again. Anyway, after fixing it the proportion didn't change. As you can see below String.Replace is the fastest approach. We just replaced special characters, not words or regular characters, because that was the requirement for a particular function we were investigating. I totally agree with you that it's not a full stress test and I mentioned this in the article. Here are the results: RegEx PS C:developmentMy Tools> measure-command { .RegExReplacePerformance.ps1 test.txt } Starting RegEx.Replace... End! Days : 0 Hours : 0 Minutes : 3 Seconds : 59 <-- Here! Milliseconds : 907 Ticks : 2399070288 TotalDays : 0.00277670172222222 TotalHours : 0.0666408413333333 TotalMinutes : 3.99845048 TotalSeconds : 239.9070288 TotalMilliseconds : 239907.0288 StringBuilder: PS C:developmentMy Tools> measure-command { .StringBuilderReplacePerformance.ps1 test.txt } Starting StringBuilder.Replace... End! Days : 0 Hours : 0 Minutes : 0 Seconds : 34 <-- Here! Milliseconds : 570 Ticks : 345702214 TotalDays : 0.000400118303240741 TotalHours : 0.00960283927777778 TotalMinutes : 0.576170356666667 TotalSeconds : 34.5702214 TotalMilliseconds : 34570.2214 String: PS C:developmentMy Tools> measure-command { .StringReplacePerformance.ps1 test.txt } Starting String.Replace... End! Days : 0 Hours : 0 Minutes : 0 Seconds : 20 <-- Here! Milliseconds : 583 Ticks : 205835077 TotalDays : 0.000238235042824074 TotalHours : 0.00571764102777778 TotalMinutes : 0.343058461666667 TotalSeconds : 20.5835077 TotalMilliseconds : 20583.5077 ThanksAnonymous
April 02, 2008
.NET:WorkingwithEvents,part1AFast/CompactSerializationFrameworkHowtosetanIISApplica...Anonymous
April 02, 2008
.NET: Working with Events, part 1 A Fast/Compact Serialization Framework How to set an IIS ApplicationAnonymous
April 02, 2008
It's me again ;-) there's still something wrong with that, because there's just no way the regex takes 8 times as long... If the code you're using is exactly what you pasted, the problem is that you're passing an array of strings to [regex]::replace instead of a string. (to be fair, you should also be assigning the output to a variable, although it won't make much difference if you use Measure-Command) try this: $lines = gc $fileName [string]$text = gc $fileName and then run your [regex]::replace on $lines and on $text ... the array takes MUCH longer (depending mostly on what you replace) My results (using the Get-PerformanceHistory script from PowerShellCentral.com/scripts): Duration Average Commmand
13.01577 0.13016 1..100 | ForEach { $out1 = [regex]::replace($lines,"n","") } 3.21073 0.03211 1..100 | ForEach { $out2 = [regex]::replace($text,"
n","") }
2.81232 0.02812 1..100 | ForEach { $out3 = $text.replace("`n","") }
Anonymous
April 02, 2008
Incidentally ... I think it's clear by now that my corrections aren't meant to disprove the basic premise -- that string.replace is fastest -- it obviously is. The only reason I bother with the correction is that the difference is very minor, not a multiple. My point is just to make sure it's clear that under normal circumstances, you should be able to just use whatever format your string is already in -- because the time (and memory) it takes to convert from one to the other (and back?) is too big of an offset ;-)Anonymous
April 02, 2008
The comment has been removedAnonymous
April 02, 2008
StringBuilder.Replace returns 'this', not a new instance (like String.Replace). So in this place $fileContent.Append($str) for($i = 0; $i -le 200; $i++) { $builder = $fileContent.Replace("`n", "") } for $i > 0, $fileContent does not contain "'n" - nothing to replace, searching only. IMHO, this fact can affect results. But. I've made the same things in C# (.NET 2.0), here is the code: static void Main(string[] args) { const int Runs = 200; string fileData = null; string result = null; using (StreamReader reader = new StreamReader("data.txt", Encoding.GetEncoding(1251))) { fileData = reader.ReadToEnd(); } Stopwatch timer = new Stopwatch(); /* RegEx Replace / for (int run = 0; run < Runs; run++) { timer.Start(); result = Regex.Replace(fileData, "n", " "); timer.Stop(); } Console.WriteLine("Regex.Replace - {0} ms", timer.ElapsedMilliseconds); timer.Reset(); / StringBuilder.Replace / StringBuilder builder = new StringBuilder(); for (int run = 0; run < Runs; run++) { builder.Append(fileData); timer.Start(); builder.Replace("n", ""); timer.Stop(); builder.Remove(0, builder.Length); } Console.WriteLine("StringBuilder.Replace - {0} ms", timer.ElapsedMilliseconds); timer.Reset(); / String.Replace */ for (int run = 0; run < Runs; run++) { timer.Start(); result = fileData.Replace("n", ""); timer.Stop(); } Console.WriteLine("String.Replace - {0} ms", timer.ElapsedMilliseconds); timer.Reset(); Console.ReadKey(); } I've got the following results: Regex.Replace - 3984 ms StringBuilder.Replace - 1691 ms String.Replace - 2108 ms Something's wrong?Anonymous
April 03, 2008
Interesting... your results are different from mine, Joel and http://www.codeproject.com/KB/cs/StringBuilder_vs_String.aspx?fid=326464&df=90&mpp=25&noise=3&sort=Position&view=Quick&fr=26 that uses C#. (I haven't tested your code in my machine) I'm wonder if someone else has also tested the three approaches in PowerShell or C#/VB.NET. I'm curious to see which approach is the fastest and if there is a consistent winner. :)Anonymous
April 03, 2008
Did you try replacing longer strings? IIRC Regex uses Boyer-Moore for matching, which should be more efficient for longer patterns. Of course, this only affects the time for searching, not the time needed for replacing the string, so it might also depend on how often the pattern has to be replaced. And since you're using the default locale, the results might be completely different (think factor 10!) on a machine running with a different locale.Anonymous
April 03, 2008
I didn't try to replace longer strings. I haven't investigated the locale, but thanks for pointing it. I'm wondering if the locale should affect the results, not the raw numbers from tests, but which is the fastest approach. My guess (and this is just a wild guess :) ) is that String.Replace should continue to be the fastest approach, because all times would be equally impacted after changing the locale.Anonymous
June 16, 2008
Have you run the CLR profiler on these three methods? String objects seem to hang around, clogging up the works, much longer that StringBuilder objects.Anonymous
June 16, 2008
No, I haven't. Did you get different results?Anonymous
August 05, 2009
The comment has been removedAnonymous
August 05, 2009
Regex.Replace - ms1148 StringBuilder.Replace - ms248 String.Replace - ms263 Same results as LoxAnonymous
July 18, 2012
Did you initialize the StringBuilder with length * 2?Anonymous
July 21, 2014
I totally agree with rafarah that the String.Replace() has the best performance among the String.Replace(), StringBuilder.Replace() and Regex.Replace(). String is a very LIGHT weight compared to HEAVY weight classes StringBuilder and Regex. Much of the time in the heavy weight classes are spent in the instantiating the new object and then building the resultant string after replacement operations. Below is the Proof of Concept in C# to support the above fact. using System; using System.Text; using System.Text.RegularExpressions; namespace ReplacePOC { public class Utils { public static string ReplaceSpecialCharactersWithString(string stringWithSpecialCharacters) { return string.IsNullOrEmpty(stringWithSpecialCharacters) ? string.Empty : (((stringWithSpecialCharacters.Replace(Environment.NewLine, Program.SingleSpace)) .Replace(Program.LineFeed, Program.SingleSpace)) .Replace(Program.CarriageReturn, Program.SingleSpace)) .Replace(Program.TabCharacter, Program.SingleSpace); } public static string ReplaceSpecialCharactersWithStringBuilder(string stringWithSpecialCharacters) { if (string.IsNullOrEmpty(stringWithSpecialCharacters)) { return string.Empty; } StringBuilder replaceBuilder = new StringBuilder(stringWithSpecialCharacters, stringWithSpecialCharacters.Length); replaceBuilder.Replace(Environment.NewLine, Program.SingleSpace); replaceBuilder.Replace(Program.LineFeed, Program.SingleSpace); replaceBuilder.Replace(Program.CarriageReturn, Program.SingleSpace); replaceBuilder.Replace(Program.TabCharacter, Program.SingleSpace); return replaceBuilder.ToString(); } public static string ReplaceSpecialCharactersWithRegEx(string stringWithSpecialCharacters) { if (string.IsNullOrEmpty(stringWithSpecialCharacters)) { return string.Empty; } return Regex.Replace( Regex.Replace( Regex.Replace(Regex.Replace(stringWithSpecialCharacters, Environment.NewLine, Program.SingleSpace), Program.LineFeed, Program.SingleSpace), Program.CarriageReturn, Program.SingleSpace), Program.TabCharacter, Program.SingleSpace); } } } Finally the results and String.Replace() WINS!!!!!!!!!!!!!!! C:WorkPOCReplacePOC>ReplacePOC.exe C:WorkPOCReplacePOCTestFile.txt Test File: C:WorkPOCReplacePOCTestFile.txt. Total items to test: 65536. Total TimeTaken for ReplaceSpecialCharactersWithString in Mills: 194.8743. Total TimeTaken for ReplaceSpecialCharactersWithStringBuilder in Mills: 301.6635. Total TimeTaken for ReplaceSpecialCharactersWithRegEx in Mills: 1009.9984. Press Enter to exit.Anonymous
July 21, 2014
Here is the TEST code for the above results. using System; using System.Text; using System.IO; using System.Diagnostics; namespace ReplacePOC { class Program { public const string CarriageReturn = "r"; public const string LineFeed = "n"; public const string TabCharacter = "t"; public const string SingleSpace = " "; static void Main(string[] args) { string testFile = string.Empty; if (args.Length <= 0) { Console.WriteLine("Enter a valid test file as argument."); Console.ReadLine(); return; } testFile = args[0]; if(!File.Exists(testFile)) { Console.WriteLine("Invalid test file."); Console.ReadLine(); return; } Console.WriteLine(string.Format("Test File: {0}.", testFile)); var fullFileLines = File.ReadAllLines(testFile);//File.ReadAllText(testFile).Split(new char[] { '|' }); Console.WriteLine(string.Format("Total items to test: {0}.", fullFileLines.Length)); //////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// var stopWatchString = Stopwatch.StartNew(); foreach (var fileLine in fullFileLines) { string replacedFileLine = Utils.ReplaceSpecialCharactersWithString(fileLine); } stopWatchString.Stop(); Console.WriteLine(string.Format("Total TimeTaken for ReplaceSpecialCharactersWithString in Mills: {0}.", stopWatchString.Elapsed.TotalMilliseconds)); //////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// var stopWatchStringBuilder = Stopwatch.StartNew(); foreach (var fileLine in fullFileLines) { string replacedFileLine = Utils.ReplaceSpecialCharactersWithStringBuilder(fileLine); } stopWatchStringBuilder.Stop(); Console.WriteLine(string.Format("Total TimeTaken for ReplaceSpecialCharactersWithStringBuilder in Mills: {0}.", stopWatchStringBuilder.Elapsed.TotalMilliseconds)); //////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// var stopWatchRegEx = Stopwatch.StartNew(); foreach (var fileLine in fullFileLines) { string replacedFileLine = Utils.ReplaceSpecialCharactersWithRegEx(fileLine); } stopWatchRegEx.Stop(); Console.WriteLine(string.Format("Total TimeTaken for ReplaceSpecialCharactersWithRegEx in Mills: {0}.", stopWatchRegEx.Elapsed.TotalMilliseconds)); Console.WriteLine("Press Enter to exit."); Console.ReadLine(); } } }Anonymous
August 26, 2014
I may be late to this conversation, but this issue may have to do with how the CLR handles Regex in 64-bit. MS has themselves admitted that XSLT & Regex utilizes 4x more memory than it did in 32-bit. connect.microsoft.com/.../508748 Its been an issue for 4+ years and still not resolved in 4.5 dev preview.