Performance Quiz #3
The funny thing about managed code is that sometimes people forget that just because we offer an easy/convenient way to do things if you're doing a quick RAD application doesn't mean that the RAD way is in fact the right way. In fact, the good old fashioned way is very likely still available and may be an excellent choice. So that brings us to Performance Quiz #3.
The question: Characterize the difference in space and speed between RadParser and ClassicParser, consider the given test input to be typical.
I'll post an analysis in a few days. Don't peek at the feedback until you're done with your own work :)
--------- radparser.cs ---------------
using System;
using System.IO;
class RadParser
{
static void Main(String[] args)
{
StreamReader stream = new StreamReader(args[0]);
String line;
Char[] delims = { ' ', '\t' };
int sum = 0;
while ((line = stream.ReadLine()) != null)
{
String[] fields = line.Split(delims);
foreach (String field in fields)
{
sum += Int32.Parse(field);
}
}
Console.WriteLine("The total of the ints is: {0,8:n0}", sum);
}
}
---- testfile.txt (2500 lines of this)
234 2348 -7295 27 52932 -2352 2352 252532 1 34 452 52 345 2 245 4525
--------- classicparser.cs ---------------
using System;
using System.IO;
class ClassicParser
{
static void Main(String[] args)
{
FileStream fs = new FileStream(args[0], FileMode.Open, FileAccess.Read);
Byte[] bytes = new Byte[4096];
int sum = 0;
int tmp = 0;
bool negative = false;
bool digits = false;
for (;;)
{
int count = fs.Read(bytes, 0, bytes.Length);
if (count <= 0)
{
if (negative)
sum -= tmp;
else
sum += tmp;
break;
}
for (int i = 0; i < count; i++)
{
switch ((char)bytes[i])
{
case ' ': case '\t': case '\r': case '\n':
if (negative)
sum -= tmp;
else
sum += tmp;
negative = false;
digits = false;
tmp = 0;
break;
case '-':
if (negative)
throw new FormatException("two negatives");
if (digits)
throw new FormatException("negative after digits");
negative = true;
break;
case '0': case '1': case '2': case '3': case '4':
case '5': case '6': case '7': case '8': case '9':
digits = true;
tmp = tmp*10 + bytes[i] - (Byte)'0';
break;
default:
throw new FormatException("invalid character");
}
}
}
Console.WriteLine("The total of the ints is: {0,8:n0}", sum);
}
}
Comments
- Anonymous
June 25, 2004
The comment has been removed - Anonymous
June 25, 2004
Some people have pointed out that the two parsers don't parse exactly the same thing. This is of course correct -- the 2nd parser is somewhat simplified compared to the first. I did this largely in the interest of keeping the source code small and somewhat understandable.
One of the things I plan to discuss in the analysis is the complexity/correctness part of the trade-off.
However it is worth noting that tightening or widening the accepted strings of the 2nd parser would not materially affect its performance characteristics so for purposes of this analysis what is there is fairly representantive. - Anonymous
June 28, 2004
It's also worth mentioning that the second approach is almost free of function calls.
One of the important performance risks of calling into an API (particularly a closed-source one) is that you have no idea what it does to get your results, but you can have some assurance that it's not specially optimized for what you're doing.
Since the whole "problem" is closed, the JIT could (theoretically) optimize all the extra stuff away, but it'll be a long time before we see a JIT that powerful :)