Timing your code
Timing your code
One of the fundamental aspects of performance testing is identifying how long it will take you to execute the code you’re writing. There are a few ways you can measure this: measuring time elapsed while your code executes, computing how many instructions it may take and extrapolating time from there (based on target hardware), or I suppose you could guess. I can’t recommend the last option, however.
Hopefully, you’re reading Rico. He has a lot of information to share, but qualifies it by saying you should almost always follow it, except when you shouldn’t. I’m even less firm that that, even – all I wish to offer at this point is one method for accomplishing this task. It is by no means the “correct” way, nor is it necessarily the correct way for you.
A useful way to measure the product as it’s going along is to write micro benchmarks which exercise the hot code paths. A micro benchmark, essentially, is the distillation of the meat of a piece of functionality, stripped down to its core. With the application separated in this fashion, it’s easy to identify what’s slowing down an overall scenario. Once you’re done writing it, make sure to spend a little time thinking about what kind of results you expect to see. Depending on your requirements, of course, examples of good goals for a micro benchmark can be like “we need to be able to execute ShoppingCart::Add 300 times a second”, or “We need to be 5% faster than the last release.”
With that in mind, let’s first discuss some common pit falls of attempting to measure managed code execution speed.
Warm vs. Cold code paths
The first time a managed method runs, it’s going to go through a JIT prestub. The first time the method is called, it’s JITed. It also eliminates the prestub call, replacing it with the actual address of the freshly jitted code. Calling your method once outside the loop ensures you are not timing JIT compilation, or timing replacing the prestub.
Run your code path more than once
It’s a good idea, when designing a micro benchmark, to run many, many iterations. The point, essentially, is to discover the answer to the question “How many times can I do this, in how long a time span?” As you can’t really control the thread scheduler and other aspects of the system, a small amount of variance can creep into your measurement. The solution is to run a benchmark for around the same number of time, increasing or decreasing the number of iterations as necessary. On the CLR Performance test team, we target five second runs for most of our micro benchmark tests. We then track how many iterations it takes to accomplish that, and we can use the metric of iterations per second as the benchmark number.
Ensure the environment is right
There are plenty of things you can do to the machine to ensure your performance measurement won’t be impacted. It’s preferable, for example, to start from a clean state if possible. This can be accomplished thorough the standard means of imaging, using good uninstall / reinstall steps, or what have you. It’s also a good idea to disable things like the screen saver, anti-virus software (carefully! Make sure you’re in a safe environment), and any scheduled tasks that might run in the background. It also can be useful to disable any services that might start using the CPU or disk unexpectedly, if possible.
Tools, tools, tools
Of course, you have to have a method with which to time code. Classically, one method for doing so has been QueryPerformanceCounter in the unmanaged world. However, no facility exists in the released versions of the framework which implements this functionality. The solution is to use interop to access this functionality. At the least, you need a start, stop, and result reporting method. I’ve written a simple example of this, called SimpleTimer, for which the source is at the bottom of this post.
Writing a benchmark
Now, let’s spend some time thinking about our benchmark. I’m going to demonstrate a fairly contrived example here. Let’s suppose we’re implementing the ShoppingCart::Add method I’ve mentioned a couple times. This is a fairly perf critical place in our application, and we want to make sure that changes to it in the future don’t impact it heavily. Pretend with me that someone’s given us a DLL containing a ShoppingCart class, which we can test.
Our benchmark code might go something like this:
using System;
using Perf; // The namespace of our timer class, SimpleTimer
public class Test
{
public static void Main(string[] args)
{
SimpleTimer timer = new SimpleTimer();
long nIters = Int64.Parse(args[0]);
ShoppingCart sc = new ShoppingCart();
sc.Add(); // Warm up that code path
timer.StartTimer();
for(long i = 0; i < nIters; i++)
{
sc.Add();
}
timer.StopTimer();
timer.Result(nIters); // the Result method prints the results of the test to the screen.
}
}
Essentially, when this is compiled to an executable, you’d execute it passing in the number of iterations desired. Timer.Result will then print out the results of the test: number of iterations, time elapsed, and iters/sec.
That’s a pretty good start. There’s a potential problem to think about though: The overhead of the for loop might be more costly than what you’re attempting to time. In that case, the trick is to execute your code path multiple times within the loop.
What to do with the results?
Congrats! You now have a fully functional and useful benchmark for your project. A final tip is to run your benchmark a few times, and average the results together. This will go a long way to eliminating any further noise that may be creeping into your results.
Wrapping up
This is getting a bit long, so I think I’ll leave it there for now. The key takeaways are:
Identify and isolate your hot code paths – these can become your micro benchmarks.
Write your benchmarks with what you’re timing in mind – ensure you’re not timing your loop or other extraneous code.
Warm up the code paths you wish to test
A uniform environment is key to repeatable results
Report the result of multiple runs of a benchmark
SimpleTimer source
using System;
using System.Runtime.InteropServices;
using System.Security;
namespace Perf
{
[StructLayout(LayoutKind.Sequential, Pack = 8)]
public class large_int
{
[MarshalAs(System.Runtime.InteropServices.UnmanagedType.I8)]
public long value;
}
[SuppressUnmanagedCodeSecurity()] // This is to eliminate the security check calling down to QueryPerformanceCounter. Be careful where you use this!
public class SimpleTimer
{
[DllImport("kernel32.dll")]
public static extern bool QueryPerformanceCounter(large_int count);
[DllImport("kernel32.dll")]
public static extern bool QueryPerformanceFrequency(large_int freq);
private large_int m_Freq = new large_int();
private large_int m_Start = new large_int();
private large_int m_Stop = new large_int();
public SimpleTimer()
{
// It's a good idea to run your timer code path once to warm it up
StartTimer();
StopTimer();
}
public void StartTimer()
{
QueryPerformanceCounter(m_Start);
}
public void StopTimer()
{
QueryPerformanceCounter(m_Stop);
}
public void Result(long nIters)
{
QueryPerformanceFrequency(m_Freq);
double nResult = ((double)m_Stop.value-(double)m_Start.value) / (double)m_Freq.value;
double nItersSec = nIters / nResult;
Console.WriteLine("{0} iterations took {1} seconds, resulting in {2} iterations per second", nIters.ToString(), nResult.ToString("F3"), nItersSec.ToString("F3"));
}
}
}
[This post is provied AS IS, implying no warentees and conferring no rights.]