C# Parallel Processing Case Study – Simple One Dimensional Integration
In this very short article we use the both of the parallel processing facilities found in C# .NET Framework 4.5.2, namely multitasking and parallel loops. We use the one dimensional trapezoidal integration technique with or without error correction. This is the simplest all basic numerical integration formulas and due to the slow convergence of the technique we can give our sequential and non-sequential methods time to do some work.
First a word about our computer system. All experiments mentioned in this article were run on a relatively new (late October 2015) Dell XPS 8900 computer with an Intel Core i7-6700K CPU operating at a clock frequency of 4.00 GHz with 16 GB DDR4 memory and a 2 TB hard-drive with a 32 GB SSD buffer. The code was compiled and ran in Debug mode within the Visual Studio 2015 Community version.
The trapezoidal rule is the first of the class of integration formulas known as the closed Newton-Cotes formulas and is given by the following equation:
Looking at the formula there are two obvious ways of introducing parallel processing to the integration algorithm: breaking the summation into a number of concurrent summing tasks or a parallel loop over the summation. We implement both of these suggestions in a C# application. If the derivative of integrand is easily calculable then an approximate error correcting term can be added to the preceding formula:
Our integration code base consists of six versions of the trapezoidal integration algorithm:
Sequential without error correction
Sequential with error correction
Parallel loop without error correction
Parallel loop with error correction
Multitasking without error correction
Multitasking with error correction
The code for the sequential versions are given next:
public double SeqTrapezoidalRule(
double a, double b, int n,
Func<double, double> f)
// a and b are the endpoints
// n is the number of steps
// f is f(x)
{
double h = (b - a) / (n + 1);
double sum = 0.5 * (f(a) + f(b));
for (int i = 1; i <= n; i++)
{
sum += f(a + i * h);
}
return h * sum;
}
public double SeqTrapezoidalRuleWithCorrection(
double a, double b, int n,
Func<double, double> f, Func<double, double> fp)
// a and b are the endpoints
// n is the number of steps
// f is f(x) and fp is the derivative of f f'(x)
{
double h = (b - a) / (n + 1);
double sum = 0.5 * (f(a) + f(b));
for (int i = 1; i <= n; i++)
{
sum += f(a + i * h);
}
return h * (sum + h * (fp(a) - fp(b)) / 12.0);
}
Now we display the parallel loop variants:
public double ParTrapezoidalRule(
double a, double b, int n,
Func<double, double> f)
// a and b are the endpoints
// n is the number of steps
// f is f(x)
{
double h = (b - a) / (n + 1);
double sum = 0.5 * (f(a) + f(b));
double[] innerSum = new double[n + 1];
Parallel.For(1, n + 1, i =>
{
innerSum[i] = f(a + i * h);
});
for (int j = 1; j <= n; j++)
sum += innerSum[j];
return h * sum;
}
public double ParTrapezoidalRuleWithCorrection(
double a, double b, int n,
Func<double, double> f, Func<double, double> fp)
// a and b are the endpoints
// n is the number of steps
// f is f(x) and fp is the derivative of f f'(x)
{
double h = (b - a) / (n + 1);
double sum = 0.5 * (f(a) + f(b));
double[] innerSum = new double[n + 1];
Parallel.For(1, n + 1, i =>
{
innerSum[i] = f(a + i * h);
});
for (int j = 1; j <= n; j++)
sum += innerSum[j];
return h * (sum + h * (fp(a) - fp(b)) / 12.0);
}
And finally we show the multitasking methods:
public struct State
{
public double a, h;
public int steps;
public Func<double, double> f;
public List<double> sum;
public State(double a, double h, int steps, Func<double, double> f)
{
this.a = a;
this.h = h;
this.steps = steps;
this.f = f;
sum = new List<double>();
}
}
public void SumTask(object o)
{
State s = (State)o;
double sum = 0;
for (int i = 1; i <= s.steps; i++)
{
sum += s.f(s.a + i * s.h);
}
s.sum.Add(sum);
}
public double TasTrapezoidalRule(
double a, double b, int cores, int n,
Func<double, double> f)
// a and b are the endpoints
// cores is the number of physical processors
// n is the number of steps must be a multiple of cores
// f is f(x)
{
double h = (b - a) / (n + 1);
double sum = 0.5 * (f(a) + f(b));
int steps = n / cores;
State[] states = new State[cores];
Task[] tasks = new Task[cores];
for (int i = 0; i < cores; i++)
{
states[i] = new State(a + i * h * steps, h, steps, f);
tasks[i] = new Task(SumTask, states[i]);
tasks[i].Start();
}
Task.WaitAll(tasks);
for (int i = 0; i < cores; i++)
{
sum += states[i].sum[0];
}
return h * sum;
}
public double TasTrapezoidalRuleWithCorrection(
double a, double b, int cores, int n,
Func<double, double> f, Func<double, double> fp)
// a and b are the endpoints
// cores is the number of physical processors
// n is the number of steps must be a multiple of cores
// f is f(x) and fp is the derivative of f f'(x)
{
double h = (b - a) / (n + 1);
double sum = 0.5 * (f(a) + f(b));
int steps = n / cores;
State[] states = new State[cores];
Task[] tasks = new Task[cores];
for (int i = 0; i < cores; i++)
{
states[i] = new State(a + i * h * steps, h, steps, f);
tasks[i] = new Task(SumTask, states[i]);
tasks[i].Start();
}
Task.WaitAll(tasks);
for (int i = 0; i < cores; i++)
{
sum += states[i].sum[0];
}
return h * (sum + h * (fp(a) - fp(b)) / 12.0);
}
The “cores” parameter is four on our computer’s architecture. At this point we must mention the four test definite integrals used in our numerical computations:
The numerical values of the first and second were found using the Wolfram online definite integral calculator. The third integral can be easily calculated using integration by parts. The last integral is well known to every competent student of elementary calculus. We only require 15 decimal point accuracy of our application.
The following tables summarize our results on the four preceding definite integrals:
Method on Integral 1 |
n |
Digits Accuracy |
Milliseconds |
Sequential |
1200 |
7 |
0 |
Sequential with Correction |
1200 |
15 |
0 |
Parallel Loop |
1200 |
7 |
0 |
Parallel Loop With Correction |
1200 |
15 |
0 |
Multitasking |
1200 |
7 |
0 |
Multitasking with Correction |
1200 |
14 |
0 |
Method on Integral 2 |
n |
Digits Accuracy |
Milliseconds |
Sequential |
4700 |
7 |
0 |
Sequential with Correction |
4700 |
15 |
0 |
Parallel Loop |
4700 |
7 |
0 |
Parallel Loop With Correction |
4700 |
15 |
1 |
Multitasking |
4700 |
7 |
1 |
Multitasking with Correction |
4700 |
14 |
1 |
Method on Integral 3 |
n |
Digits Accuracy |
Milliseconds |
Sequential |
4000 |
7 |
0 |
Sequential with Correction |
4000 |
13 |
0 |
Parallel Loop |
4000 |
7 |
1 |
Parallel Loop With Correction |
4000 |
13 |
32 |
Multitasking |
4000 |
7 |
32 |
Multitasking with Correction |
4000 |
15 |
32 |
Method on Integral 4 |
n |
Digits Accuracy |
Milliseconds |
Sequential |
2300 |
6 |
0 |
Sequential with Correction |
2300 |
14 |
0 |
Parallel Loop |
2300 |
6 |
3 |
Parallel Loop With Correction |
2300 |
14 |
4 |
Multitasking |
2300 |
6 |
6 |
Multitasking with Correction |
2300 |
15 |
7 |
So it seems that the multitasking version has the best rounding characteristics in most cases although at a runtime cost. It would be interesting to see which method is best for a definite integral computationally intense program.