C#: Multitasking Case Study - Quadratic Sieve

Article
1/17/2024

Microsoft C# .NET Framework 4.5.2 has two excellent means of adding parallel processing to an application, namely, multitasking and parallel loops. In this case study we concentrate on multitasking.

All experiments mentioned in this article were run on a relatively new (late October 2015) Dell XPS 8900 computer with an Intel Core i7-6700K CPU operating at a clock frequency of 4.00 GHz with 16 GB DDR4 memory and a 2 TB hard-drive with a 32 GB SSD buffer. The code was compiled and run in Debug mode within the Visual Studio 2015 Community version.

The quadratic sieve is a means of factoring relatively large integers. Although the algorithm has been largely supplanted by the multiple polynomial quadratic sieve and the special and general number field sieve, it is still an interesting and useful introduction to modern factoring methods from a pedagogical point of view. The algorithm has four computationally intense phases:

Creation of the prime factor base consisting of t factors
Finding smooth numbers using the polynomial q(x) = (x + m)^2 - n, where m = Sqrt(n), and n is the number to be factored, we need t1 = t + 1 of these numbers
Finding the kernel of the exponent matrix (modulo 2) found in step 2
Finding the factors of the original number using the vectors of step 3 and numbers of step 2

Steps 2 and 4 dominate the runtime of the whole application. Therefore, these are the two parts of the algorithm that could benefit from parallel processing. The sequential version of this pair of test programs uses 3.21 Algorithm Quadratic sieve algorithm for factoring integers from Handbook of Applied Cryptography by Alfred J. Menezes, Paul C. van Oorschot, and Scott A. Vanstone.

For phase 4 of the applications we use Algorithm 2.3.1 (Kernel of a Matrix) found in A Course in Computational Algebraic Number Theory by Henri Cohen. We tried in vain to speed up this algorithm by the introduction of two parallel loops. Adding a parallel loop to the vector initialization loop did not statistically improve the algorithm's runtime. The addition of a second parallel loop actually slowed down the code so we abandoned attempts to rev up this important phase.

Now onto phase 2. The code for this phase in the sequential variant is illustrated next:

for (int i = 1; i <= maxI; i++)
{
 if (i == 1)
 {
 x = 0;
 }
 
 else if  (i % 2 == 0)
 {
 x = +i / 2;
 }
 
 else
 {
 x = -i / 2;
 }
 
 BigInteger xm = x + m, b = xm * xm - n;
 List<PrimeExpon> lpe = new  List<PrimeExpon>();
 
 if (TrialDivision(b, p, lpe))
 {
 for (int j = 0; j < t; j++)
 e[count, j] = v[count, j] = 0;
 
 for (int j = 0; j < t; j++)
 {
 long q = p[j];
 
 for (int k = 0; k < lpe.Count; k++)
 {
 PrimeExpon pe = lpe[k];
 
 if (q == pe.prime)
 {
 e[count, j] = (sbyte)pe.expon;
 v[count, j] = (sbyte)(pe.expon % 2);
 break;
 }
 }
 }
 
 ai.Add(xm);
 bi.Add(b);
 
 count++;

Since the test-bed CPU has four cores and eight logical processors we transform the preceding code into four concurrent tasks. First we define the Action's state object.

public struct  State
{
 public int  i0, i1, number;
 public BigInteger m, n;
 public List<int> p;
 public List<BigInteger> la, lb;
 public List<List<PrimeExpon>> lpel;
 
 public State(
 int i0, int i1, int number,
 BigInteger m, BigInteger n,
 List<int> p)
 {
 this.i0 = i0;
 this.i1 = i1;
 this.number = number;
 this.m = m;
 this.n = n;
 this.p = p;
 la = new  List<BigInteger>();
 lb =  new  List<BigInteger>();
 lpel = new  List<List<PrimeExpon>>();
 }
}

Next we introduce the Action that carries out both a loop and the trial division method of the sequential version:

public void  FactorTask(object o)
{
 int num = 0;
 BigInteger x = 0;
 HCSRAlgorithm hc = new  HCSRAlgorithm(1);
 State s = (State)o;
 
 for (int i = s.i0; i <= s.i1; i++)
 {
 if (i == 1)
 {
 x = 0;
 }
 
 else if  (i % 2 == 0)
 {
 x = +i / 2;
 }
 
 else
 {
 x = -i / 2;
 }
 
 BigInteger xm = x + s.m, b = xm * xm - s.n;
 List<PrimeExpon> lpe = new  List<PrimeExpon>();
 int result = -1;
 BigInteger tb = b, tsb;
 PrimeExpon pe = new  PrimeExpon();
 
 if (b == 0 || b == 1 || b == -1)
 result = -1;
 
 else
 {
 if (tb < 0)
 {
 tb = -tb;
 pe.prime = -1;
 pe.expon = +1;
 lpe.Add(pe);
 }
 
 tsb = Sqrt(tb);
 
 for (int j = 1; result == -1 && j < s.p.Count; j++)
 {
 int exp = 0;
 BigInteger q = s.p[j];
 
 if (q > tsb)
 {
 result = 0;
 }
 
 else if  (tb % q == 0)
 {
 do
 {
 exp++;
 tb /= q;
 } while  (tb % q == 0);
 
 pe.prime = (int)q;
 pe.expon = exp;
 lpe.Add(pe);
 
 if (tb == 1 || !hc.Composite(tb, 20))
 {
 result = 1;
 }
 }
 }
 }
 
 if (result == 1)
 {
 s.la.Add(xm);
 s.lb.Add(b);
 s.lpel.Add(lpe);
 num++;
 
 if (num == s.number)
 {
 return;
 }
 }
 }
}

The final step is to initialize the four tasks state objects and to launch the tasks to concurrently carry out phase 2 of the algorithm:

BigInteger m = Sqrt(n), x = 0;
 
states[0] =  new  State(  1, 10000000, number, m, n, p);
states[1] =  new  State(10000001, 20000000, number, m, n, p);
states[2] =  new  State(20000001, 30000000, number, m, n, p);
states[3] =  new  State(30000001, 40000000, number, m, n, p);
sw.Restart();
 
for (int j = 0; j < 4; j++)
{
 tasks[j] = new  Task(FactorTask, states[j]);
 tasks[j].Start();
}
 
Task.WaitAll(tasks);
 
// merge data from the four tasks
 
for (int j = 0; count < t1 && j < 4; j++)
{
 State s = states[j];
 
 if (s.la.Count != number)
 return -1;
 
 for (int k = 0; count < t1 && k < s.number; k++)
 {
 la.Add(s.la[k]);
 lb.Add(s.lb[k]);
 
 for (int l = 0; l < t; l++)
 e[count, l] = v[count, l] = 0;
 
 List<PrimeExpon> lpe = s.lpel[k];
 
 for (int l = 0; l < t; l++)
 {
 int q = s.p[l];
 
 for (int u = 0; u < lpe.Count; u++)
 {
 PrimeExpon pe = lpe[u];
 
 if (q == pe.prime)
 {
 e[count, l] = (sbyte)pe.expon;
 v[count, l] = (sbyte)(pe.expon % 2);
 break;
 }
 }
 }
 
 count++;
 }
}
 
sw.Stop();
times.Add(sw.ElapsedMilliseconds);

The results are pretty astonishing. The parallel also known as multitasking version dramatically cuts down the runtime of the factoring of our test case number. We attempt to completely factor the large Mersenne number, M(300) = 2 ^ 300 - 1. We chose this number since it is rich in a factors less than 1000000. These factors are: 3^2, 5^3, 7, 11, 13, 31, 41, 61, 101, 151, 251, 331, 601, 1201, 1321, 1801, 4051, 8101, 63901, 100801, and 268501. Two additional factors are: 10567201, 13334701, and a 22 decimal digit composite number. Below is a table of runtimes in seconds for the sequential variant varying t = the number of prime numbers in the factor base:

t	Time (s)	Factors Found
1000	51.898	11
2000	169.596	15
3000	375.588	14
4000	680.609	13
5000	1100.274	16

Now we show the multitasking program's analogous results:

t	Time (s)	Factors Found
1000	17.277	12
2000	60.537	14
3000	142.134	15
4000	273.330	14
5000	467.359	16

Graphs of the preceding table data are displayed below:

The phases required the runtimes in milliseconds are shown in the table below for the parallel version:

t	Phase 1	Phase 2	Phase 3	Phase 4
1000	22	14988	2245	15
2000	76	42761	17659	20
3000	69	85091	56942	29
4000	110	140494	132671	42
5000	122	212010	255138	57

In conclusion we see that studying an application using the Stopwatch object of the Diagnostics facility that a software engineer and/or programmer can easily see what parts of an application need optimization. A judicious usage of parallel loops and/or concurrent tasks can dramatically change the runtime performance of an executable.

Share via

C#: Multitasking Case Study - Quadratic Sieve

Additional resources