Prime Number Sieve
Introduction
Sometimes we write code just for the fun of it. If you have ever run across something interesting and thought, “I’d like to give that a try!” you know what I mean, and who knows, you (I) might learn something along the way. That is how this came about. It started with this:
“…the information about prime or not prime was stored for 30 integers in each byte…”
Thirty integers in one byte, what? Research of that statement led to the link, “Black Key Sieve” (BKS) by Larry Deering (Deering). Mr. Deering’s work is based on Euler’s totient function, φ(n) with n = 30. This is also interesting and fun because it requires bits to be twiddled.
What follows is an implementation of Larry Deering’s BKS. It improves processing time and memory requirements of the classic Sieve of Eratosthenes, the first algorithm for calculating prime numbers. Future modifications to SieveBase30, such as segmentation, should result in even better performance.
Basics
A Sieve of Eratosthenes works in the following manner:
- Create a list of consecutive integers from 2 to n: (2, 3, 4, ..., n). This is the sieve.
- Initially, let p equal 2, the first prime number.
- Starting from p, count up in increments of p and mark each of these numbers greater than p itself in the list. These will be multiples of p: 2p, 3p, 4p, etc.; note that some of them may have already been marked.
- Find the first number greater than p in the list that is not marked. If there was no such number, stop. Otherwise, let p now equal this number (which is the next prime), and repeat from step 3.
All the numbers not marked in the list are prime.
How it works
BKS (SieveBase30) differs in what numbers are stored in the sieve, and how numbers are marked. Before examining how elements are struck from the sieve it will be useful to see how the end product is used to determine if a number is prime or not.
Check if Number is Prime
This implementation of the BKS stores primes as zero-bits and non-primes as one-bits in an array of bytes. All odd numbers greater than five are represented in the sieve. BASE is a constant in the class with a value of 30.
The BKS is a base 30 (2 × 3 × 5) sieve. This means that that none of the prime factors of 30 (2, 3, 5) are stored in the sieve and must be handled by the method that determines primality of a given number. The .IsPrime method of the SieveBase30 class performs that function and the code excerpt for it follows:
Const BASE As Long = 30L
Public ReadOnly BaseFactors As New List(Of Long) From {2L, 3L, 5L} '2 * 3 * 5 = 30
Public Function IsPrime(PrimeQ As Long, Optional TryFactors As Boolean = True) As Nullable(Of Boolean)
If PrimeQ < 2 Then Return False 'less than 2 is not prime
For Each pn As Long In Me.BaseFactors 'check base factors 2, 3, 5
If PrimeQ = pn Then Return True 'one of the factors of 30
If PrimeQ Mod pn = 0L Then Return False 'number to check is a mod of the factors of 30
Next
Dim unknown As New Nullable(Of Boolean)
If PrimeQ > Me.Max Then 'check if number is greater than max
If TryFactors Then
'try factoring
Dim afactor As Long = Me.FindaFactor(PrimeQ)
If afactor = 0L Then
Return unknown 'no value
ElseIf afactor < 0 Then
Return True
Else
Return False
End If
Else
Return unknown 'no value
End If
Else
Dim numStruct As New OffsetRemainder(PrimeQ) 'get the offset and remainder
If Me.theRmndrs.IndexOf(numStruct.rmndr) >= 0 Then 'is remainder in theRmndrs then
'it might be prime
'check the bit
Dim bit As Byte = Me.theBits(Me.theRmndrs.IndexOf(numStruct.rmndr))
'0 is prime, 1 is not
If (Me.Sieve(numStruct.offs) And bit) = PrimeChk Then Return True Else Return False
Else 'not one of theRmndrs
Return False
End If
End If
End Function
Private Structure OffsetRemainder
Public offs As Integer 'offset
Public rmndr As Integer 'remainder
Public Sub New(num As Long)
offs = CInt(num \ BASE) 'calculate offset into sieve
rmndr = CInt(num Mod BASE) 'and remainder
End Sub
End Structure
.IsPrime performs some required initial checks. They are:
- Check to see if the number is less than the first prime (2), and if so it is not prime.
- Then each of the base factors (2, 3, 5), stored in the List ‘BaseFactors’, are checked. This is required because these factors are not stored in the sieve.
- if the number is one of these base factors the number is prime.
- if the number is evenly divisible by the base factor the number is not prime.
- If the number is greater than the maximum number represented by the sieve a last resort method (.FindaFactor) can be tried at the user’s discretion (optional argument TryFactors). The .FindaFactor method will be discussed later.
Up to this point the .IsPrime method has not referenced the sieve and the base factors have been accounted for. Other types of sieves that store only odd numbers use similar methodology to account for even numbers.
The next part of .IsPrime gets to the heart of the algorithm. What is important is that all prime numbers, not in the base factors, mod 30 will have a remainder from this set of eight numbers:
{1, 7, 11, 13, 17, 19, 23, 29}
(Any Prime>5) Mod 30 ∈{1,7,11,13,19,23,29}
(you can prove this by performing a Mod 30 function on any known prime greater than five. Here is a list of some primes you can use to test this.)
How convenient; eight values, and eight bits in a byte! This means that the sieve is thirty times smaller.
.IsPrime now creates a new OffsetRemainder which takes the number being checked and returns a structure with two members. As can be seen, the members of the Structure are offs and rmndr, which are the offset into the sieve and the remainder. .IsPrime uses the remainder to access two Lists, theRmndrs and theBits. theRmndrs contains the eight remainders listed above, and theBits contain a binary mask. Table 1 shows what the lists look like and defines the relationship between bit and remainder:
Index |
theRmndrs |
theBits (in binary) |
0 |
1 |
00000001 |
1 |
7 |
00000010 |
2 |
11 |
00000100 |
3 |
13 |
00001000 |
4 |
17 |
00010000 |
5 |
19 |
00100000 |
6 |
23 |
01000000 |
7 |
29 |
10000000 |
Table 1
Using the Structure returned .IsPrime performs additional checks. They are:
- if the remainder is not in the list of remainders the number is not prime.
- if the remainder is in the list it uses the index of the remainder to retrieve a byte from theBits. Then a binary ‘And’ operation (AND) of the sieve offset and the byte field retrieved from theBits is performed.
- if the result of that AND is zero the number is prime
- if the result of the AND is not zero the number is not prime.
Our examination of .IsPrime shows how, “…the information about prime or not prime was stored for 30 integers in each byte…” is true. It also shows how the number being checked is broken down into a sieve offset and a remainder with corresponding bit mapping. Initialization of the sieve uses this same methodology.
Let’s take a look at an actual sieve. This sieve is for numbers up to 570. A binary dump of the 19 bytes comprising the entire sieve is shown.
Sieve size in bytes: 19
Maximum number: 570
Number of Primes: 104
Last prime in sieve: 569
Prime Percentage: 18.25 %
<<--------------0---------1---------2---------3----<>---4---------5---------6---------7--->>
0 > 00000001 00100000 00010000 10000001 01001001 00100100 11000010 00000110
8 > 00101010 10110000 11100001 00001100 00010101 01011001 00010010 01100001
16 > 00011001 11110011 00101100
Let’s look at an example. Using offset four (01001001) you can see that there are five prime numbers (0 bits) and three non-primes (1 bits). So what are the numbers? To start with multiply the offset by the BASE giving 120. Then, using Table 1 , add the eight remainders to that number. Finally AND each value in theBits with the byte at offset four to produce a value. Table 2 shows these operations.
Index |
theRmndrs |
+ 120 |
theBits |
AND 01001001 |
0 |
1 |
121 |
00000001 |
00000001 |
1 |
7 |
127 |
00000010 |
00000000 |
2 |
11 |
131 |
00000100 |
00000000 |
3 |
13 |
133 |
00001000 |
00001000 |
4 |
17 |
137 |
00010000 |
00000000 |
5 |
19 |
139 |
00100000 |
00000000 |
6 |
23 |
143 |
01000000 |
01000000 |
7 |
29 |
149 |
10000000 |
00000000 |
Table 2
For any offset, this procedure can be used to determine the numbers represented by the offset, and whether or not they are prime. Prime numbers are shown in red in the example.
For any sieve offset, call it O, the byte at O will represent odd numbers from O × 30 through ((O + 1) × 30) – 1. Using our example, offset four, this is 4 × 30 through ((4 + 1) × 30) – 1 = 120 – 149.
The last resort method for checking primality, .FindaFactor
The .FindaFactor method checks the number against all of the primes stored in the sieve. If the number is evenly divisible by the prime the number is not prime. Only primes less than the square root of the number being checked are tried. Why square root of some number N? This explanation from mathforum.org explains it:
*“We are seeking to find out whether N is prime. We have tested all primes less than the square root of N, and found that none is a factor of N. We can stop here, because we know there can be no prime factor of N that is greater than the square root of N. Why? Because, if there were, then there would also be a prime factor less than the square root of N, and we have already found out that there is none.” *(Rick)
If the number of factors for the number being checked is less than the largest prime then the primality is known certain. In other words the .IsPrime method knows primality for numbers up to (.LastPrime-1)2 .
Initialize the Sieve - marking non-primes
In all sieves the goal is to mark numbers that are not prime. SieveBase30 starts marking non-primes in a loop that is similar, if not identical, to other sieves. Starting at some prime it marks products of that prime and then continues with the next prime number. The loop is terminated when the next number is greater than the square root of the maximum number in the sieve. (Rick)
SieveBase30 has a method (.markProducts) to mark numbers not prime. This method uses a set of numbers, {7, 11, 13, 17, 19, 23, 29, 31}, the next eight primes that follow the base factors (2, 3, 5) to accomplish the marking. The first thing that happens in .markProducts is:
- in turn, multiply a number in the set times the prime, giving a product
- the product is divided by the BASE to create an offset into the sieve (note that this is integer division)
- store the offset into an array
Let’s look at that using 11 as an example.
set |
product |
offset |
7 |
77 |
2 |
11 |
121 |
4 |
13 |
143 |
4 |
17 |
187 |
6 |
19 |
209 |
6 |
23 |
253 |
8 |
29 |
319 |
10 |
31 |
341 |
11 |
Table 3
The section of code that performs this is:
Dim factors() As Long = New Long() {7L, 11L, 13L, 17L, 19L, 23L, 29L, 31L}
Dim offsSV(7) As Integer 'offsets into sieve
'calculate offsets
For i As Integer = 0 To factors.Length - 1
Dim prod As Long = thePrime * factors(i) 'product
offsSV(i) = CInt(prod \ BASE) 'calc offset
Next
One additional step is needed before marking of non-primes can begin, a bit must be found for the offset. The product in each step could be MOD’ed by the BASE to produce a remainder, and that remainder could be used to look up a bit. This is the same process that was discussed earlier when determining if the number was prime. Instead the method has an array of pre-computed bytes representing bits. The code for that is:
Dim preComputedChords()() As Byte = {New Byte() {1, 64, 32, 4, 2, 128, 16, 8},
New Byte() {2, 4, 8, 16, 32, 64, 128, 1},
New Byte() {4, 8, 128, 1, 16, 32, 2, 64},
New Byte() {8, 128, 2, 64, 1, 16, 4, 32},
New Byte() {16, 1, 64, 2, 128, 8, 32, 4},
New Byte() {32, 16, 1, 128, 8, 4, 64, 2},
New Byte() {64, 32, 16, 8, 4, 2, 1, 128},
New Byte() {128, 2, 4, 32, 64, 1, 8, 16}}
Dim preComputedChordOffs As Integer = CInt((7L * thePrime) Mod BASE) 'calculate first remainder
'get bits to turn on (not prime) based on remainder
Dim Chords() As Byte = preComputedChords(Me.theRmndrs.IndexOf(preComputedChordOffs))
preComputedChordOffs is the remainder of the first product times the prime MOD’ed with the base. The remainder is used to find an index in .theRmndrs. That index is used as an index into preComputedChords. Now that is a mouthful. Let’s look at our example again. . The index of 17 in .theRmndrs is 4 (see Table 1 ). preComputedChords(4) is the array that starts with 16. So Chrods() = {16, 1, 64, 2, 128, 8, 32, 4} in this example. If you do the math for each product you will find that the pre-computed chords are correct. Any prime number passed to the method will have its own set of offsets and one of the pre-computed byte arrays.
At this point we have two arrays with eight elements each. The arrays are offSV for offsets, and Chords for the bits. The name ‘Chords’ comes from BKS and was used here for clarity when comparing BKS to this implementation. Now we are ready to mark all products of the prime which the following piece of code does:
For svB As Integer = 0 To sv.Length - 1 Step primeNum
Dim realoffs As Integer = 0 'used as the index into the sieve
For i As Integer = 0 To Chords.Length - 1
'the following statement is required because the offsets are not linear
'e.g. offsets for 11 are 2, 4, 4, 6, 6, 8, 10, 11
realoffs = offsSV(i) + svB 'calculate offset into sieve
If realoffs < sv.Length Then 'within array bounds?
sv(realoffs) = sv(realoffs) Or Chords(i) 'turn the bits on to indicate not prime
Else
Exit For
End If
Next
Next
These eleven lines of code account for over 99% of the processor utilization during sieve initialization.
The outside for loop iterates the sieve in increments of the prime number being marked.
The inner for loop iterates Chords and in the process calculates an index into the sieve for the iteration. In the example we have been using, prime number 11, the first time through the offsets would be those shown in Table 3 . Every iteration the offset is increased by 11 until the end of the sieve is reached. As you may have guessed the smaller the prime the more times the loops execute.
To speed the process up the four smallest primes (7, 11, 13, and 17) are marked by creating an array of 17,017 (7 x 11 x 13 x17) bytes and cancelling them within that. Then this smaller array is copied to the sieve in blocks of 17,017 bytes. When complete the main loop can start from 19. This method is most helpful for larger sieves.
Now that you have seen how it works, let’s look at how it can be used.
Using the SieveBase30 Class
To illustrate the use of the SieveBase30 class a small sample VB 2012 project (TNSB30) was created. It is available here: https://skydrive.live.com/redir?resid=95E9849B11BDE22F!169&authkey=!AE83f8e4Tt7ZOTA&ithint=file%2c.zip .
If you want just the class SieveBase30 (SB30), it is available here: https://skydrive.live.com/redir?resid=95E9849B11BDE22F!171&authkey=!ANrVw-Biikrdp0s&ithint=file%2c.zip .
The first declaration seen in the form is the instantiation of a SieveBase30 class. The New method can take up to two arguments. The first argument is the limit, and tells the class to include all primes <= that number in the sieve. In this case it is 1,000,000. The New method accepts a second argument that determines if the method should return before the sieve is initialized. The default is to wait for sieve initialization to complete before returning. Sieve initialization is performed on a background thread. By setting this argument to false the initialization code can be started when the form is loaded. Loading of the form can proceed while the sieve is being initialized. Methods are provided to indicate if sieve initialization is complete, and progress if not. **If you choose NOT to wait the Ready method should be checked before the sieve is used. **
The Shown event is next. It initializes a ComboBox with some sieve sizes that will be used. You will note that there is a Debug.WriteLineIf statement in the event handler. If you have a multi-core processor you will likely not see this message. When the handler is completed the sieve is ready for use.
Next are two utility methods. The ShowStats method shows statistics about the sieve. The ShowProgress method shows the progress of sieve initialization. Depending upon the size of the sieve, number of cores, and speed of your system, you may or may not see the progress. Note that a TrackBar was intentionally used instead of a ProgressBar.
At this point the sieve is ready to use. Type numbers into TextBox1 and you will see the primality of the number in Label1 and the prime factorization of the number in TextBox2, if possible. The TextBox1.TextChanged event handler is used to accomplish this. If a number cannot be factored it will have a term with a negative exponent. The negative exponent should not be interpreted mathematically. The term with the negative exponent, disregarding the exponent, is what is left to be factored. The primality of this term is unknown.
ComboBox1 allows you to change the maximum number that the sieve contains, prime numbers less than or equal to the number selected. If you are wondering why 7,919 is in the list, it is because 7,919 is the 1,000th prime.
Button1 shows the first 1,100 primes if there is that many. It uses the IEnumerable implementation in the class to do this.
Button2 generates numbers randomly and shows the factorization of the number. Try this with different size sieves to see the difference.
Button 3, on the Encryption tab, shows simple RSA encryption. It is simple because it limits the size of the prime numbers being used, but it gives insight into the algorithm.
The SieveBase30 class has methods to .Save / .Load the sieve to / from disk. The default location is a file on the desktop with a filename of "primesieve.txt". To change the filename / location pass a path to the method or set the Path property. The Path property will contain the last path used to save or load.
The last method in the class is a self-check. It was used during the development of the class to ensure that changes to the code were viable.
The SieveBase30 class has many other methods and properties that you can explore on your own.
Summary
The SieveBase30 class improves on the performance of the classic Sieve of Eratosthenes. It also improves on memory requirements by storing the sieve in a compressed format. Your comments and suggestions are welcomed.
Those that helped
I’d like to thank Mr. Deering for his ideas and MSDN Visual Basic forum participants Reed Kimble and Crazypennie for their help and inspiration.
Appendix A - Counting Bits
The .CountPrimes method of the SieveBase30 class uses a bit twiddling hack that has been around for a number of years (Contributors and Anderson). The code in the method came from the following extension.
'Imports System.Runtime.CompilerServices
Module bitcount
Private Function CountBits(num As ULong, Optional sign As Boolean = False) As Integer
' based on http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel
' m1 = 0x5555555555555555; //binary: 0101...
' m2 = 0x3333333333333333; //binary: 00110011..
' m4 = 0x0f0f0f0f0f0f0f0f; //binary: 4 zeros, 4 ones ...
Dim i As ULong = num
i -= ((i >> 1) And &H5555555555555555UL) 'm1 ops = 3
i = (i And &H3333333333333333UL) + ((i >> 2) And &H3333333333333333UL) 'm2 ops = 4
i = (i + (i >> 4)) And &HF0F0F0F0F0F0F0FUL 'm4 ops = 3
i += i >> 8 'ops = 2
i += i >> 16 'ops = 2
i += i >> 32 'ops = 2
i = i And &H7FUL 'ops = 1
If sign Then i += 1UL 'ops = 1
Return CInt(i) 'total ops = 18
End Function
'unsigned
<Extension()>
Public Function CountBits(num As Byte) As Integer
Return CountBits(CULng(num), False)
End Function
<Extension()>
Public Function CountBits(num As UShort) As Integer
Return CountBits(CULng(num), False)
End Function
<Extension()>
Public Function CountBits(num As UInteger) As Integer
Return CountBits(CULng(num), False)
End Function
<Extension()>
Public Function CountBits(num As ULong) As Integer
Return CountBits(CULng(num), False)
End Function
'signed
<Extension()>
Public Function CountBits(num As SByte) As Integer
Return CountBits(CULng(num And &H7F), num < 0)
End Function
<Extension()>
Public Function CountBits(num As Short) As Integer
Return CountBits(CULng(num And &H7FFF), num < 0)
End Function
<Extension()>
Public Function CountBits(num As Integer) As Integer
Return CountBits(CULng(num And &H7FFFFFFF), num < 0)
End Function
<Extension()>
Public Function CountBits(num As Long) As Integer
Return CountBits(CULng(num And &H7FFFFFFFFFFFFFFF), num < 0)
End Function
End Module
Works Cited
Contributors and Sean Eron Anderson. Bit Twiddling Hacks. n.d. <http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel>.
Deering, Larry. The Black Key Sieve. 1998. <http://www.qsl.net/w2gl/blackkey.html>.
Rick, Doctor. Ask Dr. Math. 13 June 2001. 13 Jube 2001 <http://mathforum.org/library/drmath/view/56001.html>.