Generating Combinations
Overview
This article presents a class which allows a DotNet application to process all possible combinations of items selected from a set. The class is coded in Visual Basic, but it can of course be easily translated into C# if desired. The class also demonstrates the use of Iterator Functions and implementation of the IEnumerable(Of T) interface (actually the derived interface “IReadOnlyCollection(Of T)”).
Introduction
I have an application that needs to test all possible combinations of a set of numbers, and I decided to write a class that would do this.
For those who don’t remember them from maths class, combinations are the different ways you can pick a given number of items from a larger set of items. The order that the items are picked doesn’t matter (if it did matter, we would be talking about permutations rather than combinations). There are two types of combination:
- Where each item can only be picked once. This is like picking numbered balls for a lottery drawing. You might have 50 balls numbered 1 through 50; you pick out 6 balls and they will each have a different number.
- Where each item can be picked multiple times. You might have 16 teams in a football league; you ask 6 people which team they support and you may find that more than one person supports a particular team.
There is more maths to cover when we look at calculating how many possible combinations there are, but I’ll leave that for later.
Framing the problem
So how should I frame the problem in a way that a class can solve it? I need the user of the class to provide:
- The number of items to pick (an Integer)
- The number of items to pick from (an Integer)
- Whether repetitions are allowed (a Boolean)
I can have the user supply these values to the constructor (Sub New) of the class and then save them as class level fields. Here is the class with its constructor:
Public Class Combinations
Private choose, total As Integer, repetition As Boolean
Sub New(pick As Integer, pickFrom As Integer, allowRepetition As Boolean)
If pick <= 0 Then Throw New ArgumentException( _
"Number of items to pick must be a positive integer", "pick")
If pickFrom <= 0 Then Throw New ArgumentException( _
"Number of items to pick from must be a positive integer", "pickFrom")
If Not allowRepetition And pick > pickFrom Then Throw New ArgumentException( _
"Number of items to pick must not exceed the number of items to pick from", "pick")
repetition = allowRepetition
choose = pick
total = pickFrom
End Sub
End Class
Note that the constructor validates the two integers. They must both be greater than zero, and if repetition is not allowed, the number of items to pick can’t be greater than the number of items to pick from. If those conditions aren't met, an ArgumentException is thrown.
Presenting the result
A combination can be presented as an array of Integer. If I am picking three items from a set of five then the array will have three elements with each element being a number from zero through four (five possible values). What if the items I want to pick from are not a set of consecutive numbers starting at zero? No problem, I can always store the items to pick from in an array or a list; each item can be referred to using its index and the index numbers are a set of consecutive numbers starting at zero.
I need to return not just one combination, but all possible combinations. I could create a list or an array of arrays to do that, but an alternative approach is suggested by how the class will be used. The user needs to process all possible combinations and so is almost certain to use a For loop to iterate through the combinations. Any class that implements the IEnumerable interface can be used in a For Each loop, so my first thought was to implement IEnumerable(Of Integer()).
After studying the interfaces in System.Collections.Generic I decided that ReadOnlyCollection(Of Integer()) was more appropriate. It is derived from IEnumerable(Of T) and it adds a Count property to return the number of items in the collection. It is possible to calculate the number of possible combinations, and it may be useful to provide a Count property to return that value. There is an extension method of IEnumerable called Count that does the same thing, but it does so by iterating through the entire collection to count the number of items and that could take much longer than just calculating it.
Note that IReadOnlyCollection(Of T) was added in version 4.5 of the .Net Framework. Projects targeting earlier versions (version 2.0 or later) can use IEnumerable(Of T) instead. The only difference is that the Count property will not be automatically added to the class and must be added manually (without the Implements IReadOnlyCollection(Of Integer()).Count).
If I add the statement “Implements IReadOnlyCollection(Of Integer())” immediately after my Class statement, Visual Studio adds three new methods to the class so that it now looks like this:
Public Class Combinations
Implements IReadOnlyCollection(Of Integer())
Private choose, total As Integer, repetition As Boolean
Sub New(pick As Integer, pickFrom As Integer, allowRepetition As Boolean)
If pick <= 0 Then Throw New ArgumentException( _
"Number of items to pick must be a positive integer", "pick")
If pickFrom <= 0 Then Throw New ArgumentException( _
"Number of items to pick from must be a positive integer", "pickFrom")
If Not allowRepetition And pick > pickFrom Then Throw New ArgumentException( _
"Number of items to pick must not exceed the number of items to pick from", "pick")
repetition = allowRepetition
choose = pick
total = pickFrom
End Sub
Public Function GetEnumerator() As IEnumerator(Of Integer()) _
Implements IEnumerable(Of Integer()).GetEnumerator
End Function
Public ReadOnly Property Count As Integer Implements IReadOnlyCollection(Of Integer()).Count
Get
End Get
End Property
Public Function GetEnumerator1() As IEnumerator Implements IEnumerable.GetEnumerator
End Function
End Class
Adding the code
The three methods that were added when I Implemented IReadOnlyCollection(Of Integer()) are the three methods that the IReadOnlyCollection interface requires. I’ll cover each one in turn.
The Count property
This returns an integer that contains the total number of items in the collection, in this case the total number of combinations. These numbers can get very big, so it is quite possible that the number will be larger than the maximum value of an Integer (about 2 billion). I would like to be able to return the number of items in a Long, but Count must return an Integer. The way I handle this is to have a LongCount property that returns the number as a Long as well as a Count property that returns the number as an Integer (or throws an exception if the number is too large).
The formula for the number of combinations of r items chosen without repetition from a set of n items is
n! / (r! * (n-r)!)
The “!” indicates a factorial (3! = 1 * 2 * 3 = 6; 5! = 1 * 2 * 3 * 4 * 5 = 120). Factorials can get very large very quickly; 20! Is that largest factorial that will fit in a Long and 22! Is the largest factorial that can fit in a Double without losing precision. Fortunately it is possible to simplify the calculation to something that is easier to code:
(r+1) \ 1 * (r+2) \ 2 * (r+3) \ 3 ..……. n \ (n-r)
The formula for the number of combinations of r items chosen with repetition from a set of n items is
(n+r-1)! / (r! * (n-1)!)
This can be simplified for coding purposes to:
(r+1) \ 1 * (r+2) \ 2 * (r+3) \ 3 ..……. (n+r-1) \ (n-1)
The interesting thing about these sequences is that there is never a remainder in any of the division operations, so I can use integer division (the “\ operator) without losing precision. Here is the code for the Count and LongCount properties.
Public ReadOnly Property Count As Integer Implements IReadOnlyCollection(Of Integer).Count
Get
Return CInt(LongCount)
End Get
End Property
Public ReadOnly Property LongCount As Long
Get
Dim result As Long = 1
For iter As Long = 1 To total - If(repetition, 1, choose)
result *= choose + iter
result \= iter
Next
Return result
End Get
End Property
It is possible to further optimise this code. If you write out the calculations that are being done, you will see that there can be long sequences of number that appear in both the numerator and the denominator and those sequences could be skipped to save time. I have chosen not to do so in order to avoid making the code more complex. It is likely that the optimisation would save at most a few milliseconds on an operation that will typically do done only once.
The GetEnumerator1 function
In addition to the generic function IEnumerable(Of T).GetEnumerator, we are also required (for compatibility) to implement the non-generic function IEnumerable.GetEnumerator. Visual Studio added GetEnumerator1 to implement the non-generic version, and all we need to do is have it call the generic version.
Private Function GetEnumerator1() As IEnumerator Implements IEnumerable.GetEnumerator
Return GetEnumerator()
End Function
Note that as I have no need to ever call this function, I have changed it from Public to Private.
The GetEnumerator function
This is where the real work is done. GetEnumerator must return an object that implements IEnumerator and is able to sequentially select each item in the collection. I don’t need to create my own class that implements IEnumerator, there are two easy ways to create such an object.
- Create every item in the collection and store them in an array (or anything that implements IEnumerable), then call GetEnumerator on the array. I want to avoid trying to store the entire collection in memory as it might be too big, so I chose not to use this method.
- Use an Iterator function. This is a special type of function used for iterating through a collection. Each time you call it, it returns the next item. The Yield statement is used to return the item and save the current state of the function; the next time the function is called, processing resumes immediately after the Yield statement. When the function exits, that signals that the iteration is complete. This is the method I have used, as it only requires one item to be in memory at a time.
Here is the code for GetEnumerator:
Public Iterator Function GetEnumerator() As IEnumerator(Of Integer()) _
Implements IEnumerable(Of Integer()).GetEnumerator
Dim value(choose - 1) As Integer
If Not repetition Then value = Enumerable.Range(0, choose).ToArray
Dim index As Integer = choose - 1
Do
Yield value
'Exit if iteration is complete
If value(0) = total - If(repetition, 1, choose) Then Exit Do
'Find the rightmost element that can be incremented and increment it
Dim oldIndex As Integer = index
Do While value(index) = total - If(repetition, 1, choose - index)
index -= 1
Loop
value(index) += 1
'If index changed, reset all elements to the right of the new index
If index <> oldIndex Then
Do
index += 1
value(index) = value(index - 1) + If(repetition, 0, 1)
Loop Until index = choose - 1
End If
Loop
End Function
The local variable “value” is the current item, it is an array of Integer with one element for each number that is to be picked. If repetition is allowed, value starts out with all elements equal to zero. If repetition is not allowed, value starts out with elements in the sequence 0, 1, 2, 3,…. (provided by the Enumerable.Range method). The local variable “index” is the index (into the value array) of the element that is to be changed next; it is initialised to the last element.
I am going to provide the items in numerical sequence. For example, when choosing three items from a set of four without repetition, I will return:
0, 1, 2
0, 1, 3
0, 2, 3
1, 2, 3
And when choosing two items from a set of three with repetition, I will return:
0, 0
0, 1
0, 2
1, 1
1, 2
2, 2
The rest of the function is an infinite loop (there is an Exit Do statement to provide a way out) and the first thing the loop does is to yield the current “value”. The Iterator function pauses at this point and continues at the next statement when the next value is required.
The statement after the Yield checks whether the last item has been returned and if so it exits the main loop and the iteration is compete. When using repetition, the last item is the only one where the first element of the array is the maximum possible number (total – 1). When not using repetition, the last item is the only one where the first element of the array is the difference between the number of items to select from and the number of items to be selected (total – choose).
The next paragraph finds the rightmost element that is not at its maximum value and increments it. When repetition is allowed, the maximum value is (total – 1), when repetition is not allowed, the maximum value is (total -1) for the rightmost element, (total – 2) for the second from the right, and so on.
The final paragraph checks if the current index was changed (because the current element had reached its maximum value) and if so resets all element to the right of the new index. When repetition is allowed, the elements are reset to the same value as the current one, When repetition is not allowed, each element is one more than the one on its left.
Note that although the Count and LongCount properties can’t return numbers higher than Integer.MaxValue and Long.MaxValue respectively, there is no limit on the total number of items that can be returned (other than the fact that very large numbers will take a long time to process).
The complete code
Here is the complete code for the class.
Option Explicit On
Option Infer Off
Option Strict On
''' <summary>Represents a set of possible combinations of integers</summary>
Public Class Combinations
Implements IReadOnlyCollection(Of Integer())
Private choose, total As Integer, repetition As Boolean
''' <summary>Create a new Combinations object</summary>
''' <param name="pick">Number of items to be picked</param>
''' <param name="pickfrom">Number of items to pick from (0 to pick - 1)</param>
''' <param name="allowRepetition">True if items may be picked more than once</param>
Sub New(pick As Integer, pickFrom As Integer, allowRepetition As Boolean)
If pick <= 0 Then Throw New ArgumentException( _
"Number of items to pick must be a positive integer", "pick")
If pickFrom <= 0 Then Throw New ArgumentException( _
"Number of items to pick from must be a positive integer", "pickFrom")
If Not allowRepetition And pick > pickFrom Then Throw New ArgumentException( _
"Number of items to pick must not exceed the number of items to pick from", "pick")
repetition = allowRepetition
choose = pick
total = pickFrom
End Sub
''' <summary>Returns the number of possible values</summary>
''' <returns>Calculated number of possible values</returns>
''' <exception cref="OverflowException">The number of possible values exceeds Integer.MaxValue</exception>
Public ReadOnly Property Count As Integer Implements IReadOnlyCollection(Of Integer()).Count
Get
Return CInt(LongCount)
End Get
End Property
''' <summary>Returns the number of possible values</summary>
''' <returns>Calculated number of possible values</returns>
''' <exception cref="OverflowException">The number of possible values exceeds or approaches Long.MaxValue</exception>
''' <remarks>This replaces the LongCount extension method of IEnumerable which iterates through the entire collection</remarks>
Public ReadOnly Property LongCount As Long
Get
Dim result As Long = 1
For iter As Long = 1 To total - If(repetition, 1, choose)
result *= choose + iter
result \= iter
Next
Return result
End Get
End Property
''' <summary>Gets the enumerator that can be used to select each possible value</summary>
Public Iterator Function GetEnumerator() As IEnumerator(Of Integer()) _
Implements IEnumerable(Of Integer()).GetEnumerator
Dim value(choose - 1) As Integer
If Not repetition Then value = Enumerable.Range(0, choose).ToArray
Dim index As Integer = choose - 1
Do
Yield value
'Exit if iteration is complete
If value(0) = total - If(repetition, 1, choose) Then Exit Do
'Find the rightmost element that can be incremented and increment it
Dim oldIndex As Integer = index
Do While value(index) = total - If(repetition, 1, choose - index)
index -= 1
Loop
value(index) += 1
'If index changed, reset all elements to the right of the new index
If index <> oldIndex Then
Do
index += 1
value(index) = value(index - 1) + If(repetition, 0, 1)
Loop Until index = choose - 1
End If
Loop
End Function
''' <summary>Non-generic method required for compatibility</summary>
Private Function GetEnumerator1() As IEnumerator Implements IEnumerable.GetEnumerator
Return GetEnumerator()
End Function
End Class
Example of use
Here is some code that illustrates how the Combinations class might be used. It simply displays all the possible combinations of three colours chosen without repetition from a palette of Red, Blue, Black, Pink and Green.
Dim colours As New List(Of String) From {"Red", "Blue", "Black", "Pink", "Green"}
Dim display As New List(Of String)
For Each combo In New Combinations(3, 5, False)
Dim line(combo.Length - 1) As String
For i As Integer = 0 To combo.Length - 1
line(i) = colours(combo(i))
Next
display.Add(String.join(", ", line))
Next
MessageBox.Show(String.Join(vbCrLf, display))
The resulting MessageBox displays:
Red, Blue, Black
Red, Blue, Pink
Red, Blue, Green
Red, Black, Pink
Red, Black, Green
Red, Pink, Green
Blue Black, Pink
Blue, Black, Green
Blue Pink, Green
Black, Pink, Green