Introduction to Grouping with LINQ/Lambda (VB.NET) Part 1
Introduction
Grouping information from a variety of data sources in a Visual Basic project for the purpose of categorizing data by one or more key elements outside of a reporting component will be discussed utilizing Language-Integrated Query LINQ and lambda (technically, this is called an “expression lambda”) using GroupBy clause and Enumerable.GroupBy method. Both anonymous and strong typed result sets examples used along with how to transform an anonymous result set when the results are created in a function needed in another method. Also covered, when to use LINQ verses lambda as there are advantages and disadvantages to both depending on a task to perform.
There are countless ways to group data while examples within are basic to intermediate level form the base for advance level grouping. While learning how to group information what should not be done is attempting to fit code to a task without understanding both the code and task, otherwise simply copying and pasting code more likely than not will fail.
In the various code samples which follow different data sources and scenarios are used for exposure to real world task along with providing a broader scope of what can be done with grouping data.
Part 2 LINQ/Lambda Group by multiple properties (VB.NET)
Requirements
- NET Framework 3.5 or higher
- SQL-Server - not for all samples
- SSMS (suggested)
- Entity Framework 6 code first with existing database
- Basic programming concepts working with Visual Basic
- Reading Microsoft Group By Clause documentation for a base on grouping
Basic LINQ Group by
Important For proper learning starting out simple allows understanding how parts of a group by query statement works. |
The first group by task counts the occurrences of each character in a given string in a console application where results are written to the console output. The code is in a procedure called by the main procedure.
Module Module1
Sub Main()
BasicExample()
Console.ReadLine()
End Sub
Sub BasicExample()
Dim InputString As String = "T0*A1?0*23aTA3 4T4\+a4 ?407#?A*6T+"
Dim characterGroup =
(
From character In InputString.ToCharArray()
Group characterElement = character By charElement = character Into Group
Select New With {.Character = charElement, .Occurrences = Group.Count}
).
ToList.
OrderBy(Function(anonymous) anonymous.Character)
Console.WriteLine($"Char Occurrences")
For Each currentItem In characterGroup
Console.WriteLine($"[{currentItem.Character}] {currentItem.Occurrences,5}")
Next
End Sub
End Module
The variable containing results, characterGroup is an anonymous type which is only available within the procedure, to use the results a container such as a concrete class or DataTable with matching properties is needed. When working with anonymous type variables generally developers will use single character variable names and/or meaningless names for variable which when working with a simple set of code which is fine until weeks, months or years later maintenance is required to code were the LINQ anonymous statement is within complex code, with meaningless variable names its difficult to follow code logic and one character variables in a LINQ query is hard to hover over these variables during a debug session.
Here is an example that would be hard to maintain while the code sample above variable names are descriptive.
Dim InputString As String = "T0*A1?0*23aTA3 4T4\+a4 ?407#?A*6T+"
Dim results = (From c In InputString.ToCharArray() Group c By c Into Group
Select New With {.Item = c, .Occurrences = Group.Count}).ToList.OrderBy(Function(x) x.Item)
Caveat, the query works as expected other than if the input string does not know how to differentiate between casing of letters which will be addressed in another example using IEqualityComparer Interface.
Continuing with this task, the next step is to make the result set available outside of the current code block, in this case a procedure which will move to creating a public container class. Although the result is anonymous the properties are easy enough to figure out, hover the mouse over character in the From clause reveals a Char or hover over various parts to see the types as shown below.
Although this works for a simple query this may not when dealing with complex queries. An easy way to reveal various types is to run the query from a console project as per below where in the For Each statement instead of displaying currentItem.Character and currentItem.Occurrences just the object currentItem is shown.
Dim InputString As String = "T0*A1?0*23aTA3 4T4\+a4 ?407#?A*6T+"
Dim characterGroup =
(
From character In InputString.ToCharArray()
Group characterElement = character By charElement = character Into Group
Select New With {.Character = charElement, .Occurrences = Group.Count}
).
ToList.
OrderBy(Function(anonymous) anonymous.Character)
Console.WriteLine($"Char Occurrences")
For Each currentItem In characterGroup
Console.WriteLine($"{currentItem} - {currentItem.Character.GetType()} - {currentItem.Occurrences.GetType()}")
Next
The results show both expected results which in this case are the default for the ToString property followed by showing types of each property.
Knowing the types a class can be used which matches up to the types learned above.
Public Class OccurrenceContainer
Property Character() As Char
Property Occurrences() As Integer
Public Overrides Function ToString() As String
Return $"{Character} {Occurrences}"
End Function
End Class
To use the class, replace
Select New With {.Character = charElement, .Occurrences = Group.Count}
With
Select New OccurrenceContainer With {.Character = charElement, .Occurrences = Group.Count}
Which ends up in a function returning type IOrderedEnumerable(Of OccurrenceContainer)
Function BasicExampleStrongTyped() As IOrderedEnumerable(Of OccurrenceContainer)
Dim InputString As String = "T0*A1?0*23aTA3 4T4\+a4 ?407#?A*6T+"
Dim characterGroup As IOrderedEnumerable(Of OccurrenceContainer) =
(
From character In InputString.ToCharArray()
Group characterElement = character By charElement = character Into Group
Select New OccurrenceContainer With {.Character = charElement, .Occurrences = Group.Count}
).
ToList.
OrderBy(Function(anonymous) anonymous.Character)
Console.WriteLine($"Char Occurrences")
For Each currentItem As OccurrenceContainer In characterGroup
Console.WriteLine(
$"{currentItem} - {currentItem.Character.GetType()} - {currentItem.Occurrences.GetType()}")
Next
Return characterGroup
End Function
Suppose a similar task to obtain words rather than characters, little change is needed. Here a list is used, variable names changed to reflect words rather than characters
Sub BasicWordExample()
Dim wordList = New List(Of String) From {"Hello", "Goodnight", "Hello", "Good", "Last"}
Dim wordGroup =
(
From word In wordList
Group wordElement = word By token = word Into Group
Select New With {.Word = token, .Occurrences = Group.Count}
).
ToList.
OrderBy(Function(anonymous) anonymous.Word)
For Each currentWord In wordGroup
Console.WriteLine($"{currentWord.Word}, {currentWord.Occurrences}")
Next
End Sub
Just like working with characters the above result is anonymous, only available within the procedure. To fix this a class scoped properly is needed e.g.
Public Class WordOccurrenceContainer
Property Word() As String
Property Occurrences() As Integer
Public Overrides Function ToString() As String
Return $"{Word} {Occurrences}"
End Function
End Class
Now the results are available using a function.
Function WordExampleStrongTyped() As IOrderedEnumerable(Of WordOccurrenceContainer)
Dim wordList = New List(Of String) From {"Hello", "Goodnight", "Hello", "Good", "Last"}
Dim wordGroup As IOrderedEnumerable(Of WordOccurrenceContainer) =
(
From word In wordList
Group wordElement = word By token = word Into Group
Select New WordOccurrenceContainer With {.Word = token, .Occurrences = Group.Count}
).
ToList.
OrderBy(Function(anonymous) anonymous.Word)
Return wordGroup
End Function
Group by single property
Anonymous types Allow you to select arbitrary fields into data structures that are strongly typed later on in your code |
Collections of a class were statistics are required often are done best using a Group By in a LINQ/Lambda query for example, in a school a report is required for listing students by last name. Walking through obtaining this is done projecting using a group by clause and optionally filtering by a count.
Definition of a student starts with a Person class which the Student class inherits.
Public Class Person
Public Property Id() As Integer
Public Property FirstName() As String
Public Property LastName As String
Public Property City As String
Public Property Country As String
Public Overrides Function ToString() As String
Return $"{FirstName} {LastName}"
End Function
End Class
Student class
Public Class Student
Inherits Person
Public Property Year() As Integer
Public ExamScores As List(Of Integer)
End Class
For a data source a mocked list is used rather than a database table were using a database table is fine but requires more attention while a mocked list or a delimited text file works better as there are less moving parts.
Namespace Classes
Public Class Students
Public Shared Function List() As List(Of Student)
Dim studentList As New List(Of Student)
studentList.Add(New Student() With {
.FirstName = "Mary",
.LastName = "Lee",
.Year = 2019,
.ExamScores = New List(Of Integer) From {97, 72, 81, 60}})
studentList.Add(New Student() With {
.FirstName = "Jerry",
.LastName = "Foster",
.Year = 2019,
.ExamScores = New List(Of Integer) From {75, 84, 91, 39}})
studentList.Add(New Student() With {
.FirstName = "Chris",
.LastName = "Scott",
.Year = 2020,
.ExamScores = New List(Of Integer) From {88, 94, 65, 85}})
studentList.Add(New Student() With {
.FirstName = "Jason",
.LastName = "Wong",
.Year = 2019, .ExamScores = New List(Of Integer) From {97, 89, 85, 82}})
studentList.Add(New Student() With {
.FirstName = "James",
.LastName = "Lang",
.Year = 2020,
.ExamScores = New List(Of Integer) From {35, 72, 91, 70}})
studentList.Add(New Student() With {
.FirstName = "Ming",
.LastName = "Lee",
.Year = 2019,
.ExamScores = New List(Of Integer) From {55, 62, 95, 75}})
studentList.Add(New Student() With {
.FirstName = "Karen",
.LastName = "Scott",
.Year = 2020, .ExamScores = New List(Of Integer) From {45, 82, 81, 80}})
studentList.Add(New Student() With {
.FirstName = "Edward",
.LastName = "Lee",
.Year = 2019,
.ExamScores = New List(Of Integer) From {55, 72, 81, 90}})
studentList.Add(New Student() With {
.FirstName = "Mary",
.LastName = "Smith",
.Year = 2019, .ExamScores = New List(Of Integer) From {75, 77, 91, 80}})
Return studentList
End Function
End Class
End Namespace
Starting off using an anonymous LINQ query which does not use the classes above, only the mocked data.
Dim students = PersonLibrary.Classes.Students.List()
Dim queryLastNames =
From student In students
Group By LastName = student.LastName Into studentGroup = Group
Where studentGroup.Count() > 1
Order By LastName
For Each currentItem In queryLastNames
Console.WriteLine(currentItem.LastName)
For Each student As Student In currentItem.studentGroup
Console.WriteLine($" {student}")
Next
Next
Results in a console application
If the intent is to use this data outside of the current procedure the query may be altered a strong typed query as done in the character and string examples, simply hover over the query to see the types.
Rather than modify the query above simply use it as shown below which create a strong type result.
Container
Public Class StudentContainer1
Public Property LastName() As String
Public Property List() As IEnumerable(Of Student)
End Class
Additional query
Dim students = PersonLibrary.Classes.Students.List()
Dim queryLastNames =
From student In students
Group By LastName = student.LastName Into studentGroup = Group
Where studentGroup.Count() > 1
Order By LastName
Dim EnumerationOfStudentContainer As IEnumerable(Of StudentContainer1) =
queryLastNames.Select(Function(item) New StudentContainer1 With {.LastName = item.LastName, .List = item.studentGroup})
For Each currentItem As StudentContainer1 In EnumerationOfStudentContainer
Console.WriteLine($"{currentItem.LastName}")
For Each student As Student In currentItem.List
Console.WriteLine($" {student}")
Next
Next
Caveat: a LINQ query can have conditions appended onto a query similarly to what was done above. For example, the following query reads xml data from a file then allows where conditions to be added to the original query.
What should not be done is to merge the two queries together, for instance this is perfect, two queries that are easy to maintain.
Dim students = PersonLibrary.Classes.Students.List()
Dim queryLastNames =
From student In students
Group By LastName = student.LastName Into studentGroup = Group
Where studentGroup.Count() > 1
Order By LastName
Dim EnumerationOfStudentContainer As IEnumerable(Of StudentContainer1) =
queryLastNames.Select(Function(item) New StudentContainer1 With {
.LastName = item.LastName,
.List = item.studentGroup})
While combing the two queries into one is going to be harder to alter and maintain as shown below. At first this may seem cool but wait a week or two and revisit the code and see if its still cool.
Dim EnumerationOfStudentContainer As IEnumerable(Of StudentContainer1) =
(
From student In students
Group By LastName = student.LastName Into studentGroup = Group
Where studentGroup.Count() > 1 Order By LastName).
Select(Function(item) New StudentContainer1 With {
.LastName = item.LastName,
.List = item.studentGroup})
Group by with min, max, count, average
A common requirement is to obtain count, totals, min, max and averages from a group by. The following example gets max orders for customer orders using Entity Framework where Orders is considered a navigation property of customers.
Public Sub MaxOrdersByCountry(context As NorthWindContext)
Dim maximumOrdersByCountry =
From customer In context.Customers
Group By customer.Country
Into MaxOrders = Max(customer.Orders.Count)
Dim results = maximumOrdersByCountry.ToList()
For Each item In results
Console.WriteLine($"{item.Country.Name} - {item.MaxOrders}")
Next
End Sub
Results shown provide count of orders per country.
There is a problem with the query above, using Visual Studio local window the following shows the entire country object is included.
To only bring across country name and max order a select clause is needed.
Dim maximumOrdersByCountry =
From customer In context.Customers
Group customer By customer.Country Into grouping = Group
Select Country.Name, MaxOrders = grouping.Max(Function(x) x.Orders.Count)
To allow the results to be used by the caller create a class to represent the above properties, Country name and max orders.
Public Class MaxOrder
Public Property Country() As String
Public Property MaxOrders() As Integer
End Class
Implement in the query
Dim maximumOrdersByCountry =
From customer In context.Customers
Group customer By customer.Country Into grouping = Group
Select New MaxOrder With {.Country = Country.Name, .MaxOrders = grouping.Max(Function(x) x.Orders.Count)}
Dim results As List(Of MaxOrder) = maximumOrdersByCountry.ToList()
Entity Framework 6
Grouping is written no different from prior examples using LINQ, Lambda or a combination of the two. The the following example grouping is on Microsoft NorthWind database grouping products by categories.
The following container class is used to return data to a calling form from a class performing reading/grouping of the data. When first writing a lambda query which follows below to know what properties are required use the same trick discussed at the beginning of this article or set a breakpoint after an anonymous query has executed, inspect the result set using the local window in Visual Studio while in a debug session. After writing several statements it will become apparent what properties and their types are needed.
Imports CategoryProductEntitiesLibrary.Models
Namespace ContainerClasses
Public Class ProductByCategory
Public Property Category() As Category
Public Property GroupCategoryProducts() As IGrouping(Of Category, Product)
End Class
End Namespace
Backend code to perform the group by operation. Note the code is executed asynchrony using lambda syntax.
- The Task performing the query does not have a useful name although this is inferred by the method return type.
- The .Select selector uses 'group' variable, not a useful name although this is know from product.Category in the GroupBy above.
Public Async Function GroupProductByCategoryTask() As Task(Of List(Of ProductByCategory))
Using context As New ProductContext
Dim results = Await Task.Run(
Function()
Return context.Products.GroupBy(Function(product) product.Category).
Select(Function(group) New ProductByCategory With {
.Category = group.Key,
.GroupCategoryProducts = group
}).
ToListAsync()
End Function)
Return results
End Using
End Function
The Key keyword used above [.Category = group.Key] enables you to specify behavior for properties of anonymous types.
In the form calling the code above uses the results in productByCategories into a ListView.
Private Async Sub GroupProductByCategoryButton_Click(sender As Object, e As EventArgs) _
Handles GroupProductByCategoryButton.Click
ListView1.Items.Clear()
Dim productByCategories As List(Of ProductByCategory) = Await operations.GroupProductByCategoryTask()
For Each group As ProductByCategory In productByCategories
ListView1.Items.Add(New ListViewItem(group.Category.CategoryName))
For Each product As Product In group.GroupCategoryProducts
ListView1.Items.Add(New ListViewItem(New String() {"", product.ProductName}))
Next
Await Task.Delay(1)
Next
TidyupListView()
End Sub
Private Sub TidyupListView()
ListView1.AutoResizeColumns(ColumnHeaderAutoResizeStyle.HeaderSize)
ListView1.EndUpdate()
ListView1.FocusedItem = ListView1.Items(0)
ListView1.Items(0).Selected = True
ActiveControl = ListView1
End Sub
Suppose rather than A-Z sort on category Z-A is desired, the following uses OrderByDecending which results a different container for returning results. Note the use of Key which is a keyword specifying the property of the group.
Public Function GroupProductSortCategoryDescending() As List(Of GroupSortProduct)
Dim results As IOrderedQueryable(Of GroupSortProduct)
Using context As New ProductContext
'
' Here we are grouping by CategoryName under Products
' then creating a strong type for the select and within the select
' sorting the products by product name followed
' by sorting category, the key for the group in descending order.
'
results =
context.Products.GroupBy(Function(product) product.Category.CategoryName).
Select(Function(group) New GroupSortProduct With {
.CategoryName = group.Key,
.Products = group.OrderBy(Function(prod) prod.ProductName)
}).
OrderByDescending(Function(group) group.CategoryName)
Return results.ToList()
End Using
End Function
Container for the above method.
Imports CategoryProductEntitiesLibrary.Models
Namespace ContainerClasses
Public Class GroupSortProduct
Public Property CategoryName() As String
Public Property Products() As IOrderedEnumerable(Of Product)
End Class
End Namespace
Often calculations are required e.g. sum, max, min of values in items within each group. In the following example the task is to group products by category, same as above with the total/sum of units of products.
Container class
Namespace ContainerClasses
Public Class ProductsGroupedSummed
Public Property Category() As String
Public Property TotalUnitsInStock() As Integer?
End Class
End Namespace
Code to perform the group with sumon the property UnitsInStock. The same code logic can be used in other queries on different tables.
Public Function ProductsGroupedSummed() As List(Of ProductsGroupedSummed)
Using context As New ProductContext
Dim productsGroupedSummedResults As New List(Of ProductsGroupedSummed)
Dim categories = context.Products.GroupBy(Function(product) product.Category).
Select(Function(productGroup) New With {
Key .Category = productGroup.Key,
Key .TotalUnitsInStock = productGroup.Sum(Function(p) p.UnitsInStock)
})
For Each cat In categories
productsGroupedSummedResults.Add(New ProductsGroupedSummed With {
.Category = cat.Category.CategoryName,
.TotalUnitsInStock = cat.TotalUnitsInStock
})
Next
Return productsGroupedSummedResults
End Using
End Function
Working with DataTable containers
The main difference between working with classes and DataTable containers is syntactical sugar. When working with classes values are accessed using class properties while with DataTable containers values are access using DataRow.Field(Of T)("Field name") extension method along with .AsEnumerable() method of the DataTable container.
First a group by using classes
Sub AnonymousToStrongTypedVersion()
Dim operations = New DataOperations
Dim customers = operations.CustomerList()
Dim baseQuery =
From customer In customers Order By customer.City
Group By CountryName = customer.Country Into regionalCustomers = Group
Order By CountryName
Dim customersByCountry As IEnumerable(Of CountryCompanyContainer) =
baseQuery.Select(Function(customer) New CountryCompanyContainer With {
.CountryName = customer.CountryName,
.Customers = customer.regionalCustomers,
.Count = customer.regionalCustomers.Count()
})
For Each TopGroup In customersByCountry
Console.WriteLine($"{TopGroup.CountryName} ({TopGroup.Count})")
For Each innerGroup In TopGroup.Customers
Console.WriteLine($"{innerGroup.CustomerIdentifier,5} {innerGroup.CompanyName} ({innerGroup.City})")
Next
Next
End Sub
The same operation performed using a DataTable
Private Sub CustomersDataTableGrouping()
Dim operations = New DataOperations
Dim customers = operations.CustomerDataTable()
Dim baseQuery =
From customer In customers.AsEnumerable()
Order By customer.Field(Of String)("City")
Group By CountryName = customer.Field(Of String)("Country") Into regionalCustomers = Group, Count()
Order By CountryName
Dim customersByCountry =
baseQuery.Select(Function(customer) New CountryCompanyDataTableContainer With {
.CountryName = customer.CountryName,
.Customers = customer.regionalCustomers,
.Count = customer.regionalCustomers.Count()
})
For Each container As CountryCompanyDataTableContainer In customersByCountry
Console.WriteLine($"{container.CountryName} ({container.Count})")
For Each dataRow As DataRow In container.Customers
Console.WriteLine($" {dataRow.Values()}")
Next
Next
End Sub
To shortcut displaying data row values the following extension method provides output.
Imports System.Runtime.CompilerServices
Namespace Classes
Public Module DataExtensions
<Extension>
Public Function Values(sender As DataRow) As String
Return String.Join(",", sender.ItemArray)
End Function
End Module
End NameSpace
Summary
Information provided to write basic LINQ/Lambda Group by query using classes and other containers to assist with writing queries effectively in Visual Studio solutions is a starting point. In the next part focus will be on grouping by multiple properties both with classes and other data containers.
See also
LINQ/Lambda Group by multiple properties (VB.NET)
VB.NET Writing better code recommendations
Source code
The following GitHub repository contains all code samples presented along with extra code samples all guaranteed to work.