LINQ/Lambda Group by multiple properties (VB.NET)
Introduction
There are often cases when data needs to be grouped by multiple properties for reporting, this part of the series presents simple examples for grouping using both LINQ and lambda syntax style of writing group statements along with going from anonymous to strong typed result sets. More examples will be added over time from the existing examples in the GitHub repository which contains source for both this and part 1 of this series.
Many of the code samples presented are demonstrated using classes with mocked data or with Entity Framework 6 code first with an existing database. Data scripts are located in the root of the Visual Studio solution available in a GitHub repository with code samples for part 1 of the series.
Requirements
- NET Framework 3.5 or higher
- SQL-Server - not for all samples
- SSMS
- Entity Framework 6 code first with existing database
- Basic programming concepts working with Visual Basic
- Microsoft Group By Clause documentation for a base on grouping
Suggested reading
- Microsoft Query clauses documentation
- Microsoft Create a nested group for basics on nested grouping
- Microsoft GroupJoin documentation
Query vs. Method Syntax in LINQ
There are two different types of syntax that you can use to query with LINQ: query syntax and method syntax. In the code samples both types are used, as one becomes familiar with with writing queries one style lends itself better than the other.
Group by one property and multiple properties
The difference between grouping one property verses multiple properties is specifying multiple properties in an anonymous array as shown in the examples below. In figure 1 (using LINQ), a single property defines the group while in figure 2 (using Lambda) a anonymous array specifies which properties to group by.
The following screenshot shows two Lambda style queries, the first a single property, City while the second groups on City and Country.
Besides grouping a count is set as a property .PersonCount which is many time needed in reports. A similar requirement may be to not include groups with one object, in this case a person object which can be done with a where predicate e.g.
Dim groupResults = personList.
GroupBy(Function(person) New With {Key person.City, Key person.Country}).
Where(Function(grp) grp.Count() > 1).ToList()
Then, as in the prior examples shown in the screenshots adding a select clause.
Dim groupResults = personList.GroupBy(Function(person) New With {Key person.City, Key person.Country}).
Where(Function(grp) grp.Count() > 1).
Select(Function(grp) New PersonGroup With {
.City = grp.Key.City,
.Country = grp.Key.Country, .List = grp.ToList()
}).ToList()
Additionally, the last query written in LINQ syntax which some developers favor although it may also be dependent on the actual query complexity.
Dim groupResults = (
From person In personList
Group By personCityCountry = New With {
Key .City = person.City,
Key .Country = person.Country} Into group = Group
Where group.Count() > 1
Select New PersonGroup With {
.City = personCityCountry.City,
.Country = personCityCountry.Country,
.List = group.ToList()
}).ToList()
Both of the above examples use mocked data, nothing read from a database or text file.
GroupJoin
A GroupJoin is in SQL what is called a “Left Outer JOIN” while a Join in SQL refer to “Inner Join” (see join definition). In short, a GroupJoin will do a link between 2 entities even if the right side of the link has nothing to link to. In contrast, the Join will link 2 entities only if both entities contain a link between them.
The GroupJoin operator performs the same task as Join operator except that GroupJoin returns a result in group based on specified group key. The GroupJoin operator joins two sequences based on key and groups the result by matching key and then returns the collection of grouped result and key. The following example performs a group join on customer/order relations by customer identifier which has been setup in the SQL-Server database.
Results are placed into the following container.
Public Class CustomerOrder
Public Property Customer As Customer
Public Property Order As IEnumerable(Of Order)
End Class
Code
Public Function GroupJoinCustomersWithCompanyNameStartsWith(
context As NorthWindContext,
startsWithValue As String) As List(Of CustomerOrder)
context.Configuration.LazyLoadingEnabled = True
'
' Note to use Include the following Import is needed
' Imports System.Data.Entity
'
Dim results As List(Of CustomerOrder) = context.CustomerIncludes().
Where(Function(customer) customer.CompanyName.StartsWith(startsWithValue)).
GroupJoin(context.Orders, Function(c) c.CustomerIdentifier,
Function(order) order.CustomerIdentifier,
Function(customer, order) New With {
Key .Customer = customer,
Key .Order = order}).
Select(Function(customerData) New CustomerOrder With {
.Customer = customerData.Customer,
.Order = customerData.Order}).
ToList()
Return results
End Function
Using the following to run.
Imports NorthWindEntityLibrary
Imports NorthWindEntityLibrary.Classes
Imports NorthWindEntityLibrary.Models
Module Module1
Private operation As New NorthOperations
Sub Main()
Dim standardColor = Console.ForegroundColor
Using context = New NorthWindContext
Dim results As List(Of CustomerOrder) =
operation.GroupJoinCustomersWithCompanyNameStartsWith(context, "A")
For Each current As CustomerOrder In results
Console.ForegroundColor = ConsoleColor.Green
Console.WriteLine($"{current.Customer.CompanyName} - {current.Customer.Contact.FullName}")
Console.ForegroundColor = standardColor
For Each order As Order In current.Order
Console.ForegroundColor = ConsoleColor.White
Console.WriteLine($" {order.OrderID}, {order.OrderDate.Value.ToShortDateString()}")
Console.ForegroundColor = standardColor
For Each orderDetail As OrderDetail In order.OrderDetails
Console.WriteLine($" {orderDetail.Product.ProductName}, {orderDetail.Quantity}")
Next
Next
Console.WriteLine()
Next
End Using
End Sub
End Module
Working with readonly properties
There is a possibility that Entity Framework will throw an exception because the readonly property is not part of the model, in these cases an Expression builder may be needed or use .ToList off the DbSet the query is executed againsts. This will not happen when working with a Plain Old CLR Objects (POCO) class.
Statistics
If statistics are required e.g. minimum, maximum, count, sum etc then setup predicates in the GroupBy, in the following example min, max and count are returned by age of a person. The following example reads data from a SQL-Server database using Entity Framework 6.
Sub MinMaxCountForAgeProperty()
Using context As New PeopleContext
Dim people = context.People.ToList()
Dim query = people.GroupBy(
Function(person) Math.Floor(person.Age),
Function(person) person.Age,
Function(baseAge, ages) New With {
.Key = baseAge,
.Count = ages.Count(),
.Min = ages.Min(),
.Max = ages.Max()}
)
Dim output As New System.Text.StringBuilder
' Iterate over each anonymous type.
For Each result In query
output.AppendLine(vbCrLf & "Age group: " & result.Key)
output.AppendLine("Number of people in this age group: " & result.Count)
output.AppendLine("Minimum age: " & result.Min)
output.AppendLine("Maximum age: " & result.Max)
Next
' Display the output.
Console.WriteLine(output.ToString)
End Using
End Sub
Multiple tables multiple property grouping
Using the what has been presented so far the following performs a group by on three tables, Customer, Orders and Order Details. The new aspect is using three From clauses, one for each table.
To ensure there are proper relations e.g. order for each customer and customer for an order in the From a Where condition uses DefaultIfEmpty method.
Important: In the group sum there can be no null values on either (in this case) UnitPrice or Quantity properties. Since Entity Framework code first is used in the provided database, Order details table its possible to have a null value for UnitPrice so the the type was changed from Decimal to a nullable Decimal e.g. Decimal?
The query will look extremely complex (do not worry there is another one more complex than this) if time has not been spent learning the basics and practicing. It's extremely important as a query becomes complex to have decent names for property names and variables within a query as even an hour later the query can be difficult to figure out let along months later.
Public Sub ThreeTablesGrouping(north As NorthWindContext)
Dim query = From order In north.Orders
From customer In north.Customers.
Where(Function(c) CBool(c.CustomerIdentifier = order.CustomerIdentifier)).DefaultIfEmpty()
From orderDetail In north.OrderDetails.Where(Function(d) d.OrderID = order.OrderID).DefaultIfEmpty()
Group New With {
Key .order = order,
Key .customer = customer,
Key .details = orderDetail
} By GroupKey = New With {Key order.OrderDate.Value.Year, Key customer.CompanyName
} Into group = Group
Select New With {
Key .Company = GroupKey.CompanyName,
Key .OrderYear = GroupKey.Year,
Key .Amount = group.Sum(Function(e) e.details.UnitPrice * e.details.Quantity)
}
Dim queryResults = query.ToList()
Console.WriteLine()
End Sub
As with any anonymous result set this one can be done strongly typed.
IEqualityComparer / IEnumerable(Of IGrouping(Of String, Product))
When a distinct results is needed for grouping, IEqualityComparer(Of In T) Interface against a list can solve this. The following example has a list of products which has duplicate entries by Make and Model.
Namespace Classes
Public Class Products
Public Shared Function List() As List(Of Product)
Dim products As New List(Of Product) From {
New Product With {.Id = 1, .Make = "Samsung", .Model = "Galaxy S3"},
New Product With {.Id = 2, .Make = "Samsung", .Model = "Galaxy S4"},
New Product With {.Id = 3, .Make = "Samsung", .Model = "Galaxy S5"},
New Product With {.Id = 4, .Make = "Apple", .Model = "iPhone 5"},
New Product With {.Id = 5, .Make = "Apple", .Model = "iPhone 6"},
New Product With {.Id = 6, .Make = "Apple", .Model = "iPhone 6"},
New Product With {.Id = 7, .Make = "Apple", .Model = "iPhone 6"},
New Product With {.Id = 8, .Make = "HTC", .Model = "Sensation"},
New Product With {.Id = 9, .Make = "HTC", .Model = "Desire"},
New Product With {.Id = 11, .Make = "HTC", .Model = "Desire"},
New Product With {.Id = 12, .Make = "Nokia", .Model = "Lumia 735"},
New Product With {.Id = 13, .Make = "Nokia", .Model = "Lumia 930"},
New Product With {.Id = 14, .Make = "Nokia", .Model = "Lumia 930"},
New Product With {.Id = 15, .Make = "Sony", .Model = "Xperia Z3"}
}
Return products
End Function
End Class
End Namespace
ProductComparer class provides methods that carry out value comparisons which allow greater control of results in a consistent manner.
Namespace Classes
''' <summary>
''' Comparer on Make and Model properties
''' </summary>
Public Class ProductComparer
Implements IEqualityComparer(Of Product)
Public Shadows Function Equals(p1 As Product, p2 As Product) As Boolean _
Implements IEqualityComparer(Of Product).Equals
If ReferenceEquals(p1, p2) Then
Return True
End If
If ReferenceEquals(p1, Nothing) OrElse ReferenceEquals(p2, Nothing) Then
Return False
End If
Return p1.Make = p2.Make AndAlso p1.Model = p2.Model
End Function
Public Shadows Function GetHashCode(product As Product) As Integer _
Implements IEqualityComparer(Of Product).GetHashCode
If ReferenceEquals(product, Nothing) Then
Return 0
End If
Dim hashProductName As Integer = If(product.Make Is Nothing, 0, product.Make.GetHashCode())
Dim hashProductCode As Integer = product.Model.GetHashCode()
Return hashProductName Xor hashProductCode
End Function
End Class
End Namespace
In the following code the base source list is presented followed by results from the comparer then a lambda group by is used to group products.
Module Module1
''' <summary>
''' Example to first get distinct products in one query
''' followed by grouping on make property in a second query.
''' </summary>
Sub Main()
Dim productList = Products.List()
Console.WriteLine("Original list")
productList.ForEach(Sub(product) Console.WriteLine(product))
Console.WriteLine()
Console.WriteLine("ProductComparer results")
Dim productsQuery As IEnumerable(Of Product) =
productList.Distinct(New ProductComparer).
OrderBy(Function(product) product.Make)
For Each product As Product In productsQuery
Console.WriteLine(product)
Next
Console.WriteLine()
Console.WriteLine("Group by make")
Dim grouped As IEnumerable(Of IGrouping(Of String, Product)) =
productsQuery.GroupBy(Function(product) product.Make)
For Each grouping As IGrouping(Of String, Product) In grouped
Console.WriteLine(grouping.Key)
For Each product As Product In grouping
Console.WriteLine($" {product}")
Next
Next
Console.ReadLine()
End Sub
End Module
Extremely complex group by queries
Extremely complex grouping for business requirements can also be done once the basics have been mastered and there are other options dependent on willingness to think outside of LINQ and Lambda, especially when the work can be performed with conventional SQL.
Example, a database table is know to have duplicate rows of data where the duplicates are by multiple columns in a table which requires a business user to make decisions on which data is valid and which data is not valid along with the possibilities to merge data then delete extra rows.
In the following example a customer table has duplicates based on the following columns, CompanyName, ContactName, ContactTitle, Address, City and postal code. To get duplicates the following SELECT will work were the data may be stored in a DataTable or List(Of T).
Note that in a proper database schema there would be
- Proper constraints to guard against duplicates
- Unit testing done on applications to interact with the database
- The data should be normalized, it's not to keep the exercise easy to follow.
SELECT A.*
FROM Customers A
INNER JOIN
(
SELECT
CompanyName,
ContactName,
ContactTitle,
Address,
City,
PostalCode
FROM
Customers
GROUP BY
CompanyName,
ContactName,
ContactTitle,
Address,
City,
PostalCode
HAVING COUNT(*) > 1
) B
ON
A.CompanyName = B.CompanyName AND
A.ContactName = B.ContactName AND
A.ContactTitle = B.ContactTitle AND
A.Address = B.Address AND
A.City = B.City AND
A.PostalCode = B.PostalCode
ORDER BY
A.CompanyName
Data presented (less controls to perform deletes and merging).
To perform the same operation with grouping is a two step process to keep code easy to read.
Using a DataTable for returning results to a control e.g. DataGridView in this case.
Step 1
Read data from the database followed by grouping data on the exact same column as per above. The variable duplicates is an anonymous type.
Step 2
Iterate results from step 1 into a pre defined DataTable.
Returning duplicates in a List of a class which inherits from a base class to represent a single customer.
Namespace Classes
''' <summary>
''' Provides a column to mark a row in a DataGridView
''' to act upon
''' </summary>
Public Class CustomerRigger
Inherits Customer
Public Property Process() As Boolean
End Class
End Namespace
Customer class
Namespace Classes
Public Class Customer
Public Property Identifier As Integer
Public Property CompanyName As String
Public Property ContactName As String
Public Property ContactTitle As String
Public Property Address As String
Public Property City As String
Public Property PostalCode As String
Public Property Exists As Boolean
Public ReadOnly Property ItemArray() As String()
Get
Return {CStr(Identifier).PadLeft(3, "0"c), CompanyName, ContactName, ContactTitle, Address, City, PostalCode}
End Get
End Property
Public Overrides Function ToString() As String
Return $"'{CompanyName}' '{ContactName}' '{ContactTitle}' '{Address}' '{City}' '{PostalCode}'"
End Function
End Class
End Namespace
The following code differs from the first example with a DataTable is to use a LINQ statement to create the strong type results.
Public Function GetCustomerDuplicatesAsList() As List(Of CustomerRigger)
Dim dataOperations As New DataOperations
Dim dt As DataTable = dataOperations.ReadCustomersFromDatabase()
Dim duplicates As IEnumerable(Of IEnumerable(Of CustomerRigger)) = From dataRow In dt.AsEnumerable() Select item = New With {
Key .Identifier = dataRow.Field(Of Integer)("Identifier"),
Key .CompanyName = dataRow.Field(Of String)("CompanyName"),
Key .ContactName = dataRow.Field(Of String)("ContactName"),
Key .ContactTitle = dataRow.Field(Of String)("ContactTitle"),
Key .Street = dataRow.Field(Of String)("Address"),
Key .City = dataRow.Field(Of String)("City"),
Key .PostalCode = dataRow.Field(Of String)("PostalCode")}
group temp = item By Key = New With {
Key .CompanyName = item.CompanyName,
Key .ContactName = item.ContactName,
Key .ContactTitle = item.ContactTitle}
Into Group Where Group.Count() > 1 Select Group.Select(Function(g) New CustomerRigger With {
.Identifier = g.Identifier,
.CompanyName = g.CompanyName,
.ContactName = g.ContactName,
.ContactTitle = g.ContactTitle,
.Address = g.Street,
.City = g.City,
.PostalCode = g.PostalCode
})
Return (From item In duplicates From row In item Select row).ToList()
End Function
Special note (from Microsoft documentation)
The Key keyword enables you to specify behavior for properties of anonymous types. Only properties you designate as key properties participate in tests of equality between anonymous type instances, or calculation of hash code values. The values of key properties cannot be changed.
Tips
Analysers recommendations
When a piece of code is greyed out this is an indication that the code is never coded or utilized. Most of the time this is true although there are some cases Visual Studio does not analyze code correctly. This may be frustrating when writing Group By logic. In the following snippet note the shadow properties which works as expected.
Removing the anonymous properties
Clearly Visual Studio did not make the correct recommendation so when using standard analysers and third analysers don't always trust their recommendations.
Code folding
Code folding is taking a single line of code and "folding" a long line.
Without folding as shown in figure 1 below is inherently harder to read then in figure 2. Consider writing figure 1 today then need to modify figure 1 six or more months down the road verses the code in figure 2.
Figure 1
Dim groupResults = (
From person In personList Group By personCityCountry = New With {Key .City = person.City, Key .Country = person.Country} Into group = Group
Where group.Count() > 1 Select New PersonGroup With {.City = personCityCountry.City, .Country = personCityCountry.Country, .List = group.ToList()}).ToList()
Figure 2
Dim groupResults = (
From person In personList
Group By personCityCountry = New With {
Key .City = person.City,
Key .Country = person.Country} Into group = Group
Where group.Count() > 1
Select New PersonGroup With {
.City = personCityCountry.City,
.Country = personCityCountry.Country,
.List = group.ToList()
}).ToList()
Know complete results of queries
A common mistake when writing any code which interacts with any data is knowing the results beforehand. When working with LINQ and Lambda against a database first write SQL statements, review the results then write the code quires, run the queries in a unit test method and validate results against SQL. By first writing SQL a deeper knowledge of data will be when there are unexpected results in code. This also ties into optimization of queries, write a bad SQL statement is no different than writing a bad Entity Framework query that when viewing a statement looks great yet Entity Framework takes the code and writes good or bad underlying SQL dependent on a developers understanding of their data.
Debugging
Take time to learn how to properly debug code which includes using Visual Studio breakpoints, breakpoint conditions, data tips, local window, watch and quick watch windows. Depending on the edition of Visual Studio include learning to work with IntelliTrace which should only be turned on when needed.
Summary
Basic to advance grouping methods have been presented although not ever possible situation have been covered. By using what has been presented in this and part 1 of the series provides what is needed to write statements for most situations. One thing to be cognitive of is that if a query is written poorly the performance will suffer. There are countless post on the Internet about grouping poor performance where 95 percent of the time the query needs optimization and since each query is different so will be what needs to be done performance wise. Learn to use IntelliTrace in a debug session to assist with optimizing queries.
See also
Introduction to Grouping with LINQ/Lambda (VB.NET)
Source code
The following GitHub repository contains all code samples presented along with extra code samples all guaranteed to work.