Join Operations in LINQ
A join of two data sources is the association of objects in one data source with objects that share a common attribute in another data source.
Important
These samples use an System.Collections.Generic.IEnumerable<T> data source. Data sources based on System.Linq.IQueryProvider use System.Linq.IQueryable<T> data sources and expression trees. Expression trees have limitations on the allowed C# syntax. Furthermore, each IQueryProvider
data source, such as EF Core may impose more restrictions. Check the documentation for your data source.
Joining is an important operation in queries that target data sources whose relationships to each other can't be followed directly. In object-oriented programming, joining could mean a correlation between objects that isn't modeled, such as the backwards direction of a one-way relationship. An example of a one-way relationship is a Student
class that has a property of type Department
that represents the major, but the Department
class doesn't have a property that is a collection of Student
objects. If you have a list of Department
objects and you want to find all the students in each department, you could use a join operation to find them.
The join methods provided in the LINQ framework are Join and GroupJoin. These methods perform equijoins, or joins that match two data sources based on equality of their keys. (For comparison, Transact-SQL supports join operators other than equals
, for example the less than
operator.) In relational database terms, Join implements an inner join, a type of join in which only those objects that have a match in the other data set are returned. The GroupJoin method has no direct equivalent in relational database terms, but it implements a superset of inner joins and left outer joins. A left outer join is a join that returns each element of the first (left) data source, even if it has no correlated elements in the other data source.
The following illustration shows a conceptual view of two sets and the elements within those sets that are included in either an inner join or a left outer join.
Methods
Method Name | Description | C# Query Expression Syntax | More Information |
---|---|---|---|
Join | Joins two sequences based on key selector functions and extracts pairs of values. | join … in … on … equals … |
Enumerable.Join Queryable.Join |
GroupJoin | Joins two sequences based on key selector functions and groups the resulting matches for each element. | join … in … on … equals … into … |
Enumerable.GroupJoin Queryable.GroupJoin |
Note
The following examples in this article use the common data sources for this area.
Each Student
has a grade level, a primary department, and a series of scores. A Teacher
also has a City
property that identifies the campus where the teacher holds classes. A Department
has a name, and a reference to a Teacher
who serves as the department head.
You can find the example data set in the source repo.
public enum GradeLevel
{
FirstYear = 1,
SecondYear,
ThirdYear,
FourthYear
};
public class Student
{
public required string FirstName { get; init; }
public required string LastName { get; init; }
public required int ID { get; init; }
public required GradeLevel Year { get; init; }
public required List<int> Scores { get; init; }
public required int DepartmentID { get; init; }
}
public class Teacher
{
public required string First { get; init; }
public required string Last { get; init; }
public required int ID { get; init; }
public required string City { get; init; }
}
public class Department
{
public required string Name { get; init; }
public int ID { get; init; }
public required int TeacherID { get; init; }
}
The following example uses the join … in … on … equals …
clause to join two sequences based on specific value:
var query = from student in students
join department in departments on student.DepartmentID equals department.ID
select new { Name = $"{student.FirstName} {student.LastName}", DepartmentName = department.Name };
foreach (var item in query)
{
Console.WriteLine($"{item.Name} - {item.DepartmentName}");
}
The preceding query can be expressed using method syntax as shown in the following code:
var query = students.Join(departments,
student => student.DepartmentID, department => department.ID,
(student, department) => new { Name = $"{student.FirstName} {student.LastName}", DepartmentName = department.Name });
foreach (var item in query)
{
Console.WriteLine($"{item.Name} - {item.DepartmentName}");
}
The following example uses the join … in … on … equals … into …
clause to join two sequences based on specific value and groups the resulting matches for each element:
IEnumerable<IEnumerable<Student>> studentGroups = from department in departments
join student in students on department.ID equals student.DepartmentID into studentGroup
select studentGroup;
foreach (IEnumerable<Student> studentGroup in studentGroups)
{
Console.WriteLine("Group");
foreach (Student student in studentGroup)
{
Console.WriteLine($" - {student.FirstName}, {student.LastName}");
}
}
The preceding query can be expressed using method syntax as shown in the following example:
// Join department and student based on DepartmentId and grouping result
IEnumerable<IEnumerable<Student>> studentGroups = departments.GroupJoin(students,
department => department.ID, student => student.DepartmentID,
(department, studentGroup) => studentGroup);
foreach (IEnumerable<Student> studentGroup in studentGroups)
{
Console.WriteLine("Group");
foreach (Student student in studentGroup)
{
Console.WriteLine($" - {student.FirstName}, {student.LastName}");
}
}
Perform inner joins
In relational database terms, an inner join produces a result set in which each element of the first collection appears one time for every matching element in the second collection. If an element in the first collection has no matching elements, it doesn't appear in the result set. The Join method, which is called by the join
clause in C#, implements an inner join. The following examples show you how to perform four variations of an inner join:
- A simple inner join that correlates elements from two data sources based on a simple key.
- An inner join that correlates elements from two data sources based on a composite key. A composite key, which is a key that consists of more than one value, enables you to correlate elements based on more than one property.
- A multiple join in which successive join operations are appended to each other.
- An inner join that is implemented by using a group join.
Single key join
The following example matches Teacher
objects with Department
objects whose TeacherId
matches that Teacher
. The select
clause in C# defines how the resulting objects look. In the following example, the resulting objects are anonymous types that consist of the department name and the name of the teacher that leads the department.
var query = from department in departments
join teacher in teachers on department.TeacherID equals teacher.ID
select new
{
DepartmentName = department.Name,
TeacherName = $"{teacher.First} {teacher.Last}"
};
foreach (var departmentAndTeacher in query)
{
Console.WriteLine($"{departmentAndTeacher.DepartmentName} is managed by {departmentAndTeacher.TeacherName}");
}
You achieve the same results using the Join method syntax:
var query = teachers
.Join(departments, teacher => teacher.ID, department => department.TeacherID,
(teacher, department) =>
new { DepartmentName = department.Name, TeacherName = $"{teacher.First} {teacher.Last}" });
foreach (var departmentAndTeacher in query)
{
Console.WriteLine($"{departmentAndTeacher.DepartmentName} is managed by {departmentAndTeacher.TeacherName}");
}
The teachers who aren't department heads don't appear in the final results.
Composite key join
Instead of correlating elements based on just one property, you can use a composite key to compare elements based on multiple properties. Specify the key selector function for each collection to return an anonymous type that consists of the properties you want to compare. If you label the properties, they must have the same label in each key's anonymous type. The properties must also appear in the same order.
The following example uses a list of Teacher
objects and a list of Student
objects to determine which teachers are also students. Both of these types have properties that represent the first and family name of each person. The functions that create the join keys from each list's elements return an anonymous type that consists of the properties. The join operation compares these composite keys for equality and returns pairs of objects from each list where both the first name and the family name match.
// Join the two data sources based on a composite key consisting of first and last name,
// to determine which employees are also students.
IEnumerable<string> query =
from teacher in teachers
join student in students on new
{
FirstName = teacher.First,
LastName = teacher.Last
} equals new
{
student.FirstName,
student.LastName
}
select teacher.First + " " + teacher.Last;
string result = "The following people are both teachers and students:\r\n";
foreach (string name in query)
{
result += $"{name}\r\n";
}
Console.Write(result);
You can use the Join method, as shown in the following example:
IEnumerable<string> query = teachers
.Join(students,
teacher => new { FirstName = teacher.First, LastName = teacher.Last },
student => new { student.FirstName, student.LastName },
(teacher, student) => $"{teacher.First} {teacher.Last}"
);
Console.WriteLine("The following people are both teachers and students:");
foreach (string name in query)
{
Console.WriteLine(name);
}
Multiple join
Any number of join operations can be appended to each other to perform a multiple join. Each join
clause in C# correlates a specified data source with the results of the previous join.
The first join
clause matches students and departments based on a Student
object's DepartmentID
matching a Department
object's ID
. It returns a sequence of anonymous types that contain the Student
object and Department
object.
The second join
clause correlates the anonymous types returned by the first join with Teacher
objects based on that teacher's ID matching the department head ID. It returns a sequence of anonymous types that contain the student's name, the department name, and the department leader's name. Because this operation is an inner join, only those objects from the first data source that have a match in the second data source are returned.
// The first join matches Department.ID and Student.DepartmentID from the list of students and
// departments, based on a common ID. The second join matches teachers who lead departments
// with the students studying in that department.
var query = from student in students
join department in departments on student.DepartmentID equals department.ID
join teacher in teachers on department.TeacherID equals teacher.ID
select new {
StudentName = $"{student.FirstName} {student.LastName}",
DepartmentName = department.Name,
TeacherName = $"{teacher.First} {teacher.Last}"
};
foreach (var obj in query)
{
Console.WriteLine($"""The student "{obj.StudentName}" studies in the department run by "{obj.TeacherName}".""");
}
The equivalent using multiple Join method uses the same approach with the anonymous type:
var query = students
.Join(departments, student => student.DepartmentID, department => department.ID,
(student, department) => new { student, department })
.Join(teachers, commonDepartment => commonDepartment.department.TeacherID, teacher => teacher.ID,
(commonDepartment, teacher) => new
{
StudentName = $"{commonDepartment.student.FirstName} {commonDepartment.student.LastName}",
DepartmentName = commonDepartment.department.Name,
TeacherName = $"{teacher.First} {teacher.Last}"
});
foreach (var obj in query)
{
Console.WriteLine($"""The student "{obj.StudentName}" studies in the department run by "{obj.TeacherName}".""");
}
Inner join by using grouped join
The following example shows you how to implement an inner join by using a group join. The list of Department
objects is group-joined to the list of Student
objects based on the Department.ID
matching the Student.DepartmentID
property. The group join creates a collection of intermediate groups, where each group consists of a Department
object and a sequence of matching Student
objects. The second from
clause combines (or flattens) this sequence of sequences into one longer sequence. The select
clause specifies the type of elements in the final sequence. That type is an anonymous type that consists of the student's name and the matching department name.
var query1 =
from department in departments
join student in students on department.ID equals student.DepartmentID into gj
from subStudent in gj
select new
{
DepartmentName = department.Name,
StudentName = $"{subStudent.FirstName} {subStudent.LastName}"
};
Console.WriteLine("Inner join using GroupJoin():");
foreach (var v in query1)
{
Console.WriteLine($"{v.DepartmentName} - {v.StudentName}");
}
The same results can be achieved using GroupJoin method, as follows:
var queryMethod1 = departments
.GroupJoin(students, department => department.ID, student => student.DepartmentID,
(department, gj) => new { department, gj })
.SelectMany(departmentAndStudent => departmentAndStudent.gj,
(departmentAndStudent, subStudent) => new
{
DepartmentName = departmentAndStudent.department.Name,
StudentName = $"{subStudent.FirstName} {subStudent.LastName}"
});
Console.WriteLine("Inner join using GroupJoin():");
foreach (var v in queryMethod1)
{
Console.WriteLine($"{v.DepartmentName} - {v.StudentName}");
}
The result is equivalent to the result set obtained by using the join
clause without the into
clause to perform an inner join. The following code demonstrates this equivalent query:
var query2 = from department in departments
join student in students on department.ID equals student.DepartmentID
select new
{
DepartmentName = department.Name,
StudentName = $"{student.FirstName} {student.LastName}"
};
Console.WriteLine("The equivalent operation using Join():");
foreach (var v in query2)
{
Console.WriteLine($"{v.DepartmentName} - {v.StudentName}");
}
To avoid chaining, the single Join method can be used as presented here:
var queryMethod2 = departments.Join(students, departments => departments.ID, student => student.DepartmentID,
(department, student) => new
{
DepartmentName = department.Name,
StudentName = $"{student.FirstName} {student.LastName}"
});
Console.WriteLine("The equivalent operation using Join():");
foreach (var v in queryMethod2)
{
Console.WriteLine($"{v.DepartmentName} - {v.StudentName}");
}
Perform grouped joins
The group join is useful for producing hierarchical data structures. It pairs each element from the first collection with a set of correlated elements from the second collection.
Note
Each element of the first collection appears in the result set of a group join regardless of whether correlated elements are found in the second collection. In the case where no correlated elements are found, the sequence of correlated elements for that element is empty. The result selector therefore has access to every element of the first collection. This differs from the result selector in a non-group join, which cannot access elements from the first collection that have no match in the second collection.
Warning
Enumerable.GroupJoin has no direct equivalent in traditional relational database terms. However, this method does implement a superset of inner joins and left outer joins. Both of these operations can be written in terms of a grouped join. For more information, see Entity Framework Core, GroupJoin.
The first example in this article shows you how to perform a group join. The second example shows you how to use a group join to create XML elements.
Group join
The following example performs a group join of objects of type Department
and Student
based on the Department.ID
matching the Student.DepartmentID
property. Unlike a non-group join, which produces a pair of elements for each match, the group join produces only one resulting object for each element of the first collection, which in this example is a Department
object. The corresponding elements from the second collection, which in this example are Student
objects, are grouped into a collection. Finally, the result selector function creates an anonymous type for each match that consists of Department.Name
and a collection of Student
objects.
var query = from department in departments
join student in students on department.ID equals student.DepartmentID into studentGroup
select new
{
DepartmentName = department.Name,
Students = studentGroup
};
foreach (var v in query)
{
// Output the department's name.
Console.WriteLine($"{v.DepartmentName}:");
// Output each of the students in that department.
foreach (Student? student in v.Students)
{
Console.WriteLine($" {student.FirstName} {student.LastName}");
}
}
In the above example, query
variable contains the query that creates a list where each element is an anonymous type that contains the department's name and a collection of students that study in that department.
The equivalent query using method syntax is shown in the following code:
var query = departments.GroupJoin(students, department => department.ID, student => student.DepartmentID,
(department, Students) => new { DepartmentName = department.Name, Students });
foreach (var v in query)
{
// Output the department's name.
Console.WriteLine($"{v.DepartmentName}:");
// Output each of the students in that department.
foreach (Student? student in v.Students)
{
Console.WriteLine($" {student.FirstName} {student.LastName}");
}
}
Group join to create XML
Group joins are ideal for creating XML by using LINQ to XML. The following example is similar to the previous example except that instead of creating anonymous types, the result selector function creates XML elements that represent the joined objects.
XElement departmentsAndStudents = new("DepartmentEnrollment",
from department in departments
join student in students on department.ID equals student.DepartmentID into studentGroup
select new XElement("Department",
new XAttribute("Name", department.Name),
from student in studentGroup
select new XElement("Student",
new XAttribute("FirstName", student.FirstName),
new XAttribute("LastName", student.LastName)
)
)
);
Console.WriteLine(departmentsAndStudents);
The equivalent query using method syntax is shown in the following code:
XElement departmentsAndStudents = new("DepartmentEnrollment",
departments.GroupJoin(students, department => department.ID, student => student.DepartmentID,
(department, Students) => new XElement("Department",
new XAttribute("Name", department.Name),
from student in Students
select new XElement("Student",
new XAttribute("FirstName", student.FirstName),
new XAttribute("LastName", student.LastName)
)
)
)
);
Console.WriteLine(departmentsAndStudents);
Perform left outer joins
A left outer join is a join in which each element of the first collection is returned, regardless of whether it has any correlated elements in the second collection. You can use LINQ to perform a left outer join by calling the DefaultIfEmpty method on the results of a group join.
The following example demonstrates how to use the DefaultIfEmpty method on the results of a group join to perform a left outer join.
The first step in producing a left outer join of two collections is to perform an inner join by using a group join. (See Perform inner joins for an explanation of this process.) In this example, the list of Department
objects is inner-joined to the list of Student
objects based on a Department
object's ID that matches the student's DepartmentID
.
The second step is to include each element of the first (left) collection in the result set even if that element has no matches in the right collection. This is accomplished by calling DefaultIfEmpty on each sequence of matching elements from the group join. In this example, DefaultIfEmpty is called on each sequence of matching Student
objects. The method returns a collection that contains a single, default value if the sequence of matching Student
objects is empty for any Department
object, ensuring that each Department
object is represented in the result collection.
Note
The default value for a reference type is null
; therefore, the example checks for a null reference before accessing each element of each Student
collection.
var query =
from student in students
join department in departments on student.DepartmentID equals department.ID into gj
from subgroup in gj.DefaultIfEmpty()
select new
{
student.FirstName,
student.LastName,
Department = subgroup?.Name ?? string.Empty
};
foreach (var v in query)
{
Console.WriteLine($"{v.FirstName:-15} {v.LastName:-15}: {v.Department}");
}
The equivalent query using method syntax is shown in the following code:
var query = students
.GroupJoin(
departments,
student => student.DepartmentID,
department => department.ID,
(student, departmentList) => new { student, subgroup = departmentList })
.SelectMany(
joinedSet => joinedSet.subgroup.DefaultIfEmpty(),
(student, department) => new
{
student.student.FirstName,
student.student.LastName,
Department = department.Name
});
foreach (var v in query)
{
Console.WriteLine($"{v.FirstName:-15} {v.LastName:-15}: {v.Department}");
}