How to Avoid Cartesian Explosion while using EF Core

The cartesian explosion is a data loading side effect that can be a serious performance issue. Let's see how to avoid it.

Cartesian Product

Cartesian product is a mathematical description of a multiplication of two sets. When the first set consists of two elements and the second one consists of three elements, the cartesian product result set consists of 6 elements. Each element of the first set is combined with each element of the second set.

{ a1, a2 } x { b1, b2, b3 } = { a1b1, a1b2, a1b3, a2b1, a2b2, a2b3 }

Cartesian Product in EF Core

Let's have a table for Schools, Students, and Courses and create some data for them.

School harvard = new() { Name = "Harvard" };

Course math = new() { Name = "Math" };
Course english = new() { Name = "English" };

harvard.Courses.Add(math);
harvard.Courses.Add(english);

Student student1 = new() { Initials = new Initials("Daniel", "Rusnok") };
harvard.AddStudent(student1);

Student student2 = new() { Initials = new Initials("Rostislav", "Prvok") };
harvard.AddStudent(student2);

Student student3 = new() { Initials = new Initials("Radomír", "Pivoňka") };  harvard.AddStudent(student3);

context.Add(harvard);
await context.SaveChangesAsync();

As you can see, the entity School contains two collections - Students and Courses. By querying DbSet Schools with Includes over navigation collections Students and Courses, we can introduce unwanted multiplication of query results.

await context.Schools.Include(x => x.Students).Include(x => x.Courses).ToListAsync();

image.png

Entity Framework Core translated used Includes simply into a LEFT JOINs. Now imagine that school can contain 100 students and 5 courses, and we want to load it all in one query. Logically, the number of needed entities is 106. One for school, a hundred for students, and five for courses, but joining tables causes the resulting data set to consist of 500 rows.

SELECT [s].[Id], [s].[CreatedDate], [s].[Name], [s].[SchoolId], [s0].[Id], [s0].[SchoolId], [s0].[FirstName], [s0].[LastName], [c].[Id], [c].[Description], [c].[Name], [c].[SchoolId]
FROM [Schools] AS [s]
LEFT JOIN [Students] AS [s0] ON [s].[Id] = [s0].[SchoolId]
LEFT JOIN [Courses] AS [c] ON [s].[Id] = [c].[SchoolId]
ORDER BY [s].[Id], [s0].[Id]

This is called Cartesian Explosion and it can be a serious performance issue. Let's see how to avoid it.

Split Queries

Entity Framework Core 5 came with the feature called Split Queries. EF allows you to specify that a given LINQ query should be split into multiple SQL queries. Instead of JOINs, split queries generate an additional SQL query for each included collection navigation.

await context.Schools
    .Include(x => x.Students)
    .Include(x => x.Courses)
    .AsSplitQuery()
    .ToListAsync();
SELECT [s].[Id], [s].[CreatedDate], [s].[Name], [s].[SchoolId]
FROM [Schools] AS [s]
ORDER BY [s].[Id]
go
SELECT [s0].[Id], [s0].[SchoolId], [s0].[FirstName], [s0].[LastName], [s].[Id]
FROM [Schools] AS [s]
INNER JOIN [Students] AS [s0] ON [s].[Id] = [s0].[SchoolId]
ORDER BY [s].[Id]
go
SELECT [c].[Id], [c].[Description], [c].[Name], [c].[SchoolId], [s].[Id]
FROM [Schools] AS [s]
INNER JOIN [Courses] AS [c] ON [s].[Id] = [c].[SchoolId]
ORDER BY [s].[Id]
go

The resulting data sets look like this:

image.png

Explicit Loading

Or you can always write more code and load data explicitly.

var school = await context.Schools.FirstAsync();
context.Entry(school).Collection(x => x.Students).Load();
context.Entry(school).Collection(x => x.Courses).Load();

Given code results into different SQL statements where EF is using its procedure sp_executesql, but results of those statements are the same as the ones when we used Split Queries.

SELECT TOP(1) [s].[Id], [s].[CreatedDate], [s].[Name], [s].[SchoolId]
FROM [Schools] AS [s]

exec sp_executesql N'SELECT [c].[Id], [c].[Description], [c].[Name], [c].[SchoolId]
FROM [Courses] AS [c]
WHERE [c].[SchoolId] = @__get_Item_0',N'@__get_Item_0 uniqueidentifier',@__get_Item_0='E68755AF-160D-4157-87F0-08DA1EB2CF90'

exec sp_executesql N'SELECT [s].[Id], [s].[SchoolId], [s].[FirstName], [s].[LastName]
FROM [Students] AS [s]
WHERE [s].[SchoolId] = @__get_Item_0',N'@__get_Item_0 uniqueidentifier',@__get_Item_0='E68755AF-160D-4157-87F0-08DA1EB2CF90'

The resulting data sets look like this:

image.png

Summary

Try to avoid blindly using Includes in your queries. It can lead to performance issues. Take advantage of EF Core features like Split Queries or Explicit Loading.

If you want a more real-world scenario, take a look at Josh Darnel's blog post.

Sources

itixo-logo-blue.png