Performance Considerations when using TPT (Table per Type) Inheritance in the Entity Framework
The information in this post is out of date.
Visit msdn.com/data/ef for the latest information on current and past releases of EF.
For Performance Considerations see https://msdn.com/data/hh949853
The Entity Framework offers very rich inheritance mapping schemes for building your conceptual model. The inheritance capabilities in EF include:
· TPH (Table per Hierarchy): This is the inheritance scheme where there is a single table in the database representing the hierarchy, but the conceptual model includes multiple entities.
· TPT (Table per Type): In this scheme, each type in the inheritance hierarchy is stored in separate tables in the database
· TPC (Table per Concrete type): In this scheme, each concrete type in the inheritance hierarchy is stored as a separate database table, without any tables in the database for the abstract types. TPC is currently not supported by the Entity Designer in Visual Studio.
All of these inheritance schemes have their own strengths and weaknesses. One thing in particular to consider is that queries done on entities in a TPC or TPT mapped hierarchy are naturally more complicated to represent and translate to the store because of the fact that multiple tables may be involved, which usually require joins or unions. This can introduce performance problems that stem from the complexity of queries generated by EF. On the other hand, TPH fares relatively well when it comes to query complexity.
Take an example of a simple TPT inheritance hierarchy, as shown below:
Now, let’s take an example of a rather simple query that projects some properties out of the base type Guest, without ever querying for anything from any of the subtypes.
var query = from g in db.Guests
select new { Id = g.GuestId, Name = g.Name };
This results in a query that is shown below. This scenario really should only generate a query that involves the base Guests table – however, EF generates a query that includes a join over the union of two of the tables for the sub-types as well.
SELECT
[Extent1].[GuestId] AS [GuestId],
[Extent1].[Name] AS [Name]
FROM [dbo].[Guests] AS [Extent1]
LEFT OUTER JOIN (SELECT
[Extent2].[GuestId] AS [GuestId]
FROM [dbo].[Guests_USGuest] AS [Extent2]
UNION ALL
SELECT
[Extent3].[GuestId] AS [GuestId]
FROM [dbo].[Guests_UKGuest] AS [Extent3]) AS [UnionAll1] ON [Extent1]
.[GuestId] = [UnionAll1].[GuestId]
Here’s another example.
var query = db.Guests;
This LINQ query seems even more simple than the previous one – but the resulting query is not as simple as you might expect, because what we are effectively doing is requesting a polymorphic result set that includes multiple types.
The scenario itself is naturally more complicated than it seems – however EF is doing extra work to ensure that when returning flattened table that includes the data for all the types, that null values are returned for columns that may be irrelevant for a particular type. Technically, this extra validation is not necessary and the highlighted parts of the query can be safely ommitted from the SQL we generate.
SELECT
CASE WHEN (( NOT (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL))) AND ( NOT (([UnionAll1].[C4] = 1) AND ([UnionAll1].[C4] IS NOT NULL)))) THEN '0X' WHEN (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL)) THEN '0X0X' ELSE '0X1X' END AS [C1],
[Extent1].[GuestId] AS [GuestId],
[Extent1].[Name] AS [Name],
[Extent1].[Address] AS [Address],
[Extent1].[City] AS [City],
CASE WHEN (( NOT (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL))) AND ( NOT (([UnionAll1].[C4] = 1) AND ([UnionAll1].[C4] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL)) THEN [UnionAll1].[State] END AS [C2],
CASE WHEN (( NOT (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL))) AND ( NOT (([UnionAll1].[C4] = 1) AND ([UnionAll1].[C4] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL)) THEN [UnionAll1].[Zip] END AS [C3],
CASE WHEN (( NOT (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL))) AND ( NOT (([UnionAll1].[C4] = 1) AND ([UnionAll1].[C4] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL)) THEN [UnionAll1].[PhoneNumber] END AS [C4],
CASE WHEN (( NOT (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL))) AND ( NOT (([UnionAll1].[C4] = 1) AND ([UnionAll1].[C4] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) ELSE [UnionAll1].[C1] END AS [C5],
CASE WHEN (( NOT (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL))) AND ( NOT (([UnionAll1].[C4] = 1) AND ([UnionAll1].[C4] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) ELSE [UnionAll1].[C2] END AS [C6]
FROM [dbo].[Guests] AS [Extent1]
LEFT OUTER JOIN (SELECT
[Extent2].[GuestId] AS [GuestId],
[Extent2].[State] AS [State],
[Extent2].[Zip] AS [Zip],
[Extent2].[PhoneNumber] AS [PhoneNumber],
CAST(NULL AS varchar(1)) AS [C1],
CAST(NULL AS varchar(1)) AS [C2],
cast(1 as bit) AS [C3],
cast(0 as bit) AS [C4]
FROM [dbo].[Guests_USGuest] AS [Extent2]
UNION ALL
SELECT
[Extent3].[GuestId] AS [GuestId],
CAST(NULL AS varchar(1)) AS [C1],
CAST(NULL AS varchar(1)) AS [C2],
CAST(NULL AS varchar(1)) AS [C3],
[Extent3].[PostalCode] AS [PostalCode],
[Extent3].[PhoneNumber] AS [PhoneNumber],
cast(0 as bit) AS [C4],
cast(1 as bit) AS [C5]
FROM [dbo].[Guests_UKGuest] AS [Extent3]) AS [UnionAll1] ON [Extent1].[GuestId] = [UnionAll1].[GuestId]
These are two very common cases that you may have run into when using TPT with the Entity Framework. The good news is that we are working on these issues so that EF no longer generates unnecessary SQL. The bad news is that it will take some time before the fix is delivered in a future release.
So, until we are able to get these fixes in place, there are a few things you can do to make sure you don’t get affected by some of these issues:
Pay special attention when using Model-First with EF
Model-First is a fantastic capability of the Entity Designer to allow you to focus on your conceptual model rather than the storage model when starting to build your application. The thing to keep in mind here is that Model-First uses TPT as the default inheritance mapping strategy when you build inheritance hierarchies.
Consider using TPH inheritance (along with Entity Designer Database Generation Power Pack)
TPH support is great from a performance perspective – and in many ways, TPH can help keep things simple as far as storage and query translation goes. TPH is an option within Model-First if you download and install the Entity Designer Database Generation Power Pack. A screenshot of the designer with the Entity Designer Power Pack installed are shown below:
We understand that there are scenarios where TPT is required and it is a goal of EF to provide the best possible performance when using all the inheritance mapping strategies supported out of the box. While the story is not perfect today with TPT, we hope that you are able to use some of the information mentioned above until we are able to release fixes for these in a future release.
Faisal Mohamood
Program Manager, Entity Framework
Comments
Anonymous
August 17, 2010
Would a lot of these performance implications arise when using queries that target specific higher level types? For example, if you were targeting "UKGuest" instead of just "Guest"?Anonymous
August 17, 2010
I apologize for the followup - but I felt I was unclear. For instance if you took the above example, and did a query.. var ukGuests = dbo.Guests.OfType<UKGuests>(); I'm uncertain if this is accurate syntax, I am going on memory. This would take only the objects that matched the type UKGuests from the collection. Because it selects the guests in general, and then filters them, are we looking at the same level of overhead? I am currently invested in a project that cannot be expressed with a TPH method for several objects. This is very important to me, because the flexibility of TPT was the crux of what made this function appropriately. Is this unique to the Entity Framework, or do all ORM systems (nhibernate, etc) suffer from this issue as well?Anonymous
August 18, 2010
Back in January, I blogged very extensively about the performance catastrophe that is EF TPT inheritance. samscode.com/.../the-entity-framework-v1-and-v4-deal-breaker-tpt-inheritance I did some benchmarks, and showed that with 30 subclasses in TPT, it takes almost 2 minutes for EF to generate the 8000 lines of sql, send the query over, and get results back. And this is when the tables are empty, and no data comes back. I don't know if there's some noble or idealogical purist reason to generate the SQL with the absurd amount of joins and subqueries that are currently being used, but NHibernate doesn't not suffer from this same problem. You might want to take a look at the simple flat set of LEFT JOINs they use to accomplish the same thing. There's no reason for the complexity that EF uses. As it stands, TPT is simply not an option in any production environment. Even with as few as 5 or 6 subclasses, it slows to a crawl.Anonymous
August 18, 2010
@Derek - in the case that you mention, you will actually be fine. Here's the query that would be generated: SELECT '0X0X' AS [C1], [Extent1].[GuestId] AS [GuestId], [Extent2].[Name] AS [Name], [Extent2].[Address] AS [Address], [Extent2].[City] AS [City], [Extent1].[PostalCode] AS [PostalCode], [Extent1].[PhoneNumber] AS [PhoneNumber] FROM [dbo].[Guests_UKGuest] AS [Extent1] INNER JOIN [dbo].[Guests] AS [Extent2] ON [Extent1].[GuestId] = [Extent2].[GuestId] Hope this helps. Faisal Mohamood | Program Manager | Entity FrameworkAnonymous
August 25, 2010
Well then I've only wasted one day creating an EF model. I guess I can live with that. I'm going to go write my database structure in SQL now.. at least I still have Linq2SqlAnonymous
August 27, 2010
G S, This issue is just part of ORM. You will even face it when constructing your queries in LinqToSql too, if your relational storage path is to break the data up this way. This blog is just a friendly way of showing that EF has the tools to allow you to make the decision, not make the decision for you. There are huge wins in some areas to have a direct correlation between domain objects and table schema. In my experience with WCF and ASP.NET MVC, the blanket Domain Mapping works 90% of the time. When you get into "the querys are to bulky" or "the dba wouldn't have designed it that way in sql", then just extend the framework types to add your specialized case. Either way, with the EF 4 Code-First feature rolling in the near future... You should have all the room you need to make great decisions about your data. ToddAnonymous
August 27, 2010
G S, This issue is just part of ORM. You will even face it when constructing your queries in LinqToSql too, if your relational storage path is to break the data up this way. This blog is just a friendly way of showing that EF has the tools to allow you to make the decision, not make the decision for you. There are huge wins in some areas to have a direct correlation between domain objects and table schema. In my experience with WCF and ASP.NET MVC, the blanket Domain Mapping works 90% of the time. When you get into "the querys are to bulky" or "the dba wouldn't have designed it that way in sql", then just extend the framework types to add your specialized case. Either way, with the EF 4 Code-First feature rolling in the near future... You should have all the room you need to make great decisions about your data. ToddAnonymous
August 28, 2010
Is there any simple way to see the SQL being generated? I've been looking pretty hard, and am having a hard time with this. Since I cannot afford full SQL (Only SQL Express) I cannot get the Profiler tool, and none of the Frameworks I have discovered seem to be readily compatible with EF CodeFirst's DbContext.Anonymous
August 28, 2010
Actually, I figured that part out. In running some tests, I really didn't get very comparable differences when using TPT and TPH. Let me explain my structure... class Attribute
- Id
- Name (string) class Unit
- Id
- Attribute class Measure : Unit
- Value class Sheet
- Id
- ICollection<Unit> Units var s = new Sheet { Units = new List<Unit> { new Measure { Attribute = db.Attribute.Name == "Etc", Value = 15 } } } I still see a ton of code generated, but no less if I skip the sub classing entirely.
Anonymous
August 29, 2010
@Todd Morrison, this issue is not systemic to ORMs in general. It's an EF specific problem that LLBLGen and NHibernate do not suffer from. They generate flatter, more compact SQL that is almost as performant as what you would write by hand. I don't know specifically about LINQ2SQL though, you'd have to test that to know one way or the other.Anonymous
September 21, 2010
Can we please have some kind of timeframe for this to be fixed? Will this be a hotfix coming relitivly quickly or are we likely to be waiting for a major release?Anonymous
October 01, 2010
Does the choice of workflow affect the conceptual model in any way?Anonymous
November 24, 2010
I would concur with @Ross --- we need a time frame for this. At the moment I have seemly benign looking Linq queries that translate into THOUSANDS of lines of T-SQL code, which of course is a performance DISASTER. TPT with inheritance simply is not ready for prime time, with this kind of resolution from Linq to SQL.Anonymous
November 26, 2010
I would also like to know when this is going to be addressed. We need to know if we can leave what work we have already done, or we have got to go back and rework these elements.Anonymous
November 28, 2010
I'm in the same situation as Jamie. I really need to know if I can expect a solution before I'm going into production, or if I need to rework my model. So please give us a time frame for this.Anonymous
November 30, 2010
I'm i n the same situation also, and would like a time frame. I also need to know if I must rework my model before going into production or not.Anonymous
December 08, 2010
Chaps, It appears they do not care. Move on from EF. Use nhib or llblgen etc.Anonymous
December 16, 2010
Hello, will a fix be included in the next EF version (4.5?) to be released Q1/2011 ?? It is vital for us to have the info as we have a scenario with about 15 classes inheriting from a single base class. If we're unsure we'll have to either review our design or use another ORM. Thanks.Anonymous
January 13, 2011
I am running into a problem using stored procedures for Inserts with TPT inheritance. I am trying to save a new record into the inherited table. When I dont use stored procedures, this works correctly, but when I have stored procedures in place, Entity Framework calls the inserts in the wrong order - attempts to insert the inherited table before the base table - resulting in a foreign key error. Is there a way to get around this, or a fix for it? ThanksAnonymous
February 27, 2011
The first example with projection is obviously bug. There is no reason to run union over all subtypes when you are doing projection and you don't use any column from subquery - that joined subquery with union is never used.Anonymous
April 12, 2011
Is there any hope of seeing this issue resolved in 2011? I concur with @Ladislav, that this is obviously a bug with TPT, which should have been fixed with 4.0.Anonymous
June 22, 2011
Please address the performance of TPT inheritance queries. My team is in a world of pain.Anonymous
November 24, 2011
Are there any further updates on improving the performance of TPT? I have 19 tables derived from a base, and Entity Framework is producing a query that is 3100+ lines long, with 360,000 characters in the query. When this query is saved to disk, it is 700KB in size!!!!!Anonymous
December 02, 2011
haishibai.blogspot.com/.../entity-framework-tpt-performance-quick.html