Improvements to Generated SQL in .NET 4.0
The information in this post is out of date.
Visit msdn.com/data/ef for the latest information on current and past releases of EF.
In a previous post (https://blogs.msdn.com/adonet/archive/2009/08/05/improvements-to-the-generated-sql-in-net-4-0-beta1.aspx), we talked about the improvements to the quality and the readability of the SQL generated when querying using the Entity Framework that we made in .NET 4.0 Beta 1. We continued working on that task and here I talk about the additional improvements provided with .Net 4.0.
Again, most of the improvements we have made are in the Entity Framework query pipeline as opposed to specific SqlClient changes, thus affecting the SQL generated by any backend, not only by our SQL Server provider.
Unless otherwise noted:
· The examples are based on the NorthwindEF Entity Data Model provided with the ADO.NET Entity Framework Query Samples (https://code.msdn.microsoft.com/EFQuerySamples).
· The “Before” generated SQL is produced by the Entity Framework in .NET 3.5, and the “After” by the Entity Framework in .NET 4.0.
Also, in all the examples below, the parts of the SQL query highlighted in red have been removed and those in yellow have been added.
1. Elimination of internal constants used as null sentinels
In the query pipeline we introduce internal constants to serve as null sentinels, i.e. based on whether they come out as null we make decisions whether a record should be null /collection should be empty. In some cases, the introduction of the new column can cause a new level of nesting to be introduced. The improvement is instead of introducing new constants to try reusing other columns for the same purpose. This in many cases leads to simpler queries with less nesting.
Example:
var query = from c in context.Categories
select new { c.CategoryID, Count = c.Products.Count }
Before
SELECT
1 AS [C1],
[Project1].[CategoryID] AS [CategoryID],
[Project1].[C1] AS [C2]
FROM ( SELECT
[Extent1].[CategoryID] AS [CategoryID],
(SELECT
COUNT(cast(1 as bit)) AS [A1]
FROM [dbo].[Products] AS [Extent2]
WHERE [Extent1].[CategoryID] = [Extent2].[CategoryID]) AS [C1]
FROM [dbo].[Categories] AS [Extent1]
) AS [Project1]
After
SELECT
[Extent1].[CategoryID] AS [CategoryID],
(SELECT
COUNT(1) AS [A1]
FROM [dbo].[Products] AS [Extent2]
WHERE [Extent1].[CategoryID] = [Extent2].[CategoryID]) AS [C1]
FROM [dbo].[Categories] AS [Extent1]
2. Translating String.StartsWith, String.EndsWith and String.Contains to LIKE in LINQ to Entities
In .NET 3.5, in LINQ to Entities we provided support for the String.StartsWith, String.EndsWith and String.Contains and we translated them to canonical functions and ultimately to SQL in the following way:
CLR Method |
Translation in terms of canonical functions |
Resulting translation for SQL Server |
Boolean String.Contains(String value) |
IndexOf(this, value) > 0 |
CHARINDEX(value, this) > 0 |
Boolean String.EndsWith(String value) |
Right(this, length(value)) = value |
(RIGHT(this, LEN(value)) = value |
Boolean String.StartsWith(String value) |
IndexOf(this, value) = 1 |
CHARINDEX(value, this) = 1 |
These translations resulted in inability to utilize indexes on the tested string and were not as easy to read. Therefore, in .NET 4.0, when the given value is a constant or a parameter and the provider support translation to LIKE, we translate these functions into a DbLikeExpression, which ultimately results in a “LIKE” on SQL Server.
Example:
var query = from p in context.ProductSet
where p.ProductName.Contains("va")
select p.ProductID;
Before
SELECT
[Extent1].[ProductID] AS [ProductID]
FROM [dbo].[Products] AS [Extent1]
WHERE (CAST(CHARINDEX(N'va', [Extent1].[ProductName]) AS int)) > 0
After
SELECT
[Extent1].[ProductID] AS [ProductID]
FROM [dbo].[Products] AS [Extent1]
WHERE [Extent1].[ProductName] LIKE '%va%'
3. Improve the translation of the canonical functions StartsWith, Contains and EndsWith to use LIKE
In .NET 4.0 Beta 1 we introduced new canonical functions, among which the following string functions along with their translations for SQL Server:
Canonical Function |
Translation for SQL Server |
Boolean String.Contains(String value) |
CHARINDEX(value, this) > 0 |
Boolean String.EndsWith(String value) |
CHARINDEX(REVERSE(arg1), REVERSE(arg0)) = 1 |
Boolean String.StartsWith(String value) |
CHARINDEX(value, this) = 1 |
We have now changed the translations of these functions to use LIKE when the value is a constant.
Example:
Note: The ‘Before’ SQL in this example is produced by Entity Framework in .NET 4.0 Beta1
string entitySql = "contains('abcd', 'bc')";
Before
SELECT
CASE WHEN (CHARINDEX( 'bc', 'abcd') > 0) THEN cast(1 as bit) WHEN ( NOT (CHARINDEX( 'bc', 'abcd') > 0)) THEN cast(0 as bit) END AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]
After
SELECT
CASE WHEN ('abcd' LIKE '%bc%') THEN cast(1 as bit) WHEN ( NOT ('abcd' LIKE '%bc%')) THEN cast(0 as bit) END AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]
4. Additional Join Elimination when there is more than one navigation property from one entity type to another
In some cases, then there is more than one navigation property from one entity type to another, and both naviation properties were accessed mulitple times we were generating an output quey with redundant joins.
Example :
Note: This example is based on a slightly modified schema from the one provided in the ADO.NET Entity Framework Query Samples. In particular, a new Association Products_SecondaryCategory has been added identical to the Products_Category association along with the navigation properties. Also, referential constrains between Categories and Products for both associations have been added to the SSDL.
var query = from p in context.ProductSet select new {
p.ProductID,
category = p.Category.CategoryName,
categoryDescription = p.Category.Description,
secondaryCategory = p.SecondaryCategory.CategoryName,
secondaryCategoryDescription = p.SecondaryCategory.Description
};
Before
SELECT
1 AS [C1],
[Extent1].[ProductID] AS [ProductID],
[Extent2].[CategoryName] AS [CategoryName],
[Extent3].[Description] AS [Description],
[Extent4].[CategoryName] AS [CategoryName1],
[Extent5].[Description] AS [Description1]
FROM [dbo].[Products] AS [Extent1]
LEFT OUTER JOIN [dbo].[Categories] AS [Extent2] ON [Extent1].[CategoryID] = [Extent2].[CategoryID]
LEFT OUTER JOIN [dbo].[Categories] AS [Extent3] ON [Extent1].[CategoryID] = [Extent3].[CategoryID]
LEFT OUTER JOIN [dbo].[Categories] AS [Extent4] ON [Extent1].[SecondaryCategoryID] = [Extent4].[CategoryID]
LEFT OUTER JOIN [dbo].[Categories] AS [Extent5] ON [Extent1].[SecondaryCategoryID] = [Extent5].[CategoryID]
After
SELECT
[Extent1].[ProductID] AS [ProductID],
[Extent2].[CategoryName] AS [CategoryName],
[Extent2].[Description] AS [Description],
[Extent3].[CategoryName] AS [CategoryName1],
[Extent3].[Description] AS [Description1]
FROM [dbo].[Products] AS [Extent1]
LEFT OUTER JOIN [dbo].[Categories] AS [Extent2] ON [Extent1].[CategoryID] = [Extent2].[CategoryID]
LEFT OUTER JOIN [dbo].[Categories] AS [Extent3] ON [Extent1].[SecondaryCategoryID] = [Extent3].[CategoryID]
5. Provide mechanism for efficient queries on non-Unicode columns
In .NET 3.5, whenever a constant or a parameter was used in LINQ to Entities query, we treated it as being Unicode. As a result, when comparing a constant to a property stored in a non-unicode column on SQL Server, if there was an index on that column, it was not being used.
To address the issue, we now generate non-unicode constants and parameters when these are used in LINQ to Entities queries in comparisons with non-unicode columns.
Example:
Note: This example is based on a slightly modified schema from the one provided in the ADO.NET Entity Framework Query Samples. In particular, the SSDL has been modified, to specify the ProductName column as non-unicode.
var query = from p in context.ProductSet
where p.ProductName == "Chocolade"
select p.ProductID;
Before
SELECT
[Extent1].[ProductID] AS [ProductID]
FROM [dbo].[Products] AS [Extent1]
WHERE N'Chocolade' = [Extent1].[ProductName]
After
SELECT
[Extent1].[ProductID] AS [ProductID]
FROM [dbo].[Products] AS [Extent1]
WHERE 'Chocolade' = [Extent1].[ProductName]
6. Collapse multiple Case statements into one
Queries with deeply nested conditional statements could fail due a SQL Server’s limitation on the level of nesting of CASE statements. This scenario is especially interesting with LINQ to Entities queries like the one in the example, which are natural to write. To provide a better support for queries like this, we now collapse multiple cascading case statements into one.
Example:
var query = from p in context.ProductSet
select
(p.ProductID == 1) ? "first" :
(p.ProductID == 2) ? "second" :
(p.ProductID == 3) ? "third" :
(p.ProductID == 4) ? "fourth" :
"other";
Before
SELECT
CASE WHEN (1 = [Extent1].[ProductID]) THEN N'first' ELSE
CASE WHEN (2 = [Extent1].[ProductID]) THEN N'second' ELSE
CASE WHEN (3 = [Extent1].[ProductID]) THEN N'third' ELSE
CASE WHEN (4 = [Extent1].[ProductID]) THEN N'fourth' ELSE N'other'
END END END END AS [C1]
FROM [dbo].[Products] AS [Extent1]
After
SELECT
CASE WHEN (1 = [Extent1].[ProductID]) THEN N'first'
WHEN (2 = [Extent1].[ProductID]) THEN N'second'
WHEN (3 = [Extent1].[ProductID]) THEN N'third'
WHEN (4 = [Extent1].[ProductID]) THEN N'fourth' ELSE N'other' END AS [C1]
FROM [dbo].[Products] AS [Extent1]
Thank you!
Kati Iceva,
Developer, Entity Framework
Comments
Anonymous
May 10, 2010
that's amaaaaazing! thanks ado.net teamAnonymous
May 12, 2010
The comment has been removedAnonymous
May 25, 2010
Any improvments to how Included entities are loaded? I'd really like to see includes handled as seperate select statements, rather than joins.Anonymous
May 26, 2010
@Steve J, What version of Entity Framework are you using? I'm pretty sure we did make a change to help with parameter name consistency that should be in the RTM version of EF 4, but I'd like to confirm that you are using something earlier than that. If you are seeing this with EF 4 RTM, then I'd like to follow-up. @Mark, Unfortunately we haven't made any changes to include in EF4. The reason Include uses joins is because that's the most straightforward way to guarantee a consistent snapshot of results. If you make separate statements (unless you have a ridiculously restrictive isolation level), then you run the risk of something changing between the first select statement and the second one which could cause your data to be inconsistent. Additionally, the thought is that if you are willing to take that risk, then you can always write two or more queries yourself without using include. That said, more flexible fetch plans is one of the things on our backlog for future improvements to include. I'm just not sure yet when it will hit the top of the list.
- Danny
Anonymous
October 25, 2010
Actually, point 5 isn't true, it still doesn't work, as I reported on the data-access mailing list. It would be great if someone from the team could update this post to enlight that issue.Anonymous
October 25, 2010
One question on TPH: I have a scenario where I have a base abstract entity from which two types derive. One of these types is concrete while the other is abstract. From this second abstract type another concrete type is derived leaving me with this structure: AbsT1 |--- ConcT1 |--- AbsT2 |--- ConcT2 This conceptual model is then mapped to a single table in the storage model (TPH). This table has a column to identify the type of record being stored. In these mappings, conditions are added to both concrete types to get this column fixed values for each case. Under this condition I always get mapping errors stating that ConcT1 and ConcT2 are mapped to the same rows in the table which in my oppinion shouldn't happen. Additionally if I derive ConcT1 from AbsT2 instead, keeping everything else the same, all mapping errors are gone and everything works as expected. Another way of making things 'work' is to make both abstract types concrete instead. I've already posted this scenario on Connect but no solution as of yet. Can you shed some light on this scenario? Thanks in advanceAnonymous
October 25, 2010
Sorry, this comment was on done on the wrong post, please ignoreAnonymous
November 01, 2010
Re point 5 above. Seems to work correctly with string constants, variables of type string, not so much. Can someone point me toward a solution or at least some additional discussion on the matter. Thanks.Anonymous
February 01, 2011
We have an application which has the entity model created using vs2008 and at some point the project was upgraded to vs2010. However reading this blog, I observerd that the end SQL being generated by the application is still using the older version of .NET 3.5. For ex: StartsWith operator used by the Ado.net dataservices still translates to the ncharindex sql call. Is there anything specifically that needs to be done so that the application uses the EF4 version and generate the much improved sql?Anonymous
November 27, 2011
The bug with comparing ASCII strings with unicode mentioned in point 5 still exists when LINQ to Entities produces an IN clause. For example if I put into my query: (status == "A" || status == "B") where status is a non-unicode string, Linq-to-Entities produces the SQL query: where status in ( N'A', N'B') which causes serious performance issues.Anonymous
November 28, 2011
@Mike Emerson: Thanks for bringing this up. A fix for this issue will be included in the update for the EF core libraries in .NET 4.5.