Udostępnij za pośrednictwem


Improvements to Generated SQL in .NET 4.0


The information in this post is out of date.

Visit msdn.com/data/ef for the latest information on current and past releases of EF.


In a previous post (https://blogs.msdn.com/adonet/archive/2009/08/05/improvements-to-the-generated-sql-in-net-4-0-beta1.aspx), we talked about the improvements to the quality and the readability of the SQL generated when querying using the Entity Framework that we made in .NET 4.0 Beta 1. We continued working on that task and here I talk about the additional improvements provided with .Net 4.0.

 

Again, most of the improvements we have made are in the Entity Framework query pipeline as opposed to specific SqlClient changes, thus affecting the SQL generated by any backend, not only by our SQL Server provider.

 

Unless otherwise noted:

· The examples are based on the NorthwindEF Entity Data Model provided with the ADO.NET Entity Framework Query Samples (https://code.msdn.microsoft.com/EFQuerySamples).

· The “Before” generated SQL is produced by the Entity Framework in .NET 3.5, and the “After” by the Entity Framework in .NET 4.0.

 

Also, in all the examples below, the parts of the SQL query highlighted in red have been removed and those in yellow have been added.

 

1. Elimination of internal constants used as null sentinels

In the query pipeline we introduce internal constants to serve as null sentinels, i.e. based on whether they come out as null we make decisions whether a record should be null /collection should be empty. In some cases, the introduction of the new column can cause a new level of nesting to be introduced. The improvement is instead of introducing new constants to try reusing other columns for the same purpose. This in many cases leads to simpler queries with less nesting.

Example:

var query = from c in context.Categories

select new { c.CategoryID, Count = c.Products.Count }

Before

SELECT

1 AS [C1],

[Project1].[CategoryID] AS [CategoryID],

[Project1].[C1] AS [C2]

FROM ( SELECT

      [Extent1].[CategoryID] AS [CategoryID],

      (SELECT

            COUNT(cast(1 as bit)) AS [A1]

            FROM [dbo].[Products] AS [Extent2]

            WHERE [Extent1].[CategoryID] = [Extent2].[CategoryID]) AS [C1]

      FROM [dbo].[Categories] AS [Extent1]

)  AS [Project1]

After

 

SELECT

[Extent1].[CategoryID] AS [CategoryID],

(SELECT

      COUNT(1) AS [A1]

      FROM [dbo].[Products] AS [Extent2]

      WHERE [Extent1].[CategoryID] = [Extent2].[CategoryID]) AS [C1]

FROM [dbo].[Categories] AS [Extent1]

 

2. Translating String.StartsWith, String.EndsWith and String.Contains to LIKE in LINQ to Entities

In .NET 3.5, in LINQ to Entities we provided support for the String.StartsWith, String.EndsWith and String.Contains and we translated them to canonical functions and ultimately to SQL in the following way:

CLR Method

Translation in terms of canonical functions

Resulting translation for SQL Server

Boolean String.Contains(String value)

IndexOf(this, value) > 0

CHARINDEX(value, this) > 0

Boolean String.EndsWith(String value)

Right(this, length(value)) = value

(RIGHT(this, LEN(value)) = value

Boolean String.StartsWith(String value)

IndexOf(this, value) = 1

CHARINDEX(value, this) = 1

 

These translations resulted in inability to utilize indexes on the tested string and were not as easy to read. Therefore, in .NET 4.0, when the given value is a constant or a parameter and the provider support translation to LIKE, we translate these functions into a DbLikeExpression, which ultimately results in a “LIKE” on SQL Server.

Example:

var query = from p in context.ProductSet

            where p.ProductName.Contains("va")

            select p.ProductID;

Before

SELECT

[Extent1].[ProductID] AS [ProductID]

FROM [dbo].[Products] AS [Extent1]

WHERE (CAST(CHARINDEX(N'va', [Extent1].[ProductName]) AS int)) > 0

After

 

SELECT

[Extent1].[ProductID] AS [ProductID]

FROM [dbo].[Products] AS [Extent1]

WHERE [Extent1].[ProductName] LIKE '%va%'

3. Improve the translation of the canonical functions StartsWith, Contains and EndsWith to use LIKE

 

In .NET 4.0 Beta 1 we introduced new canonical functions, among which the following string functions along with their translations for SQL Server:

Canonical Function

Translation for SQL Server

Boolean String.Contains(String value)

CHARINDEX(value, this) > 0

Boolean String.EndsWith(String value)

CHARINDEX(REVERSE(arg1), REVERSE(arg0)) = 1

Boolean String.StartsWith(String value)

CHARINDEX(value, this) = 1

 

We have now changed the translations of these functions to use LIKE when the value is a constant.

Example:

Note: The ‘Before’ SQL in this example is produced by Entity Framework in .NET 4.0 Beta1

string entitySql = "contains('abcd', 'bc')";

Before

SELECT

CASE WHEN (CHARINDEX( 'bc', 'abcd') > 0) THEN cast(1 as bit) WHEN ( NOT (CHARINDEX( 'bc', 'abcd') > 0)) THEN cast(0 as bit) END AS [C1]

FROM ( SELECT 1 AS X ) AS [SingleRowTable1]

After

SELECT

CASE WHEN ('abcd' LIKE '%bc%') THEN cast(1 as bit) WHEN ( NOT ('abcd' LIKE '%bc%')) THEN cast(0 as bit) END AS [C1]

FROM ( SELECT 1 AS X ) AS [SingleRowTable1]

 

4. Additional Join Elimination when there is more than one navigation property from one entity type to another

In some cases, then there is more than one navigation property from one entity type to another, and both naviation properties were accessed mulitple times we were generating an output quey with redundant joins.

Example :

Note: This example is based on a slightly modified schema from the one provided in the ADO.NET Entity Framework Query Samples. In particular, a new Association Products_SecondaryCategory has been added identical to the Products_Category association along with the navigation properties. Also, referential constrains between Categories and Products for both associations have been added to the SSDL.

var query = from p in context.ProductSet select new {

          p.ProductID,

         category = p.Category.CategoryName,

         categoryDescription = p.Category.Description,

           secondaryCategory = p.SecondaryCategory.CategoryName,

         secondaryCategoryDescription = p.SecondaryCategory.Description

            };

Before

SELECT

1 AS [C1],

[Extent1].[ProductID] AS [ProductID],

[Extent2].[CategoryName] AS [CategoryName],

[Extent3].[Description] AS [Description],

[Extent4].[CategoryName] AS [CategoryName1],

[Extent5].[Description] AS [Description1]

FROM [dbo].[Products] AS [Extent1]

LEFT OUTER JOIN [dbo].[Categories] AS [Extent2] ON [Extent1].[CategoryID] = [Extent2].[CategoryID]

LEFT OUTER JOIN [dbo].[Categories] AS [Extent3] ON [Extent1].[CategoryID] = [Extent3].[CategoryID]

LEFT OUTER JOIN [dbo].[Categories] AS [Extent4] ON [Extent1].[SecondaryCategoryID] = [Extent4].[CategoryID]

LEFT OUTER JOIN [dbo].[Categories] AS [Extent5] ON [Extent1].[SecondaryCategoryID] = [Extent5].[CategoryID]

After

 

SELECT

[Extent1].[ProductID] AS [ProductID],

[Extent2].[CategoryName] AS [CategoryName],

[Extent2].[Description] AS [Description],

[Extent3].[CategoryName] AS [CategoryName1],

[Extent3].[Description] AS [Description1]

FROM [dbo].[Products] AS [Extent1]

LEFT OUTER JOIN [dbo].[Categories] AS [Extent2] ON [Extent1].[CategoryID] = [Extent2].[CategoryID]

LEFT OUTER JOIN [dbo].[Categories] AS [Extent3] ON [Extent1].[SecondaryCategoryID] = [Extent3].[CategoryID]

 

5. Provide mechanism for efficient queries on non-Unicode columns

In .NET 3.5, whenever a constant or a parameter was used in LINQ to Entities query, we treated it as being Unicode. As a result, when comparing a constant to a property stored in a non-unicode column on SQL Server, if there was an index on that column, it was not being used.

To address the issue, we now generate non-unicode constants and parameters when these are used in LINQ to Entities queries in comparisons with non-unicode columns.

Example:

Note: This example is based on a slightly modified schema from the one provided in the ADO.NET Entity Framework Query Samples. In particular, the SSDL has been modified, to specify the ProductName column as non-unicode.

 var query = from p in context.ProductSet

where p.ProductName == "Chocolade"

select p.ProductID;

Before

SELECT

[Extent1].[ProductID] AS [ProductID]

FROM [dbo].[Products] AS [Extent1]

WHERE N'Chocolade' = [Extent1].[ProductName]

 

After

 

SELECT

[Extent1].[ProductID] AS [ProductID]

FROM [dbo].[Products] AS [Extent1]

WHERE 'Chocolade' = [Extent1].[ProductName]

 

6. Collapse multiple Case statements into one

Queries with deeply nested conditional statements could fail due a SQL Server’s limitation on the level of nesting of CASE statements. This scenario is especially interesting with LINQ to Entities queries like the one in the example, which are natural to write. To provide a better support for queries like this, we now collapse multiple cascading case statements into one.

Example:

var query = from p in context.ProductSet

select

(p.ProductID == 1) ? "first" :

                 (p.ProductID == 2) ? "second" :

                (p.ProductID == 3) ? "third" :

                 (p.ProductID == 4) ? "fourth" :

      "other";

 

Before

SELECT

CASE WHEN (1 = [Extent1].[ProductID]) THEN N'first' ELSE

  CASE WHEN (2 = [Extent1].[ProductID]) THEN N'second' ELSE

    CASE WHEN (3 = [Extent1].[ProductID]) THEN N'third' ELSE

      CASE WHEN (4 = [Extent1].[ProductID]) THEN N'fourth' ELSE N'other'

END END END END AS [C1]

FROM [dbo].[Products] AS [Extent1]

After

 

SELECT

CASE WHEN (1 = [Extent1].[ProductID]) THEN N'first'

     WHEN (2 = [Extent1].[ProductID]) THEN N'second'

     WHEN (3 = [Extent1].[ProductID]) THEN N'third'

     WHEN (4 = [Extent1].[ProductID]) THEN N'fourth' ELSE N'other' END AS [C1]

FROM [dbo].[Products] AS [Extent1]

 

Thank you!

Kati Iceva,

Developer, Entity Framework

Comments

  • Anonymous
    May 10, 2010
    that's amaaaaazing! thanks ado.net team

  • Anonymous
    May 12, 2010
    The comment has been removed

  • Anonymous
    May 25, 2010
    Any improvments to how Included entities are loaded? I'd really like to see includes handled as seperate select statements, rather than joins.

  • Anonymous
    May 26, 2010
    @Steve J, What version of Entity Framework are you using?  I'm pretty sure we did make a change to help with parameter name consistency that should be in the RTM version of EF 4, but I'd like to confirm that you are using something earlier than that.  If you are seeing this with EF 4 RTM, then I'd like to follow-up. @Mark, Unfortunately we haven't made any changes to include in EF4.  The reason Include uses joins is because that's the most straightforward way to guarantee a consistent snapshot of results.  If you make separate statements (unless you have a ridiculously restrictive isolation level), then you run the risk of something changing between the first select statement and the second one which could cause your data to be inconsistent.  Additionally, the thought is that if you are willing to take that risk, then you can always write two or more queries yourself without using include. That said, more flexible fetch plans is one of the things on our backlog for future improvements to include.  I'm just not sure yet when it will hit the top of the list.

  • Danny
  • Anonymous
    October 25, 2010
    Actually, point 5 isn't true, it still doesn't work, as I reported on the data-access mailing list. It would be great if someone from the team could update this post to enlight that issue.

  • Anonymous
    October 25, 2010
    One question on TPH: I have a scenario where I have a base abstract entity from which two types derive. One of these types is concrete while the other is abstract. From this second abstract type another concrete type is derived leaving me with this structure: AbsT1 |--- ConcT1 |--- AbsT2         |--- ConcT2 This conceptual model is then mapped to a single table in the storage model (TPH). This table has a column to identify the type of record being stored. In these mappings, conditions are added to both concrete types to get this column fixed values for each case. Under this condition I always get mapping errors stating that ConcT1 and ConcT2 are mapped to the same rows in the table which in my oppinion shouldn't happen. Additionally if I derive ConcT1 from AbsT2 instead, keeping everything else the same, all mapping errors are gone and everything works as expected. Another way of making things 'work' is to make both abstract types concrete instead. I've already posted this scenario on Connect but no solution as of yet. Can you shed some light on this scenario? Thanks in advance

  • Anonymous
    October 25, 2010
    Sorry, this comment was on done on the wrong post, please ignore

  • Anonymous
    November 01, 2010
    Re point 5 above.  Seems to work correctly with string constants, variables of type string, not so much.  Can someone point me toward a solution or at least some additional discussion on the matter. Thanks.

  • Anonymous
    February 01, 2011
    We have an application which has the entity model created using vs2008 and at some point the project was upgraded to vs2010. However reading this blog, I observerd that the end SQL being generated by the application is still using the older version of .NET 3.5. For ex: StartsWith operator used by the Ado.net dataservices still translates to the ncharindex sql call. Is there anything specifically that needs to be done so that the application uses the EF4 version and generate the much improved sql?

  • Anonymous
    November 27, 2011
    The bug with comparing ASCII strings with unicode mentioned in point 5 still exists when LINQ to Entities produces an IN clause.  For example if I put into my query: (status == "A" || status == "B") where status is a non-unicode string, Linq-to-Entities produces the SQL query: where status in ( N'A', N'B') which causes serious performance issues.  

  • Anonymous
    November 28, 2011
    @Mike Emerson: Thanks for bringing this up. A fix for this issue will be included in the update for the EF core libraries in .NET 4.5.