Entity SQL

ADO.NET’s Entity Client comes with a client-side query engine that provides a consistent entity-level platform over different store providers. The input to that query engine is an entity-level query written in a language naturally called Entity SQL. The purpose of Entity SQL is to provide a common, entity-level, query language independent of any data store, so that queries written in it are the same across different store implementations. People might expect Entity SQL to be a small subset of SQL constructs that is common for most data stores. Those will be pleasantly surprised - Entity SQL is a very rich functional language whose expressiveness supersedes conventional SQL dialects.

 

Expressions

The building block of Entity SQL is expression. The language is about composing complex expressions. Thus the simplest Entity SQL query looks like this:

 

-- Returns one scalar item

1;

 

The domain of expression types is multidimensional. With regard to multiplicity, expressions fall into two categories: items and collections. The difference between those two is easy to guess – collections are subject to relational operations while items are not. Here are some valid collection expressions:

 

-- Returns a collection of one element

{1};

 

 

-- Returns a collection of three elements

{1, 2, 3};

 

With regard to source, expressions may be literals, parameters, or nested expressions. That means anywhere a value is expected, it could be parameterized or composed of some other expression. Parameterization has both security- and performance benefits. By passing user-supplied input as parameters, one avoids the risk of SQL injection. And by using a constant query text, a frequently used query is likely to stay in the query plan cache and won’t waste time to compile for each execution.

 

Composition means that queries are no longer constrained to the tabular shape of conventional SQL. A nested expression may be placed anywhere a value of that type would be accepted. For instance:

 

-- Returns a hierarchical collection of three elements at top-level.

-- x must be passed in the parameter collection.

ROW(@x, {@x}, {@x, 4, 5}, {@x, 7, 8, 9})

 

-- Returns a hierarchical collection of one element at top-level

-- x must be passed in the parameter collection.

{{{@x}}};

 

If a collection represents the vertical axis of a conventional SQL row set, then what represents the horizontal axis, i.e. how can we make a composite item? In addition to the scalar items that we’ve seen already, there are rows and entities. An Entity SQL row is identical to the conventional SQL row – it’s a set of items eventually of different types. A row is constructed through the ROW() constructor.

 

An entity is an instance of an entity type from the entity model. An entity could be flattened to a row through the VALUE keyword. The opposite is also true – an entity may be constructed from a row using the entity type constructor as long as all the properties are supplied in the exact order as they are defined in the entity model.

 

Now we are ready for some samples with relational operations. All the samples bellow use the Northwind sample model:

 

-- Returns one row with three elements

ROW(1, ‘abc’);

 

 

-- Returns a collection of five identical rows with three elements each

SELECT ROW(1, ‘abc’)

FROM {11, 12, 13, 14, 15};

 

 

-- Returns a collection of two rows with two elements each

{ROW(1, ‘abc’)}

UNION ALL

{ROW(2, ‘xyz’)};

 

 

-- Returns a collection of all the entities from the Categories set

SELECT c

FROM Northwind.Categories AS c;

 

 

-- Returns a collection of rows where each row represents a flattened entity from the Categories set

SELECT VALUE c

FROM Northwind.Categories AS c;

 

An entity model has no notion of normalization as that is only applicable to store implementations. Therefore entity-level queries against a well-designed model should rarely need to use JOIN. Instead, those queries should use navigation properties. Those are properties that represent relationships in an entity model. For instance:

 

-- Returns a collection of rows where each row contains a Product entity and a reference to its corresponding Category entity

SELECT p, p.Categories

FROM Northwind.Products AS p;

 

Navigating the same relationship in the opposite direction is even more interesting:

 

-- Returns a collection of rows where each row contains a Category entity and a collection of references to its corresponding Product entities

SELECT c, c.Products

FROM Northwind.Categories AS c;

 

References and Entities

References and entities in Entity SQL differ with regard to entity properties - properties are directly applicable only on entities, and not on references. A reference must be “dereferenced” to an entity first. For instance, to query all products and the name of each product’s category, we need the following query:

 

-- Returns a collection of rows where each row contains a Product entity and the name of its corresponding Category

SELECT p, DEREF(NAVIGATE(p, Northwind.FK_Products_Categories)).CategoryName

FROM Northwind.Products AS p;

 

Nested Queries

As it was mentioned above, a nested query may be used anywhere an expression of the type it returns is accepted. Furthermore, since Entity SQL is not limited to a tabular result set as conventional SQL dialects are, one can place a nested query even in a projection clause:

 

-- Returns a collection of rows where each row contains a Category entity and a collection of references to its corresponding Product entities

SELECT c, (SELECT DEREF(p)

                FROM NAVIGATE(c, Northwind.FK_Products_Categories) AS p) AS Products

FROM Northwind.Categories AS c;

 

Paging

Paging is a hot feature that web applications demand from data providers. Roughly speaking paging is extracting a range/page of a base collection starting at a given offset and proceeding for a given length/size. The benefit of paging is reducing the cost of building and transporting the result on the backend and processing it on the frontend.

 

Entity SQL supports physical paging. That’s what the major data store providers support today. Physical paging means both the offset and the size of the requested page are given in number of items. For instance, if an application is rendering products from the Northwind model in groups of 10, on page 1 it will render items 1 through 10, on page 2 – items 11 through 20, and so forth. Physical paging has some deficiencies with regard to changes in the base collection:

· For each item that is removed from a lower-index page, the rest of the collection will shift towards the beginning, and one item will not be shown at all.

· For each item that is added to a lower-index page, the rest of the collection will shift towards the end, and one item will be shown more than once.

Despite of its semantic deficiencies physical paging could be implemented efficiently by data providers. It is also easy to comprehend and to adopt in applications.

 

Paging implies some sort order. Although in standard SQL one can query a range without specifying any ordering, Entity SQL doesn’t allow that exception – paging constructs are invalid without an ORDER BY clause. The clauses to specify page offset and page size are SKIP and LIMIT respectively. The following example would retrieve products from position 21 through position 30 ordered by name:

 

-- Returns a collection of rows where each row contains a Category entity and a collection of references to its corresponding Product entities

SELECT p

FROM Northwind.Products AS p

ORDER BY p.ProductName

SKIP 20

LIMIT 10;

 

Each of SKIP and LIMIT is optional. When LIMIT is given alone, it is equivalent to TOP in the projection clause. However, mixing SKIP and TOP is not allowed.

 

Differences from Standard SQL

Although Entity SQL shares a set of operations and syntax constructs with standard SQL, it should not be approached as another “SQL dialect”. Instead, it should be regarded as a new, collection-oriented, query language with its own merits.

 

The syntax of standard SQL is driven by the tabular shape of the result sets that data stores return. There are some shortcuts that are not available in Entity SQL since Entity SQL does not imply any particular shape of the collections it operates on. Following are examples of how some standard SQL statements map into Entity SQL:

 

-- Pseudo-T-SQL

SELECT

FROM

UNION ALL

SELECT

FROM …;

 

 

Writing such a statement in Entity SQL is ambiguous since the operands of the UNION ALL operator are unclear. It would work over named collections like this:

 

-- Pseudo-Entity SQL

c1

UNION ALL

c2;

 

 

The rule in Entity SQL is that nested queries must always be enclosed in parentheses:

 

-- Pseudo-Entity SQL

( SELECT

FROM … )

UNION ALL

( SELECT

FROM … );

 

Another convenient shortcut in T-SQL is that an ORDER BY clause may be specified at the end of the last query of a UNION ALL operator:

 

-- Pseudo-T-SQL

SELECT

FROM

UNION ALL

SELECT

FROM

ORDER BY …;

 

Entity SQL’s UNION ALL operator doesn’t have an ORDER BY clause at all. Again, nesting should be used:

 

-- Pseudo-Entity SQL

SELECT

FROM

    (

        ( SELECT

           FROM … )

          UNION ALL

        ( SELECT

          FROM … )

    )

ORDER BY …;

 

Overall our belief is that having stricter rules and less exceptions is better for the adoption of a programming language. Once a developer abstracts herself from the tabular constraint of conventional result sets, Entity SQL will become much easier to comprehend than standard SQL.

 

 

A beta version of the Entity SQL language reference is available as part of the June CTP distribution. Take a look and try out some queries. Don’t hesitate to post questions and ask for help at the Entity Framework forum. Feedback is welcome.

 

 

Zlatko Michailov

Program Manager, Data Programmability Runtime

Microsoft Corp.

Comments