Queries in ADO.NET vNext

Artigo
08/21/2006

Hi folks,

There’s been a lot of interest in the ADO.NET CTP that we posted a few days back, and lots of questions about the internal details. I wanted to briefly describe how we handle queries in the ADO.NET CTP – and hopefully address some of your questions in this area.

The Lay of the Land

The ADO.NET VNext stack provides for two broad programming surfaces – the Object Services layer and the Map Provider layer – in addition to the existing provider programming surface. The Map Provider is a regular .NET data provider that accepts connections, commands, and returns DataReaders. The Map Provider allows you to describe your store data in terms of a conceptual model (the EDM) and allows you to specify client-side views over your store tables. The query language for the Map Provider is Entity SQL (eSQL).

The Object Services layer provides a strongly typed object abstraction, and allows you to program and reason in terms of programming language objects. The Object Services layer provides you a cache of objects, allows you to perform CRUD operations on these objects, allows you to query your data via LINQ or eSQL. The Object Services layer is stacked on top of the Map Provider, and gives you rich objects on top of the client views described by the conceptual model over store tables.

How do I express queries?

At the Object Services layer, you have a choice of expressing queries either via LINQ or via eSQL. At the Map Provider, you need to express queries in terms of eSQL. We may also allow – in a future version – queries to be specified in terms of canonical query trees (CQTs).

Ok, what are Canonical Query Trees (CQTs)?

We use Canonical Query Trees (CQTs) as a declarative representation for queries in ADO.NET. We use this tree representation for multiple purposes – as a common representation for all queries (regardless of how they were expressed); as a provider-neutral mechanism of communicating queries to stores (since every store has its own dialect of SQL); and in the future, as a means of programmatically constructing queries, especially in tooling and query-editing scenarios. More details on CQTs in a follow-on post

The Query Flow

A user query isalways converted into a CQT. If the query is expressed in eSQL, the eSQL parser is invoked, the query is validated against the conceptual model, and then converted into a CQT. If the query is expressed in LINQ terms, the LINQ expression tree is converted into a CQT. All further processing of the query happens on the CQT.

We first “expand” (unfold) all references to client views (aka Conceptual Entity Sets) in the query. The client views themselves are simply queries against one or more Store-level Entity Sets (aka tables) and, even possibly other client views; and the view expansion process is identical to what databases do with database views. These views are generated by a mapping compiler (a subject for another blog post – but suffice it to say the mapping compiler is very powerful, and handles a number of different mappings in a very clean fashion) based on a user-supplied mapping specification. The query now targets store tables, but may have references to one or more conceptual types (entity types, complex types etc.).

The query is then run through a number of transformations to produce one or more queries that can be executed at the store. Two of the most important pieces here are the elimination of EDM type constructs (entity types, row types etc.) and the elimination of nesting operations from the query – these require some significant tweaking of the original queries – if you see some funny constants (‘0X1X2X’ etc.) in the generated SQL queries, this is where they come from. We need these pieces of information – as well as some information that we maintain outside the query - to tell us how to reassemble results later. At the end of this stage, we have queries that are entirely in terms of what the store providers support (ie) flat relational queries. We perform a few simplification operations on the query to produce “better” and more compact SQL. In the bits that you’re seeing now, some of this may still be a bit raw, but we’re working on them, and by RTM, we’ll be much better in terms of the queries we generate.

We now have one or more queries (in the form of CQTs) that can be executed by the underlying store provider. Each of these is then sent to the provider to convert into its native SQL dialect (more precisely, the provider provides a mechanism to produce a native DbCommand object from the query tree). We had a ProviderFest earlier this month, where we had provider writers for a number of databases come in to Redmond for a couple of days, and work with us on how to produce their native dialects of SQL from the CQTs – and all of them were able to get their apps up and running very quickly against their respective databases via ADO.NET. We anticipate that by RTM, many databases would have upgraded their providers to plug in easily into ADO.NET VNext.

The queries are then executed, and the results from the different readers are then reassembled (using information that’s been gathered earlier) to produce a single data reader. If the query originated at the Object Services layer, then there’s one additional step. The records from the DataReader are transformed into objects – we call this object materialization – based on the mapping information maintained by our metadata system.

Note that query execution happens entirely at the store. The client side performs result assembly (shaping operations, really), and object materialization; but the real guts of query execution is entirely in the store.

So why eSQL?

SQL has been around for a number of years, and there’s a vast number of folks that are familiar with it. SQL’s great for relational systems, but starts getting a little hairier for more advanced concepts. Large parts of SQL are still interesting though; so we designed Entity SQL (aka eSQL) to be an extension of SQL – very SQL-like in most cases, but more appropriate for the conceptual model (EDM) we’re building. We borrowed heavily from SQL-99, OQL, and earlier internal Microsoft incarnations like WinQL. You’ll see most of your friendly SQL constructs (select, from, where), some new EDM-specific constructs (type constructors, multiset constructors, relationship navigation etc.). More eSQL details in a follow-on post.

I mentioned earlier that every database has a slightly different variant of SQL that it supports. On similar lines, every database supports a slightly different set of builtin functions, and a different set of builtin types. We tried hard to keep the eSQL language agnostic about the builtin functions supported by a database. In fact, eSQL has no builtin functions – it has a very small number of builtin operators (things like +, -, <, >, = etc.). Instead, we rely on a Provider Manifest (I’ll talk about this in a follow-on post) to define the set of builtin functions and builtin types supported by the provider, and these are automatically available to you when you write eSQL queries. We also have the notion of a small set of canonical functions (I’ll describe this in a later post as well) that every provider is expected to support; and you can use these to write provider-independent queries.

When do I use LINQ? When do I use eSQL?

If you’re using the Map Provider, you will need to use eSQL. If you’re using the Object Services layer (Query<T>), you have a choice of using eSQL or LINQ. In a future release, we may also allow queries to be specified in terms of (publicly constructible) CQTs.

Some queries are easier to express in eSQL than in LINQ; some queries are easier to express in LINQ. LINQ queries give you stronger compile-time checking and IntelliSense. On the other hand, eSQL may be more appropriate for dynamic queries. Some constructs of the EDM, and some provider-specific constructs can only be exercised via eSQL. We expect to add a lot more capabilities to eSQL as part of future releases – including query-based DMLs, analytic functions, better aggregation (cube/rollup) support, native span support etc. At the end of the day, however, use the approach that is most suitable for your application. As I described earlier, both eSQL and LINQ queries are first converted into a common tree representation (the CQT) - from that point on, all processing is identical. You can mix eSQL and LINQ queries in your application logic without any trouble.

I hope that gave you all a somewhat reasonable understanding of the way we handle queries in ADO.NET. We’d love to hear what you think. What are we doing right? What parts can we do better? Where are we just plain wrong ?

We also want to get your thoughts on some of these questions. When would you use eSQL, and when would you use LINQ? What kind of constructs do you think are missing from eSQL? Would you like to program against CQTs directly (ie) would you want to build queries directly in terms of CQTs?

Things I will talk about in subsequent posts

· eSQL

· CQTs

· Provider Manifests

· More details on the query pipeline

Thanks

Murali

Comments

Anonymous
August 21, 2006
We shipped our CTP last week (ADO.NET August CTP). Pretty cool. I guess I should blog some on things...
Anonymous
August 21, 2006
Fantastic post! Extremely informative and definitely clears up a lot of confusion surrounding the query mechanism differences between eSQL, LINQ to Entities, etc
Anonymous
August 21, 2006

If custom SQL or stored procedures are to be supported for entity loading, how would they fit into this pipeline? If somehow I was able to map an entity to a stored procedure would the linq-to-entity statement look like:

var orders = from order in myOrders
where order.PlacedDate <= oldOrdersDate
select order

where @PlacedDate is a stored proc parameter?

This would allow compile-time typing. Joins could be prohibited at compile-time (as a ‘non-joinable’ entity perhaps). Also could there be a compile-time error if required parameters are not supplied?

2) If this is possible, would it be possible to use a stored procedure which returns multiple resultsets as separate entities, e.g. perhaps:

var products, orders = from myresult in myProductsAndOrders
where myresult.PlacedDate <= oldOrdersDate
select product
select order

Or should I be looking at Link-for-SQL, or bypassing Object Services and going directly to the Map Provider to load the resulting entities?

Cheers,
-Mat

Anonymous
August 21, 2006
The comment has been removed
Anonymous
August 21, 2006
PS: in above replace:
where order.PlacedDate <= oldOrdersDate
with
where order.PlacedDate = oldOrdersDate
(as I am thinking stored proc param here)!

A follow-on question from 2) is:
3) How might fetch plans be supported in the CQT (for eSQL and Linq to support), e.g. to eager-fetch master to detail entities.

Thanks,
-Mat
Anonymous
August 21, 2006
Thanks for the informative post!
Anonymous
August 23, 2006
Hi Matthew,

The way we're thinking of store-procs support for fetching data is more along the lines of defining well-known, fixed entry points that map to stored-procedures in the store. You wouldn't be able to formulate queries over them, you would only be able to invoke them using some API or syntax construct.

While it's tempting to do something along the lines of what you suggest, that would mean that you'd have to indicate with metadata whether a given parameter represents a filter, a value to be projected out, a value used for internal computation, etc. Additionally, stored-procedures don't compose well, which causes no end of pain in a language like SQL where composition of queries is so important.

Regarding resultsets, we'll probably only support stored-procs with a single result-set, at least on the upcoming release. Supporting multiple result-sets is tricky in various aspects, from mapping to the language support in LINQ, so it's probably something we'll not look at now.

There is also the aspect of stored-procs support for update handling. In that case what we're thinking is to allow you to indicate which stored-procs we should use instead of the generated SQL for each given table.

All this is based on some eaerly thinking, we'll see where we land as we make progress in this area.

-pablo
Anonymous
August 23, 2006
You've been kicked (a good thing) - Trackback from DotNetKicks.com
Anonymous
August 25, 2006
I'm not sure I like the idea of yet another query language. Can you give me an example of a query that can't be executed in LINQ? It's difficult for me to imagine provider-specific feature that couldn't be accommodated with LINQ simply by taking advantage of purely symbolic, empty methods calls like DLINQ's "Including" method. I would think that by calling an empty method in the query you should be able to build DSL query languages to accomodate most scenarios, no?
Anonymous
August 25, 2006
OK - this sounds like:
eSQL <-> HQL
LINQ <-> Criteria API (with syntactic sugar supported by C#3 language features)

Is this basically accurate? If so - thanks! Also if so - its too bad your marketers couldn't frame this new development accurately as providing ORM (with the not insignificant bonus of ease of use provided by the C#3 language features)
Anonymous
August 28, 2006
I don't see why we need eSQL.
Anonymous
September 02, 2006
the ideia of a writing queries in eSql (or LINQ) thinking in terms of a ER model that is provider independent is really awesome!

it seems that eSql is a natural evolution when you think that traditional SQL cannot deal with higher abstractions such as an ER model and is not provider independent.
Anonymous
September 02, 2006
Sometimes getting tired of all this TLA only for a CTP

and the beta brainwash saga continues !
Anonymous
September 14, 2006

Hi folks,My name is Atul Adya and I am a software architect in the ADO.NET team. After working on persistent...
Anonymous
September 22, 2006
Excellent post! Great food for the mind.

I am very grateful that you took the burden of "equalizing" the database engines for us.

Also, I think it is a very power idea to give developers the chance to create and manipulate CQTs directly. It could replace eSQL for dynamic queries in many scenarios. Maybe someday database stores will talk in term of CQTs, and we will get rid of all the layers of eSQL/SQL parsers. This would be very a important evolution for SQLCLR.

My understanding is that any LINQ query that talks to a IQueryable store gets translated to a tree representation internally. Is that always a CQT? So, how does a CQT look like?

Thank you.
Anonymous
September 25, 2006
PingBack from http://bryanandnoel.hintonweb.com/blog/2006/08/29/adonet-vnext-linq-esql-microsofts-take-on-data-access-in-the-future/
Anonymous
February 12, 2007
Well done! [url=http://vtiywemf.com/fngr/haht.html]My homepage[/url] | [url=http://hduxqccl.com/thjn/sfwl.html]Cool site[/url]
Anonymous
February 12, 2007
Thank you! <a href="http://vtiywemf.com/fngr/haht.html">My homepage</a> | <a href="http://txirkgsa.com/vnmz/boee.html">Please visit</a>
Anonymous
February 12, 2007
Well done! http://vtiywemf.com/fngr/haht.html | http://skiqbkuk.com/gpjo/hsqk.html
Anonymous
May 05, 2007
PingBack from http://laribee.com/blog/2006/09/17/canonical-query-trees/
Anonymous
June 15, 2009
PingBack from http://debtsolutionsnow.info/story.php?id=3774

Partilhar via

Queries in ADO.NET vNext

Comments

Recursos adicionais