LINQ to SQL: Objects all the way down

There are a lot of different opinions on just what LINQ to SQL is or is not. Some of them are actually based on investigation by people who have kicked the tires and taken a test drive. Being its architect, I realize that I am the foremost expert on what’s under the hood, yet that alone does not give me the omniscience to know how the product measures up to everyone’s expectations. What it does give me is the ability to share with you the impetus behind the design and hope that gives you enough insight to make up your own mind.

It’s probably no secret to anyone that’s been following along that the high-order bit with LINQ to SQL has always been LINQ; though it might not be so obvious just how deep that truth goes. LINQ to SQL is not just another LINQ provider. It has been from the beginning and still is the quintessential LINQ provider, shaping and being shaped by the same process that designed LINQ. So to understand the impetus behind LINQ to SQL you need to understand the impetus behind LINQ.

LINQ is more than just new query syntax added into existing programming languages. LINQ is the recognition that objects (in our case CLR objects) are the core of everything you as a programmer do. In the past, you had multiple disciplines to master, including databases, transforms and other scripts. Yet at the center of all your work was your mainstream programming language and runtime that you did everything else with, and instead of using objects to naturally represent this diverse work you used coarse grained API’s, instead of built-in language constructs supported by your development tools you used domain specific languages wedged into the system as unrecognizable text. Using these API’s was adequate but not ideal. You got your work done, but it was like poking at your data through keyholes, and when some of your data did get sucked back through it was basically dead-on-arrival, since none of your tools for manipulating it carried forward across the divide. Believe me, I know. I built many of these API’s.

So for LINQ the high-order bit was not only making it possible to represent your domain information as objects but making the ability to manipulate those objects in domain specific ways first class. LINQ to SQL was the poster child. The important area of manipulation was the query and since query was pervasive throughout most other highly used domains it was obvious that query would need to become first class in the language and runtime. LINQ to SQL’s primary job became the representation of the relational database data as objects and the translation of language integrated queries into remotely executing database queries.

Of course, since this coincides with the territory of ORM systems it should come as no surprise that LINQ to SQL has taken on that role as well, enabling a degree of mapping between the objects you use in the language and the shape of data in the relational database. We took from the experience of customers the most valued features of such systems and laid out a roadmap for delivering those features, yet like with any shipping product reality eventually crept in, so priorities were set and unfortunately a lot that we would have loved to do did not make the cut for the first version. But this is no apology. LINQ to SQL has amassed a set of features that will be compelling for a large part of the market, and over time it will only get better.

The truly interesting thing to understand about LINQ to SQL is just how deep the rabbit hole goes.

One of our primary tenets from the get-go was to enable plain-old-CLR-object (POCO) development. We received enough feedback from earlier prototypes of ObjectSpaces to know that customers really cared about this and what specifically about it mattered the most to them. And yet while we found reason to offer specialized classes such as EntityRef and EntitySet, we never strayed from the objective, since use of these classes has always been optional. You certainly can build an object model with plain-old object references and collections such as lists or arrays; you just don’t get deferred loading or bi-directional pointer fix up. And although some would have preferred for us to invent a cheap interception solution that would have allowed these behaviors without needing to use specialized classes, no such solution was on the horizon and the use of these types could easily be disguised behind properties.

It’s also worth pointing out that these specialized classes don’t actually cause you to mingle data access logic with your business objects. EntityRef and EntitySet don’t actually tie back to the database or LINQ to SQL infrastructure at all. They were designed to be completely free of any such entanglements, having no references back to contexts, connections or other abstractions. Deferred loading is based on assigning each a simple IEnumerable source, such as a LINQ query. That’s it. You are free to reuse them for whatever diabolical purpose you can imagine.

 But it does not stop there. The use of objects, your object’s specifically, and the CLR is pervasive throughout the design of LINQ to SQL. You see it in the way that behavior overrides for Insert, Update and Delete are enabled. Instead of offering a mapping based solution to connect these submit actions to database stored-procedures, the solution is designed to take advantage of objects and the runtime, the ability to add code to a system by defining a method. You define an InsertCustomer method as part of your DataContext and the LINQ to SQL runtime calls it, letting you override how an insert is performed. You can do anything you want in this method, logging, tracing, executing any SQL you prefer, or no SQL at all. Of course, all this is wired up for you when you use the designer to connect a submit action to a stored procedure. But the beauty of it lies in the simplicity of using the runtime and basic extensibility mechanism of the language to enable any custom solution you require.

You see it in the way that mapping can be defined using CLR custom attributes. Of course, an external mapping file variation is also available, but the attribute model was paramount. Some will argue that using attributes in the code breaks from pure POCO design. That might be true. However, it’s precisely the ability to declare mapping inline with the definition of your objects that makes LINQ to SQL simple to use and easy to get started because you always stay focused on your objects and your code.

You also see it in how LINQ to SQL operates internally or communicates to its provider. Queries are LINQ expression trees handed all the way down to the query translation pipeline. It’s your object types and runtime metadata that are reasoned about directly, constructed, compared and navigated. Even stored procedure calls are understood as references to the actual signature of the method defined on the DataContext and the results of ad hoc queries (projections) are never some general object with awkward accessors like DataReader, they are always your objects or objects defined implicitly through anonymous type constructors and are interacted with though strongly typed fields and properties.

Looking back, a lot of this just seems obvious now, but believe me, none of this was readily apparent at the time we started the project. For example, we designed ObjectSpaces with none of this in mind. Before LINQ not much of it was even possible to consider. Yet when it came time to build LINQ to SQL, tradeoffs in design were resolved by keeping true to your objects and the simplicity gained by using the built-in mechanisms of the runtime and language to manipulate them.

LINQ allowed us to finish the puzzle that was started when database access was first mashed together with object oriented languages. LINQ to SQL became the embodiment of this object-to-database solution, focusing the design on query, domain objects and the integration of both into the language and runtime.

Of course, there were many other design goals as well; simplicity, ease of use and performance lead to many interesting consequences that are equally deserving of their own post. I suppose I ought to write about them too. J

Comments

  • Anonymous
    June 21, 2007
    There are a lot of different opinions on just what LINQ to SQL is or is not. Some of them are actually

  • Anonymous
    June 21, 2007
    The comment has been removed

  • Anonymous
    June 21, 2007
    Thanks, keep these good posts coming. I think these background concepts will be very useful to people like me who are still a bit fuzzy on exactly LINQ to SQL is going to fit in.

  • Anonymous
    June 21, 2007
    The comment has been removed

  • Anonymous
    June 21, 2007
    Matt, Thanks for yet another interesting posting on the LINQ to SQL journey. It's refreshing to read about the thoughts, tradeoffs and reasons why LINQ to SQL became what it is today (as a fellow architecht, I share your pain - great API design is an art that takes a long time to master). I am very curious about the performance of LINQ to SQL in beta 2 - hoping that the initialization of DataContexts as well as entity materialization will improve. As mentioned in the end of your essay, a dedicated essay on Performance within LINQ to SQL would be great. Again, thoughts shared on your design proces may help others make smarter choices. With the anticipation of the "mini connectionless datacontext" (an ObjectTracker?) as well as the announced improvements arriving in beta 2, it's clear that the elegance of LINQ to SQL is about to improve.

  • Anonymous
    June 21, 2007
    Hi, thanks for your post. I have already started new project using LINQ to SQL, before I was using my own object to sql library. But there are some design limitations in current release which makes implementation much harder. The main are described here: http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1760703&SiteID=1 and http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1734001&SiteID=1 Are there any changes that there will be some work in this area? Nice day edvin

  • Anonymous
    June 22, 2007
    The comment has been removed

  • Anonymous
    June 22, 2007
    Quoting Frans - "you don't, the system will be pretty dumbed down: every feature you actually want will have to be written by hand into the domain classes" ... EXACTLY ... my thoughts! 90% of the pain is not solved in the current implementation. Then why introduce the extra complexity of yet another data access mechanism?

  • Anonymous
    June 22, 2007
    The comment has been removed

  • Anonymous
    June 22, 2007
    Frans, I understand that you do not seem to value Persistance Ignorance (POCO is a bad term), but those working in a Test-Driven or Domain Centric approach do rate it very highly. "in .NET land things like INotifyPropertyChanged on the entity, IBindingList/ITypedList/IListSource implementing collections, pk-fk syncing at runtime etc. etc. are must have elements of your domain/collection classes" Really, I've never needed them and I have delivered a lot of systems without them.

  • Anonymous
    June 23, 2007
    Hi Matt I've been looking for the DynamicUpdate, DynamicInsert and DynamicDelete methods you've mentioned in several posts. Are these methods already available in beta 1 (if so, where can I find them) ?

  • Anonymous
    June 23, 2007
    Anders, those methods are coming in Beta 2.

  • Anonymous
    June 23, 2007
    Ian, I sort of agree with Frans on the bit about bi-directional pointers. I'm guessing that a pure P.I. approach that does not model bi-directionality will tend to have subtle problems keeping the object model in sync with the database.  However, I'm not in favor of a 'magic' solution that keeps your pointers correct without needing any user code or special classes either.  Having your objects explicitly implement this behavior is actually ideal, since it is your objects that enable this behavior, not some side effect of the persistence system. I think there is some expectation by people that POCO means "I write nothing but a class that declares simple properties plus a few business methods and everthing else just magically happens so I don't have to think about it."  Yet, what is really means is that your class is not littered with actual data access logic. Behaviors like bi-directionality are not data access logic. They are, however, not built-in behaviors of the runtime either so your going to have to opt into them manually by doing 'something.' So now were only dickering on whether you have to write a few lines of code or have the behavior packaged into a custom class.  We chose the 'few' lines of code to represent the bi-directionality behavior to keep the size of EntityRef down so we could maintain it as a value-type and not require the use of reflection to make it work.

  • Anonymous
    June 23, 2007
    The comment has been removed

  • Anonymous
    June 23, 2007
    Agreed.  I hope people only implement as much as they need to get the job done. :-)

  • Anonymous
    June 23, 2007
    Ian, if you don't use dyn. proxy using o/r mappers, and you do want to have your classes behave in databinding scenario's, how would you do that without INotifyPropertyChanged, and for example ITypedList? Or do you define your collections as BindingList<T> ? Sure, if you say "I don't use databinding", then you don't need databinding enabled code, but if you DO, you won't get away with a simple class, as there's no code there which signals bound controls something has changed, etc. I simply don't buy this statement: "My objection is more to the argument that devs don't really want PI because they have to implement a lot of infrastructure code in the .NET world anyway. That just seems defeatist and in fact I know a lot of devs who avoid framework features like data-binding in many cases." If  you don't use databinding, you have to do some other kind of MVC approach to make data show up in a form and have the data be transported back from the controls to the objects. All great, but you have to write that code manually as well. That's infrastructure plumbing code, Ian. :) Sure, you don't need the databinding interfaces etc., but it doesn't mean you dont have to do some work (and writing the controllers isn't a few lines of code either) somewhere else to get your app working. Make no mistake, I'm all for dropping the databinding framework TODAY, so I can get rid of the interfaces in my base classes and perhaps even get away with just code generation and no base classes at all. (see this code generation as the typing of the infrastructure code you need to get POCO classes get persisted and MANAGED in memory.) That by itself asks for a different solution to the same problem, e.g. MVC could be that solution. This then asks for another way to avoid having to write a lot of code to be able to use data in a UI without having to write code which passes data from/to controls. Persisting a POCO class isnt free. Magic is involved, somewhere, OR you have to do it the slow way via reflection. The thing is though: using an o/r  mapper isn't solely about persistence. Persistence is just a small part of using an o/r mapper. The biggest gains are won in the areas where validation, auditing, authorization, flexible concurrency control etc. are taken care of by the framework which takes care of the entities anyway, namely the o/r mapper. If you have to write that code as well by hand (and in the case of linq to sql you have to) all over the place in your POCO classes, have fun. :) Sure, you'll now say that by using additional frameworks like spring.net and castle it will be much easier. I fully agree, but not addressing that part when claiming POCO or PI is great and better is IMHO simply telling half the story. I also still find it odd that some TDD people are still blind for the fact that entities don't fall out of the sky. Even the most purest die-hard agile TDD developer won't have visions behind the keyboard where a bright white light suddenly wispers in his eary "Thou shall create a customer class! With these 3 properties!". The developer will think about what to create, which entities are there in the system. Use DDD to discover which entities there are, define them first, then type in the code for the class definitions. THis effectively means that the class is just another representation of the abstract entity definition that's the theoretical basis of the entity. Or are you suggesting that there's no theoretical basis for a random entity class in your system? :) Using that theoretical basis you can thus opt for a different route with entity classes which contain plumbing code and which allow you to STILL do TDD, simply because the TDD you'll be doing is about the system functionality, not about the plumbing code you need to get the functionality code up and running, as that's already been taken care of. Oh, and I do value PI, it's just that I react to the remarks done by some people in the blogosphere where they tend to claim that PI is free and won't give you anything to worry about. Its not free nor will it free you from a lot of work, on the contrary. It WILL force you to write other infrastructure code, if you like it or not. If .NET and VS.NET was build around MVC, my framework would have been PI more or less today, but the opposite is true. Add to that that if you want to have real performant o/r mapping, you can't do it without either post-compile IL manipulation or dyn. proxies, IF you want to go the PI route. Both have side effects and downsides and limitations for your own code. So does using a base class/code gen. I won't deny that, it's just that saying that using POCO/PI isn't having sideeffects/limitations is simply a lie.

  • Anonymous
    June 23, 2007
    Btw, I'd like to add that I'm more or less suggesting that code inside the entity classes (be it via a base class, code genned or hand-typed for the people who like to type everything in despite the fact that it's repetitive and boring) should be targeted towards:

  1. entity management. This is infrastructure at its best. You either have to type this in by hand or get it from the framework in some form (dyn. proxy/post-compile IL man./codegen/baseclass)
  2. make o/r mapping more efficient. This is also infrastructure code. One might close his eyes and denying the db is there, but that's just stupid. The db is there, deal with it. THat's not to say that the entity should have methods like Save() etc., that's not what I'm suggesting as I personally also don't like that (which is why we support both types) One other reason I often hear from people who want POCO is that it gives them the choice to switch to another framework. That's a big myth. Your code will be soaked with the o/rmapper's details, query language etc. Switching to another framework isn't going to be a walk in the park, on the contrary, even if you have poco classes. In nhibernate this isn't true: after: myCustomer.Orders.Add(myOrder); it's not true that myOrder.Customer points to myCustomer. Some other frameworks DO make sure this is true. Switching a framework will break that kind of code, even if you use POCO classes, and even worse: you won't notice it until you run the app and it hits the situation where this is important. :) Ian, in a perfect world, I would definitely agree with you, and in theory I still do, the reality is however that poco/pi has disadvantages too, and as .NET isn't really tailored to support them (in java all methods etc. are virtual, so dyn. proxy is easy, also, byte code manipulation is easy, in .NET it's much harder as there's no real runtime IL manipulation possible which performs really well) you have to develop your software in a different way, where you can't utilize the tools available to you in the .NET framework.
  • Anonymous
    June 24, 2007
    I think what we're really saying here is that OR mapped domain objects are not just 'POCO' objects, since they embody behavior beyond what is typical to the runtime. This behavior is not persistence related, it is a by-product of the meta entity that is being modelled and represented in the runtime environment. Users and builders of these objects should realize that these objects are neither limited to the capabilities of 'normal' objects nor can they be described using all the qualities possible in the runtime. The data model of the meta entity overlaps with the CLR, neither encompasses the other.  In order to model what is missing in the runtime, user code or conventions must be added to the mix. It makes no sense to be opposed to this.

  • Anonymous
    July 02, 2008
    http://blogs.msdn.com/mattwar/archive/2007/06/21/linq-to-sql-objects-all-the-way-down.aspx : "EntityRef

  • Anonymous
    July 13, 2008
    I recently had some time on airplanes to read through Bitter EJB , POJOs in Action , and Better, Faster

  • Anonymous
    July 13, 2008
    I recently had some time on airplanes to read through Bitter EJB, POJOs in Action, and Better,...