Udostępnij za pośrednictwem


Ramblings about POCO*, transparency and delayed load.

As someone pointed out, I have been referring to ObjectSpaces as supporting true POCO – which is technically incorrect. ObjectSpaces does prescribe the usage of ObjectHolder and ObjectList for delayed load cases (more of the merits of delayed loading down below). However, beyond that – there is no prescribed type definition. So for the sake of being technically correct, I will start referring to ObjectSpaces as POCO*.

I think this leads to an interesting conversation around whether ObjectSpaces supports true transparency. In other words, can the domain model developer design types for a particular business problem without regards to the fact that the objects will be persisted to a permanent data store? To answer this question, I want to first look at the pillars of Indigo:

            Service Boundaries are Explicit and Costly to traverse

Think of the data store as one very complex and powerful service. There is no way to avoid this with current relational database technology unless the store 1) is local, 2) is mostly read-only access or used exclusively and 3) has huge amounts of disk space to support a large amount of indexes. In other words, the boundary between the database and the database application is explicit and therefore all current data access technology cannot support transparent access. In then follows that ObjectSpaces cannot support true transparency. At least the developer who is designing the domain types, the mapping and/ or the database must be aware of this explicit boundary. I also doubt that the developer just utilizing the domain types can even use them transparently. There are still runtime concerns like transaction management, batching considerations and dealing with concurrency errors. I suppose one could develop a framework that abstracts away these realities from the domain model, but that would lead toward making the physical persistence transparent to the domain model – which I have yet to see a framework which completely does that and have doubts it can be done for significantly complex architectures.

So, in general - POCO* minimizes the burden of the domain model developer needing to know details about the datastore, however one must still know the datastore exists and lives across a service boundary. In other words, it offers a nice abstraction but not complete transparency.

Matt Warren has some interesting points about the evils of loading on demand. I for the most part, can’t say I particularly disagree with him. I do though suggest for value types, particularly for very large ones – this requirement justifies the means. At the end of the day I don’t want to download the entire movie when I want to see the date is was created. However that said, I would still like to drill down into what I believe the true evil is: preserving graph fidelity. Think for a second that ObjectSpaces did not support delayed loading (then we would have true POCO instead of POCO*). ObjectSpaces still would support the concept of object identity across queries via the ObjectManager. That is, if the set of results for query 1 and query 2 intersect, then ObjectSpaces is going to add the results to query 2 only for the non-intersecting subset. The rest gets thrown away since it is already in memory. So I am going to take the liberty of extrapolating what Matt said about demand loading being evil to really mean that graph fidelity should not be maintained across query executions.

Let’s table that discussion for a second, and discuss a very specific evil with delayed loading when not using a prescriptive solution like ObjectHolder/ ObjectList. Short of code injection, the only way to support delayed loading is through some sort of context that would know how to retrieve related object(s) for a given object at runtime. Then the problem becomes is what is the meaning of null?

Take for example, the case where I have type A which has a reference to type B called b. I run a query that loads A’s but does not include B in the span. In a true POCO model, b is set to a null value by the engine materializing an instance of A. What does that null mean? That A does not have a corresponding b or that b has not been materialized in memory graph? Really it is quite ambiguous unless of course one takes in account the original query. So it you buy that some sort of delayed loading support is desirable, then ObjectHolder/ ObjectList are a necessary divergence from POCO, unless of course code injection is utilized. But that is another topic for another time.

So where are we now? A limited amount of delayed loading is ok, but completely preserving graph fidelity across query executions is a very dangerous game with very bad consequences for those who embrace it’s dark side. Should preserving graph fidelity really be the job of an O/R mapping framework. Well, I can probably argue each side of the discussion. Perhaps another topic for another time.

So, to conclude – my points:

1) ObjectSpaces does not support true transparency. Unless there is a huge technological jump in datastore technology, this is not feasible. What it does support is a nice away to decouple the domain model from the data access layer, however it is not a transparent technology for persisting objects.

2) ObjectHolder/ ObjectList are a necessary divergence from the true POCO model. It allows the delayed loading of large values on demand while disambiguating the value of null. It however, can be abused (see point 3).

3) ObjectSpaces can preserve graph fidelity across query executions and therefore allow the user to materialize sections of their domain model on demand. This however can be abused and can lead to cases of the in-memory representation being out of sync with the database in a significantly complex way. Like most software, this feature is a tool which can either be used correctly or abused; it is up to the developer to do what is right for their design. Further, utilizing this feature correctly probably means even less transparency for ObjectSpaces (see point 1).

Comments

  • Anonymous
    April 30, 2004
    The comment has been removed
  • Anonymous
    April 30, 2004
    "Unless there is a huge technological jump in datastore technology, this is not feasible."

    The necessary ingredients are present already. It is just that people haven't realized.
  • Anonymous
    April 30, 2004
    Transparent persistence is a buzzword anyway. I see some O/R mapper vendors claim their solution is transparent. But it's always transparent from a certain point of view and not transparent from another.

    This question: "In other words, can the domain model developer design types for a particular business problem without regards to the fact that the objects will be persisted to a permanent data store?" illustrates a major flaw in the domain model idea. "Can the designer create classes without the knowledge of where the instance' contents come from?" No s/he cannot. The reason for that is that the class instance is nothing, the data INSIDE the instance is something, so the class itself is the hull, the data inside is what's important.

    Example: entity. What's an entity? Is it (A) a class definition? (B) an instance of a class, (C) a definition of fields/attributes or (D) a bunch of data?

    A 'customer' is not a class, nor an instance of a class, it's a bunch of data, identified by the 'unique identifying attributes', which are part of the bunch of data.

    The data forming the customer, or the entity, are stored in a table in the database, or in an instance of a class. The table and the instance are instances of entity definitions, so they're entity definition instances. The table's DDL and the class definition are the entity definition.

    Why is this important in this context? Well, because what's important, what's it all about, is the data inside some container, be it a class instance or a table. So, does it really matter that (A) the data is stored in an object of type A, B or C? or (B) how the data gets there?

    No, not at all. And here's where the domain model goes the wrong route: it focusses on the entity definition and the entity definition instances, however those are just containers, it's the data inside these which is important. So, the question "In other words, can the domain model developer design types for a particular business problem without regards to the fact that the objects will be persisted to a permanent data store? ", is true, no matter what, because all s/he does is designing containers where data, the real entities, are stored and manipulated. So does it matter how these data is entering these containers or how the data is stored in its real reality, the database? No.

    Which makes the whole discussions about transparency in O/R mappers, true non-intrusive code and other academic discussions irrelevant.

    Besides that, transparent persistence doesn't exist by definition (even we claim some sort of transparency in the LLBLGen Pro marketing, hey, it's marketing ;)) as the in-memory entities (that is: the data) are never the reality. So you can never directly mirror your actions on the 'in-memory mirror' of the entities to the reality where they live in: the database, there is always a delay PLUS you always work with an incomplete mirror in-memory (Thread A loads customer C and his orders at time T. Thread B creates at T+t a new order for C. A doesn't see that order, and thus by definition can't say C's orders in memory are THE orders, so actions on C's orders or C based on those orders should always be seen as actions in a given period of time, which implies an end of the time period the actions are performed, which implies an 'end of actions' marker, like a call to Save() or 'PersistChanges()' or what have you. That call implies non-transparency.
  • Anonymous
    April 30, 2004
    " "Can the designer create classes without the knowledge of where the instance' contents come from?" No s/he cannot. "
    I meant: the designer of the classes always has to know that the data is coming from somewhere and also has to understand that the data is what's important, not the classes, so the designer has to design the classes as hulls, as containers, where data is placed into, which implies that the data has to be persisted and read, thus has to take into accound that functionality.
  • Anonymous
    April 30, 2004
    Hmmm, re-reading what I wrote, I completely misformulated my point. Sorry about that, Andrew (It looks like I switch position in the same posting, aaaarg). I shouldn't write complex replies on the early morning...

    The scentence: "is true, no matter what, because all s/he does is designing containers where data, the real entities, are stored and manipulated. So does it matter how these data is entering these containers or how the data is stored in its real reality, the database? No."
    is very badly formulated. I meant: if you think about the data, you don't end up with the domain model, you end up with designing classes which will be containers (of functionality for example) and which will be entity consumers. For these classes it doesn't matter. For the domain model, it does matter, see my correction in the posting above this one.

    And now first some coffee. :) Sorry again, Andrew for messing up your reactions.
  • Anonymous
    May 01, 2004
    Ya, what he said.
  • Anonymous
    May 02, 2004
    If anyone is interested in transparently transparent persistence ;) check out the C# port of the Hibernate O/R framework... The project stalled for a while but seems to have started up again as strategies emerge for handling dynamic proxies.

    http://nhibernate.sourceforge.net
  • Anonymous
    May 03, 2004
    So, you can say that delayed loading really violates one of the Indigo Pillars ;)
  • Anonymous
    May 03, 2004
    I really don't see why hibernate makes persistance any more transparent.
  • Anonymous
    May 03, 2004
    Some of this post seems to be mixing separate concerns into the same discussion. I see at least three very distinct concerns: Distribution of objects, transparent representation of object graphs for storage to a datastore, and transaction managment boundaries. Each of these concerns have different approaches with pros and cons. From my experience in the Java world working with Toplink (since 1999) and Hibernate each imply tanspancey of the domain model from knowledge of how it is mapped and cruded to the underlying database. Does this mean we can casually design models without regards to distribution or txn mgmt? No, having put several systems into production you HAVE to consider the other two factors. However, do I do that when I'm analyzing the domain model? No, only when I look at the system requirements for distribution do I need to consider whether I need to use a pattern such as transfer object and its subppattern of entity inherits transfer object to separate distribution from what is mapped. In terms of txn mgmt, I want the persistence engine to provide the calls to designate the demarcation of a txn which then can be managed at the service boundary which has the "controller" knowledge to do the correct thing. In the case of an external txn mgr, I'd like the engine to have pluggable behavior such that it delegates txn mgmt to the external mgr and does the correct thing (e.g. join an existing txn if present and the attributes specified don't override this, etc.).
    In terms of the dealyed load holder or code injection, both are fine with me (given Toplink uses the holder approach and hibernate used the code injection/replacement) since using the former is a design decision that is known when you develop the implementation and hence is not really an issue (esp since this would be hidden by the property for the designated field).
    Hence, for me transparency is about:
    1) not constraining the problem domain model's transistion into the solution domain model.
    2) Allowing mappings that support physical schema decisions (e.g. need to support aggregate mappings, selection of partial properties/fields to be mapped, transformation mappings, rollup inheritance mappings, etc.)
    3) Allowing the mapping to be determined at runtime through configuration (i.e., I can change how classes are mapped without necessarily changing the class which allows schemas to have some degree of freedom to change/evolve).
    4) Provide framework around txn mgmt so that developers learn a simpler interface instead of all the details need to deal with local and distributed txns.
    It is my hope that ObjectSpaces and other commerical persistence engines will look at the successful tools on the java side and add similar functionality on the .net side so that we have the option to apply the same tools in .net systems.