Futility of Object-Relational technologies

[アーティクル]
12/21/2003

OR (Object-Relational) technologies attempt to abstract OO developer from knowing intricacies of relational database. It does so by de-coupling the object definition from its persistence. Persistence layer will do the 'right thing' to persist information in the persistent store. It also knows how to fetch information from the persistent store. In general, most technologies use OO language types such as classes or in some case structs to map to tables or views with properties of the type mapped to columns or fields. This post discusses some of the short falls of such technologies.

One of the main issues with attempting to map relational tables to objects is that of flexibility. For example, think of application like Outlook which lets you choose whatever the fields you want to see (like Size, received time, subject etc). In fact this feature is called Field Chooser. I am sure most of the applications require end-users to select a view of data. Relational database lets you create views of a table through select statement. What is the equivalent in OO world? As far as I know there is no way one can create a view of an object statically, leave alone dynamic definition. OR guys may object to this, possibly saying that the application can ignore the fields that it doesn't want to see. This is a lousy argument at best. For example, think of an Employee object that has hundreds of attributes. Let's say that I want to search for an employee and all I want to see is the name, title and department. Let's say that the search returned with 1000 possible matches. If you use OR technology, you probably have 1000 employee object instances out there that contains all the attributes (even though the attributes may not be set) that the user doesn't want to see, but still occupying memory. This is not the optimal use of working-set. Again, OO guys might object to this by saying that in today's systems memory doesn't matter. In my opinion, it always matters and world class applications always use memory intelligently and spartan use of memory is always preferable.

Another related issue is that binding the result sets to presentation layer widgets such as grid or list controls. In the above said case, where the view is dynamic based on user choice, application developer has to write some code to map the object or object set to controls. Instead, if you use datasets, then you don't have to do any extra tricks to map to user controls.

Yet another issue is that of security. For example, certain users may have access to only certain fields. In this case again, application has to write additional amount of code to enforce the authorization.

There are other cases, where OR has no answers at all. For example, continuing the flexibility argument, there are cases, where an application have to send a free-form SQL query to RDBMS. I can think of situations where the users are provided with a flexible interface where users can choose what they want to see and they can define search predicate. One fine example is Query by example (QBE). The implementation might require the application to generate a dynamic sql on the fly that may involve one or more tables (again determined on the fly). I would like to call this dynamic 'aggregation' of data from many different tables. There is no way by which one can achieve this functionality using current OR technology. Objects/classes, by definition are inflexible data-structures. As of today, type system supported by the OO languages or the runtime they are based on, do not support this semantics.

Predominant implementations of OR technologies uses type system provided by the host language. For example, if you take Java or CLR or C++, classes are used to represent an object. Most of these languages or the runtime based on which the language compilers are developed, doesn't support dynamic modification of the types. The type system of OO languages is not flexible where as RDBMS is very flexible by its very nature. One can define a type in RDBMS system, define a view on it, define an aggregated view using the semantics of relational calculus. Sure enough, RDBMS requires huge amount of meta-data that one has to supply in the form of DDL. But, that is the power of RDBMS.

So, the question would then be, what are the desirable characteristics of the data-structure that represents relational information in-memory. The main characteristic, I would think of, is that the data-structure should be flexible - one should be able to make modifications to the structure on the fly. I can think of three possible data structures that fit this criteria. They are

Datasets
XML
Roll your own which will ultimately look like datasets.

Comments

Anonymous
December 21, 2003
Don't have time to post a full response but just wanted to say this:

"Another related issue is that binding the result sets to presentation layer widgets such as grid or list controls..."

It is just as easy to bind a collection of objects, be it a strongly typed collection or a simple ArrayList.

A final note, there's a place for everything, and everything has its place.
Anonymous
December 21, 2003
"As far as I know there is no way one can create a view of an object statically, leave alone dynamic definition"
Of course you can :) I call it a typed list, it's a view defined using entity fields. Check the tutorial: http://www.llblgen.com/pages/tutorial1typedlists.aspx

You even can create them at runtime if you want to.

Also, if you roll your own data structures, they will NEVER look like datasets, because the DataSet is bloated and works very different than most custom class hierarchies. For example, can you do this in a dataset: (order-products is an m:n relationship!) myCustomer.Orders[orderIndex].Products[0].Supplier.Name ?

Working with objects doesn't mean you give up the flexibility of a relational database: the data you pull out of the database is fixed in the resultset format anyway, be it stored in custom classes or datarow objects in a datatable...
Anonymous
December 21, 2003
My thoughts also!
Just to add a few more
1) ORM requires tight coupled clients
2) Fails to represent untyped relations ie instead of order-orderlines->products i want the products->orders) or products->(orders hidden)->customers
3) Fails to merge entities, ie in the above example i want a record that is both product and customer
4) AFAIK updates are complete (all fields are set) making triggers that depend on specific fields to fire for no reason
5) Fail to represent cell attributes, ie in a row list a column is updatable in one row but not in another (Dataset also fails here)

i have succesfully solve all these by following the 3rd option (Roll my own which looks like dataset)
Anonymous
December 21, 2003
Panos:
1) No it doesn't. In fact, it creates an abstraction that makes writing clients more generic instead of tighter coupled to databases (you don't have to know field names to access data in a resultset for example)
2) Also untrue. Most O/R mappers offer a sophisticated dynamic query engine which can create filters on objects based on any other object in the database by allowing to construct relation traversals and field predicates. Also, my example just showed Orders[0].Products, which is an m:n relation via Orderlines. :)
3) If you solely focus on objects, then indeed it's hard, however a decent tool allows you to formulate views on entities as well.
4) Most O/R mappers know exactly which fields are changed and construct using their dynamic query engines updates which are solely focused on updating only the changed fields.
5) Also untrue because a 'row' is in fact an object, an entity. So changing a field in one entity and not in the other is perfectly allright: the query engine constructs a query for updating that field of that entity and not the other.

If you do all this, your own classes will not look like datasets, because they offer so much more, especially when you're working with individual objects (like 'a customer').
Anonymous
December 21, 2003
"The implementation might require the application to generate a dynamic sql on the fly that may involve one or more tables (again determined on the fly). I would like to call this dynamic 'aggregation' of data from many different tables. There is no way by which one can achieve this functionality using current OR technology."
An 'entity' is a fixed definition, by definition. Every O/R mapper I know can query data based on filters which span multiple entities, sort it on the fly etc. This is achieved by the dynamic query engines implementeed in these tools. My own O/R mapper, LLBLGen Pro, allows you to create such a form as you suggest.

The resultset is a set of fields from different tables. I can define that resultset on the fly as well and store the result inside a datatable or other structure if you want. You know, the classes are static, but the field collections are not. Furthermore, working with entities is something else than working with random data. If you want to retrieve a random column set from a random set of tables based on a random set of predicates and joins, you can't work with entities anymore, at least not on the storage front, you REQUIRE a flexible, generic data bucket. However, is such a bucket a gain ? No, not at all. Passing such a datastructure to a GUI tier will require that the GUI code knows the column names inside the bucket, you need addressing like bla.Rows[i]["MyColumn"].Value, you can't use typed code, you can't rely on a compiler to check if your column names are correctly formulated.

Remember: it's not about what you can and can't do, it's about what you HAVE TO do and then if you can do it in the best possible way.
Anonymous
December 21, 2003
Frans:
1+5) What objects does the resultset contain ?
If they are instances of the Customer class then it is tight coupled (client must have the assembly with the customer class locally). This may sound ok for small schemas (<100 tables) but its unrealistic for medium (~1000 tables) or bigger schemas

If not then how can i invoke customer specific methods ? ie check if updating his name is permitted or delete him or view his balance sheet.

2) That solves the first level only (master table) but not its details. How about the final resultset to be products(master) -> customers (details) ?

4) I cannot understand this. Do you mean that foreach field in a ORM object a replica field exists that keeps the old value? Or is just a flag ? both answers (field or flag) have many problems ;)
Anonymous
December 21, 2003
Exchange is basically an object database, and it's defined a pretty decent syntax for working with objects this way. For example:

SELECT "urn:schemas:httpmail:to", "DAV:contentclass"
FROM "http://myexchangeserver/exchange/stevex/inbox/"
WHERE "urn:schemas:httpmail:subject" LIKE '%Some Subject%'
AND "DAV:contentclass" = 'urn:content-classes:message'
ORDER BY "urn:schemas:httpmail:date"
DESC

This will give you the properties you asked for from the objects that match the select.. the Exchange store can have any number of properties, scoped by namespace, on any object. It's pretty slick.

The SCOPE option lets you say which objects in the hierarchy to look at:

FROM scope('shallow traversal of "http://myserver/exchange/stevex/inbox"')

You can specify the depth to search etc.. check out (unwrap as required):

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/wss/wss/_exch2k_sql_web_storage_system_sql.asp
Anonymous
December 21, 2003
I don't think O/R mappers are ever going to be used in every case, but neither is any other technique applicable to every case. What I don't get, however, is why people keep saying O/R mappers are just "bad" because they don't do certain things when they obviously haven't done their research at all on the matter. My first thought when I read this post was that the author had a naive view not based on facts, and Frans has done a nice job trying to set them straight. But why make such bold statements about what O/R mappers cannot do when you haven't spent the time researching them in the first place? Something else for thought -- how many people that use more traditional methods actually do the things pointed out in these criticisms? I would venture that just like people can use O/R mappers to return or update too much data, so do other traditional development techniques. Most people write a single stored proc that returns all the columns and another that updates all the fields and how is that any different. Its actually a lot harder to avoid that using stored procs, since you either have to create multiple versions, or have all sorts of inefficient parameterized versions. Yes, you can solve all these problems with dynamic sql, but that's exactly the point of an O/R mapper -- it can create the dynamic sql for you better at runtime then you can yourself. Similarly with complex queries and the other concerns noted here. Again, O/R mappers aren't going to solve every problem, but they do allow you to avoid a lot of boring and redundant CRUD, while still providing more flexibility than you can usually do yourself without tons of code, and they also can allow you to seamlessly support multiple databases in many cases. So, end of my rant -- let's just start doing some basic research before making blanket statements about what something cannot do when the facts are obviously the other way around.
Anonymous
December 21, 2003
The comment has been removed
Anonymous
December 22, 2003
I've been contemplating the move to SOA style development for some time now, but recently I've taken the plunge. I'm talking, of course, about giving up OO as a way of life. Not everything is an object.

I have a lot to say on the subject, but my main focus will be on the impact SOA has for developing a system. That's right. One system. Not how to connect different systems using XML Web Services.

On that note, I'd just like to debunk a myth I ran into recently a la 15 Seconds : <a href="http://www.15seconds.com/issue/031215.htm">Realizing a Service-Oriented Architecture with .NET</a>. The following image appears to be "common knowledge" when talking about SOA. Anyone see the problem ?

Hint: its between the Business Objects and the Data/Persistence.

<img source="http://udidahan.weblogs.us/SOA/wrong_SOA.bmp" alt="wrong SOA" />

The problem is that persistence is ( or at least should be ) a service. The business objects themselves should not be aware of their persistence. Rather, the services above should use the persistence service in order to perform the work needed.

Instead of: myCustomer.Update();

Should be: Persistence.Update( myCustomer );

Why is this important ? Because it contributes to a separation of concerns ( see Agile Management - http://www.agilemanagement.net/Articles/Weblog/Separationofconcerns.html ). Obviously, putting the knowledge of persistence ( or, for that matter, anything else ) into business objects creates bloated objects that need to know everything about what can be done with them. I'm sure that this will just further inflame the great O/R mapping debate ( see <a href="http://weblogs.asp.net/ramkoth/archive/2003/12/21/44935.aspx">"Futility of Object-Relational Technologies"</a> and <a href="http://weblogs.asp.net/klaus.aschenbrenner/archive/2003/12/21/44983.aspx">"GENerative Object Mapping Layer"</a> ) but it is still true.

Coming up next:

How to do persistence with SOA.
Anonymous
June 24, 2004
OK. Both camps make very strong cases. However, if you’re about to build a big enterprise application and is the architect responsible for technology, design, tools, and so forth, you’re more interested in proven technology than low-level academic theory.

IMHO Fowler makes a very strong case for using «Domain Model» when you’re faced with complex business logic.

On the other hand, as a .net developer you find yourself in a situation optimized for «Record Set» and «Transaction Script»/«Table Module» (.net fw, vs.net, msdn, ...).

The latter makes a very strong case for these patterns given the technology domain.

I guess my point is that all ORM vendors seem to be one to five man/men shops (no offense Frans ;-) with unstable products, small user bases, and no strong customer cases.

Please, prove me wrong since I’m a very big fan of «Domain Model» in theory, but have been burned in practice.
Anonymous
June 24, 2004
The comment has been removed
Anonymous
June 24, 2004
The comment has been removed
Anonymous
June 24, 2004
The comment has been removed
Anonymous
June 16, 2009
PingBack from http://fixmycrediteasily.info/story.php?id=16100

次の方法で共有

Futility of Object-Relational technologies

Comments

その他のリソース