What's An Entity, Anyway?

These days, I seem to be encountering a lot of entities. Not in the sense of non-corporeal beings as usually depicted in certain science fiction TV shows, but in the sense of data structures. Sometimes, they are called business entities.

Although the concept of entities differ from project to project, I think I have identified at least one common trait of all the entities I come across: They contain (structured) data, but no behavior. Usually, these entities are being consumed and manipulated by something called the business logic. In some cases, entities are even used to transfer data from one layer of an application to the next layer (some people then call them data transfer objects). Since architecture diagrams with vertical columns adjacent to layers appear to be much in vogue these days, I'll use one as an example:

The idea here is to have a single definition for data that spans multiple levels so that you only have to write the data structure implementation once. The code in the different layers interact with the entities: The data access layer creates and stores the entities, the business logic layer modifies the data, and the UI layer presents the data. Pretty clean architecture, right?

No.

So what's wrong with it? First of all, what does the name entity tell us? Nothing, really. Entity is a synonym for object, but surely, the term business objects is so last year that any self-respecting architect would never use such a term. On the other hand, an object with structure but no behavior sounds awfully familiar.

Your code takes one or more structures of data as input, operate on them and outputs other structures. Fowler calls this pattern a Transaction Script; I call it procedural programming, and since I have had my experiences with this programming style early in my career, I never want to go back. Domain Model is where it's at.

In Patterns of Enterprise Application Architecture, Fowler wrote that "a Data Transfer Object is one of those objects our mothers told us never to write." While the pattern itself is valid, it's only supposed to be used for communication across process boundaries, not across layers in the same process.

If you are still not convinced about my arguments, let's take a look at an example. Imagine that you want to model a product catalog. Since we are modeling with entities, we create Product and Category classes. Both are just dumb classes with default constructors, read/write properties, and no behavior. To decouple data access, we also define a data access interface:

 public interface ICatalogDataAccess
 {
     Category ReadCategory(int categoryId);
  
     Product ReadProduct(int productId);
 }

Implementing this interface is fairly straightforward, and goes something like this:

 using (IDataReader r = this.GetProductReader(productId))
 {
     if (!r.Read())
     {
         throw new ArgumentException("No such product.", "productId");
     }
  
     Product p = new Product();
     p.ProductId = (int)r["ProductId"];
     p.Name = (string)r["Name"];
     p.ListPrice = (decimal)r["ListPrice"];
     p.Discount = (decimal)r["Discount"];
     p.InventoryCount = (int)r["InventoryCount"];
  
     return p;
 }

This code is actually fairly benign - trouble only starts to appear in the business logic layer. Imagine that we need the business logic to implement the calculation of the discounted price, and whether the product is in stock (yes, rather inane business logic, I know). Since the Product entity is just a structure without behavior, it's necessary to create another class to implement this business logic:

 public class ProductOperator
 {
     private Product product_;
  
     public ProductOperator(Product p)
     {
         this.product_ = p;
     }
  
     public decimal DiscountedPrice
     {
         get { return this.product_.ListPrice - this.product_.Discount; }
     }
  
     public bool InStock
     {
         get { return this.product_.InventoryCount > 0; }
     }
 }

Now you are left with the problem of how to pass this information on to the next layer.

One alternative is to create an abstraction of ProductOperator (say; IProductOperator) and pass that to the next layer together with the Product entity. That approach can quickly grow quite unpleasant, since each layer adding content to the entity needs to define yet another auxiliary class to be passed along with the ProductOperator and the Product entity.

Another alternative is to model the Product entity to include properties for this information from the start. That would mean that the data access component would fill in only the properties of the Product entity that comes from the database, and a variant of ProductOperator would then fill in the DiscountedPrice and InStock properties in the business logic layer:

 public partial class ProductOperator
 {
     public ProductOperator()
     {
     }
  
     public void UpdateDiscountedPrice(Product p)
     {
         p.DiscountedPrice = p.ListPrice - p.Discount;
     }
  
     public void UpdateInStock(Product p)
     {
         p.InStock = p.InventoryCount > 0;
     }
 }

Beware: Here be dragons.

One problem with this approach is that you'd end up with a lot of properties whose values may or may not be null (DiscountedPrice and InStock, in this case), so you always need to check for null before reading and using a property value.

The other problem with this design is that it railroads your components into a particular usage scenario. In the end, you model the entity in order to communicate it across your process boundary (via a UI, service interface, etc.). This boundary has a particular usage scenario; e.g. you need to show product information in a UI. Such a usage scenario then becomes the driver for the entity structure: You need to show the discounted price, so you need a property for that, etc. If you need to display product information in another screen, you include properties for this screen as well. In the end, you end up with a data structure that carries around a lot of data that may or may not be used in any particular scenario.

There are lots of nicer ways to pass data between layers in extensible ways, and in a future post, I'll describe one such approach.

Comments

  • Anonymous
    June 18, 2007
    While I hope that my previous post made it clear that Data Transfer Objects are not my first choice for

  • Anonymous
    June 19, 2007
    I think in the case of DTO the best option would be to use extended methods (.NET v3.5). There are two primary benefits, one, extended methods would/can be contextual to the layer. Also, because all we need are derived properties (which are/should be read-only), they don't need to be stored - saving memory. Cheers

  • Anonymous
    June 20, 2007
    Agreed. My post on Layered Architecture problems centers on exactly the same point. One solution is to define the entity interfaces in the lowest layer, however that doesn't (by itself) solve the issue of how the UI Layer can create a new entity. This leads us to the creational aspects of Dependency Injection (DI), which I wrote about in Careful how you inject those dependencies. I think that its just fine to use (DI) with entities / domain objects. I've done it so as to allow custom fetching strategies for getting them from the database. In short: +1

  • Anonymous
    June 21, 2007
    Hi Rishi Thank you for your comment. Although extension methods are nice, I don't agree that they fit in this scenario. If you wrote an extension method to, say, implement DiscountedPrice, you'd essentially be writing business logic into your extension method. The point about this exercise is decoupling, so you don't want your business logic invading the next layer up. This means that you can't use the (business logic) extension methods in your UIP layer, but then you don't have any interface to extract the DiscountedPrice any more. Using extension methods only in the business logic layer would not be very meaningful either, as the implementation would be entirely transient and the data would not be available to the next layer - not even in abstract form.

  • Anonymous
    June 21, 2007
    Hi Udi Thank you for your comment. Your post on layering mirrors mine very nicely :) As you probably know, there are lots of better ways to model the abstractions between layers. For the fun of it, my next post outlined a better approach, but when you really think about it, there should be no correlation between the objects in each layer. If you have a Product class that you share between the DAL and the BLL, extending that Product class to be shared between the BLL and the UIP layer ties all three layers together. You will not be able to modify the Product abstraction without impacting the UIP layer, and that may not be what you want to do. Ideally, you should be able to vary abstractions independently between layers, which means that the Product abstraction used between the DAL and the BLL should have no relation to the Product abstraction used between the BLL and the UIP layer.