Share via


Auditing Data Changes in the Entity Framework: Part 2

In my previous post I described the basic concept behind my auditing approach using the Entity Framework, and covered some of the problems I encountered. This post focuses on the solution I went with; do feel free to comment if you have any thoughts.

Audit Lifecycle

To get to the root of my requirements I wrote down the lifecycle of a piece of data, and what needed to be tracked against it.

Action

Audit Requirements

Record Added

The date it was created, by whom, and what the original values are.

Record Modified

* repeated many times

The date it was changed, by whom, and which values were changed.

Record Deleted

The date it was deleted, and by whom.

The problem is that creating an audit record is tricky based on point (2) in my previous post. Therefore, I chose to record the original values when each change is made. This means I actually track the following;

Action

Audit Requirements

Record Added

The date it was created, and by whom.

Record Modified

* repeated many times

The date it was changed, by who, and what the pre-change values were.

Record Deleted

The date it was deleted, by who, and what the pre-delete values were.

This has effectively flipped the approach on its head, but I can still get a point-in-time view of the data whenever I need it. Therefore my data model becomes something like this;

 

When a new Product is added, the CreatedBy and CreatedDate fields are recorded on the Product entity. When changes are made, the previous unchanged version of the entity is copied to the ProductHistory table, and then the new Product entity is saved. When a Product is deleted, it is removed from the Product table and a copy of its last values is saved to ProductHistory.

This means my ChangedDate field on ProductHistory could arguably be described better as “ExpiredDate”, as it is the date and time that the data on that record became out of date.

Enabling Auditing on the ObjectContext

I described in my previous post how I would use the SavingChanges event to create audit records. To wire this up, I’ve created an extension method that can be called on an ObjectContext;

public static void Audit<fromType, toType>(

    this ObjectContext context,

    Func<IDataRecord, EntityState, toType> mapping,

    Action<toType> addToContext)

{

    context.SavingChanges +=

        new EventHandler((o, e) =>

            CreateAuditRecord<fromType, toType>(

                context, mapping, addToContext));

}

Calling this method sets up auditing for one specific type of entity by adding an event handler to the SavingChanges event. The handler invokes a method called CreateAuditRecord.

You might think it would be better to set all audit types up at once, or you might want to configure auditing in the ObjectContext’s OnContextCreated partial method... and that should be easy enough. The point of this code is not to be final, but that it should be easy for you to understand and adapt.

So why have I used two type parameters, a Func<T,U,V> and an Action<T> on the Audit method then? It comes down to points (4) and (5) in my previous post; I wanted auditing to be very explicit about how it behaved, and I wanted to avoid passing string values around to identify field names or entity sets.

To demonstrate this, consider the following example of how I would enable auditing on the Product entity, saving changes to ProductHistory.

using (MyEntities db = new MyEntities())

{

    db.Audit<Product, ProductHistory>(

        (record, action) => new ProductHistory() {

            Id = record.Field<Product, int>(f => f.Id),

            Description = record.Field<Product, string>(

                f => f.Description),

            Price = record.Field<Product, double>(f => f.Price),

            CreatedDate = record.Field<Product, DateTime>(

                f => f.CreatedDate),

            CreatedBy = record.Field<Product, string>(

  f => f.CreatedBy),

            ChangedBy = "Simon",

            ChangedDate = DateTime.Now,

            ChangeType = action.ToString() },

        (ph) => db.AddToProductHistory(ph));

    // TODO: make some changes to Product here...

    db.SaveChanges();

}

In my Audit method I specify the source entity (Product) and target audit store entity (ProductHistory) as type parameters. This then enables me to easily create a strongly typed lambda for the first parameter that I know accepts an IDataRecord (“record”) and an EntityState (“action”), returning a ProductHistory record;

(record, action) => new ProductHistory() {

    Id = record.Field<Product, int>(f => f.Id), ... <snip>

The purpose of this method is to map a Product data record to a ProductHistory entity. It has an IDataRecord input as I pass the OriginalValues to it, not the full Product entity (remember I’m saving the previous values in my audit table, not the new ones). Therefore this code creates a ProductHistory record, and initialises its properties using object initialiser syntax. If this code became too unwieldy I could easily factor it out into a helper method.

Next, you’ll notice that I’m using another extension method named “Field” that applies to the IDataRecord type. I use this to fetch a property value in a strongly typed way, and to make it easy to get at fields that represent an entity property.

In this case, I’m specifying that I expect the IDataRecord to contain fields that represent Product’s members. I’m also saying the particular property I’m after is an integer, and then I use a lambda to identify the property itself (Id in this case).

The second parameter to my Audit call looks like this;

 (ph) => db.AddToProductHistory(ph)

This is because I need to know how to add audit records that are created to the ObjectContext, ensuring they are saved as part of the same SaveChanges transaction as the actual changes we’re tracking. The lambda receives a ProductHistory entity (which is known because of the “toType” type parameter in my call to Audit) which I choose to add to the ObjectContext by calling AddToProductHistory.

For this task I could have used the more general AddObject method available on an ObjectContext – but this needs a string value to identify the entity set, and I want as much compile time checking as possible.

The Mechanics of Auditing

Now you see how we setup auditing, let’s dive into the CreateAuditRecord method to see how it actually creates the audit trail.

First, the method retrieves a list of ObjectStateEntry objects that describe entities that have either been modified or deleted. Remember that my approach doesn’t record anything in the audit table when they are added;

IEnumerable<ObjectStateEntry> entities =

from e in context.ObjectStateManager.GetObjectStateEntries(

EntityState.Modified | EntityState.Deleted)

where

e.IsRelationship == false &&

typeof(fromType).IsAssignableFrom(e.Entity.GetType())

select e;

This also ensures that we only get results for entity types that are of type “fromType”, or inherit from it. Arguably this would be more efficient if it looped through all changes looking for the audit configuration for each type as it went... or there may be other desired behaviours for inheritance heirarchies – but that is out of scope for this post, and it should be easy for you to see how you might change it.

Next, we loop through all of our results;

foreach (ObjectStateEntry item in entities)

{

    toType auditRecord =

        mapping(item.OriginalValues, item.State);

    addToContext(auditRecord);

}

... and it is just a case of using the supplied helper functions to map the OriginalValues IDataRecord to a ProductHistory (in this example), and then add it to the ObjectContext. Of course these helper functions are in the form of the lambdas we passed to our call to the Audit extension method.

Retrieving Point-in-Time Products

This design is based on the assumption that I will rarely need to retrieve audited data, as it requires some calculations and slightly complex SQL to complete. This is fine, but if you were retrieving audit data very frequently I would consider doing some further testing, with a view to optimising the approach.

So how do I get a Product record as it looked at 3pm on the 20th April 2009, for example?

Simply put, to get the correct data fields for a given time I need to find the first ProductHistory record after the specified date & time. If there are none, I need the values on the Product record (as this means no changes had been made to the record at that point in time). Let’s break that down.

First I need to get the first record from ProductHistory for products that were created before the specified date & time, but were changed after it...

SELECT

      Id, [Description], Price, CreatedBy, CreatedDate

FROM

      (

      SELECT

            Id, [Description], Price, CreatedBy, CreatedDate,

            ROW_NUMBER() OVER(ORDER BY ChangedDate) AS DateOrder

      FROM

            ProductHistory

      WHERE

            CreatedDate <= @PointInTime

      AND ChangedDate > @PointInTime

      AND Id = @Id

      ) A

WHERE

      A.DateOrder = 1

Next, I need all Products that have no ProductHistory records that fall into the above category, but were created before the specified date & time;

SELECT

      Id, [Description], Price, CreatedBy, CreatedDate

FROM

      Product

WHERE

      NOT EXISTS (

            SELECT * FROM ProductHistory ph

            WHERE ph.Id = Product.Id AND

                  CreatedDate <= @PointInTime

                  AND ChangedDate > @PointInTime)

      AND CreatedDate <= @PointInTime

      AND Id = @Id

Put these two query sections together using a UNION statement and we have the content of my [GetPointInTimeProduct] Stored Procedure (see the code download), which allows us to find the exact state of a Product given its identifier and a specific date and time. I’ve brought it into my model using a Function Import.

The Good and the Bad

I hope all that has made sense – as usual I’ve included the code as a download (standard disclaimers apply). There are of course some pros and cons to my approach – some that spring to mind are below. If you have comments on better ways to do this, or thoughts about my approach, feel free to chip in!

· We can easily record the username of whoever made the changes as we can deduce it in our C# code; even in a trusted subsystem model.

· The approach is easy to configure in a compile-time checked fashion. It doesn’t rely on possibly flawed convention (e.g. always store audit records in tables ending “Audit”). It should also therefore be easy to refactor to suit your needs.

· Mappings between entity and audit records are explicit, so it does not dictate too much about how you would do this. You could store completely different audit data to me if you’d like.

· The audit records participate in the ObjectContext’s Unit of Work; that is, they are saved in a batch of SQL statements with the data changes. It also means that due to the ordering and batching of updates that the likelihood of locks and deadlocks is arguably reduced when compared to some other approaches.

· The configuration of auditing for each type independently keeps the code clear, but introduces some inefficiency (such as many subscribers to the SavingChanges event, and many loops through the ObjectStateEntry collection).

· The data model, although simple, needs explanation. The history records may not work quite how a newcomer would assume.

· The complexity of the point-in-time SQL is slightly higher than I would like.

· [Edit] It is important to use some kind of concurrency checking to ensure that multiple audit records are not written by different users... I tend to use a timestamp with “Fixed” concurrency.

· [Edit] The biggest drawback of the code so far is that it doesn’t record changes to relationships... so it is suited well to resource data tables, but not so well to those involved in a complex model.

On balance, overall I like the approach. What do you think?

Note: Regarding the edits, I just reread this and realised I’d oversimplified for this post and missed two important points from my notes... that’ll teach me to type up blog posts late at night! Sorry.

AuditingWithEfx.zip

Comments

  • Anonymous
    April 20, 2009
    PingBack from http://www.anith.com/?p=30345

  • Anonymous
    May 27, 2009
    Trying your approach on a schema involving Table Per Type Inheritance and using Entity Framework. Problem: Base type table columns "CreatedBy" and "CreatedDate" appear as duplicate because inheriting type table(s) also have these fields. Possible solutions:

  • Rename base type table columns "CreatedBy" and "CreatedDate"
  • Assume no auditing on base types. Merge columns in base type table history into inherited type table history. Eg. Table BaseType
  • Column1
  • Column2 Table InheritedType
  • Column3
  • Column4
  • CreatedBy
  • CreateDate Table InheritedTypeHistory
  • Column1
  • Column2
  • Column3
  • Column4
  • CreatedBy
  • CreatedDate
  • ChangedBy
  • ChangedDate
  • ChangeType What do you think?
  • Anonymous
    May 28, 2009
    @ Merritt, that's an interesting one. I wonder if you could make the base type abstract, and then have two types inheriting from it... so for example you could have;
  1. Chair (abstract)
  2. ArmChair (concrete, adds no columns other than audit columns CreatedBy etc...)
  3. CollapsibleChair (concrete, adds other columns and audit columns) You would then have 2 matching audit history tables that match ArmChair and CollapsibleChair. What do you think? I'm making some sweeping assumptions about your model of course! Simon
  • Anonymous
    June 01, 2009
    My solution for TPT: Table BaseType
  • ColumnX
  • CreatedBy
  • CreatedDate Table InheritedType
  • ColumnY
  • ColumnZ Table BaseTypeHistory
  • ColumnX
  • CreatedBy
  • CreatedDate
  • ChangedBy
  • ChangedDate
  • ChangeType Table InheritedTypeHistory
  • ForeignKey_To_BaseTypeHistory
  • ColumnY
  • ColumnZ
  • Anonymous
    June 01, 2009
    @ Merritt; that seems to make sense; let me know how you get on with it. Simon

  • Anonymous
    June 02, 2009
    Working brilliantly so far. I reworked your solution a bit so that the consuming code doesn't need to know about the auditing mechanism. Basically, everything is done within the SavingChanges event handler, and the mapping logic is defined within the entity itself (for the sake of encapsulation). This required creating some basic interfaces, but it's bit hard to explain quickly in words, so I'll send you some sample code if you are interested.

  • Anonymous
    June 14, 2009
    In my previous post I described the basic concept behind my auditing approach using the Entity Framework,

  • Anonymous
    June 14, 2009
    In my previous post I described the basic concept behind my auditing approach using the Entity Framework,

  • Anonymous
    June 18, 2009
    @ Merritt; that's great to hear - pleased it has helped. I suspected many people would prefer to wrap it up into the context itself, so that sounds logical. Simon

  • Anonymous
    June 26, 2009
    Meritt, Kindly plz post your implementation; would be of gr8 help

  • Anonymous
    July 01, 2009
    I was hoping to create a codeplex project of using simon's basic strategy with the addition of context wrapping, but haven't gotten around to it yet. However, if my last post isn't clear enough, I will answer any additional questions posted.

  • Anonymous
    July 23, 2009
    @Meritt I trying to do something like this with one of my applications that I am working on at the moment.  I would love to see how you did the mapping on the entity itself, rather then on the context. Would you be able to explain it some more. Thanks, Nathan

  • Anonymous
    August 05, 2009
    Simon, How does your code handle foreigh key? For example, if your PRODUCT table has many foreign keys like locationID etc., how do you handle update audit when product has been altered? Thanks! Kris

  • Anonymous
    August 06, 2009
    @ Kris, It doesn't... hence my very last comment about not recording changes to relationships. I didn't need to do this in the context of when I was writing this post. I'm ashamed to admit that I've been intending on writing then blogging an update to do exactly what you describe, but it's gotten left lower down the priority list! Briefly, the way I view it is slightly different to how others have explained it to me (although as I said, I've not proven my thoughts yet); I would detect a change in a relationship, and notify one individual entity of the change. This works well with my model described in this post, as when "enabling" auditing for a particular entity, I could also supply a list of relationships to watch. All I would then need is to invoke a callback to that entity that supplies slightly different information about what changed. As the responsibility to write the audit record is with the callback this has little impact on my framework; instead I just need to add a database column or similar. I hope that makes sense and helps - if you do give it a go I'd love to hear how you get on. You never know, I might get around to trying this sooner or later :-) Simon

  • Anonymous
    August 07, 2009
    Thanks Simon for your info! Hopefully I can read your solution on your next blog. Kris

  • Anonymous
    April 06, 2011
    The comment has been removed