Dela via


Discussion about API changes necessary for POCO:

Evolving an API to support new requirements, like POCO, while maintaining backward compatibility is challenging.

The following design discussion from members of the Object Services team illustrates some of the issues and hard calls involved.

Have a read, and tell us what you think.

In particular are we missing something, or overstating the importance of something? Let us know...

Anyway over to Diego and Mirek...

POCO API Discussion: Snapshot State Management

What is in a snapshot?

In Entity Framework v1 there is a single way for the state manager to learn about changes in entity instances: the change tracking mechanism is set in such a way that the entity instances themselves notify the state manager of any property change.

This works seamlessly if you use default code-generated classes, and it is also part of the IPOCO contract for developers willing to create their own entity class hierarchies.

For version 2, we are currently working on a snapshot-based change tracking mechanism that removes the requirement from classes to send notifications, and enables us to provide full POCO support.

The basic idea with snapshots is that when you start tracking an entity, a copy of its scalar values is made so that at a later point – typically but not always at the moment of saving changes into the database - you can detect whether anything has changed and needs to be persisted.

The challenge

When we created the v1 API, we made a few assumptions that were totally safe with notification-based change tracking but don’t completely prevail in a snapshot change tracking world.

We now need to choose the right set of adjustments for the API for it to gracefully adapt to the new scenarios we want to support.

Mainline scenario: SaveChanges() Method

In notification-based change tracking, by the time SaveChanges is invoked, the required entity state information is ready to use.

With snapshot thought, a property by property comparison needs to be computed for each tracked entity just to know whether it is unchanged or modified.

Once the snapshot comparison has taken place, SaveChanges can proceed normally.

In fact, assuming the process is triggered implicitly on every call to SaveChanges, a typical unit of work in which the program queries and attaches entities, then modifies and adds new entities, and finally persists the changes to the database, works unmodified with POCO classes:

Category category = context.Categories.First();

category.Name = "some new name"; // modify existing entity

Category newCategory = new Category(); // create a new entity

newCategory.ID = 2;

newCategory.Name = "new category";

context.AddObject("Categories", newCategory); // add a new entity

context.SaveChanges(); // detects all changes before saving

Things get more complicated when a user wants to use lower level APIs that deal with state management.

State Manager Public API

ObjectStateManager and ObjectStateEntry classes comprise the APIs that you need to deal with if you want to either get input from, or customize the behavior of Entity Framework’s state management in your own code.

Typically, you use these APIs if you want to:

  • Query for objects that are already loaded into memory
  • Manipulate the state of tracked entities
  • Validate state transitions or data just before persisting to the database
  • Etc.

As the name implies, ObjectStateManager is Entity Framework’s state manager object, which maintains state and original values for each tracked entity and performs identity management.

ObjectStateEntries represent entities and relationships tracked in the state manager. ObjectStateEntry functions as a wrapper around each tracked entity or relationship and allows you to get its current state (Unchanged, Modified, Added, Deleted and Detached) as well as the current and original values of its properties.

Needless to say, the primary client for these APIs is the Entity Framework itself.

ObjectStateEntry.State Property

The fundamental issue with snapshot is exemplified by this property.

With notification-based change tracking, the value for the new state is computed on the fly on each notification, and saved to an internal field. Getting the state later only encompasses reading the state from the internal field.

With snapshot, the state manager no longer gets notifications, and so the actual state at any time depends on the state when the entity began being tracked, whatever state transitions happened, and the current contents of the object.

Example: Use ObjectStateEntry to check the current state of the object.

Category category = context.Categories.First();

category.Name = "some new name"; // modify existing entity

ObjectStateEntry entry = context.ObjectStateManager.GetObjectStateEntry(category);

Console.WriteLine("State of the object: " + entry.State);

Question #1: What are the interesting scenarios for using the state management API in POCO scenarios?

Proposed solutions

In above example there are two possible behaviors and it's not obvious for us which one is better:

Alternative 1: Public ObjectStateManager.DetectChanges() Method

In the first alternative, computation of the snapshot comparison for the whole ObjectStateManager is deferred until a new ObjectStateManager method called DetectChanges is invoked. DetectChanges would iterate through all entities tracked by a state manager and would detects changes in entity's scalar values, references and collections using a snapshot comparison in order to compute the actual state for each ObjectStateEntry.

In the example below, the first time ObjectStateEntry.State is accessed, it returns EntityState.Unchanged. In order to get a current state of the "category" entity, we would need to invoke DetectChanges first:

Category category = context.Categories.First();

category.Name = "some new name"; // modify existing entity

ObjectStateEntry entry = context.ObjectStateManager.GetObjectStateEntry(category);

Console.WriteLine("State of the object: " + entry.State); // Displays "Unchanged"

context.ObjectStateManager.DetectChanges();

Console.WriteLine("State of the object: " + entry.State); // Displays "Modified"

DetectChanges would be implicitly invoked from within ObjectContext.SaveChanges.

Pros:

· User knows when detection of changes is performed

· DetectChanges is a method that user would expect to throw exceptions if some constraint is violated

· This alternative requires minimal changes in Entity Framework current API implementation

Cons:

· Since DetectChanges iterates through all the ObjectStateEntries, it is a potentially expensive method

· User has to remember to explicitly call DetectChanges() before using several methods/properties, otherwise, their result will be inaccurate:

o ObjectStateEntry.State

o ObjectStateEntry.GetModifiedProperties()

o ObjectStateManager.GetObjectStateEntries()

· This alternative implies adding a new method to ObjectStateManager

Alternative 2: Private ObjectStateEntry.DetectChanges() Method

In the second alternative, there is no public DetectChanges method. Instead, the computation of the current state of an individual entity or relationship is deferred until state of the entry is accessed.

Pros:

· User doesn't have to remember explicitly calling DetectChanges to get other APIs to work correctly

· Existing API works as expected in positive cases regardless of notification-based or snapshot tracking

· No new public API is added

Cons:

· The following methods now require additional processing to return accurate results in snapshot:

o ObjectStateEntry.State

o ObjectStateEntry.GetModifiedProperties()

o ObjectStateManager.GetObjectStateEntries()

· Existing API works differently in negative cases in notification-based and snapshot tracking:

o some of the existing methods would start throwing exceptions

or

o would require to introduce a new state of an entry – Invalid state (see below for details)

Question #2: What API pattern is better? Having an explicit method to compute the current state based on the snapshot comparisons or having the state to be computed automatically when accessing the state?

How this affects other APIs

While the State property exemplifies the issue, there are other APIs that would have different behaviors with the two proposed solutions.

ObjectStateEntry.GetModifiedProperties() Method

GetModifiedProperties returns the names of the properties that have been modified in an entity. Similar to the State property, with notification-based change tracking, an internal structure is modified on the fly on each notification. Producing the list later on only encompasses iterating through that structure.

With snapshot, the state manager no longer gets notifications, and at any given point in time, the actual list of modified properties really depends on a comparison between the original value and the current value of each property.

Therefore, for alternative #1, this APIs will potentially return wrong results unless it is invoke immediately after DetectChanges. For alternative #2, the behavior would be always correct.

ObjectStateManager.GetObjectStateEntries()

This is the case in which an implementation detail that was a good idea for notification-based change tracking stops offering performance benefits in snapshot. Internally, ObjectStateManager stores ObjectStateEntires in separate dictionaries depending on their state. But in snapshot, any unchanged or modified entity can become deleted because of a referential integrity constraint, and any unmodified entity can become modified.

In alternative #1, DetectChanges would iterate thought the whole contents of the ObjectSateManager, and thus it would update the internal dictionaries. Once this it done, it becomes safe to do in-memory queries using GetObjectStateEntires the same way it is done today.

In alternative #2, GetObjectStateEntires would need to look in unchanged, modified and deleted storage when asked for deleted entries; also in unchanged and modified storage when asked for modified entities.

Querying with MergeOption.PreserveChanges

In Entity Framework, MergeOptions is a setting that changes the behavior of a query with regards to its effects on the state manager. Of all the possibilities, PreserveChanges requires an entity to be in the Unchanged state in order to overwrite it with values coming from the database. In order for PreserveChanges to work correctly, accurate information on the actual state of entities is needed.

Therefore, with alternative #1, querying with PreserveChanges will not have a correct behavior unless it is done immediately after invoking DetectChanges. For alternative #2, querying with PreserverChanges would behavior correctly at any time.

Referential Integrity constraints

When there is a referential integrity constraint defined in the model, a dependent entity may become deleted if the principal entity or the relationship with the principal entity becomes deleted.

For alternative #1, DetectChanges would also trigger Deletes to propagate downwards throw all referential integrity constraints.

For alternative #2, finding out the current state of an entity that is dependent in a RIC would requires traversing the graph upwards to find if either the principal entity or a relationship has been deleted.

Mixed Mode Change Tracking

At the same time we are working on snapshot change tracking, we are also working on another feature, denominated dynamic proxies. This feature consists of Entity Framework automatically creating derived classes of POCOs that override virtual properties. Overriding properties enables us to inject automatic lazy loading on property getters, and notifications for notification-based change tracking in property setters. This introduces a subtle scenario: when using proxies, it is possible that:

a. Not all properties in the class are declared virtual. The remaining properties need to still be processed using snapshot.

b. Sometimes, non-proxy instances of POCOs and proxy instances of the same type have to coexist in the same ObjectStateManager. For the former, it will have to use snapshot tracking. For thelatter, a combination of snapshot and notifications.

All in all, it becomes clear that the actual state of an entity does not entirely depend on the value of the internal state field, nor on the snapshot comparison, but on a combination of both.

ObjectStateEntry.SetModified()

As with mixed mode change tracking, SetModified() requires a combination of the internal state and the snapshot comparison to return valid results.

Handling of invalid states

When working in notification-based change tracking, Entity Framework throws exceptions as soon it learns that entity key values has been modified. With the default code-generated classes, the exception prevents the change for being accepted.

For alternative #1, DetectChanges can throw exceptions if some model constraint (i.e. key immutability) is violated. It is too late to prevent the value from changing.

Example:

Category category = context.Categories.First();

category.ID = 123;  // Modify a key property. This would throw if category wasn't a POCO class.

context.ObjectStateManager.DetectChanges(); // Throws because key property change was detected.

For alternative #2, reading the state of modified properties from an entity with modified keys could throw an exception:

Example:

Category category = context.Categories.First();

category.ID = 123;  // Modify a key property. This would throw if category wasn't a POCO class.

ObjectStateEntry entry = context.ObjectStateManager.GetObjectStateEntry(category);

Console.WriteLine("State of the object: " + entry.State);

// Throws because key property change was detected.

Getting an exception thrown here would be unexpected. But there is an alternative design that is to define a new EntityState that indicates that an entity is Invalid. This new state would account for the fact that POCO classes per se do not enforce immutable keys.

Since EntityState is a flag enum, Invalid could be potentially combined with other states.

SaveChanges would still need to throw an exception if any entity in the state manager is invalid.

It would be possible to query the state manager for entities in the Invalid state using GetObjectStateEntries method.

Question #3: Is it better to have an Invalid state for entries or should the state manager just throw exceptions immediately every time it finds a change on a key?

Our questions:

1. What are the interesting scenarios for using the state management API in POCO scenarios?

2. What API pattern is better? Having an explicit method to compute the current state based on the snapshot comparisons or having the state to be computed automatically when accessing the state?

3. Is it better to have an Invalid state for entries or should the state manager just throw exceptions immediately every time it finds a change on a key?

---

We really want to hear your thoughts on the above questions.

Alex James
Program Manager,
Entity Framework Team

This post is part of the transparent design exercise in the Entity Framework Team. To understand how it works and how your feedback will be used please look at this post .

Comments

  • Anonymous
    August 01, 2008
    PingBack from http://blog.a-foton.ru/2008/08/discussion-about-api-changes-necessary-for-poco/

  • Anonymous
    August 01, 2008
    I certainly hope this isn't replacing ADO.NET.  I do appreciate the rewrite and there are some drawbacks to the existing ADO.NET paradigm but I think it has evolved quite nicely with the introduction of LINQ to DATASET. I hope you realize this is duplicating many of the ideas and features that already exist in DataSets.

  • Anonymous
    August 02, 2008
    The Entity Framework will not replace ADO.NET--it builds upon ADO.NET, and we will support both.  There will continue to be reasons for using ADO.NET.  It might be that this is just the style of coding you prefer or it might be that you want to use the EF but occassionally need to go around the EF and talk more directly to the DB or whatever. The EF has its place and so does ADO.NET (including DataSets).

  • Danny
  • Anonymous
    August 04, 2008
    The comment has been removed

  • Anonymous
    August 09, 2008
    Entity Classes & Architecture Patterns Part of the Entity Framework FAQ . 2. Architecture and Patterns

  • Anonymous
    August 09, 2008
    Entity Classes & Architecture Patterns Part of the Entity Framework FAQ . 2. Architecture and Patterns

  • Anonymous
    August 11, 2008
    The comment has been removed

  • Anonymous
    August 11, 2008
    The comment has been removed

  • Anonymous
    August 11, 2008
    "I’ve had some issues with my comments not posting when logged in. Finally got a chance to type this again..." Yeah I've had this every time I try to post on the EF blog, tried 4 times on one post and nothing appeared. Very frustrating.

  • Anonymous
    August 13, 2008
    Add the ability to require a POCO entity to be explicitly 'registered' for snapshot tracking (including its relationships?).  E.g. POCOs may not all come from the context anyway.  This obviates the performance problem of DetectChanges having to iterate through too much, because you only have to iterate through what is registered (only a subset of entities may need to be modified). Snapshot (or notification) state management must support true 'savepoints' to model nested units of work (e.g. popup cancelable modal edit dialogs in a UI, or any type of undo to a pre-specified savepoint). Following on from the above, state managements need to support TransactionScope - i.e. implicit savepoints and rollbacks to them on a TransactionScope rollback.  Ideally to be able to participate in distributed transations. State managements need to be extensible (interface based), with events to hook or methods to override, and also be fully replaceable. DetachChanges() should be public and not throw exceptions for validation changes (separate structure or method for querying that).  EDM constraints are only a subset of an overall validation model so don't have exceptions dictate an implementation approach (i.e. the output of DetectChanges() is valuable with or without EDM validation errors). (Jason says this earlier.) Making and restoring (overwriting an entity) of snapshots of entities should be able to use a custom 'serialization' mechanism if desired, e.g. to support customization of the 'restore' process (talking about an undo or 'rollback' here) for any custom code that needs to be run (e.g. (re)setting any transient properties). POCO scenario:  'context-free', save-pointable, TransactionScope-aware, efficient querying of what is modified. Cheers, -Mat Hobbs

  • Anonymous
    August 15, 2008

  1.  I have had times where it was not enough to know that an object was changed, but I actually needed to know what changed.  The fact that the object changed, means that it needs to be persisted; however, if I have another system that only cares about certain changes, i wouldn't want to notify it of these changes unless it was necessary.
  2.  Interesting dilemma.  From a pure academic point of view, I would say that since ObjectStateEntry.State is a property, best design practices would say it shouldn't do any heavy lifting (should be deterministic).  However, in this case, I think the best decision is to break this rule.  I just think its dangerous to return wrong values to a developer who is ill aware of the new changes to the API.  This is almost worse than breaking the interface, it preserves it, but alters the functionality.  So alternative #2 seems better to me.  One potential way to speed things up is to take a checksum of the scalar fields of an entity and store it along with the state.  That way when subsequent calls are made to properties / methods that require the recomputation of state, it could be done faster, since if the checksum was the same, you would know that there were no changes from the previous computation.  Just an idea.   #3 - I don't have strong feelings on this one either way you go.  Since its too late to correct anything, it seems like throwing an error would be a good thing.
  • Anonymous
    August 21, 2008
    Q1, Q2 and Q3 To manager business I need sinals to make decisions at any time, no matter scenarios. If I'm working with business objects (tables) and it have components (columns) I need to know the states of both at same time and together, no matter the Tier(WCF,ASP,WinForm, etc.). IMHO, IPOCO is the best option but without to implement many interfaces. I don't like the idea that another object(ObjectStateManager) control the states of objects.No problem to "identity management" but wrong for "State management". If I'm bad I say to my doctor "I'm bad.." and not my doctor say to me "you are bad". The state of me and any part of my body can't live out me. I think like it to object too. So, I would like to see in my objects : myObject.State   ( readonly, already exists ) myObject.Undo()  ( undo the changes ) myObject.Columns["MyColumn1"].State     ( readonly, the state on my column ) myObject.Columns["MyColumn1"].OldValue  ( readonly, the old value of my column ) myObject.Columns["MyColumn1"].Value     ( readwrite, the current value ) myObject.Columns["MyColumn1"].Undo()    ( undo the changes ) myObject.MyColumn1                      ( readwrite, the current value ) I'm using this approach with "Linq to Sql" and work very well. Now, I would like to "migrate" my objects to EF. My table will be like it :    [Serializable]    [DataContract]    public class MyIPocoClass : IEntity    {        public MyIPocoClass()        {        }        private EntityState _state;        [DataMember]        public EntityState State        {            get { return _state; }        }        [NonSerialized]        private EntityColumns _columns;        [XmlIgnore]        public EntityColumns Columns        {            get            {                if (_columns == null)                {                    _columns = new EntityColumns();                    _columns.PropertyChanging += new PropertyChangingEventHandler(OnPropertyChanging);                    _columns.PropertyChanged += new PropertyChangedEventHandler(OnPropertyChanged);                }                return _columns;            }        }        public event PropertyChangingEventHandler PropertyChanging;        public event PropertyChangedEventHandler PropertyChanged;        protected virtual void OnPropertyChanging(object sender, PropertyChangingEventArgs e)        {            if (PropertyChanging != null)            {                PropertyChanging(sender, e);            }        }        protected virtual void OnPropertyChanged(object sender, PropertyChangedEventArgs e)        {            if (PropertyChanged != null)            {                PropertyChanged(sender, e);            }        }        [DataMember]        public string MyColumn1        {            get { return Columns.GetValue("MyColumn1") as string; }            set { Columns.SetValue("MyColumn1", value); }        }

  • Anonymous
    August 22, 2008
    The comment has been removed

  • Anonymous
    August 27, 2008
    Talking about snapshots - my wish for your next release is to see a rollback mechanism which supports TrasactionScope. if the scope fails - the context returns to what it was before the transaction.

  • Anonymous
    August 27, 2008
    I created, like many others, entity classes derived from the database tables/Sp's.  I also implemented the "State" field (modified, deleted etc) and latched that to events used to implememt the databinding interfaces. Like someone else has said knowing it's changed is sometimes not enough, you need to know what has changed.  You could argue that if that is the case use a DataSet which implements the "before/after" model. This of course has the drawback in that it doubles that amount of data being passed around. The other area that needs consideration is DataBinding at the client.  Various interfaces allow data binding to "undo" changes (start/endedit) and synchronise code/screen updates through their respective managers.  Databinding works very well with datasets but not so well with Linq to SQL (I understand) so maybe some lessons there. My problem was with databinding and serialization. To support Databinding I implemented the required interface and the notification events. Tied into those properties I modifed the "state" at the same time. Of course the properties that raised these events are called by the serializer.  So during a web service call "everything" becomes "modified". Being able to pass "work in progress" across serialization boundaries and have the state maintained is vital. Ideally the properties need to know when they are being called by the serializer and in .NET2 the XML serializer is not very helpful, there are no "before" and "after" events to latch onto.  The Binary serializer and WCF both implement these now I think. My "cludge" was to event a "first" and "last" property to set and clear a flag and fix the "field order"!! These properties would only ever get called when serializing (I hope) so could enable and disable the change events.  With WCF is can de-gludge my code I think. One thing I considered was using a Property/Hashtable type structure rather than the usual Property/Private field.  Optionally using 2 hashtables to store before and after.  The 2nd hashtable storing only fields that have changed and storing the original value.  Of course you need a serializable Dictionary<> object which I wrote too. I like the hastable approach, you only store twice at the field level when the field changes.  Any records in the 2nd table means "modified".  No changes means no XML when serialized unlike a DataSet which always stores everything twice so doubles the XML created. Where I work data load across the network is a "big thing", hence our interest in WCF and Binary serialization!! Breaking "hassle free" databinding is a reason we haven't touched Linq for SQL. My story is simply pointing out that serialization, data load size, network performance and support for client Data Binding must all be factored into any decision.

  • Anonymous
    August 27, 2008
    I forgot to say that Dictionary<> objects do serialize the "key" which creates big XML too.  One reason I looked at it and then didn't bother!

  • Anonymous
    September 09, 2008
    The comment has been removed

  • Anonymous
    December 11, 2008
    In a DDD project, there will usually be s service object responsible for persist the domain object to the data store, such as UserSerivice.SaveUser(user) etc. So, why just let the consumer of the EF tell EF to persist the entity, no tracking, no comparison or notification. We just need something like _context.SaveObject(object), _context.DeleteObject(object) or _context.InsertObject(object). It is easy to implement and easy to use.

  • Anonymous
    January 02, 2009
    I would really love to see the equivalent of "Fetching Strategies" that appear in NHibernate. In following a DDD approach, POCO along with Fetching Strategies would allow me to inject and specify what my loading intentions are, whether they be to eager fetch of lazy load. I could today build some kind of interpreter perhaps that allows my Repository to build a chain of .Includes for eager fetching but lazy loading is currently extremely difficult to accommodate in this respect. Essentially what I would love to see is the ability to create single fetching strategies that embody my loading intentions either eager or lazy. For example in LINQ To SQL I was able to achieve this with eager loading in this fashion: public interface IFetchingStrategy {    /// <summary>    /// LINQ To SQL DataLoadOptions    /// to use on the given DataContext.    /// </summary>    DataLoadOptions LoadOptions { get; } } /// <summary> /// Marker Interface /// </summary> /// <typeparam name="TRole"> /// Specification of a role for /// dyanmic runtime type lookup. /// </typeparam> public interface IFetchingStrategy<TRole> : IFetchingStrategy{ } THE IMPLEMENTATION: public class CustomerLoyaltyDiscountFetchingStrategy : IFetchingStrategy<ICustomerLoyaltyDiscount> {    private readonly DataLoadOptions _loadOptions;    #region IFetchingStrategy<Customer,ICustomerFaxChange> Members    /// <summary>    /// The DataLoadOptions used for    /// eager loading.    /// </summary>    public DataLoadOptions LoadOptions    {        get { return _loadOptions; }    }    #endregion    #region Constructor    /// <summary>    /// The constructor which sets the fetching strategies    /// eager loading options.    /// </summary>    public CustomerLoyaltyDiscountFetchingStrategy()    {        _loadOptions = new DataLoadOptions();        _loadOptions.LoadWith<Customer>(c => c.Orders);        _loadOptions.LoadWith<Order>(o => o.OrderLines);    }    #endregion }

  • Anonymous
    March 02, 2009
    @Q1 and Q2: I think approach 1 of always giving correct results is the correct way to go forward with snapshots. It preserves existing code, provides correct results, seems like a win-win to me. @Q3: Throwing an exception is just limiting the user even more. There won't be any choice except try{ } catch(blah b){ } around the whole thing. Another approach would be to fail gracefully, populate an Errors collection instead.