Freigeben über


Getting the Enterprise Canonical Data Model right

What is the correct level of abstraction for the Enterprise Canonical Data Model (ECDM)?

As I blogged before, the ECDM is used to decide what data should be passed through the integration infrastructure in the notifications that occur on business events.  The canonical schema that define "things" are all subsets of the ECDM (or extensions as well will see).

In some organizations, there are fairly few variations in basic 'things' like order, product, and agreement.  In other organizations, including Microsoft, the need for independent variation is more apparent.  As we move more toward "Software as a service," the number and types of products will only grow.  And what exactly is an order if we are using click-stream billing for a service call?  This will be fun.  So we need lots of flexibility as the business grows and changes.  An ECDM that is too prescriptive or too large can end up constraining the business' ability to grow and change.

There are basically two types of messages that need to rely on the ECDM: event notifications and full data entities.  Both are transitory, in that they state a fact at a particular point in time, but the event notifications are more transitory because they are only sent once across the infrastructure.  We need to be able to replay them, but (with the exception of BAM), we don't often query them.

In general, I'd say the rule for event notifications should be:

Communicate sparingly, communicate clearly, allow for questions.

Communicate sparingly: Define your entities to the minimum level needed to share "concepts" and "relationships" across the enterprise.  If an order happens from "company ABC" for 10,000 licenses of "product XSP" under marketing program "VLR", then the canonical schema for that order needs to be pretty short, and the event notification even shorter, so that receiving systems can decide if they even care.  Remember that your event system will send a LOT of events.  Keep them small but provide enough information for the recipient to decide if they need to know more.  So, perhaps the "order placed" notification has things like order id, customer id, partner id, reseller id, program id (sales are made under marketing programs) and a list of product categories that the items in the order represent.  That's it.  The receiving system can decide if they need to know more.

Communicate Clearly: The id's must be generic and enterprise wide.  If a receiving system gets a notification or a canonical element (like the full order), they have to be able to interpret it consistently.  That means that the systems listening for the events have to know what the ids mean and how to get more information on an id if they don't already have it.

Allow for Questions: the infrastructure needs to provide a generic way to ask the question: I need to know more about order 1234 to customer ABC on program VLC.

So if the needs of the event notification are for brevity and consistency, what are the needs for full data entities?

When a system gets an event notification, it will look at the event and decide if it cares.  Most of the time, it won't, and our use case ends.  Sometimes it will.  When it does, it needs to ask for full details of that data entity.  Perhaps it wants to store data.  Perhaps it wants to calculate something to append to the records for the customer, the partner, the reseller, the sales team that made the sale, or the product group that made the product.  Lots of reasons why the system getting the message will need more data.  We have the ability to 'ask questions' listed above, but that one comes to full data entities as well.

I'd say the rule for full data entities is:

Provide a complete document, at a point in time, allow for questions

Provide a complete document - the full data entity contains all of the data that the source system can share about it, including denormalized details about related entities.  For example, if I get an order as stated above, for 10,000 licenses for product XSP, we would provide the full "legal name" for the product and some attributes for the product (like the fact that it is a license, what country it is sold in, languages, product family id, etc). On the other hand, we don't want to constrain the business, so allow for optional fields in the semantics of the canonical object.  Allow a system that doesn't have a data element (like a price or even a quantity) to send the order anyway.  Also allow the system that is sending data to append 'system specific' data elements.  That way, a team can use the canonical model to send data to another closely related system in the same business stream, where those 'system specific details' can be understood and used.

At a point in time - Recognize that your documents are not static.  Provide dates and version numbers for each and every document and allow a document to be called back up on the basis of those dates and version numbers.  This is key to being able to recreate a data stream later in time, an operational necessity that is often overlooked.  So, yes, your order has a version number. 

Allow for questions: as complete as your order document is, it will still need to have codes in it referring to other things.  For example, each product may have a product family.  By including the product family code, you are stating this: "At the time this order was placed, product "Sharepoint" was part of the "Office Family" of products".  For some products, this may not change much, but for others, this could.  So you include the product family, but there is no need to include attributes of the product family.  The receiving system can ask for product family details of the same infrastructure if it needs to follow up.

Hopefully, with these simple guidelines, we can build the ECDM at the right level of abstraction.

Comments

  • Anonymous
    June 30, 2007
    From your perspective, would you define the canonical data model at the SOAP/XSD level (as in service contract) or are you talking full-blown enterprise data model as in having a single data store (ala SQL/MDM)?

  • Anonymous
    June 30, 2007
    The comment has been removed

  • Anonymous
    June 30, 2007
    Hi Nick Your post provides a good heuristic to determine notification granularity first of all. And tying it to a CDM would ensure every provider has a consistent vocabulary to deal with data elements. Neat post, thanks! Could you elaborate a little on the CDM itself? Is this an entity archetype that is communicated across the organization and fidelity to it enforced via best practice and governance? Where does one start on the CDM, if there are a hundred assorted data models, all loosely representing similar things? How does one version and manage this? Are there organizations that do this?! Any best practices you could share or point out? Cheers

  • Anonymous
    July 01, 2007
    @Mahesh You have asked a good question.  I'm not sure I have a 'right' answer.  I will say this: the ECDM cannot be formed centrally and communicated outward.  It has to be formed at the edges and communicated in.   Where one starts is with the business.  What does the business understand about their data.  If you don't start with the conceptual data model, the canonical data model is unattainable or fictitious.   Normally the business will use different terms for the same entities.  Most of the problem of developing the ECDM is in dealing with people, not technology.  

  • Anonymous
    July 01, 2007
    @Nick, That makes sense.  It seems like an attempt to bring together a common "contract" to say, 10k services in a large enterprise. Do you consider the EDM to be some form of executable contract (ala SOAP/XML, etc) or more an attempt at establishing a set of guidance and documentation across a large set of services. Evan

  • Anonymous
    July 01, 2007
    @Evan, Do I consider the ECDM to be an executable contract?  Part of one, yes.  There is the behavior of the service as part of a message exchange pattern that is entirely outside the ECDM, and then there are the data elements that I pass that are formed from entries within the ECDM. I view it as more than guidance.  It is a consensus, hopefully.  An agreement for excellence.  

  • Anonymous
    July 05, 2007
    The comment has been removed

  • Anonymous
    July 05, 2007
    @Anthony You use the term Business Object Model.  I use the term Conceptual Data Model.  We mean the same thing. Thanks for your contribution. --- Nick

  • Anonymous
    July 05, 2007
    The comment has been removed

  • Anonymous
    July 05, 2007
    @Anthony, Point well taken.  I was not using the term correctly.  You are right.  I should not use the term Conceptual Data Model to refer to the Enterprise Business Object Model. Thanks, --- N