A distributed systems' logical data model
There's lots of different ways to describe data. I've seen data models that attempt to describe, conceptually, all of the data relationships for lines of business, marketing programs, fulfillment programs, etc. Conceptual data models are useful, primarily because they give you a starting point to work with the business to first understand, and then communicate, how the data can represent the business' requirements.
Normally, when creating a system, we drop down to a logical data model for that system. We indicate the "data on the inside" and the "data on the outside". Effectively, the diagram starts with a large 'box'. Inside the box are entities needed by the application. Outside are the entities that come from somewhere else but are referenced by the application.
One challenge, however, that appears to be stumping one of my team mates is how to create the conceptual model when there is not one system, but two or three systems that communicate. Effectively, we are talking about a distributed system, with distributed data. The data is not distributed because of geography, but rather in order to foster loose coupling.
This is a different way to look at the design of a system than is typically seen, but I feel pretty strongly that it is an important aspect, and one that we need to be fairly formal about.
I approach the systems from the standpoint of the business processes first, and the use cases second. For example, if you are creating a system that facilitates the creation of a standard business contract, it is entirely reasonable to break down the process into steps, where each step is performed by different roles.
First step would be to define a marketing or fulfillment program that the contract will be tied to. Second would be to create legal clauses that can be fit into the document. Third would be to create a template with rules for how the clauses are to be assembled for the particular contract type, and fourth would be to create the contract itself. Different people perform each step. Each step has distinct responsibilities. You could, if you wish, create a seperate system for each. In a SOA world, I think that you would create a set of services for each.
Each set of services is, in itself, an independent system. In order to remain decoupled, the data may be referential, but not coupled. Therefore, you may need to add a customer before you add an invoice, but there is NO reason that adding a customer should create data records directly in the order management database (I'm being a purist... Master Data Management is the 'reality' behind this situation).
So, if you are a developer who is used to creating a database with every bit of data that you think you will need in it, it can be quite a change to create not one, but many databases, bound together by master data that is copied locally on demand, and kept up to date by a cache engine (MDM).
Now, take one of those developers and ask him or her to create a data model that illustrates not "data on the inside" but "data in each room". That requires a different kind of thinking... because now, the problem of 'master data' becomes visible (and a little painful).
In this model, the Product data is brought across both for invoices and for shipments, but is it really the Product data that is in the shipments, or is it product and lot data. In other words, it is one thing to ask "who did we ship soap to," and another thing altogether to ask "who did we ship Lot 41 of tainted beef to?"
This distinction, between product and lot, becomes particular visible when you model your systems this way, but more importantly, you can see the lines that cross the boundary between systems, and you can place services on each line: get product, get lot, get invoice, get shipment
When designing the database, you will need to use a replication or cache or transactional store to insure referential integrity.
Comments
Anonymous
September 15, 2006
Great article Nick. This is something I have been trying to articulate for some time now. The idea of using the data from another system when you need it rather than designing for every piece of information and having serveral systems with the same or redundant data. I know that isn't exactly what you were sayiong, but that can often be the result when trying to integrate with systems that have been in place or are in place prior to the new project. It is also the case many times as small customers become larger. They have all of the personel data in some system that has it's own DBMS. So why do I have to design my db for that as apposed to just using theirs as sort of the master data and taking what I need from that when I need it.Anonymous
September 15, 2006
Hi Mike,
You are correct... this is what I am saying, although the conundrum that triggered this post is about new development. The problem is, as you correctly pointed out, the same: design to bring data in as you need it, in a decoupled manner, not with large data copies.
--- NickAnonymous
October 16, 2006
I hope you don't mind the criticism, but the manner in which you are modeling is typically something to avoid unless you have to do it, i.e. unless you are being forced to tightly couple (tie) together a set of systems, typically in a synchronous manner. The purpose of SOA is to hide many of the details that you are referring to here. In this example, you are discussing what I refer to as "Data As A Service", but it's just one specialized case of SOA. Peace.Anonymous
October 16, 2006
Hi Cameron, As for your opinion, I thank you for it, but I respectfully submit that you missed the point. By modeling a logical data model, I am not making any statement whatsoever about how the data will move across the boundaries... I am stating what "system" will own the data. We'd all like to think that SOA will create an environment of "brilliant pebbles" that are 100% decoupled, but that is absurd. 70% (or more) of all services are exposed by enterprise systems, not stand-alone service containers. This model helps the developers to understand how the data entities are related to one another in the enterprise systems and in which system they are mastered. That is essential to the notion of "data on the inside vs. data on the outside" that is explained by Pat Helland. Only after you understand this landscape is it possible to develop a large distributed business system that needs to interact across boundaries with a dozen other systems. That's not to say that there aren't SOA services that are justifiably 'stand alone' because there are. Workflow is a great example, and I can reel off a dozen more. But when it comes the non-infrastructure business data, these models are mission critical.Anonymous
October 17, 2006
The comment has been removedAnonymous
October 17, 2006
The comment has been removed