Jaa


EDA: Avoiding "coupling on the name"

A friend of mine, Harry Pierson, is a great thought-provoker.  I'm a big fan of thought provokers.  Pat Helland is another, as are David Chappell and Martin Fowler.  Harry has been asking me to make sure we build a layer of indirection into our message addressing system (which I agree with, but haven't been really thinking about).

So this morning, as I was just waking up (literally), I thought about my recent post on Event Driven Architecture and whether to use events or documents.  Both models share a key behavior: send out a message without any idea of who will pick it up. 

So what if no one picks the message up?  Is that an error? 

Let's say I have a system to handle a call center for financial services or telco.  When a customer calls on the phone and asks to be enrolled in "Heavily Advertised Program ABC," there may need to be three or four systems that interact to make that real. 

In an async EDA world, we want to disconnect the behavior of one system from another in time.  The orchestration can happen over time and location.  But the connected process has to occur based on an orchestration that is not on the local machine. 

Harry asked me if the sending app would have to know the identity of the receiving app, because it would be an error if the receiving app doesn't get the message.  Personally, I think this is coupling on the name.  The sending app knows the name of the receiver.

Harry asks me to consider using a 'logical name' of the receiver.  The sender contacts a logical end point, the addressing infrastructure turns that into a physical end point, and we still have decoupling. 

Honestly, I like it but I think it is insufficient.  What if we need to contact 20 downstream systems in a complex workflow, but I don't want a single "orchestration coordinator" to be a bottleneck (or single point of failure).  I don't want to hand the orchestration off from my app to a central orchestration hub. 

How about this: when I send a document, we begin a handshake between my app and a local agent. 

My app: I have a document.  Please tell me what you are going to do with it.

Local agent: Thank you for contacting your local routing agency.  Let me see your document.  (I look in my (cached) instructions for handling that document).  Here, I will return, to you, a routing slip.  Now you know everyone that will see your document (using logical names).  Error handling criteria is included in the slip, so you know what the infrastructure thinks an error is.  Is this list OK?

My app: (I examine at the routing slip.  It meets my criteria for handling).  I approve the transmission of my document, but I have marked up the error handling criteria... I want it to be an error if the 'logical order handler' doesn't pick up the document in 10 seconds.

Local agent: I'll get it started.  I'll assign your logical name to the message, and the endpoint for your callback service, so if someone wants more information, they know who to contact.  (Local agent sends both the message and the routing slip.  Agents on other systems follow the instructions in the routing slip, which may include calls to a dozen other systems along the way).

One thing that is not clear from this use case: The "approval" step is not required.  The sender (My app) could have chosen to simply trust the infrastructure in the first place.  That's a valid option.   

A couple benefits for this handshake:

  • The calling system doesn't know the names of all of the document's collaborators. It knows the logical name of one collaborator that it cares about (logical: orderhandler). 
     
  • The calling system gets local override over error handling criteria on a per-message basis.
     
  • The calling system doesn't know it's own name.  That comes from the infrastructure. 
     
  • The recipient system, when they want to call back for more information, have all the information in the document, as inserted by the infrastructure.  The recipient doesn't have to know the name of the 'sender of information' either... it comes in the message. 
     
  • Workflow coordination happens at the agent, not in a separate infrastructure.  Talking from point A to point B still involves one (and only one) message, not two as would be required in a typical hub-and-spoke model. 

Of course, these ideas are not new.  This is right out of the ESB playbook, and why shouldn't it be? 

The point is that the caller doesn't know much about the orchestration... preferably they know nothing at all, but they have the RIGHT to know if they want to, and there is no 'central authority' that slows you down or decides things for you.  There is an agent, working on your behalf, on your own system.  Under the agent's covers: .Net Workflow (WF). 

Note: to my friends who love Biztalk, we still need many of the capabilities of Biztalk... like transformation and the adapters to SAP and such.  The routing infrastructure can call Biztalk when it is important to do so.  When it is not important, it won't.

A management console is still needed for someone to manage routing information.  The local agents download updates when they get an event indicating that routing has changed.  The point is that the management point is not a bottleneck.  It is just an endpoint that also interacts on the same infrastructure.

Comments

  • Anonymous
    August 12, 2007
    PingBack from http://msdnrss.thecoderblogs.com/2007/08/12/eda-avoiding-coupling-on-the-name/

  • Anonymous
    August 12, 2007
    Hi again Nick, Firstly, the handshake solution you have described for me is not a EDA, if I understand what you mean! Another problem for me is that the you are pushing the coupling to the agents fronting the services, especially through their involved communications channels. Also, in EDA we should be trading in events and handshakes? And, I would rather that we have a well-defined orchestration coordinator sitting between the business flows than 10/20 service agents. Another thing, if the agent is co-ordinating activities between your service and various other external services, is it really still a local-agent – to me if it is outside the service, it is not local anymore! It by defacto becomes an orchestration coordinator – your single point of failure, so to speak! And if you have multiple agents orchestrating each from their end, you simply multiply the complexity of the system and cost of changing the system. Now, how can we handle your major requisites by events:

  1. If the requirement is for a 10 second pick up, then we could use a CEP engine with a rule that an if a corresponding event isn’t raised within 10 seconds, raise an alert or another known event to signify the logically error/requirement. This could as easily be handled by a scheduling agent (by the workflow), I’m myself using an a WCF based agent for such related purposes.
  2. If you need to dynamically adjust error handling criteria, then isn’t it best to keep it out of the application’s logic and straight into the hands of the middlewear like a "orchestration coordinator"?
  3. As for handling error, we could again define an error event that the service in question could subscribe to and handle its end. In addition the error handling, at the level of the process, can be handled accordingly by the orchestration coordinator
  4. Also, your routing information or instructions (as per the routing slip) could be described in the event’s message headers with the orchestration engine working according to the prescribed rules.
  • Anonymous
    August 13, 2007
    So, I am totally thrilled in my new role at Neudesic , where today, I got to spend quite a bit of time