Distributed System Versioning Myth #1

One rather large pain point with distributed systems is versioning.  There are things you can do that make this problem better and things you can do that make it worse.  The worst thing you can do is pretend that this problem does not exist.

Myth: Loosely coupled systems don’t require versioning

Every piece of data that flows between things in a distributed system has both syntax and semantic.  Syntax can be communicated with schema, contracts and the like.  Syntax tells you that you are going to receive a string, an Int32 and a DateTime.  It tells you the names of the fields (which provide some clue to semantic) and it might even communicate rules regarding validation (max length, not null etc.)

What syntax does not communicate is meaning, that is the world of semantic.  Suppose you receive an Int32 named Code.  What does it mean?  The syntax cannot tell you the meaning of Code.  To get to the semantic, you must get to a human and communicate with the human to develop a shared understanding of what Code is. When you consume data which contains this value we call Code you take your understanding and spread it across the behavior of your app.

When we say that systems are loosely coupled what we typically mean is that we have a standards based way of exchanging data between the systems and (possibly) some form of contract that describes the syntax.  And today the trend is even moving away from describing the syntax – JSON just shows up at your door with a bag of mystery data which may change at any time.

Truth: Every copy of data carries with it a need to understand what it is and how I can reason about it

Messages that flow between things in a distributed systems are copies of data.  The copy carries with it the syntax and semantic of the people who created it.  When we say this piece of data is V1 we are using the version as a tool to describe both syntax and semantic.  V1 is the label we use to describe our understanding of the syntax and semantic for that data.  This label is both helpful and useful because we know that at some point in the future there will be a V2.

When V2 arrives we have to apply a new understanding.  There may be new syntax or new semantic to consider as we process the data.  It is possible that V2 won’t change in any significant way but it is quite possible that the changes will be very substantial. 

Any time I get data from some external source (web service, socket, file, database, cache etc.) I need to know what version I am dealing with.

Really?  That sounds extreme doesn’t it.  Perhaps, but consider the alternative chaos of dealing with data incorrectly because I can’t tell that the syntax or semantic has changed.  Let’s just be honest for a moment shall we?  If your system blows up when it encounters data it does not understand don’t blame the developer.  Blame the architect who designed a world where the developer would encounter data with no way to reason about the version of that data.

So what shall we do then?

  1. Any message sent to any other element in your distributed system needs to have some way to describe what version (syntax & semantic) it came from. 
  2. Any code that receives this message from any external source (file, socket, database, web service, queue, etc.) should verify that the message matches the version understanding it expects.  If V1 code receives a message from V2 what then?  You can do what you want at that point, but the good news is that you know what you are dealing with.
  3. When the syntax or semantic of the data changes, increment the version number

The day will come when you (or some poor soul who comes after you) will need to make a change.  This change will always be painful to some degree.  The degree of pain inflicted at that point will depend largely on how thoughtful you were when it comes to this issue.

If you want to see a good example of implementing this in WCF DataContracts look at the WCF / WF Service Fault and Validation Example and open the ServiceFaults.WCF.Contracts / GetDataRequest class

Happy Coding!
Ron Jacobs
https://blogs.msdn.com/rjacobs
Twitter: @ronljacobs https://twitter.com/ronljacobs

Comments

  • Anonymous
    April 15, 2011
    You say: "Messages that flow between things in a distributed systems are copies of data" what I understand is that "every message in transport (in DS) is copy". So where is original from which copy has been made? It is uncealr to me...