The Problem of Versioning

The voting on our site What do you want to see on endpoint.tv is moving along nicely.  Yes we are going to do something about OAuth with the HttpClient but today I want to think about the #2 issue which is versioning with workflows.  This post is foundational thinking.  As we move forward to consider solutions I want to have this foundation to build upon.  Feel free to add you thoughts as we consider versioning

What do we mean when we say “Version”? 

Version is an artificial construct that gathers together some behavior of a system and stamps it with an identifier that makes it understandable.  Today I’m writing this blog post on Windows 7 (a product name, not a version).  The actual version of Windows I’m using is Windows 6.1 (Build 7600).  Someday in the near future I will install service pack 1 which represents a set of changes to the behavior that will be applied to my system.

Why bother?

We bother with versioning because it is necessary for managing the complexity of components that have to interact with each other.  If I create a component and you want to use it then there has to be an agreement about the things my component does at some given point in time.  The name we use for that agreement or understanding is a version.

Change Happens

Over time things change.  Perhaps I have a new requirement that my component must deal with or maybe I find a security hole or some other kind of code defect in my system.  I have to change my component.  Will you have to change yours? 

Maybe, or maybe not.  Not every change requires me to create a new agreement with you.  If your code can continue to use my component without any problem then my change is backward compatible.  On the other hand if my change will cause your component to fail or otherwise work incorrectly then my change is a breaking change.

Inside vs. Outside

Object Oriented Programming introduced the notion of encapsulation as a mechanism to deal with the problem of change.  If I can limit the ways in which you can know about or depend upon something I minimize the scope of things that cause breaking changes.  The things in my component that you can use directly we think of as the public interface.

Change on the Outside is Syntax

Syntax – from the Greek suntaxis meaning to “put in order”

If I change the public interface of my component I’m really just changing the syntax.  In a distributed component this means that the structure of the messages that flow between components will be different.  In some distributed technologies any change to syntax is a breaking change while others allow messages to change in some ways such as adding new fields without causing a breaking change.

Change on the Inside is Semantic

Semantic – from the Greek sēmantikos "significant"

If the change I am making involves the behavior of my component I am changing the semantic.  For example if I change a method so that it subtracts two numbers instead of adding them the syntax of the response may be exactly the same but the semantic is now very different.  The previous semantic was addition and now the semantic is subtraction.

Data Inside or Outside?

State – from the Latin status "way of standing, condition"

What about data stored in durable storage such as a file or database?  How does this data relate to the syntax and semantic of the system?  Most of us think as state as a private implementation detail of the system.  You probably don’t expect that others will read the state of your data directly from durable storage to manipulate it.  I think of it this way

state = data + syntax + semantic

In other words, the state you find in durable store is the result of a particular version of syntax and semantic applied to data.  Therefore the data you find in a durable store is version dependent. When you make changes to syntax or semantic your change may or may not be compatible with the data in the durable store.

Where does the pain happen?

Pain happens when

  • Syntax breaks compatibility with existing clients
  • Semantic changes break the understanding of the behavior of dependent components
  • New syntax or semantic meets old data from a durable store

State in the durable store is often more than just serialized data.  It is the result of the behavior and assumptions that wrote the data to the store.  You have to be very aware of the forces that shaped that data.  If you simply load it and resume processing you could be violating the intent that created that state in the first place.

Where do we go from here?

I’m inclined to deal with syntax then semantic and finally state.  Let me know if you think differently.