Workflow Persistence in the Windows Server AppFabric
What does the persistence subsystem do?
The persistence subsystem is responsible for making a workflow instance durable. Durable workflow instances can be unloaded from memory and reloaded on the same or a different machine at a later point in time. This way, the persistence subsystem enables scenarios such as long-running workflow applications, increase system load by unloading idle instance, migration of instances between machines, scale out and load-balancing, and recovery of failed instances.
The persistence subsystem performs the following tasks:
· Storing the state of workflow instances in a persistence store.
· Recovery of durable workflow instances.
· Activation of durable workflow instances with expired durable timers.
· Instantaneous reactivation of durable workflow instances after a shutdown or crash of the Workflow Service Host.
Persistence for a workflow is enabled by defining an instance store for the Workflow Service Host. The .NET 4 framework comes with the SQL Workflow Instance Store, which is a SQL Server implementation of an instance store.
When a workflow instance persists, the SQL Workflow Instance Store (SWIS) saves the current instance state into the persistence store together with additional metadata that is required for activation, recovery and control. After the instance is persisted, the Workflow Service Host can unload the instance from memory. At a later point in time the Workflow Service Host may instruct the instance store to load the instance again. For example, the Workflow Service Host will automatically reload the workflow instance when a new message arrives for that instance or if any of the instance's timers expire.
Persistence is triggered in multiple ways:
· Some activities persist the instance. The SendReplyToReceive activityhas a PersistBeforeSend property, which can be set to persist the Workflow state before a reply is sent.
· The user defines the Workflow Idle behavior and sets the PersistOnIdle time. When specified the Workflow Service Host persists the instance after the instance has been idle for the specified time. An instance can go idle when it is waiting on a receive or delay activity.
· The user defines the Workflow Idle behavior and sets the UnloadOnIdle time. When specified the Workflow Service Host persists and unloads the instance after the instance has been idle for the specified time.
In addition to these mechanisms, the workflow can contain explicit persistence activities. Those explicit persist activities are only required to persist a workflow throughout a long episode of computation or to guarantee that certain well-known persist points are present.
How does the persistence subsystem work?
The persistence subsystem consists of four parts:
1. Persistence Framework API A set of persistent interfaces that define the contract with Workflow Service Host to persists its state. These enable you to build any type of persistence provider from database backed to say distributed in-memory cache.
2. The SQL Workflow Instance Store (SWIS) implements the abstract class InstanceStore of the Persistence Framework API. This class is used by the Workflow Service Host to create, load, save and delete durable instance data.
3. The SQL Server Persistence database stores all the durable instance state. It also stores the additional metadata that is used to activate and recover a service instance.
4. The Workflow Management Service (WMS) is a Windows Service that activates a Workflow Service Host whenever there are unloaded instances with expired durable timers, instances that need to be reactivated after a graceful shutdown or instances that need to be recovered after a crash. The WMS is also involved in the execution of instance control commands, which will be covered in a future blog post.
All workflow instances that are hosted by a Workflow Service Host that configures the SQL Workflow Instance Store save their instance state in the SQL Server Persistence Database. In addition to the instance state binary blob, the persistence store holds the following:
The message correlation key allows the Workflow Service Host to correlate an incoming message to a service instance if that instance is not loaded.
The instance lock indicates whether the service instance is loaded.
The service deployment information defines how the service is deployed in a IIS/WAS-hosted environment. It consists of site name, application path, service path and service name. Note that the instance store does not contain any deployment information for self-hosted workflows, meaning that the WMS is specifically tailored for IIS hosted services.
Loading and locking
If a workflow instance is loaded by a Workflow Service Host the persistence store locks that instance, and the instance cannot be loaded by any other service host. If the service host unloads the service instance the lock is released, and the instance can be loaded by a different service host. The new service host may reside on a different machine. This means that a workflow instance can run on multiple machines throughout its lifetime, whenever activation messages arrive. It also means that you can build a machine farm without employing an intelligent message router that remembers which machine is running a particular workflow instance. Instead, the router routes a message to any of the machines in the farm. If the message is correlated to an existing workflow instance, the instance will be loaded by that machine. Casually speaking, the service instance follows the message.
Processing of expired timers
If a workflow instance is executing a delay activity at the time it persists the instance store stores the expiration time of the activity. At the time the activity expires the SQL Workflow Instance Store notifies the Workflow Service Host, which then loads the workflow instance and processes the expired delay activity. If multiple machines are available, the instance may be loaded on any of these machines.
If no Workflow Service Host is running on a particular machine the WMS will activate a host on that machine. This causes the instances of a particular workflow type to be distributed among all machines in the farm.
Instance recovery
If a service host has loaded a workflow instance the Workflow Service Host must renew the instance's lock on a regular basis. If not renewed on time, the lock expires. An expired lock indicates that the service host or the machine the service host ran on has crashed. In this case another Workflow Service Host will load the instance. If no other Workflow Service Host is running the WMS will activate a host.
If the Workflow Service Host shuts down (e.g., due to an app domain recycle) it releases the locks of all the service instances it has loaded.