Sdílet prostřednictvím


Two-Phase Commit (2PC): Coordinating Transactions in Distributed Environments

 There was a time when applications were isolated as a common issue, and having each one just one repository was usual

Sometimes, we needed to update diverse records to the repository, let's say a purchase order header with its purchased items. Those times, it was necessary that all that business movement was made in an "all or nothing" manner. That is, every record should be saved, or elsewhere no record saved at all was the best contingent option (raising an exception to advice that something went wrong and the business movement couldn't be saved). The fact is that, saving some records and leaving the rest unsaved drives to inconsistencies

So, to reflect such "all or nothing" policy, the concept of transaction emerged. Transactions are said to be acid :-) in the sense that they are

  • Atomic: a transaction must completely occur, or must not occur at all. As we commented before
  • Consistent: a transaction is the transition between two consistent states in the application and/or its data
  • Isolated: a transaction is independent of other transactions occurring at the same time
  • Durable: the state of the application or its data after a transaction occurrence, only can be changed by other transaction

Up to here, nothing uncommon. In the context of single back-end solutions (i.e. a database), if the back-end is transaction-enabled, applications just demarcate where the transactional scope starts and where it finishes. Saving for them, if abnormal conditions arise, the power to roll back a transaction to the former consistent state

But, nowadays, applications aren't isolated anymore. Today we encounter, as an usual feature, that enterprise applications are built upon building blocks. Let's imagine, for instance, a call center application able to add or remove services to a customer account, resend bills or invoices, accept different kind of payments (credit card, debit, banking transfer, etc)

Call center applications are frequently "composite application": they expose consolidated GUIs from disparate applications, and invoke several back-ends (either directly, or through a middleware)

So, imagine a customer has his/her account blocked (no services available) because unpaid bills. The customer calls the call center, wanting to correct his situation in order to get services back. The employee retrieve his debt, tells the customer and offers available payment mechanisms

The customer choose bank account transfer so...

  • A debit on his account must be made (thus, online access from the company to the bank must be presente)
  • A credit on the company account, also (implying online access from the company to its bank)
  • A payment record (through the collection system)
  • The restablishment of services, that could imply posting commands to several back-ends (cellular phone, internet, cable TV, etc)
  • Countable movements (accessing the accounting system)
  • Auditing records (through the auditing system)
  • ...

In the past, such kind of operations were asynchronous. I mean, the company accepted the payment as conditional, and updated every back end through batch processing. In the meanwhile, the customer still didn't get his services back until the last critical back-end was succesfully updated

Because of that, compensation processes had to be considered for the sack of consistency

But, why not to think in including everything in a transaction? Because when just one back-end is involved, the transaction scope is managed by the back-end itself. But when, as in this case, several back-ends are involved, we need a superior instance able to establish a transactional scope, enrol the involved tiers and coordinate the transaction in the distributed environment

Fortunately when a problem is so common, the probability of finding a solution is greater. And that rule is met in this case: let me introduce Two-Phase Commit (2PC for short)

2PC is a standard protocol for transaction coordination in distributed environments. Thus, we can think in 2PC-enabled back-ends, ready to handle a transaction locally, but following directions from a foreign coordinator

For a deep explanation of Two-Phase Commit protocol, I suggest to visit this link https://en.wikipedia.org/wiki/Two-phase_commit_protocol

In the Microsoft platform, the Distributed Transaction Coordinator (DTC) is a Windows service capable to coordinate 2PC-enabled tiers (also known as XA-compliant resource managers). Here we have an interesting link to gain insight with MS DTC: DTC Developers Guide

From the point of view of .NET development, .NET 2.0 brought a new namespace, System.Transactions, which made transaction-oriented services coding easier than ever. That is, System.Transactions is an abstraction level between the application logic and MS DTC as an infrastructure component. One of the best features of System.Transactions, in terms of transparency, is that this API is intelligent enough to infer that more than one back-end is involved in the transaction, and thus involving MS DTC

I mean, the programming model is similar whether one only back-end participates in the transaction or we have several ones

You can start learning this facility by reading an article of MSDN Magazine:

.NET Matters: Scope<T> and More

Also, a deeper reference is here: Transaction Processing

As a final reflection, however 2PC protocol is thought for both EAI (intra organizational) and B2B (inter enterprise) scenarios, still it's supposed that in the later won't be massively present for a while

Comments