More than one way to skin a (transactional) cat

Let's briefly look at how databases deal with transaction rollback.  I'm going to ignore a lot of other fun stuff around locking protocols, snapshots, isolation etc. and only focus on how atomicity & durability are achieved.  I'll keep the conversation in the realm of traditional database objects: tables and rows.

Approach 1: Buffer everything until it's time

[caveat: this solution smells like "be functional and then do nofail swap()" but since we're dealing with persistent data and real hard drives, there typically isn't a "nofail swap" type of thing]

The started/created transaction object maintains a list of all of the proposed changes to the long-lived system.  You maintain some in-memory model of the whole database.  Presumably you only have to keep deltas in memory and reads which aren't satisfied by the deltas are issued against the underlying persistent store.

When you go to commit the transaction you build a nice compact queue of operations and you run them really quickly and hope that the system doesn't crash and that the hard drive doesn't develop bad sectors right then.  Hmmm... I guess that's not transactional.  I guess instead I keep a log somewhere and for each item in the queue I'm going to do, I read the current state of the database and save it in the log before making the change.  When I'm done I write a special entry at the end of the log saying that I'm done and that the previous entries don't actually have to be used to roll back the state.

In this approach, aborting the transaction is trivial - you just drop the in memory buffer.

You could try the functional game by instead writing out a new database and then using a hypothesized atomic swap filesystem operation (which is allowed to fail) for the commit.  Probably doesn't scale too well.  :-)

Approach 2: Make changes allowing for rollback

[this is how many/most databases work.  Interestingly at least one high-profile database operates(operated) in the other mode and only developed persistent journalling/crash recovery relatively late in its lifespan...  I'll double check this anecdote before I'll name the database.]

The started/created transaction gets a unique ID and changes to the database are reflected directly into the persistent store, with the exception that the before-image of each record is written to the log under the txn id.  When the txn is committed by the client, an entry it written to the log marking the txn as committed.

In this model, rollback is nontrivial since you have to go backwards through the log finding the undo state for the changes made and move things back.  Since you can't fail rollback (hmm.... sounds familliar) you have to had been sure not to do things like reorganize/compress database pages too much while making the changes.  Maybe the rollback records are just whole pages if your concurrency locking granularity is a page.

---

At some level, both are the same; it's just a matter of how you spread out your impact and I/Os.  There are also fun topics around concurrency, isolation and guaranteed forward progress when there are concurrent transactions.

I'll summarize the models:

1. Buffer up all your changes so you don't make the first real change until you're sure you're done

2. Make your changes but ensure that you can roll them back without requiring a major amount of resources

Disk I/Os can and do fail whether due to lack of kernel resources, cosmic rays, disk controller failure, disk failure etc. etc. etc.  So what's really important is being able to apply the roll back logs for uncommitted transactions in the case of catestrophic failure, ... and! .. turning any failure which might result in the database being corrupt or the log records not getting written into a catestropic failure (e.g. bugcheck()).

Can we apply some of those techniques for in-memory structures?  At what cost?

Comments