Function transactionality

A topic that comes up repeatedly in designing function is what a coworker of mine calls "transactionality".

Since I have a little database background, I think that this is an abuse of the term but let's go with it for a while.

Function transactionality is really a pretty simple concept.  Either a function succeeds and its ESEs are visible or it fails in which case the effective state of the system is restored to what it was when the function began.

It's worth pointing out something that is often missed in analysis.  At some constructive level, a program or subprogram is just a series of operations to be performed.  You might call a subprogram that yields a value a "true function" but I'm using C as my machine language so basically every subprogram is a function and some don't happen to return a value.

In this framework there is no notion of "success" and "failure".  (Even exceptions in such languages are defined in terms of functional behavior; it's humans who develop patterns around their use which correspond to they being associated with failure cases.)

In order to make sense of this argument you have to promote the notion of a block of code succeeding or failing to be a first class concept.  Most everyone does this regularly, but since there are no explicit language features around such markings, it's all convention which makes it hard to develop the set of rules I'm leading towards eventually in this series.

Back to function transactionality.  To discuss this further, let's subdivide the world of ESEs into two parts: Transient ESEs and Persistent ESEs.  Transient ESEs are ones which, if the program is terminated, have no effects which persist past the program's termination and restarting.  Peristent ESEs are things which you can't just make go away by stopping and restarting the program.  In general, when a piece of code is working within a single terminatable address space, TESEs are basically things in memory - global variables, data structures, etc.  PESEs generally correspond to files for the sake of discussion.  (There is a third class of one-way ESEs which you cannot undo, such as dispensing cash from the ATM or printing a character on the printer.)

I'm going to ignore PESEs for today and see what it looks like to write code that gets at least TESEs right.

First let's define a nice function that lets us to addition without all the nasty security-bug-causing overflows:

int safe_add(size_t left, size_t right, size_t *result) {
size_t temp = left + right;
if ((temp < left) || (temp < right))
return EOVERFLOW;
*result = temp;
return 0; // success
}

It's clear that this function has only one ESE (the assignment at the end), has a clear notion of success and failure and meets the expectations of a "transactional function".  (I really do hate this term but I can't come up with something better...)

Let's use this to do some more work:

int safe_add_2(
size_t number_to_add,
size_t *number_to_add_to_1,
size_t *number_to_add_to_2
) {
int err;
if ((err = safe_add(number_to_add, *number_to_add_to_1, number_to_add_to_1)) != 0) return err;
if ((err = safe_add(number_to_add, *number_to_add_to_2, number_to_add_to_2)) != 0) return err;
return 0;
}

Obvious implementation.  But it's not transactional!  If you were really using this library function to get something  useful done, you could get yourself into a lot of trouble unless every error reported eventually terminated the process.  Let's fix it naively.

int safe_add_2(
size_t number_to_add,
size_t *number_to_add_to_1,
size_t *number_to_add_to_2
) {
int err;
if ((err = safe_add(number_to_add, *number_to_add_to_1, number_to_add_to_1)) != 0) return err;
if ((err = safe_add(number_to_add, *number_to_add_to_2, number_to_add_to_2)) != 0) {
safe_subtract(*number_to_add_to_1, number_to_add, number_to_add_to_1); // bug??
return err;
}
return 0;
}

There we go again like in the beginning of this series.  Calling a function in the error exit path depending on it not to fail.  This code is clearly not safe against multithreaded access or anything so it's hard to argue that this is not correct.  But it sure is unsatisfying.  Let's rewrite it so that the transactionality is more obvious.

int safe_add_2(
size_t number_to_add,
size_t *number_to_add_to_1,
size_t *number_to_add_to_2
) {
int err;
size_t temp1, temp2;
if ((err = safe_add(number_to_add, *number_to_add_to_1, &temp1)) != 0) return err;
if ((err = safe_add(number_to_add, *number_to_add_to_2, &temp2)) != 0) return err;
*number_to_add_to_1 = temp1;
*number_to_add_to_2 = temp2;
return 0;
}

Good or bad?  We used two new temporaries but I personally find this more satisfying because rolling back the changes during the part of the code which could fail was pretty obvious - don't do anything!  Obviously this pattern doesn't work when the proposed state changes are bigger but like I said, it's more obvious that it's correct and in this day and age that's critical.

(The reason that PESEs are much harder is because unless you have real transaction support from the underlying persistence engine, you probably have to issue counter-operations against the changes made which themselves may fail and you can get yourself into a situation where your persistent data is corrupt and you can neither go forward nor backward.)

Comments