Why is fclose()'s contract so confusing?

Because it’s a long established pattern and contract, let’s explore fclose() today.

Here’s my ideal fclose() implementation:

int fclose(FILE *fp) {
close(fp->fd);
free(fp);
return 0;
}

But of course close() can fail.  Yuck.  I found this man page on close(2) on the web; maybe it doesn’t represent the modern/common close definition but even in my idealized world, the implementation of fclose() has to be something more like:

int fclose(FILE *fp) {
while (close(fp->fd) == -1) {
if (errno != EINTR) {
return EOF;
}
}
free(fp);
return 0;
}

I picked on EINTR to loop on here because the man page for close() calls it out as a condition (without mentioning whether there are or are not un-rolled-back side effects).  By my reading I assume that it means that the close() was not able to complete.  If the I/O wasn’t able to complete and I don’t get a chance to retry it, this would seem to be a very fundamental break in the I/O API design so I’ll assume that it has the contract I believe (a case of Raymond’s “What would happen if the opposite were true” analysis technique).

Notice also that something very important happened here.  There’s an implicit contract in close()-type functions which is that regardless of “success” or “failure” the resource will nonetheless be freed.

But we didn’t deallocate the FILE!  Maybe the right implementation was:

int fclose(FILE *fp) {
int fd = fp->fd;
free(fp);
while (close(fd) == -1) {
if (errno != EINTR) {
return EOF;
}
}
return 0;
}

Is this right?  Probably not; if we’re going to return an error like ENOSPC, callers need to be able to retry the operation.

The simple answer is that the close() pattern is a very special contract.  Nobody’s actually going to sit in a loop calling it.  It must deallocate/free the resources.  If the state of the address space/process is trashed, you might as well kill the process (ideally tying in with a just-in-time debugging or post-mortem-dump debugging mechanism).

But even my totally idealized fclose() implementation is unrealistically simple.  These are the buffered file I/O functions in the C run time library.  Since people are too lazy to call fflush(), the real implementation has to look something like this:

int fclose(FILE *fp) {
if (fp->bufpos != 0) {
const void *buf = fp->buf;
size_t bytestowrite = fp->bufpos;
while (bytestowrite != 0) {
ssize_t BytesWritten = write(fp->fd, buf, bytestowrite);
// -1 is error; otherwise it’s number of bytes written??
if (BytesWritten == -1) {
// what the heck am I going to do now? Holy crud look
// at all the error codes in the man page! Am I really
// going to try to figure out which of those are
// retry-able?
// I guess I’ll just cut and run
return EOF;
}
// docs for write(2) don’t say that successful writes will
// have written all the bytes. Let’s hope some did!
// but wait, what if, say -2 is returned? What if I asked for
// more bytes to be written than ssize_t can represent as
// a positive number? The docs don’t say; chances are the
// implementation is naïve but it seems wrong to second guess
// the apparently intentional use of ssize_t instead of
// size_t for the return type.
assert(BytesWritten >= 0); // do you sleep better with this?
bytestowrite -= BytesWritten;
buf = (const void *) (((size_t) buf) + BytesWritten);
}
}
while (close(fp->fd) == -1) {
if (errno != EINTR) {
return EOF;
}
}
free(fp); // at least free is void!! Phew!
return EOF;
}

My point here is not to lambaste the U*ix syscall APIs in particular; they’re just one in a long line of badly designed contracts.  Deep down, CloseHandle() has a similar loop inside it because just one last IRP has to be issued to the file object and if the IRP can’t be allocated the code loops.  Arguably the IRP should have been allocated up front but people have a hard time stomaching such costs.

What I’m really trying to show is how you can design yourself into a corner.  The C buffered stream functions did this on two axis simultaneously.  The fclose() function has to try to issue writes to flush the buffer.  (Yes, it probably would have called fflush() but since fflush() doesn’t guarantee forward progress, presumably the calls to fflush() would also have to be in a loop.)  The underlying I/O implementation is fallaciously documented – that is it tried to look like it did due diligence in documenting all the things that can go wrong but even with a mature man page on a mature API, there’s no real clue left behind about what to do about these conditions.

The real point here is that rundown protocols don’t get enough attention.  I’m sure that someone thought that the design for close(2) was great.  There’s tons and tons of information about what errors you can get back!  But, in essence its contract isn’t very usable.  The C run time function fclose() has it even worse since nobody trained people to call fflush() if they really want their data written to disk.

The key that should be showing up is this:

Code that is executed in resource deallocation or invariant restoration paths is very special.  How special?  I’m still not sure; we’re taking this journey of discovery together, but you can be sure that nobody’s going to write calls to close functions in a loop hoping that it’ll someday succeed, so I’m sure that returning errors is useless.  Performing operations which may fail due to transient lack of resources is very very problematic.

Tomorrow we’ll hop back over to invariant restoration and see a bunch of the same problems.   Later we’ll journey over to the Promised Land, where all side effects are reversible… until committed.

Comments

  • Anonymous
    May 04, 2005
    Glad to see you write again after so long. I've been enjoying this series quite a bit, despite some initial worries when you chose to go with straight c... (now it makes perfect sense!)

    I keep thinking this out into too many topics, so I'll try and keep it focused to this one:

    It seems to me that closing a file (or I/O device in general) is a bit of a special case even in the already-special set of resource deallocation mechanisms. Actually, to be more specific, it's not so much a special case as a non-case: close()'s contract falls entirely outside of that set; it's a combination of resource deallocation and "other things".

    It sounds like your contract for an ordinary deallocation function is that it should guarantee that either 1) the resource it's responsible for (usually memory) is freed, or 2) the program fails and lets you know why it died.

    In other words, you don't or shouldn't care about possible failures in closing a resource, unless they're so catastrophic that you can't do much of anything else.

    This contract is entirely incompatible with the way something like close() is designed; by nature, file (and especially network!) operations can and occasionally will fail. When they do fail, the program almost always wants to know about it, and preferably not by terminating.

    The only way to resolve that discrepancy is by reducing the guarantee of an I/O close to freeing up the resources we have control over. Operations like flushing the buffer or notifying interested parties can't be made part of the guarantee--there's just no way to guarantee that they'll happen, and crashing if they don't isn't the right answer.

    If someone wants to make certain that they do, they should be using a non-deallocating function to find out about their status. But that's not how it worked. works.

    Short version: close() doesn't operate on the idea of a resource contract. Making file i/o have such a contract would mean altering the behavior and guarantee of what close() does.

    Hard truth: You can't build an interface on top of close() and provide a 100% guaranteed resource contract. (Do you think this is right?)

    But, you can build one that works 99.5% of the time, and appears to most people to have such a contract. The fact that this kind of failure is relatively rare is, I think, why we've always been able to get away with such poor definitions of what should happen when something goes wrong--since nothing does, usually, we haven't thought about it much.

    Or maybe I give too little credit to programmers in the not-so-distant past.

  • Anonymous
    May 04, 2005
    The comment has been removed

  • Anonymous
    May 05, 2005
    Unfortunately, I work in an environment (custom application development) where even something like 80% is "good enough". It's horrible.

    I'm serious, though: As long as every task that needs to be accomplished can be accomplished by at least one route, putting any more time into a custom project is not considered financially worthwhile. Even if any deviation from that route can cause irreversible data loss.

    In your situation, on the other hand, I can see how even the smallest possibility of failure is unacceptable. The more critical and/or popular an application is, the less it can get away with.

    Anyway, I guess what I was getting at at the very end there with the 99.5% number was that I don't think most programmers work at either of our extremes, and I do think that for the majority of programmers, something that almost always works is considered ok. The only people I really see talking about resource contracts are either in academia or working for the top software companies.

    I think that's a shame. I miss having the opportunity to come up with an elegant or perfect solution to a given problem.

  • Anonymous
    May 29, 2009
    PingBack from http://paidsurveyshub.info/story.php?title=mgrier-s-weblog-why-is-fclose-s-contract-so-confusing

  • Anonymous
    June 09, 2009
    PingBack from http://insomniacuresite.info/story.php?id=1352