Software Contracts, Part 5: Hold on a second, why do we care about this stuff anyway?
I'm more discombobulated than usual on this series, I totally missed the third article in the series when I should have gotten to it (this, btw, is why Raymond writes his stuff 8 months in advance - it lets him fix stuff like this).
So consider this post the 2nd in the series (the first is "Software Contracts", the second is "there are two sides to every contract", this is the 3rd, the 4th is "Sometimes contracts are subtle", etc...).
Why do we care about software contracts?
Well, for the exact same reason we care about real-world contracts. Contracts define the expectations between two parties. Without fully understanding the contract for a function, you don't know how to correctly call it.
Take the ReadFile example I mentioned the other day. The ReadFile contract tells you which parameters to the function MUST be provided (hFile, lpBuffer, nNumberOfBytesRead), which MAY be provided (lpNumberOfBytesRead and lpOverlapped). It also includes how you determne if the function succeded or not (or if the function has success/failure semantics).
We're so used to interpreting software contracts that they become ingrained. Normally, we don't even bother think about them, and for the most part, you don't need to wonder about them (just like in real world contracts).
However the instant you step outside the simplest case, understanding the contract for an API becomes critical.
As a simple example, consider one small aspect of the contract for the standard C++ library (from "Thread Safety in the Standard C++ Library"):
A single object is thread safe for reading from multiple threads. For example, given an object A, it is safe to read A from thread 1 and from thread 2 simultaneously.
If a single object is being written to by one thread, then all reads and writes to that object on the same or other threads must be protected. For example, given an object A, if thread 1 is writing to A, then thread 2 must be prevented from reading from or writing to A.
It is safe to read and write to one instance of a type even if another thread is reading or writing to a different instance of the same type. For example, given objects A and B of the same type, it is safe if A is being written in thread 1 and B is being read in thread 2.
These rules concisely lay out the threading guarantees for C++ library functions (to be honest, I really like this version of the text, usually I just hear it written as "An object is thread safe for reading from multiple threads or writing from a single thread").
You know from this part of the contract (which applies to the Microsoft implementation of the container classes (I don't know if it's in the standard, since I don't have a copy of the standard)) that you can have multiple readers of an object, but the instant you have a single writer, you need to add some kind of a lock to isolate the readers from the writers. On the other hand, you don't need a lock if all you're doing is reading the data.
Without this text being a part of the contract, you MUST assume that it's not possible to call the container classes in the C++ library from multiple threads (because the contract doesn't say that you can).
A failure to appreciate software contracts can result in a myriad of different bugs, including various and sundry security bugs. In my experience, most subtle, hard-to-diagnose bugs ultimately turn out to be caused by a misunderstanding about the contract associated with a function.
For example, it's long been known that strcpy is a haven for security bugs. One of the naive suggestions for fixing strcpy bugs is to simply replace the calls to strcpy with calls to strncpy. Unfortunately in many ways the strncpy API is just as bad as strcpy because it fills the destination string with null characters up until the length provided and doesn't ensure that the destination string is properly formed. But most people looking for a "safe" replacement for strcpy will ignore that part of the contract and thus introduce different security bugs while trying to fix existing problems (according to Michael Howard, this mistake has happened more than once in the wild).
If the people recommending replacing strcpy with strncpy fully understood the contract for strncpy, it's likely that they wouldn't have make that mistake (or would have added more caveats).
Comments
Anonymous
January 15, 2007
Hmm. Looks like it's about time to invoke RFC 2119 . . . (Hope this is taken tongue in cheek.)Anonymous
January 15, 2007
The problem with strncpy is that its contract is sociopathic. There is no conceivable situation in which strncpy could ever be useful. Everyone who uses it and sees the problem writes their own version to insure that the result is null terminated.Anonymous
January 15, 2007
Michael, I think I agree with you :) strncpy is a simply hideous API.Anonymous
January 15, 2007
Michael As someone once pointed out, strncpy works great when working with things such as directory structures. For example, if you directory structure is something such as struct { char name[16], int length, int sectorstuff }; then doing an strncpy into name will allow for a 16 character name while also blank filling old data. I think Larry's or Raymond's old blog entries talk about this.Anonymous
January 15, 2007
Monday, January 15, 2007 8:58 PM by Michael Geary > There is no conceivable situation in which strncpy could ever > be useful. 100% wrong. strncpy is sociopathic because of its name, and it's obsolete because of what it does, but it served a 100% conceived and 100% existing situation when it was invented with its 100% then-meaningful contract and 100% sociopathic name. First a digression, since we're in a mostly-Windows programming environment. When files are created in Windows, Windows often assigns them short names (8.3 format) in addition to their real names. Historically, some versions of Windows could only handle the 8.3 format. Historically, some file systems used by Windows could only store the 8.3 format. Now for a big surprise. Windows isn't the only family of operating systems to have that kind of history. In antiquity, Unix stored filenames up to 14 bytes long. Structures that stored filenames had 14 bytes and didn't need a null terminator, but if a filename was shorter than 14 bytes then you wanted the remaining bytes to be zeroes. strncpy was invented for this purpose and it served this purpose. Only strncpy's name was hideous.Anonymous
January 15, 2007
The C++ standard doesn't mention threads at all. Any thread interaction is implementation dependent (although I would expect all the current major implementations adhere to the same thread safety rules as the Microsoft (Dinkumware) version)Anonymous
January 15, 2007
strcpy_s/strncpy_s anyone? :)Anonymous
January 15, 2007
I remember reading somewhere that strncpy() was written in order implement some file name copying in unix, where the file name length had a fixed length, padded with NULs, and not necessarily NUL-terminated - hence its wierd, insecure contract. Can't find it at the moment though.Anonymous
January 16, 2007
Oh, there is exactly ONE reason it works the way it does, and that's because it was part of the original UNIX file system code. The file name limit was 13 characters, and it was padded with NULLs to fill in the extra space. Strncpy matches the symatic of this very old data structure becuause it was writen to fill in this field of this data structure. Since then it's insanity has been a part of C.Anonymous
January 16, 2007
Ditto on strncpy and its various cousins (strncat, snprintf, etc). Several of these look similar, but have slightly different expectations--snprintf will null terminate, but you have to give it a buffer size which doesn't include the null. It's not just strncpy's contract that is sociopathic. I like the solution of creating the safe CRT; it's not standard, but it really should be. I'd be curious to find out if there's been any effort to submit it to the C/C++ standards committees.Anonymous
January 16, 2007
Except that strncpy doesn't even necessarily null-fill the remainder of the buffer, making it less than ideal for the directory entry case as well. It's neither fish nor fowl. It would have been far better to have two different versions of strncpy -- one that always null-terminated without exception, and one that null-filled the entire buffer first and then wrote the string second without doing any termination at all.Anonymous
January 17, 2007
The comment has been removedAnonymous
January 17, 2007
Larry, with your apparent attention to detail, couldn't you convince MS to let you do a short and concise (and 100% non-ambigous) write up of this subject - even if just for your own interface designers? This includes IOCTL/FSCTL "interfaces". The reason I ask is that I'm sick and tired of documents stating "You do not need to set the dwSize member" when it crashes without it, non-const pointers that should be const (when there never even is a chance it modifies the data I give the function), functions explicitly documented as accepting a NULL pointer argument (as one of the frequently too many arguments) that SEGV given a NULL pointer, and finally my favorite: usage of only half-documented ioctl's/fsctl's that BSOD's the machine inside MS drivers (not only MS-provided, but MS' own code).Anonymous
January 17, 2007
"Except that strncpy doesn't even necessarily null-fill the remainder of the buffer" Yes, it does. These functions are from the standard C library, and so they're defined by K&R, who have this to say on the subject... char *strncpy(s,ct,n) copy at most n characters of string ct to s; return s. Pad with '�'s if ct has fewer than n characters. People asking for a "safer" strncpy are often about to make a mistake. Such functions have been defined (many Unix systems provide strlcpy() for example) but now you need to audit your application to be sure that simply truncating strings here and there doesn't have any unforeseen consequences. Often, instead of scattering MY_MAX_STRING_LENGTH or worse char buffer[1024] through the code and then trying to fix the resulting security problems with strlcpy() you ought to be using dynamic allocation for your strings in the first place.Anonymous
January 20, 2007
lstrcpyn always null-terminates the destination string.