Partilhar via


Even good design can't succeed in isolation

I tried three times to write yesterday's entry which was intended to explore the same problems with trying to restore invariants in failure paths.  Instead I kept getting fouled up because the collections I was using were based on the STL design.  The STL designers actually did a good job in making it hard to recreate the problem I wanted to exhibit.

But instead I had to create artificially complex examples to avoid going outside the STL design patterns.  But there are problems due to the STL design which form a typical problem when trying to build reliability out of lower layers which are not conducive to it.  Much like the example of close(2) possibly failing for environmental reasons, they are just following a well established pattern of "when in doubt, return an error".

The particular case is the fact that the std::map::erase() overloads are documented not to fail, including the overload that takes a const key_type & reference (which must therefore also invoke comparisons between the key passed in and the key in the elements in the map).  This is a smart contract and we all know that strcmp() can't fail (usual caveat: except for bad parameters and in this case the failure is as catestrophic as we could hope for).

But humans aren't that interested in the kinds of comparisons that strcmp/wcscmp do; they want to do several standard things like ignore case or apply linguistic collation algorithms.  The key issue here is that for anything but these trivial byte-wise type comparisons, the comparison can have 4 results: a == b, a < b, a > b or failure.  A good example of such a string comparison API is the CompareString() API in Windows (https://msdn.microsoft.com/library/default.asp?url=/library/en-us/winui/winui/windowsuserinterface/resources/strings/stringreference/stringfunctions/comparestring.asp).

How can I be good and use a useful string comparison function and play into this pattern?  Let's write some code and see what it might look like.

int are_equal_case_insensitive(const wchar_t s1[], const wchar_t s2[]) {
int comparisonResult;
int err;
if ((err = compare_string(CASE_INSENSITIVE, s1, s2, &comparisonResult)) != 0) {
// We'll just "play it safe" and say that they're not equal
return 0;
}
return (comparisonResult == 0);
}

Was that really playing it safe?  I guess we couldn't prove they were equal, but we also didn't prove that they're not equal either.  I didn't bother trying to loop like I did in the other examples when the error is ENOMEM; I'll address why that isn't really a good idea anyways later.  I guess the answer is:

int are_equal_case_insensitive(const wchar_t s1[], const wchar_t s2[]) {
int comparisonResult;
int err;
if ((err = compare_string(CASE_INSENSITIVE, s1, s2, &comparisonResult)) != 0) {
// We'll just "play it safe" and say that they're not equal
bugcheck(err);
}
return (comparisonResult == 0);
}

bugcheck() is a function that would force a memory dump and then terminate the process with a failure exit code.

Is that a good solution or a bad solution?  It did seem pretty drastic but at the same time I converted a potential privilege escalation attack into a denial of service (DOS) attack.  DOS attacks are bad but not as bad as letting someone take advantage of inconsistent global state in your process.  At the same time, most managers may not feel like taking the high road here when it turns out that someone can crash the service process just by typing an unusual set of characters into some field in a data entry form.

If you don't believe this, look for all the code in Java and C# that just swallows exceptions because a "crashing" bug was reported against the app.

The point of this post is that we're going to see that it's really hard to engineer solid solutions if your framework doesn't provide you good solid guarantees.  STL actually sets a good example here, but the problem is that a good example that's in the middle of the conceptual stack can actually cause more mess than it cleans up.

My original problem I was going to talk about is because our collection classes, while modelled after the STL designs, allow erase() to fail because the key comparisons may fail.  This gets you into a real bind when you can't guarantee forward nor backwards progress.  The checked in code right now is chicken and leaves the global state torn.  I plan to fix it and we'll talk about what lengths we might have to go to in order to fix it when the normal operations don't have any guarantees of forward progress.

Comments