Sometimes people don't really get the point of defensive programming for security...

I was reading this months issue of Dr. Dobbs Journal, and I ran into the column "Illusions of Safety" by Pete Becker.  Pete writes about enhancements to the C language, and I usually really enjoy his columns.

This month was an exception.  Pete basically spent the entire column explaining why you don't have to worry about the "unsafe" C runtime library functions like gets() and strcpy() as long as you design your application correctly.

I'm just fine until he gets to the 2nd page of the article, "Prevention by Brute Force".  First off, he presents a "safe" version of strcpy called safe_strcpy which isn't actually a safe version of strcpy.  It's actually a replacement for strncpy, and preserves one of the critical bugs in strncpy - strncpy isn't guaranteed to null terminate the output string, and neither is his "safe" replacement.  He also describes testing for success on the strcpy as "tedious".  Yeah, I guess that ensuring that you handle the failure of API's is "tedious".  I'd also describe it as "correct".

But it's when we get to the 3rd page "Prevention by Design" that Pete totally loses me.

You see, on Page 3, Pete decides that it's OK to use strcpy.  You just need to make sure that you put an assert() to make sure it doesn't overflow.  And you need to make sure that the inputs to your functions are checked upon entry to the system.

But that's just plain wrong.  First off, the assert() won't be there in non debug code - assert()'s disappear in production code.  So your production code won't have the protection of the assert.  And it completely ignores the REAL cause of the problem - what happens when the vulnerable function is called from an unchecked code path.  If (when) that happens, you've got a security hole.  And the bad guys ARE going to find it.  Michael Howard gives LOTS of examples where developers added checks to a vulnerable code path at the entry point of a function without realizing that there was another code path to the vulnerable function.  You also don't know what will happen four or five years in the future - it may be that a future maintainer of your code won't realize that your low level routine has such a constraint and calls it improperly. 

It's far better to replace the strcpy() call with a strcpy_s() call and make the caller pass in the size of the target buffer.  That way you don't rely on others to protect your vulnerability.  There is at least one known vulnerability that was caused by someone taking Pete's advice.  A development organization had a vulnerability reported to them, and they fixed it by adding a check up-front to cut off the vulnerability.  Since they'd removed the vulnerability, they released a patch containing the fix.  Unfortunately, they didn't realize that there was another path to the vulnerable function in their code, and they had to re-release the patch.  This time they did what they should have done in the first place and not only did they add the up-front test, they also removed the vulnerable call to strcpy - that way they'll never be caught by that vulnerable call again.

On page 4, he claims that it IS possible to use gets() correctly.  It's ok to have programs that read from the console as long as those programs are only called from other, known programs.  But of course, this totally ignores the reality of how the bad guys work.  Security vulnerabilities are rarely exposed by someone using an API or protocol in the way it was intended.  They almost always occur because someone DIDN'T use a program in the way it was intended.  For instance, I know there was at least one *nix vulnerability that was explicitly caused by a setuid application calling gets.  Someone fed a bogus command line string to that application and presto! they had root access to the machine.

Here's another way of thinking about it (oh no, not another stupid automotive analogy).  If you drive safely, you don't need seat belts (or airbags).  But I wouldn't even DREAM of riding in a car without at least seat belts.  Why?  Seatbelts and airbags are a form of defense in depth.  Your primary defense is safe driving.  But just in case, your seat belts (or airbags or both) may save your life.

 

Bottom line: A good architecture is no substitute for Defense in Depth.  Sure, apply the architectural changes that Pete described.  They're great.  But don't believe that they are adequate to ensure safety.

Comments

  • Anonymous
    October 04, 2006
    The comment has been removed

  • Anonymous
    October 04, 2006
    I think that every function checking the validity of its arguments is probably overkill (it gets into the "IsBadXxxPtr" problem).  If the function is internal, then assert() is fine. But assert() is not and can not be a substitute for defensive programming.

  • Anonymous
    October 04, 2006
    You're right, that article is pants.First, it's not clear that "safe" functions are any slower. Modern processors can schedule three instructions at the same time, especially trivia like using a register to keep track of the free space in a buffer.Second, if "safe" functions are too slow (implausible as this is), then use a better string representation. If you're checking all string lengths at the boundary then why not convert them to a better representation, such as counted strings, at the same time. This has the further advantage that there's no chance of unchecked strings slipping through to your potentially unsafe code.But your automotive analogy is poor. Most drivers have a certain level of acceptable risk. If you reduce their risk with safety features they'll drive more recklessly to increase the risk back to their acceptable level. So the gross effect of safety features is no improvement in driver safety coupled with increased risk for pedestrians, cyclists, etc. A net loss. Much research supports this.

  • Anonymous
    October 04, 2006
    I hate to say it, but Dr Dobbs lost me many year ago when the published an article about a new super fast sorting algorithm. What was the great advancement?Swapping pointers to elements instead of elements.Um... yeah...Makes me wonder if there is any type of review. It really isn't the articles that I know are stupid that scare me, it is the articles that I don't know that they are stupid.

  • Anonymous
    October 04, 2006
    The comment has been removed

  • Anonymous
    October 04, 2006
    The comment has been removed

  • Anonymous
    October 04, 2006
    The comment has been removed

  • Anonymous
    October 05, 2006
    OpenBSD guys created a safe strlcpy:http://www.gratisoft.us/todd/papers/strlcpy.html

  • Anonymous
    October 05, 2006
    This is offtopic, but your blog comments seem to randomly switch between supporting HTML syntax and not.e.g. This should be a new paragraph (using <p>).

  • Anonymous
    October 05, 2006
    Yeah, I know.  We recently switched the site to CS 2.1 and this apparently is a side effect.  I've complained to the appropriate people.

  • Anonymous
    October 05, 2006
    Why not use strcpy if I've proven that the parameters will not cause an overflow? Data that comes from file/network/user input can be anything and it has to be appropriately handled and checked. But internal (or processed by your carefully designed app) data can be guaranteed to be safe. In this case assert() is enough.

  • Anonymous
    October 05, 2006
    Sergei: It's possible to use strcpy safely.  But you need to first verify that the source string being copied is shorter than the destination buffer, which requires traversing the string twice (once to find the length, the 2nd time to copy the data).  But why NOT use strcpy_s which does both at the same time? Otherwise you're trusting your callers. I've had MANY situations where developers made a change, tested it on the retail bits and checked in the change.  Only later, when someone was testing the debug bits did they realize that the other developer caused an assert to fire.  That's because asserts are only used to validate invariants.  Asserts are fine, but since they're not always present, they aren't a suffcient protection. People keep on trying to give arguments why they should use broken paradigms.  Why?  There are replacements that have equivilant performance characteristics that are safe, why not use them?

  • Anonymous
    October 05, 2006
    >one of the critical bugs in strncpy - strncpy isn't guaranteed to null terminate the output string If it did there would be no way to detect the case when the destination is too small to hold the whole source.

  • Anonymous
    October 05, 2006
    sergei: Interesting.  I've never heard of that usage for strncpy (checking the byte at <length> for 0).  I much prefer APIs that return errors when they fail as opposed to having to probe for side-effects.

  • Anonymous
    October 05, 2006
    If you're the world's best and safest driver, seatbelts protect you from all the morons out there.If you're the world's best and safest programmer, safe string copy functions protect your code from all the morons out there who accidentally work around your safety checks.And, also, if you're not quite as great as you think you are, they keep you from getting a whole lot of egg on your face. ;)

  • Anonymous
    October 05, 2006
    I think the real point of the article is being missed here. Surely everyone here would agree that formal proof of buffer overflow impossibility through static analysis and design is a MUCH better way to produce secure software than runtime buffer size checks? Just checking string bounds isn't always enough -- you have to make sure that a truncated string isn't used in a way that could cause a different vulnerability, or that throwing exceptions doesn't allow an easy way to cause a DoS, etc. I don't think bounds checking is a bad idea, as it definitely helps with robustness as well as security (think hardware errors!), but it's not a great solution. For calls that are solely internal, it's a bit like speculatively adding try/catch.Personally, I would rather see more emphasis on language features and tools that help me declare and enforce guarantees at compile time, because that would help avoid a lot more problems than just buffer overflows.Also, I think the performance issue is a bit of a red herring, not because the checks are "free" -- they aren't, either in runtime or code size cost -- but because string operations themselves aren't cheap, and overuse of strings is a common performance problem.

  • Anonymous
    October 07, 2006
    Friday, October 06, 2006 3:03 AM by Phaeron > formal proof of buffer overflow impossibility through static > analysis and design is a MUCH better way to produce secure > software than runtime buffer size checks? That is 100% true. For some kinds of programs you can do a formal proof.  How do you construct a formal proof?  In my experience it was done by humans applying logic to demonstrate that each fact is provable from previously proven facts.  In other words, humans were applying the same logic that humans apply in writing programs.  So the bug rate wasn't zero, the bug rate might potentially be geometrically lower (two simultaneous failures being geometrically less probable than one failure), but the actual results weren't that good. In the general case there won't always be even a theoretical possibility of a formal proof.  It reduces to the halting problem. Now, C strings are miserable.  Most of the C language combines the power of assembly language with the beauty of assembly language because it was invented as a substitute for assembly language.  Of course since being standardized it's not always possible to use it that way any more, but it still reflects the beauty of its origins (^_irony).  The strings section of the library seems to have been designed more hastily.  It took advantage of machine instructions which an assembly language programmer could code quickly for one kind of machine.  It didn't prepare for efficient but harder coding that would measure once before copying and concatenating and comparing several times.  C++, being object-oriented assembly assembly, derives with the same weak base, though it's a bit easier to make counted string libraries that will be easily usable by clients. A few days ago I used _tcscpy safely.  I ran a few _tsclen calls, added the results, did a malloc, checked the result of malloc, and then if still running I copied each string to the appropriate portion of the resulting buffer.  Then I cringed at the thought of what will happen when someone copies and pastes the code without figuring out why it was OK this once.  I wished I could just use VB's & operator instead.

  • Anonymous
    October 10, 2006
    The real solution is to use C++

  • Anonymous
    October 11, 2006
    > The real solution is to use C++ Why stop here and not switch to C# or VB?

  • Anonymous
    October 24, 2006
    As I understand it, the main argument against to "Why stop here [C++] and not switch to C# or VB[.NET]?" is that C++ is still a LOT faster running...like 16 to 64 times faster...mostly due to the asbsence of strict variable typing and the direct access it affords to memory.   That's how I used to think even just a few years ago, and I still do, to the extent that I think unmanaged code is still a hellalot better for, say, massive scientific simulations and the like, because it's still WAY WAY WAY faster, deny it all you like. However, for pretty much any business-related project I could imagine, and most anything else non-scientific (or gaming), the hardware is now simply so fast that: who cares how much slower managed code executes?...it's just not generally an issue that it runs many times slower.  So, in the final analysis, I would never use C or even C++ for anything usual, especially if I needed the result to securly withstand public use.

  • Anonymous
    October 28, 2006
    "A few days ago I used _tcscpy safely.  I ran a few _tsclen calls, added the results, did a malloc, checked the result of malloc, and then if still running I copied each string to the appropriate portion of the resulting buffer." Did you remember to check to for overflow on your addition?

  • Anonymous
    November 03, 2006
    He's not claiming that you should use assert(3) for parameter validation; indeed in that bit of the article he explicitly states that he wants to avoid doing parameter validation everywhere on efficiency grounds(!). The assert is just an expression of the "contract" of the function. Sure, the suggestions are pretty silly, but not really in the way you suggest, at least in that case.

  • Anonymous
    November 23, 2006
    If C# and C++ are too slow, then switch to Java. I switched and never looked back.