Refactoring C and C++ Code for Security
I have been programming in C and C++ since I was 15 years old. And no, I won’t tell you how long ago that was! I have always loved both languages, and still do, but when the first internal pre-releases of Visual Studio 2013 came out, I selected C# as my prime language. To be honest, I felt like a deserter!
When building and working on operating systems, C and C++ are the dominant languages, but when working with customers building line of business and cloud applications, most are using higher level languages like C#, JavaScript and Java. With that said, I would say that about 25% of customers I work with have some form of legacy C and C++ code in production and while it is often not exposed directly to the Internet, much of it sits right behind a web server. In other words, it’s still in the firing line of attackers. For example, some systems I have reviewed take a web request (ASP.NET, PHP etc.), turn the request data into a proprietary format and shoot it over a socket to a service or daemon written in C or C++ listening on a TCP socket. The C/C++ code then performs some parsing and queuing and shoots the request to a back end system which processes the data and returns the result back up the pipeline.
Of course, there are classes of systems that use C/C++ throughout. For example, many control systems use C/C++ for the core system and use higher level languages, such as C# and Java, for the management systems.
In short, there’s still a great deal of old, crusty C and C++ code out there that is directly or indirectly open to attack.
This code should be updated where possible to improve its security, but this should be done in a way that does not introduce regressions and requires very little engineering effort. I am not saying a customer should spend thousands of hours securing C and C++ code (some should, however!) but there are things that can be done that raise the security bar easily, and this means refactoring the code with an eye on security.
What’s Refactoring?
Refactoring is a process where code is improved in some way without changing how it functions. Refactoring examples including making code more legible or maintainable. The rest of this commentary focuses on refactoring C and C++ code so it is more secure and for C and C++ code, that means reducing the number of potential memory corruption issues in the code.
Memory corruption (also called memory safety) vulnerabilities have long been the bane of C and C++ code and every refactoring idea below attempts to reduce or mitigate many memory corruption issues.
Refactoring Idea #1 - Recompile and Relink
It really could not be simpler. The two main C/C++ toolsets in use today, Microsoft Visual C++ and Gnu gcc, add memory corruption defenses to the compiled and linked code. All you need to do is flip a few compiler and linker flags and the tools will add memory corruption defenses to resulting binary.
For Visual C++ the compiler flags are:
- /GS <More Info>
- /guard:cf <More Info>
And the linker flags are:
The really good news is that for Visual C++ 2015, you don’t need to do anything other than recompiling and linking the code as these switches are enabled by default. With the exception of the /guard flag; that's new in VC++ 2015 and must be set.
Also, add this to a commonly used header, such as stdafx.h
#pragma strict_gs_check(on)
For gcc, you should flip the following switches in the compiler and linker:
CFLAGS="-fPIE -fstack-protector-all"
LDFLAGS="-Wl,-z,now -Wl,-z,relro"
There is a “downside” to these changes – if there are memory corruption vulnerabilities in the code, there’s a good chance your code will fail if an issue is found during test or when the code is in production. I see this as a good thing because you just found a real security vulnerability with a nice, clean stack trace. Fix the code, the bug has probably been latent for decades and you never knew it.
Refactoring Idea #2 – Replace Insecure C Runtime Functions
There are many C runtime functions that we know are insecure because they don’t constrain how much memory is copied. At Microsoft we banned the use of these functions in new code. The Rogue’s Gallery includes:
strcpy
strcat
sprintf
strncpy
strncat
snprintf
gets
And many, many more. You are wrong if you think I am going to suggest you dive into the code and manually replace these functions with safer versions. A better option, because we’re trying to keep the work as small as possible, is to have the compiler do the work for you when it can. I wrote about this many years ago. All you need to do is add this to a commonly used header file, such as stdafx.h, and then recompile the code:
#define _CRT_SECURE_CPP_OVERLOAD_STANDARD_NAMES 1
#define _CRT_SECURE_CPP_OVERLOAD_STANDARD_NAMES_MEMORY 1
For gcc, you can use these two settings which do a similar thing.
CFLAGS=" -D_FORTIFY_SOURCE=2 -Wformat"
Refactoring Idea #3 – Focused use of Static Analysis Tools
If you use Visual C++, compile with -analyze and fix any issue that relates to memory safety. We have tuned the tool to find issues with a high degree of confidence, so there should be few false positives.
In my opinion, no code should be check-in with these warnings:
C6001, C6002, C6029, C6054, C6059, C6063, C6064, C6066, C6067, C6101, C6200, C6201, C6255, C6320, C6383, C6385, C6386, C6411, C6412
John Carmack has some interesting things to say about the value of using /analyze.
Refactoring Idea #4 – Stretch a Little
This one is a little bit of work, as you’ll get build failures until you fix all the issues, but if you add this to a common header, it will deprecate the banned C runtime functions. After you have downloaded the header, the line to add is:
#include <banned.h>
If that’s a little too harsh, then at least fix all C4996 warnings; these are warnings that indicate banned functionality and the set is smaller than the list in banned.h.
Summary
The security purists out there are probably saying there is a lot more to do to old legacy C and C++ code than what I outlined and the purists are totally correct. But I would much rather see large swaths of C and C++ code be made somewhat more secure rather than having 0% of the C and C++ out there have nothing done to it. For those that want to go beyond this list, feel free to do so!
Comments
- Anonymous
April 13, 2016
Couple of additions - strict_gs isn't really something they meant to expose, and enabling it can have a big perf impact for some types of apps. With each new iteration of /GS, they've gotten a lot smarter about what to check. I recommend that people just go with the default. Next, there's a way to get the safety of C# and the perf of C++ - the STL! This is a much heavier weight solution, but if you rewrite the code to always use STL containers, and always use accessors, out of range accesses all turn into non-exploitable C++ exceptions, not AVs. In terms of perf, it takes a bit of learning to know the new ways that the STL can both gain you perf and ways that you can make mistakes. My experience on a large library was that the perf tuning I did to find where I had misused the STL not only found my errors, but a lot of problems that already existed. By the time I was done, perf was about 25% better, and memory footprint was much lower. This is something that we've seen across many teams that have adopted the STL - for example, just about everyone has some home-brewed resizable array. They typically see substantial perf gains by removing those and using the STL implementation.