Buffer overruns and old school exploits
I was asked to talk about Buffer overruns and I am happy to do that – although you will forgive me if I don’t give sample code, I hope. We don’t often talk about this but the BlackHats all know this material so I guess that it doesn’t much matter.
In the old days, there was a programmer’s trick that I sometimes used to embed data into an application. The code read something like:
Call OutputString
.string “Press enter”
.byte 13, 10, 0
…
:OutputString
Pop EAX ; got the address of the data which should by rights be the return address
Mov SomewhereSafe, EAX
; code to print the data, incrementing a value every byte we process
Add EAX, [BytesHandled]
Push EAX
Ret
In fact, it looked a bit different because it was in 6502 machine code but the principle holds just fine. Now, what have we done here? We cheated and made the return address act like a parameter. The way that this works is that the call instruction (Jsr in 6502) pushes the address of the following instruction onto the stack. Accordingly, the address of the data was actually passed although that meant that the called function wouldn’t have a return address. However, we took a copy of the address for later and changed the return address so that we would return to somewhere other than we came from, in this case after the data. By pushing this calculated value back on the stack, we had an address to return to. In effect, we carefully overwrote the stack. Exploits that use buffer overruns often rely on the same thing.
Imagine that I have a C function that takes a string and an integer:
MyFunc(int iStream, char* buffer)
In my documentation, I specify that the string must not be more than 100 characters because I know that I have some code in my function that says
MyFunc(int iStream, char* buffer)
{
char szLocalBuffer[100];
strcpy(szLocalBuffer, buffer);
…
}
What happens if buffer contains a longer string? Well, we overwrite something. What do we overwrite? Stack. Because of the way that compilers typically work, the local variables will be after the parameter list and the return address and what have you – and for C, there is not much else there. Things are a little more complex for C++. Anyhow, we overwrite the stack and we will probably crash because we have overwritten all the local variables at least some of which will typically be pointers. If you write code then you will have seen it happen time and time again. However, if the values were just right (and a little debugging would tell you what was there prior to the overwrite) then you could arrange for the contents of the string to such that it would put sensible values in those variables – after all, you are overwriting bytes with bytes when it comes down to it. If you did that then it wouldn’t crash until the end when we pop the bad address and the program counter gets trashed sending execution off to somewhere odd. Of course, you could choose to set the return address to something a bit more helpful to you as I did in the initial assembler example and change the flow of execution. It seems a lot of effort for little gain.
Now, you have probably been thinking of someone using our library function from an application. Imagine instead that the application calling the library was just passing on data that it got from another machine on the network. That data could have come somewhere out in internetland. In that case, it may well be that someone far away and not at all trustworthy has changed your program flow. If they changed it to point to the string buffer which they provided then they will be running their own code in your process with your account rights. That is not at all a good thing.
In practice, that no longer works because of Data Execution Prevention. Basically, pages of memory which are not supposed to contain code (and the stack is one of these) do not have the code execute right set (see the documentation for the VirtualProtect API). This came in with XP SP2 and a lot of people complained about the performance overheads.
That doesn’t mean that Buffer Overwrites are no longer exploitable but it does make it a little harder. Sometimes security, like so many other things, is about delaying the inevitable. The generation of exploits that followed DEP used more sophisticated variants such as indirecting calls through OS functions (so that the code was in a code segment) and overwriting exception pointers. I won’t be going in to the details of those today or probably ever since this is a public document and there are very few people who need to know quite how those tricks work.
So, the obvious solution is better parameter validation and I will be talking about some of the gotchas that you can see there in my next post.
Until then, signing off
Mark