StackOverflowException and IronPython
A stack overflow is not a recoverable exception under .NET. When your program runs out of stack space, the CLR will tear down your process without giving you the chance to do anything about it.
So, how many frames can you get onto the stack before the world blows up? Obviously, this depends both on the size of the stack and on the size of each individual frame, but we can get ballpark numbers by setting up a simple recursive test.
On my laptop, I ran the following program and found that a value of 61450 would reliably produce a stack overflow exception.
public class Program {
static int max;
static void Test(int i) {
if (i >= max) throw new ArgumentException("done");
Test(i + 1);
}
public static void Main(string[] args) {
max = Int32.Parse(args[0]);
try {
Test(0);
} catch (Exception) {
}
System.Console.WriteLine("Test succeeded");
}
}
Now, let's change this program just a little bit so that we rethrow the exception once per stack frame.
public class Program {
static int max;
static void Test(int i) {
try {
if (i >= max) throw new ArgumentException("done");
Test(i + 1);
} catch (Exception) {
throw;
}
}
public static void Main(string[] args) {
max = Int32.Parse(args[0]);
try {
Test(0);
} catch (Exception) {
}
System.Console.WriteLine("Test succeeded");
}
}
With this change, the number of stack frames I can create drops down to about 18900. What's responsible for the difference? When an exception is thrown under .NET, the stack isn't cleaned up until a catch handler finishes executing normally. In this code, that doesn't happen until the end of the handler in the main function is reached. When we enter that catch block, the stack still contains each of the 18900 frames that we got from calling the Test function recursively. It also contains some kind of data from each of the 18900 exceptions that were rethrown.
Eighteen thousand stack frames ought to be enough for anyone
These tests were all performed on a 32-bit edition of Windows Vista. When I rerun the first test under a 64-bit operating system, I get a much smaller number of frames: 13673. By default, a pure MSIL application will run as a 64-bit process on a 64-bit operating system. This has definite implications for the stack. For one thing, the return address now occupies 8 bytes instead of four. The x64 calling convention also requires the caller to set aside 32 bytes of "shadow space" on the stack regardless of the actual number of bytes used. Finally, the stack must be kept 16-byte aligned. These variations add up quickly when you multiply by ten thousand frames.
Finally, when performing the second test under x64, the program dies after only 114 frames. This appears to be because each thrown exception in the 64-bit CLR takes nearly 8 KB of stack space. So the next time you read that it's a bad idea to rethrow exceptions, you'll have one more reason to agree with the sentiment.
What does this have to do with IronPython?
As it turns out, plenty.
One of the differences between the CLR and the Python language is that a Python exception can be of any Python class, while the CLR basically requires that an exception be an instance of System.Exception or a class derived from it. Even if it didn't, we don't have a 1-to-1 mapping between Python classes and CLR classes, so we can't use the equivalent of "catch (PythonDefinedException pde) {}" in our emitted code. Instead, we throw a System.Exception with which we associate the thrown Python object. That means that every Python catch block defined by user code needs to "catch (Exception e)" and then look at the actually thrown object in the catch block. If its type does not match any except handlers, we have no choice but to rethrow.
In other words, consider the following code:
def inner(i):
try:
if i < 50: return inner(i + 1)
raise RuntimeError, 'Raised after fifty'
except TypeError:
return 0
try:
inner(0)
except:
print 'Caught'
When IronPython generates code for the except handler of the inner() function, it has to catch all exceptions and examine them for their type. In this sample, that will first happen on the fifty-first invocation of inner(). Because RuntimeError does not match the TypeError criterion, the exception will be rethrown -- and caught again on each of the 50 subsequent exception handlers it meets in the inner() function while unwinding the stack.
It gets better.
In order to be able to display a Python stack trace when an exception happens we wrap generated code in a fault handler. A fault handler is a feature of MSIL that's not currently available when programming in C# -- it's basically the equivalent of a finally block that only runs if the guarded block is exited via an exception. However, there's a catch (if you'll pardon the pun). The fine print in the MSDN documentation tells us that exception fault blocks aren't supported when emitting dynamic methods. As a result, when building a dynamic method, the DLR will replace a fault block with an ordinary catch block which will (say it with me) rethrow the exception.
This means that -- in the Python code snippet above -- we may end up with as many as 102 exception objects on the stack before it is finally unwound.
Bottom Line
So far, we haven't gotten any reports of this issue causing trouble "in the wild". As such, I present it here largely as an intellectual curiosity and as a somewhat entertaining diversion. If you do run into this problem, please let us know by filing a bug at the IronPython website. It is always possible to force a .NET application to run as a 32-bit process under a 64-bit operating system, and that would be my suggestion for a temporary workaround.
May your cup run over, but your stack stay well below its limit line. Good night, and good luck.
Comments
- Anonymous
July 29, 2008
One of the differences between the CLR and the Python language is that a Python exception can be of any Python class, while the CLR basically requires that an exception be an instance of System.Exception or a class derived from it.IIRC, the CLR spec does not restrict the types of exceptions thrown (so long as they are reference or boxed value types). It's CLS that demands that exceptions should derive from System.Exception - but CLS can (and often is) ignored. That means that every Python catch block defined by user code needs to "catch (Exception e)" and then look at the actually thrown object in the catch block. If its type does not match any except handlers, we have no choice but to rethrow.Shouldn't exception filters do the trick without the need to rethrow explicitly? - Anonymous
July 30, 2008
"the CLR spec does not restrict the types of exceptions thrown" -- Technically true, but generally discouraged. And there are other reasons why this wouldn't work for us."Shouldn't exception filters do the trick without the need to rethrow explicitly?" -- Good question, and one which deserves the long answer I hope to give it later this week.