Udostępnij za pośrednictwem


Never Say Never, Part Two

Whether we have a "never" return type or not, we need to be able to determine when the end point of a method is unreachable for error reporting in methods that have non-void return type. The compiler is pretty clever about working that out; it can handle situations like

int M()
{
try
{
while(true) N();
}
catch(Exception ex)
{
throw new WrappingException(ex);
}
}

The compiler knows that N either throws or it doesn't, and that if it doesn't, then the try never exits, and if it does, then either the construction throws, or the construction succeeds and the catch throws. No matter what, the end point of M is never reached.

However, the compiler is not infinitely clever. It is easy to fool it:

int M()
{
int x = 123;
try
{
while(x >= x / 2) x = x / 2;
}
catch(Exception ex)
{
throw new WrappingException(ex);
}
}

Here x is always going to be a small, non-negative integer, and a small, non-negative integer is always greater than or equal to half of it. You know that, I know that, but the compiler doesn't know that. The compiler complains that this method has a reachable end point, even though clearly the end point will never be reached. We could put that logic into the compiler if we really wanted to; we could be more clever.

How much more clever could we be? Is there any limit to our cleverness? Could we in theory produce a compiler that perfectly determined whether the end point of a method was reachable?

Turns out, no. A proof of that fact is a bit tricky, but we're tough, we can do it.

First off, let's restrict the problem to a particular pattern. Consider programs of this form:

class Program
{
private static string data = @"some data";
public static int DoIt()
{
some code
}
}

The question is "given values for 'some data' and 'some code', does DoIt have a reachable end point?" If we can't solve the problem for programs with this very restrictive format then we can't solve it in general, so let's explore whether we can solve problems in this limited format.

Let's suppose that we have a class in our compiler library with a public method that answers that question. We assume that "code" is a legal body of a C# program and "data" is the legal contents of a string literal.

public class Compiler
{
public static bool HasReachableEndPoint(string code, string data)
{
// [Omitted: work out the result and return it]
}
}

And now things get really weird. What if we call:

string weird = @"
if (Compiler.HasReachableEndPoint(data, data))
return 123;
";
bool result = Compiler.HasReachableEndPoint(weird, weird);

What does that do? It attempts to work out whether DoIt has a reachable end point in this program:

class Program
{
private static string data = @"
if (Compiler.HasReachableEndPoint(data, data))
return 123;
";
public static int DoIt()
{
if (Compiler.HasReachableEndPoint(data, data))
return 123;
}
}

What is "result"? Suppose it is true. Then that means that DoIt has a reachable end point. Which means that when we run DoIt, the call in the conditional statement returns false. But "data" and "weird" are the same string, so why is "result" true if the result of the same call is going to be false? That's logically inconsistent, so result cannot be true. Clearly I cannot drink from the cup closest to me.

Suppose "result" is false. That means that DoIt does not have a reachable end point. Which means that Compiler.HasReachableEndPoint(data, data) always either returns true, or throws an exception, or runs forever. But again, "data" and "weird" are the same string, so why would it return true, throw, or run forever here, when by assumption it returns false normally when given that input? Clearly result is not "false" either, since that leads to a contradiction. Clearly I cannot drink from the cup closest to you.

Since result can logically be neither true nor false, HasReachableEndPoint must either throw or go into an infinite loop when given these inputs. But if it does that then there are some inputs for our reachability tester which cause the compiler to fail or run forever, which seems bad. The compiler needs to be able to compile all legal programs and error on all illegal programs; ideally we don't want to crash, run forever, or give a wrong answer for any possible input.

What we've shown here is, first, that you should never go up against a Sicilian when death is on the line, and second, that an endpoint reachability tester logically must either (1) sometimes give the wrong answer, or (2) sometimes fail to give an answer. The reachable endpoint problem is in general not reliably solvable by compilers no matter how clever the compiler writers are.

Now, you might reasonably push back on this proof, noting that the proof relies upon an absolutely crazy situation, namely, a program that contains part of its own code as data that itself calls the reachability analyzer in what Douglas Hofstadter calls a "strange loop". Even if you don't like the proof though, we have other evidence that a "perfect" reachable end point detector that always returns a correct result in finite time is probably impossible. Consider for example this totally trivial little method, with just five loops and some calculations on an arbitrary-precision integer type:

public static int DoIt()
{
Integer max = 0;
while(true)
{
max += 1;
for (Integer x = 1; x <= max; ++x)
for (Integer y = 1; y <= max; ++y)
for (Integer z = 1; z <= max; ++z)
for (Integer n = 1; n <= max; ++n)
if (x.Power(n+2) + y.Power(n+2) == z.Power(n+2)) goto done;
}
done: ;
// Uh oh, we "forgot" to return an int.
// But that's only a problem if the label is reachable.
}

Want to know if Fermat's Last Theorem is true? Just run your magical perfect C# compiler on a class containing that method and see if it compiles. If it fails with a "reachable end point in non-void-returning method" error then the goto must have been reachable, which means that there is a non-trivial solution to the equation, and Fermat's Last Theorem is false. If compilation succeeds (with a warning that there's an unreachable label!) then the goto was unreachable and the theorem is true. To answer the question you don't even need to run the program, you just have to run the compiler! The fact that this program is insanely inefficient in the amount of recalculation it performs is irrelevant, since we're not going to even run it.

Such a compiler would enable us to answer any open question in finite mathematics simply by writing a program that terminates if the hypothesis is true and runs forever if it is not. It seems inconceivable that we could even in theory construct a compiler that had the ability to answer all unsolved problems in the entire history of finite mathematics.

This famous problem is called the "Halting Problem" because it is usually posed for computational systems that do not have "throw an exception" as an option. The only alternatives are "halt with a result", or "go into an infinite loop".

The Halting Problem is of course solvable by heuristics provided that you can tolerate false positives. The C# compiler will cheerfully tell you if the end point of a method is absolutely guaranteed to be unreachable, but as we've seen, you can fool it into telling you that some unreachable end points are reachable. As long as it does not have false negatives (that is, sometimes tells you that the reachable end point of a method is unreachable) everything is fine.

Comments

  • Anonymous
    February 23, 2011
    Something about six fingers and Guilder.

  • Anonymous
    February 23, 2011
    You're trying to trick me into giving away something. It won't work.

  • Anonymous
    February 24, 2011
    This doesn't seem right.  Surely the contents of DoIt should be valid code, but it's attempting to "return 123" from a function declared to return bool.  If you change "return 123" to a throw your argument seems to make more sense. That was a typo; thanks for bringing it to my attention. - Eric

  • Anonymous
    February 24, 2011
    I can see future versions of the compiler becoming dangerous if you can engineer a crash... "Hello. My name is Eric Lippert. You killed my compiler - prepare to die!" Or perhaps every successful build will finish with "As you wish..." (Your timing is impeccable. I've recently introduced my kids to The Princess Bride, and just yesterday my eldest asked me to start teaching him programming. Maybe I won't start with the halting problem though...) Out of interest, am I right in saying there was a post about the "never" return type a couple of years ago? I tried to find it when part one was posted, but failed...

  • Anonymous
    February 24, 2011
    The comment has been removed

  • Anonymous
    February 24, 2011
    Either I'm misunderstanding something, or the code is wrong.  (Probably me misunderstanding.) > What is "result"? Suppose it is true. Then that means that DoIt has a reachable end point. Which means that when we run DoIt, the call in the conditional statement returns false. Why would the conditional return false?  If it returns true, then we hit the "return 123": > if (Compiler.HasReachableEndPoint(data, data)) return 123; Should the code be?: > if (!Compiler.HasReachableEndPoint(data, data)) return 123

  • Anonymous
    February 24, 2011
    @Derek: Same confusion I had.  "Does it have a reachable end point" means "can it reach the end of the function", not "does it return successfully".

  • Anonymous
    February 24, 2011
    "Such a compiler would enable us to answer any open question in finite mathematics simply by writing a program that terminates if the hypothesis is true and runs forever if it is not." Usually it's the other way around--the program terminates if the hypothesis is false (because it found a counterexample and exited, like your FLT method) or diverges if the hypothesis is true. You can always invert the sense of the hypothesis, but being precise here is important when you're talking about [co-][semi-]decidability.

  • Anonymous
    February 24, 2011
    @Carlos, thanks.  I was indeed confused by that.

  • Anonymous
    February 24, 2011
    "Such a compiler would enable us to answer any open question in finite [= discrete, I guess] mathematics" - not quite. Precisely, it would enable you to solve any question classified in Computability Theory as "solvable with an oracle to the halting problem" - "Finite mathematics" is another name for what is usually called "discrete mathematics"; it was called "finite" when I was taught it, and old habits die hard. And yes, you are absolutely right about the set of solvable problems, though of course the definition of that set is somewhat circular when stating it informally. (The set of problems that could be solved by such a device is... the set of problems that could be solved by such a device!) I decided to not go into that level of detail in this brief sketch. What I'm getting at here is that a whole host of unsolved (or until-recently-unsolved) problems believed to be extremely difficult would be solved by such a magical device; Goldbach's Conjecture, Fermat's Last Theorem, and so on. That's evidence that such a device is at the very least far beyond our present capabilities, and probably impossible. -- Eric There is a vast literature about questions that fall into this class and others that don't. See the Wikipedia article on the Arithmetical Hierarchy, or better yet, a good textbook like Floyd & Beigel's "The Language of Machines". Clearly, this class of questions is tantalizing enough to justify the "inconceivability" of an algorithm to solve it; but given that any PC can now perform tasks that 50 years ago were "inconceivable", I trust the proof by contradiction better.

  • Anonymous
    February 24, 2011
    Thanks for your comment.  As a lecturer in Computer Science, I value "practical" people who can explain the pertinence of basic theory (like undecidability) to practice (like compilers).

  • Anonymous
    February 24, 2011
    The comment has been removed

  • Anonymous
    February 24, 2011
    The compiler thinks I meant the function to exit, but if I know compilers, and I do, then I know that it knows that I think I meant the function to exit. Luckily I've been programming for years, so I've built up an immunity to the halting problem!

  • Anonymous
    February 24, 2011
    I recently saw a question on StackOverflow that was something like "Sometimes the user initiates an operation that never completes; how do I terminate the operation" and one of the answers was essentially "Rather than attempting to terminate the operation, just put it in a condition like this:    if (/* operation would complete for this input */) {        // perform operation    }" Apparently the person who wrote that answer failed to understand that the "operation would complete" function is decidedly non-trivial. Although in this case, it wasn't quite the traditional halting problem, but still it was an answer that you could only determine by actually trying the operation.

  • Anonymous
    February 24, 2011
    Both possible code paths halt. I've spent the past year building an immunity to logical contradictions.

  • Anonymous
    February 24, 2011
    The comment has been removed

  • Anonymous
    February 24, 2011
    The comment has been removed

  • Anonymous
    February 25, 2011
    Fermat's Last Theorem talks about positive integers - a set having infinite number of numbers -, however on computers I think one can have types with finite value range. Would it really help to answer Fermat's Last Theorem question to know that this code never reaches the label ? Since it would mean only that the theorem is true only for numbers from 0 to Integer.MaxValue ( assuming that You represent number in a format that Integer.Maxvalue+1 = 0). I think we can check if the function reaches the label or not in finite (but quite long) time since we just have to check all possible combinations knowing that the value range of the variables is finite.

  • Anonymous
    February 25, 2011
    Zsolt: Eric said arbitrary precision integers, I guess he is also assuming infinite memory is available aka your classic Turing machine with infinite tape.

  • Anonymous
    February 25, 2011
    Presumably, "legal" is what the language standard says is legal.

  • Anonymous
    February 28, 2011
    Nice article, but no mention of Turing, strangely. Correct, I did not mention Alan Turing, who first proved that the Halting Problem was not solvable by any conventional means of implementing algorithms. Neither did I mention David Hilbert, who first proposed a version of the Halting Problem (though in his time it was called the Entscheidungsproblem which means "Decision Problem"; I also did not mention Martin Davis, who actually coined the term "Halting Problem"). Neither did I mention Kurt Gödel, whose earlier work on undecidability of formal proof systems is clearly related. Neither did I mention Georg Cantor, whose technique of "diagonal argument" is used in the proof. I did however make passing mention of Douglas Hofstadter, whose book Gödel Escher Bach has a delightful discussion of the Halting Problem, amongst many other things. I'm not writing a history of the theory of decidability here, I'm writing a little blog post about the interesting things you run into when writing a C# compiler. See Wikipedia if you want a summary of the history of the problem. - Eric

  • Anonymous
    February 28, 2011
    Excellent post. Thanks!

  • Anonymous
    March 01, 2011
    The comment has been removed

  • Anonymous
    April 08, 2011
    In this particular case the compiler can't verify no matter how clever. It doesn't know that I replaced the code in Power at load time.

  • Anonymous
    November 27, 2012
    The first YouTube link is dead :(