Udostępnij za pośrednictwem


Never Say Never, Part One

Can you find a lambda expression that can be implicitly converted to Func<T> for any possible T?

.

.

.

.

.

.

.

.

.

.

.

Hint: The same lambda is convertible to Action as well.

.

.

.

.

.

.

.

.

.

Func<int> function = () => { throw new Exception(); };

The rule for assigning lambdas to delegates that return int is not "the body must return an int". Rather, the rules are:

* All returns in the block must return an expression convertible to int.
* The end point of the block must not be reachable.

Both those conditions are met. The first one is vacuously met; zero out of zero returns meet the condition, so that's all of them. The second one is met because the compiler can deduce that no possible code path hits that end brace. Either "new Exception()" throws, or it goes into an infinite loop, or it succeeds and its value is thrown; no matter what, there's no possibility of the function completing normally. Of course the conditions are met for any type argument, not just int.

Similarly, it's assignable to Action because the rule for Action is simply that every return in the block must not have an expression. Again, that condition is met vacuously.

The rule for lambdas is just a special case of the rule for regular functions. This is perfectly legal, for precisely the same reasons:

int I.M()
{
throw new NotImplementedException();
}

Why then is this application of the "extract method" refactoring not legal?

private static void AlwaysThrows()
{
throw new NotImplementedException();
}
int I.M()
{
AlwaysThrows();
}

The problem here is that the C# compiler does not perform interprocedural control flow analysis. We do analysis of one method body at a time, and we trust the declared return type of the method to be accurate. The declared return type of AlwaysThrows is void, and void means that it returns no value, but it does possibly return. Therefore, the end point of the call to AlwaysThrows is reachable, and therefore the end point of M is reachable without returning an integer. You and I both know that it is not reachable, but the compiler is not sophisticated enough to know that.

Of course, this is a silly example, but it doesn't take much to turn this into a realistic example. You see this sort of thing in unit testing frameworks all the time:

Frog frog;
try
{
frog = Animals.MakeFrog();
}
catch(Exception ex)
{
LogAndThrowTestFailure(ex); // always throws
}
frog.Ribbit();

The compiler complains that frog.Ribbit() is illegal because MakeFrog might have thrown before frog was assigned, and LogAndThrowTestFailure -- which we know always throws but the compiler doesn't know that -- might have returned normally, in which case frog is not definitely assigned at the point of the call. If instead it had been

catch(Exception ex)
{
throw LogTestFailureAndReturnAnotherException(ex);
}

then the compiler would correctly reason that the call to Ribbit is only reachable if the assignment succeeded.

What, if anything, can we do about this?

In practice, nothing. You've got to write something like

int I.M()
{
AlwaysThrows();
return 0;
}

to shut the compiler up, or make AlwaysThrows return the exception and then throw it.

What about in theory? Is there anything the language designers could have done to ease this burden?

As I mentioned before, we could do interprocedural analysis, but in practice that gets real messy real fast. Imagine a hundred mutually recursive methods that all go into an infinite loop, throw, or call another method in the group. Designing a compiler that can logically deduce reachability from a complex topology of calls is doable, but potentially a lot of work. Also, interprocedural analysis only works if you have the source code for the procedures; what if one of these methods is in an assembly, and all we have to work with is the metadata? (Moreover, as we'll see next time, even interprocedural flow analysis is insufficient to solve the problem in general.)

What we need to solve this problem without interprocedural analysis of source code is a another kind of return type. The CLR supports three kinds of return types today. You can return values of value type or reference type, like int or string. You can return nothing, in which case the method is marked "void". Or you can return an alias to a variable. (C# does not support this latter feature; C# only supports "ref" on variables going in to a method call, but we could also support ref on variables coming out if we chose to. Don't hold your breath while waiting for it.) What we need is a fourth kind of return type, the "this method never returns normally" return type. Such a method would have to contain no returns whatsoever and not have a reachable end point. We could know that it does not have a reachable end point by checking whether on every possible code path it always throws, always goes into an infinite loop, or always calls another "never" method.

Some programming languages do have a "never" return type; Curl, for example. A similar function annotation has also been proposed for ECMAScript. But since doing it properly in C# requires support from the verifier in the CLR, it's unlikely that it will become a feature of mainstream CLR languages. Particularly when there are such easy workarounds for the rare circumstances in which you are calling a method that never returns. (*)

Next time: could we be more clever? Just how clever can we be?

(*) For additional thoughts on programming styles in which methods never return, see my long series of articles on Continuation Passing Style.

Comments

  • Anonymous
    February 20, 2011
    Can't you solve by using a postcondition like Contract.Ensures? I would love to see support for Contract Programming in C#.

  • Anonymous
    February 20, 2011
    Actually, it's possible to refactor such methods into not throwing the exception but returning it to be thrown by the caller. private static Exception AlwaysThrows() {  return new NotImplementedException(); } int I.M() {  throw AlwaysThrows(); } That will result in a differrent exception call stack but that should not be a problem for such a scenario.

  • Anonymous
    February 20, 2011
    Actually you can even throw exception in AlwaysThrows and never return, it will preserve the call stack then.

  • Anonymous
    February 20, 2011
    If you wanted to support this feature entirely within C# without CLR-verifier support, it could be done with an attribute on the method. The difference between 'returns void' and 'never returns' only matters for reachability analysis which is only done at the compiler level, no? Even if not, it could still be done at the compiler level with some hackery; you could have the method (with an attribute) return the exception to be thrown, and make the compiler translate every call to a method with that attribute automatically translate into a throw of the result of calling that method... I think...

  • Anonymous
    February 21, 2011
    What would be the point of supporting ref return types? I can't think of any use case...

  • Anonymous
    February 21, 2011
    @Ihar - I think it would be cleaner to make AlwaysThrows() generic (with signature T AlwaysThrows<T>()), and then use "return AlwaysThrows()" instead of "throw AlwaysThrows()". @Thomas - one example is the Address method on the T[] type.

  • Anonymous
    February 21, 2011
    The comment has been removed

  • Anonymous
    February 21, 2011
    In what why is verifier support required?  This code compiles to verifiable MSIL: public ref struct NoReturn { __declspec(noreturn) static void DoesNotReturn() { throw gcnew System::Exception(); } static int Another() { DoesNotReturn(); } }; Yes the compiler uses a trick to achieve verifiability, but so could C#.

  • Anonymous
    February 21, 2011
    The generated MSIL is available at http://codepad.org/pFCwpCPD (to avoid being a spoiler). Oh, and the C++/CLI code I gave is also warning-free.

  • Anonymous
    February 21, 2011
    This is clearly beside the point of the post, but the first thing I thought of when looking at the initial question was: () => default(T) Are there situations where this would not work?

  • Anonymous
    February 21, 2011
    @Shawn: where would T come from in the lambda?

  • Anonymous
    February 21, 2011
    @Ben: it looks like C++/CLI does not preserve __declspec(noreturn) in assembly metadata. So if you define the method in one assembly or module, and reference it from another via #using, this no longer works.

  • Anonymous
    February 21, 2011
    I don't see why a distinct bottom type would be necessary for this. For an expression-centric language such as F#, sure, it's handy to have one if only to express the type of a throw-expression - though F# just says it's 'a forall 'a, i.e. universally substitutable, which is good enough for type analysis. Same can be done in C#, in fact. The reason why you need that extra piece of information about the fact that the value is never going to be returned is solely due to reachability analysis, and that exists only because C# distinguishes statements and expressions - again, to contrast versus F#, in the latter you cannot avoid returning a value from a function, because it entire body is an expression that has a value - and either the types match, in which case it's all good, or the types don't match, in which case it's an error. For C#, it could just as well be done by some attribute placed on a method to indicate that it never really returns (separately from its declared return type), similar to __declspec(noreturn) in VC++. And, of course, the compiler can just insert "ret" in IL as needed to achieve verifiability - there's no reason to bother CLR with it all, it's a higher-level concept and can be perfectly well expressed in terms that CLR understands already.

  • Anonymous
    February 21, 2011
    The comment has been removed

  • Anonymous
    February 21, 2011
    The comment has been removed

  • Anonymous
    February 21, 2011
    There is an interesting post in the C# Language Forum relating to overload resolution & lambdas that never return. social.msdn.microsoft.com/.../0BB53E51-E24E-4BF1-B388-655C460C19C0

  • Anonymous
    February 21, 2011
    @ Shawn () => default(T) wouldn't work for Func<T1,T2>

  • Anonymous
    February 21, 2011
    Does this really happen in practice? I can imagine having a function that crafts an exception object, but I don't get why you'd have that function throw the exception too. throw MakeMeAnException(x,y,z); vs ThrowAnException(x,y,z); return; // Required to avoid compiler error. What practical cases are there to prefer the second pattern over the first?

  • Anonymous
    February 22, 2011
    Bill P. Godfrey: Yes, it does happen in practice. If you use Reflector, for example, you can see that mscorlib has an internal class called ThrowHelper whose only purpose is to throw exceptions. I don't know why they throw the exceptions instead of just returning them, but maybe they save some minimal code space that adds up.

  • Anonymous
    February 22, 2011
    Gabe: That's me told. Thank you. I would be very interested to know why that function exists. I wonder if that reason is just because it's the .NET runtime and if that reason applies to non-runtime code that the rest of us would write. If I may rephrase my question; What practical cases are there for non-runtime code to prefer a function that always throws and dealing with the compiler error, over calling a function that builds an exception and throwing the returned instance?

  • Anonymous
    February 22, 2011
    @Pavel Minaev I agree but emitting an attribute for a method that never returns is only half of the work. Every where a nerver-returning-method is called a throw new InvalidProgramException() should be inserted right after the call to the method. Why? what happens if someone uses the attribute on a method that actually returns? The compiler need to ensure that, that situation will be caught but with that slight addition I agree with you that it's not necessary to have it supported in the CLR

  • Anonymous
    February 24, 2011
    The comment has been removed

  • Anonymous
    March 02, 2011
    "In practice, nothing. You've got to write something like..." What are your thoughts on something like this? int I.M() {  return AlwaysThrows<int>(); } /Helpers/ T AlwaysThrows<T>() {    throws new Exception() }

  • Anonymous
    March 15, 2011
    "interprocedural analysis only works if you have the source code for the procedures" Or if you have an attribute that claims it never returns. I think the verifier already does some reachability analysis to deal with methods (void or not) lacking a 'ret' instruction (void f() { throw new Exception(); } doesn't generate a ret instruction), it could just as easily declare code unverifiable if it has this attribute but can return. Also, always throwing an exception is not the only circumstance under which a method can never return. Calling Environment.Exit is another one.

  • Anonymous
    March 15, 2011
    "interprocedural analysis only works if you have the source code for the procedures" Or if you have an attribute that claims it never returns. I think the verifier already does some reachability analysis to deal with methods (void or not) lacking a 'ret' instruction (void f() { throw new Exception(); } doesn't generate a ret instruction), it could just as easily declare code unverifiable if it has this attribute but can return. Also, always throwing an exception is not the only circumstance under which a method can never return. Calling Environment.Exit is another one.

  • Anonymous
    April 08, 2011
    In C, we have this. "volatile void" is the compiler hint for does not return and the compiler will dutifully omit the code to rebalance the stack or the function epiloge code in the caller if necessary.

  • Anonymous
    October 05, 2011
    Thomas, a "ref" return type IS useful for accessing collections more efficiently; I do it in my C++ code. Let's say you want to increment the value associated with a key-value pair in an IDictionary... today you have to first get the value, then set the value as a separate operation. If you could get a reference to the value, then you could simply increment it, doubling the speed of the code by not doing two separate lookups.