What is "binding" and what makes it late?

Статья
02/06/2012

"Late binding" is one of those computer-sciency terms that, like "strong typing", means different things to different people. I thought I might describe what the term means to me.

First off, what is "binding"? We can't understand what it means to bind late if we don't know what it is to bind at all.

A compiler is by definition a device which takes in a text written in one language and outputs code that "means the same thing" in another language. The compiler I work on, for example, takes in C# and outputs CIL(*). All the principle tasks performed by a compiler falls into one of three broad buckets:

Syntactic analysis of the input text
Semantic analysis of the syntax
Generation of the output text -- we will be unconcerned with this step in this article

Syntactic analysis is analysis of the input text that does not consider anything about the meaning of that text; the syntactic analyzer is concerned with first determining the lexical structure of the program (that is, where all the boundaries are between comments, identifiers, operators, and so on) and then from that lexical structure determining the grammatical structure of the program: where are the boundaries between classes, methods, statements, expressions, and so on.

Semantic analysis then takes the results of the syntactic analysis and associates meanings with various syntactic elements. For example, when you say:

class X {}
class B {}
class D : B
{
public static void X() { }
public static void Y() { X(); }
}

syntactic analysis determines that there are three classes, that one of them contains two methods, the second method contains a statement which is a call expression. Semantic analysis determines that the X in X(); refers to the method D.X(), rather than, say, the type X declared above. That's an example of "binding" in its most widely-agreed-upon sense: binding is the association of a syntactic element that names a method with a logical element of the program.

"Binding" in the sense of "early" or "late" binding is almost always used to refer to the evaluation of a name used as a method call. That, however, is far too strict a definition of "binding" for me. I would also use "binding" to describe how the compiler's semantic analyzer determines that class D inherits from class B; the name "B" is bound to the class.

Moreover, I would use "binding" to describe other analyses as well. If you had an expression 1 * 2 + 1.0 in your program then I would say that the "+" operator is "bound" to the built-in operator that takes two doubles, adds them, and returns a third. People do not normally think of that analysis as associating the name "+" with a method, but nevertheless I consider it to be "binding".

Speaking even less carefully, I might also use "binding" to describe the association of types with expressions that do not directly name the type. In the example above if speaking casually I might say that the syntax 1 * 2 is "bound" to the type int, even though obviously it does not name that type. The syntactic expression is strongly associated with that semantic element, even though it does not name it.

So, speaking generally I would say that "binding" is any association of some fragment of syntax with some logical program element. (**)

What then is "late" vs "early" binding? People talk about these two kinds of bindings as though it was a binary choice, either early or late. As we'll see, actually there is more of a spectrum than you might imagine; some bindings are entirely early, some are partially early and partially late, and some are very late indeed. But before we get into that, earlier or later than what?

Basically by "early binding" we mean "the binding analysis is performed by the compiler and baked in to the generated program"; if the binding fails then the program does not run because the compiler did not get to the code generation phase. By "late binding" we mean "some aspect of the binding will be performed by the runtime" and therefore a binding failure will manifest as a runtime failure. Early and late binding might better be called "static binding" and "dynamic binding"; static binding is binding performed using "static" facts known to the compiler, and dynamic binding is performed using facts "dynamically" known to the runtime.

So which is better? Clearly neither is unambiguously better than the other; if one was clearly the winner then we wouldn't be having this discussion in the first place. The benefit of early binding is that you gain confidence that your program will not fail at runtime; the down side is that you lose the flexibility that comes with late binding. Early binding requires that all information required to make the right binding decision be known before the program runs; but that information might not be available until runtime.

I said that early and late binding falls on a spectrum. Let's take a look at some examples in C# of how we can "turn the dial" on early-to-late binding.

We begin with our example above of calling the static method X. This analysis is entirely, unambiguously early. There is no doubt at all that when Y is called, it is going to call the method D.X. No part of that analysis is deferred until runtime and the call will unambiguously succeed.

Next, consider:

class B
{
public void M(double x) {}
public void M(int x) {}
}
class C
{
public static void X(B b, int d) { b.M(d); }
}

Now we know less. We do a lot of early binding here; we know that b is of type B and that the call is to B.M(int). But unlike the previous example, we have no guarantee from the compiler that this invocation will succeed. b could be null. We are essentially deferring the analysis of whether or not the receiver of the call is valid to the runtime to determine. One normally does not think of that as a "binding" decision though, because we are not associating syntax with a program element. Let's make the call in C; a little more late bound by changing B:

class B
{
public virtual void M(double x) {}
public virtual void M(int x) {}
}

We are now doing some binding analysis at compile time; we know that the invocation will be on the virtual method declared by B.M(int). We know that the call will succeed in the sense that there will be such a method to call. However, we do not know which method will actually be called at runtime! There could be a derived type that overrides that method; some completely other code could be called that appears nowhere in this program. Virtual dispatch is a form of late binding; the decision about which method to associate with the syntax b.M(d) is partially made by the compiler and partially made by the runtime.

How about this?

class C
{
public static void X(B b, dynamic d) { b.M(d); }
}

Now the binding is almost entirely deferred until runtime. The compiler generates code that tells the Dynamic Language Runtime that static analysis has determined that the static type of b is B and that the method being invoked is called M, but the actual overload resolution to determine whether it is B.M(int) or B.M(double) (or neither, if d is, say, a string) is done by the runtime based on that information. (***)

class C
{
public static void X(dynamic b, dynamic d) { b.M(d); }
}

Now the only portion of the binding performed early is the fact that this is a method call on a method named M. This is almost as late-bound as it gets, but we can in fact go one step further:

class C
{
public static void X(object b, object[] d, string m, BindingFlags f)
{
b.GetType().GetMethod(m, f).Invoke(b, d);
}
}

Now everything is late-bound; we don't even know what name we are going to be associating with a method. All we can possibly know is that the author of X expects that the object passed as b has a single method named by the string in m that match the flags in f, and that it takes the arguments given in d. There is nothing whatsoever we can do to analyze this call site at compile time. (****)

(*) Of course the output is encoded into a machine-readable binary format, rather than the human-readable CIL format.

(**) You might then ask whether "binding" and "semantic analysis" are not synonyms; surely semantic analysis is nothing more than the association of syntactic elements with their meanings! Binding is much of the semantic analysis phase of the compiler, but there are many analyses that we must perform after a method body is entirely "bound". Definite assignment analysis, for example, cannot by any reasonable stretch be called "binding"; it is not associating syntactic fragments with specific program elements. Rather, it is associating lexical locations with facts about program elements, facts like "the local variable blah is not definitely assigned at the beginning of this block". Similarly, optimizing arithmetic expressions is a form of semantic analysis but is clearly not "binding".

(***) The compiler could still do lots of static analysis here. For example, suppose B were a sealed class with no methods named M. Even with a dynamic argument to the call we could know statically that the runtime binding of M will always fail, and tell you that at compile time. And in fact the compiler does some such analyses; precisely how it does so is a good topic for another day.

(****) In some sense this example gives a counterexample to my definition of binding; we are no longer even associating a syntactic element with a method; we're associating the contents of a string with a method.

Comments

Anonymous
February 06, 2012
"(***) The compiler could still do lots of static analysis here. For example, suppose B were a sealed class with no methods named M. Even with a dynamic argument to the call we could know statically that the runtime binding of M will always fail, and tell you that at compile time." It could, but not as invariantly as it seems. That depends on whether a matching extension method could be found in surrounding compilation units. Dynamic code does not consider extension methods when performing overload resolution; we have no way of expressing in the call site what set of extension methods were "in scope" at the point of the call. -- Eric
Anonymous
February 06, 2012
...or in referenced assemblies, for that matter (for extension methods on object).
Anonymous
February 06, 2012
What is this "dynamic" type? Is that actual C#? Perhaps you missed the release of C# 4, which added improved interoperability with dynamic languages and legacy object models. (And also covariance and contravariance of generic delegates and interfaces constructed with reference types.) -- Eric
Anonymous
February 06, 2012
@JesperEC I don't think it's possible to "early bind" extension methods... what if one comes completely out of nowhere, say if the extension method is created through a MethodBuilder. (I don't know if this is actually possible, but I can't see why it wouldn't be, at least in theory).
Anonymous
February 06, 2012
@Aaron: Depends on how early. The exact method is definitely set in stone into the executable and bound before runtime; like Eric says, it depends on the code and assemblies available at compile-time, and the namespaces in scope at the call site. That's deterministic at compile-time, so there's no reason not to set it there (or to fail with an error if it can't work out).
Anonymous
February 06, 2012
@Eric: Of course, duh, sorry. I've been bit by this myself. Did you make any attempts to have LINQ work with dynamic via alternate mechanisms?
Anonymous
February 06, 2012
@JesperEC Your second example was actually my inspiration. The scenario I'm imagining has an extension method defined on a superclass of B at compile-time, and a runtime-generated extension method on B itself that would override the method on the superclass. As I said though, I don't know if that scenario is actually possible.
Anonymous
February 06, 2012
@Aaron: No, I think the compiler expands the extension method to a direct invocation of the specific extension method to prevent the method from being swapped at runtime.
Anonymous
February 06, 2012
"Now the only portion of the binding performed early is the fact that this is a method call on a method named M." - why not an Invoke call on a delegate which is the value of a property named M? (Visual Basic makes this more interesting with the way it handles indexed properties, too) One thing that is interesting is how C# and VB's dynamic differs in the handling of public members of non-public classes. (long story short: VB allows it, C# doesn't, at least in my test)
Anonymous
February 06, 2012
The comment has been removed
Anonymous
February 06, 2012
The comment has been removed
Anonymous
February 07, 2012
Since binding information is available during compile time itself, c# compiler can generate better optimized code than in cases where there is dynamic binding. Is this right?
Anonymous
February 08, 2012
No idea how good the JIT in .NET is, but I think that actually adds another aspect to it: The JIT may cause a virtual method to be called directly without virtual dispatch (now we also have to distinguish between "runtime" when the JIT compiles the code and the actual time when the method is called), but later when an additional class is loaded revert back to the virtual dispatch * Now we get from late to "early" binding and maybe even back to late all at runtime. *JVMs have done that for years.
Anonymous
October 23, 2012
so early binding is compile time check and late binding is run time check

Поделиться через

What is "binding" and what makes it late?

Comments

Дополнительные ресурсы