Compartilhar via


SimpleScript Part Seven: Binder Skeleton

In Part Five I was discussing modules: there is a "global module" and any number of additional modules. Each module is associated with a named item, and the only module which is associated with more than one named item is the global module. This means that each module is going to need its own name table to keep track of the functions, the variable names and values, etc, in it. We'll call such devices "binders" because they bind names to values.

For reasons which will become more clear in future episodes, it is convenient to have the binder implement the IDispatch interface. Normally if you had to implement IDispatch you'd build a type library, and then have your implementation of IDispatch call through to the ITypeInfo methods which look up dispatch ids, invoke methods, and so on. There's a good reason for that: rolling your own IDispatch code can be extremely tricky. But, we're pretty much stuck with it here -- we haven't got the convenience of a known vtable interface to wrap a type library around at compile time. The script engine name tables are going to be extremely dynamic, so we're going to have to roll our own IDispatch.

The code isn't nearly done yet, obviously, but I've got a good skeleton in place. Take a look at binder.cpp for the details. I've pretty much got the high-level semantics of the dispatch interface worked out, but the actual low-level implementation details all return E_NOTIMPL. What I need to build here is a special hash table which can efficiently look up values by both numeric id and name. That's not too challenging; what is going to be more difficult is implementing the SCRIPTITEM_GLOBALMEMBERS feature. We need to be able to add another arbitrary dispatch object to our binder and dynamically aggregate all of its methods, without causing any collision in dispatch ids!

Default property semantics are a little weird. In the wacky world of late bound objects, an object may have a "default" method. This feature was created so that you could call the Item method of a collection the same way you dereference an array in Visual Basic -- you make it look like a function call. That is, since this code works:

Dim arr(10)
arr(1) = 123

then so should this code

dict = CreateObject("Scripting.Dictionary")
dict(1) = 123

the last line is the same as

dict.item(1) = 123

Such is the wackiness of VB -- because the array dereference and function call syntax are conflated, they ended up making function calls on invisible methods be legal on the left hand side of an assignment statement. The mind fairly boggles. We'll be good citizens in SimpleScript. If we have an object in our binder and someone attempts to call it, we'll just pass on the arguments to its default method and let it sort out what the right thing to do is.

As you can see, there is a whole lot of parameter validation, and altogether about eleven separate cases that I cover in the implementation of Invoke. The actual implementation in VBScript and JScript is much, much more complex than this because they support IDispatchEx, garbage collection, property accessor functions, and numerous other features that complicate the code. But still, this is the longest function you're likely to see me write in this project; I'll be very surprised if I write any code more complicated than the dispatch logic.

Speaking of garbage collection, I'm not quite sure what kind of garbage collector I'm going to build into SimpleScript. Building a mark-and-sweep collector like JScript has would be instructive, but it would also take a lot of time and effort. What do you guys think? (If you look at invoke.cpp you'll see that I'm thinking ahead about issues we're going to run into in garbage collection. We're going to run into related issues in the script execution code; it is possible for ill-behaved hosts to shut down the script engine in a callback while it is executing! We need to be robust in the face of such behaviour.)

The next step is to get the actual guts of the binder working, and then build a list of code blocks to be executed when the script engine goes to started state. The clone semantics for the code blocks are a little tricky, so we'll get that straightened away, and then maybe, just maybe, actually write a language parser.

UPDATE: For your convenience, I've put a zip file containing all the sources at https://www.eric.lippert.com/simplescript.zip. I'll keep this zip file updated as I change the sources on the blog.

Comments

  • Anonymous
    May 04, 2004
    re: What collector to build?

    I think it would be easier to pick a GC algorithm after we know what the language is going to be like.

    Here would be my hit list of design decisions to make before thinking about how to GC:

    Does the language support references?
    Does the language support global variables?
    Does the language support closures?
    Are the scope and lifetime of a variable closely related?
    What are the major object lifetime patterns?
    Does the average object contain other objects?
    What's the size of the average object?

    Up until this point there's been very little code that cares about GC and I think that's going to continue to be true for a while. There's no need to commit to a method early on.
  • Anonymous
    May 04, 2004
    Indeed! As I've mentioned before in my blog, normally the design process consists of a series of questions which dig deeply into the customer requirements. You figure out what problem they need to solve, and that drives the design.

    But with a pedagogic example like SimpleScript, the cart gets before the horse. The "problem" that I'm looking to solve is to demonstrate practical issues in implementing language processors. What I'm looking for are what sort of problems people would like to see solved. If people want to see how a M&S GC is implemented, then I'll design a language that has circularly referenced objects that live beyond their scopes.

    I'll therefore broaden the question -- what sort of language features do you think would be instructive? I started this project with the intention of making the language simple because I figured that the interesting, poorly understood part was the semantics of the engine interfaces. But if it turns out that what people are really interested in are GCs, closures, expando objects, and the like, then I'll start laying the groundwork for those things earlier rather than trying to bolt them on at the end.
  • Anonymous
    May 04, 2004
    more re: What collector to build?

    What do you want to get out of this project? I'd love to see a copy collector because designing one is pretty interesting. And once you've got a working Cheney scan, it's easy to experiment with an incremental collector, generational collector, train collector, type segregating collector, whatever. But you seem to have some priorities in mind already about what you want to focus on.
  • Anonymous
    May 04, 2004
    And while that may look like a carefully crafted reply to your question, it was written before you post. I'd have to think a while about proper pedagogy and such.
  • Anonymous
    May 04, 2004
    The comment has been removed
  • Anonymous
    May 04, 2004
    > the .NET framework uses a generational M&S GC to deal with this problem

    I'm going to guess that it essentially uses a copy collector for the new generation and only uses mark-and-compact on the tenured generations (where the old objects live). If all of the objects are going to move anyways, there will be less of a pause for copying than M&C which is the goal for Eden. But then I've never seen the code...

    Train algorithm- a train collector divides memory into blocks called cars. An object has references either within its car (good) or to other cars (bad). The collector over time hooks together cars that refer to each other into a train. Collection within a train is fast since we've partitioned the heap into smaller sets that we don't need to trace between. This allows a train collector to run GCs with very short pauses even on older generations. Train algorithms are also used in many ways besides garbage collection... Downside: the train collector has less throughput compared to M&C.

    Type segregating collector- a type segregating collector arranges objects in memory according to their type. It can group together types by size (allowing for an arena-like layout to be used) or group together types by whether they can contain references. If all of the objects that can't contain reference are together on the heap, you can just test the pointer address of the object to see if the collector needs to bother looking at it. Downside: requires the collector to be very knowledgeable about your type system.

    All of the techniques that are being mentioned here are primarily for uniprocessor designs. Once you've got multiple processors, it's generally better to sacrifice throughput on a particular processor for parallel collection. And as the number of processors gets even larger, you can sacrifice latency on a particular processor for concurrent collection.
  • Anonymous
    May 04, 2004
    Speaking of comments...

    >>>
    // We must addref the pointer before the invocation. Why? Consider
    // this scenario: the invocation calls a method which calls back
    // into the script engine, which triggers a garbage collection, which
    // does the final release on the dispatch object. Then control returns
    // back to the call to Invoke, which dereferences its "this" pointer
    // and promptly crashes.
    <<<

    Part of the reason this big comment is necessary is because doing bare AddRefs and Releases is opaque and mysterious. Can the GCable thing be protected instead of the dispatch object? It would be natural to root a GCable thing here since the local stack is a root. Then I wouldn't have to think about what might happen to pdisp during the call.
  • Anonymous
    May 04, 2004
    Binder::GetIDsOfNames >>>
    if (IID_NULL != riid)
    {
    return hr= DISP_E_UNKNOWNINTERFACE;
    goto LError;
    }
    <<<

    This probably isn't what you wanted to do.


    Binder::Invoke >>>
    Bug("We don't support named arguments to except to property puts.")
    <<<

    Too many to's here.


    The goto LError; kind of bugs me. It makes a lot of sense when there actually is an error. But it's also used to skip to the end of a function even when everything is ok. I'd be quicker to understand what you were doing if it was goto out; on success or something.


    >>>
    HRESULT GetIdOfName(const WCHAR * pszName , DISPID * pdispid);
    HRESULT GetNameById(DISPID dispid, Name * * ppName);
    <<<

    Interesting that these aren't symmetric.
  • Anonymous
    May 04, 2004
    // Failing to
    // provide room for the return value is dumb, but not illegal.

    Arrgh.


    Binder::Invoke sure was a lot of fun to read.
  • Anonymous
    May 04, 2004
    Re: failing to provide room -- it's possible that the property get has a side effect and you care about that, not the return value. That's a very rare situation, but it is legal.
  • Anonymous
    May 04, 2004
    Re: IID_NULL -- Hmm, I'm surprised the compiler didn't catch that. Looks like I need to bump the warning level up to max.

    Re: Two to's is too many. Yep.

    Re: LError on non-error. Yeah, you're right. I might fix that up.
  • Anonymous
    May 06, 2004
    The comment has been removed
  • Anonymous
    May 08, 2004
    The comment has been removed
  • Anonymous
    May 10, 2004
    Thanks Nicholas. I think you're right on the money.

    I've never written a copy-based GC before, so that might be an interesting experiment. And I agree that a goal of the language should be brevity.

    I once considered writing up a .NET language that I was going to call "VERBOSE" -- basically a simple C#-like demonstration language but with the property that there were NO optional keywords. A typical class declaration would look something like

    class {
    name := "MyClass",
    sealed := true,
    extends := class {
    name := "MyBaseClass",
    namespace := "MyNamespace"
    },

    etc, etc. It would have been wonderfully horrid. I never did get around to it though.

    I agree that the runtime is the interesting part, but I will probably concentrate upon the infrastructure -- the GC, the error reporting mechanism, etc -- and not implement things like a big string manipulation library, a date object, etc.
  • Anonymous
    August 30, 2005

    Here’s a question I recently got about VBScript, where by &quot;recently&quot; I mean August 28th, 2003.&amp;nbsp;This...
  • Anonymous
    August 30, 2005

    Here’s a question I recently got about VBScript, where by &quot;recently&quot; I mean August 28th, 2003.&amp;nbsp;This...
  • Anonymous
    September 06, 2009
    The comment has been removed