T4: VBScript and the Terminator
I had some free time on the flight to Canada and my laptop with me, so maybe I will do a little blogging on my vacation after all.
A week ago or so a reader asked me to talk a bit about how class termination works in VBScript. Let’s start by looking at a simple case:
Class Foo
Public Name
Public Other
Private Sub Class_Terminate
MsgBox Name & ": Goodbye, world!"
End Sub
End Class
As you can see, the class has an “event handler” which is executed when an instance is just about to be released. (I put scare quotes on there because, though it has the form of an event handler, in fact the underlying mechanism that runs the terminator is completely distinct from the other event handling code in VBScript.)
JScript, as you may know, has a nondeterministic mark-and-sweep garbage collector. It’s “nondeterministic” because it runs pretty much when it feels like it. There is a scheduling algorithm, but from the perspective of the program, it’s pretty much random; it’s quite tricky to predict when it’s going to run. And it’s “mark and sweep” because every collection, it runs through every script-allocated memory block, checks if it is still accessible by the script, and if it isn’t, the memory is freed. This makes it mostly immune to the “circular reference” problem, where two or more otherwise dead objects reference each other.
VBScript’s garbage collector is completely different. It runs at the end of every statement and procedure, and does not do a search of all memory. Rather, it keeps track of everything allocated in the statement or procedure; if anything has gone out of scope, it frees it immediately. We can see this in action by watching terminators run.
Let’s consider a simple example.
Sub Bar
Dim X
Set X = New Foo
X.Name = "X"
End Sub
MsgBox "Starting Bar"
Bar
MsgBox "Done Bar"
outputs, as you’d expect:
Starting Bar
X: Goodbye, world!
Done Bar
As soon as Bar’s End Sub runs, X goes out of scope. Since that’s its last reference, the terminator runs.
It’s that “last reference” part that’s the rub. Objects are reference counted in COM – every object keeps track of how many other objects are keeping it alive, and when that drops to zero, the object deletes itself.
The trouble is, what if two objects are each keeping a ref on each other, but are otherwise dead? The “circular reference” problem wasn’t so much a problem before we added classes to VBScript. After all, without the ability to define new objects in script, the only objects you really need to worry about are external ActiveX objects – and not even JScript can break circular references involving ActiveX objects, because the ActiveX objects know nothing about JScript’s garbage collector, and vice versa.
But with objects in VBScript, suddenly it becomes quite bad. You can very easily write scripts that consume memory in the form of objects that reference each other circularly, and they won’t go away when they go out of scope:
Sub XYZ
Dim Alpha, Bravo
Set Alpha = New Foo
Alpha.Name = "Alpha"
Set Bravo = New Foo
Bravo.Name = "Bravo"
Set Alpha.Other = Bravo
Set Bravo.Other = Alpha
End Sub
MsgBox "Starting XYZ"
XYZ
MsgBox "Done XYZ"
outputs
Starting XYZ
Done XYZ
Uh oh. Neither of those guys was terminated when their variables went out of scope. All the memory associated with them is still reserved by the heap. But they’re not reachable now. There are no variables that reference them. There’s nothing the script programmer can do about it now.
It would be unfortunate indeed if such circular references really did live until the end of the process, when the whole heap is cleaned up, both because we’d leak memory, and because those terminators might be doing something interesting.
Fortunately, we can do slightly better than that. If you run the code above in Windows Script Host, you’ll see that the terminators run as soon as the script ends; they are cleaned up eventually. In IE, the terminators run when the page is torn down. And in ASP, the terminators run when the page is served. (Though of course in ASP you don’t want to be putting up message boxes, or accessing the ASP object model – the page is just about to get served, don’t try to mess with it!)
How was this implemented? And how does this solve the memory leak problem?
It’s pretty straightforward. Every time a new instance of a VBScript class is created, it adds itself to a special list maintained by the engine, and whenever one is destroyed, it removes itself from the list. When the VBScript engine itself is closed by the host, the engine runs down the list and runs all the terminators of all the VBScript objects left on the list. Then it runs down the list a second time and clears every field of the object. This breaks the circular reference. Better late than never.
In Windows Script Host, this really isn’t much help because the engine isn’t closed until the process is about to be torn down anyway. But in IE and even more importantly, ASP, script engines are closed much more aggressively. It was very important that we not add a new feature to the language that made it really easy to write ASP pages which leaked memory and forced you to reboot your server frequently.
There are some wrinkles that we ran into when implementing the shutdown logic. Let’s see if you can deduce what they are; I’ll give the answers in my next post.
QUESTION #1: When I first wrote the VBScript class code, that’s not exactly the shutdown sequence I wrote. The original logic ran down the list of objects once, going terminate, clear, terminate, clear, terminate, clear. Soon after that I rewrote the shutdown sequence to the present terminate, terminate, terminate, clear, clear, clear. Why did I make the change to a two-pass implementation? What was wrong with the original one-pass implementation?
QUESTION #2: The code that implements VBScript classes checks to see if the instance has been terminated already before it calls the Class_Terminate method. It should be clear from the foregoing why that is – if an object is in a circular reference and manages to survive until the engine shuts down, then the object will probably be terminated while there are still references held on it. The later “clear all fields” phase of the shutdown sequence could then release all the references, triggering the “we’ve just released the last reference, so call the terminator” logic. We therefore have to protect against the terminator running twice.
There’s another way that an object can be terminated twice that we protect against. In this scenario, not only can the object be terminated twice, but an incorrect implementation of the object deletion code can crash the process. What is it?
Hint: the scenario does not involve circular references. It has just a single object being terminated because it is going out of scope and has no other references.
QUESTION #3: Speaking of which, why do we want to ensure that terminators don’t run twice?
QUESTION #4: This is not a question about termination per se, but there is a termination angle in the answer. Why do we run the garbage collector at the end of every statement, instead of only at the end of every procedure/global block? After all, no local variable is going to go out of scope until the end of its procedure! Doesn’t that seem like a lot of work for no possible gain?
"I'll be back" in the new year -- have a good festive holiday season everyone!
Comments
- Anonymous
December 22, 2004
Q4) Because VBScript programmers liked to write very very long procedures - and only collecting at the end of a procedure would be a very long wait. These statements may not even be in a procedure - they could be global.
BTW - It cannot have been "very important" to add the circ ref cleanup code to VBScript - otherwise the same code would have appeared in VB6 and my colleague wouldn't have spent the past couple of weeks tracing memory leaks in code written by long-forgotten developers... - Anonymous
December 22, 2004
Q3: You'd get problems when you try to clean up certain resources twice (like closing connections or something) when the programmer of the class does not expect his terminator to run more than once. - Anonymous
December 22, 2004
Q1: If you terminated and then cleaned in object 1, then object 2's terminate would error out if it had a reference to object 1 and tried to call an obj1 method (for whatever reason). - Anonymous
December 22, 2004
The comment has been removed - Anonymous
December 22, 2004
PatternGuru: Correct, we ensure that the terminator runs exactly once so that the author can rely upon that invariant. - Anonymous
December 22, 2004
Steve: Correct! We need to ensure that an object is not cleaned until everyone is done with its properties. - Anonymous
December 23, 2004
Eric> But only collecting WHAT [after a statement]?
Maybe to let "= Nothing" actually do something. Or maybe VBS creates a bunch of anonymous objects (ex: perhaps with the string addition operator). - Anonymous
December 23, 2004
The comment has been removed - Anonymous
December 24, 2004
Setting an object to Nothing does immediately clean it up if that's the last reference.
"Anonymous" objects are the thing to think about here. A statement can create arbitrarily many temporary strings and objects that need to be cleaned up. - Anonymous
December 28, 2004
The comment has been removed - Anonymous
December 28, 2004
The comment has been removed