VBScript Quiz Answers, Part One
There were a grand total of eight entries to my VBScript quiz -- I think I made it too hard! Congratulations to Steven Bone and Nicholas Allen, who both got all twelve right, with more-or-less correct explanations of what was happening here. Guys, send me your addresses and I'll send you both autographed books just as soon as I get my next box. Should be any day now.
The answers are
1: only d is legal
2: only c is Integer
3: only c is illegal
4: only a is illegal
5: only c never prints False
6: only d is illegal
7: a
8: only c is legal
9: only a is legal
10: only a is illegal
11: only a is not a tautology
12: c
(Hmm, no "b"s. That wasn't intentional. Weird.)
I'll spend the next few entries explaining the answers. But before I get into the details, some jargon.
The standard design for a compiler is as follows: first the text of the program is broken up into tokens by a lexical analyzer, also known as a "lexer" or "scanner". This is analogous to breaking a sentence up into words, numbers, punctuation, etc. Then a parser organizes the tokens into larger units -- expressions, statements, programs. This is analogous to ensuring that a sentence is grammatical. And finally, a code generator generates code from the parse tree. This is analogous to translating the parsed sentence into another language. Finally, the runtime engine executes the generated code.
The various weirdnesses I asked about are results of oddities in the design of the lexer, parser, codegen or runtime. I'll call out which is which as I go.
In general, we want every VBScript program to be a legal VB6 program; VBScript is a subset of VB6. I'll also point out areas where we violate that principle.
1) Which of the following are syntactically legal VBScript statements? Why?
(a)
x = 10&987&654&321
(b) x = 10&987&&654&321
(c) x = 10&987&&654&&321
(d) x = 10&987&&654&&&321
Only (d) is legal. None are legal VB6 statements, so (d) is a violation of the subset principle.
It's well known that you can specify hex literals in VBScript like this:
&h123. It's less well known that you can do the same for octal literals: &o123. It's even less well known that you can omit the o from the octal literal and VBScript will not complain. The VB6 editor will automatically insert the o for you if you try to leave it out.
It has always seemed bizarre to me that VBScript will accept syntaxes which the VB editor will autocorrect. The reason why the editor automatically corrects the bad syntax is because it’s bad! But there are many cases where VBScript accepts the uncorrected syntax and treats it as though it were correct. In VB6 it is not legal to have the
& operator immediately follow the left-hand operand. VB6 requires spaces around the & operator to prevent exactly this weird lexical ambiguity. VBScript does not require spaces around the & operator, which leads to trouble.
It's also little-known that you can specify that you want a literal to be a long integer rather than a short by appending an
&. One reader pointed out that this is a holdover from the days in VB when you could put a type decoration onto variables. foo$ was a string, for instance. The same reader also noted that VBScript is inconsistent in the semantics of the literal suffix – that’s a bug.
Let’s consider (c). Why is it illegal? Well, look at it from the lexer’s point of view. It knows these rules:
& following a hex or octal literal is part of the literal
& followed by h, o or 0-7 is the start of a hex or octal literal
otherwise, & is a lone ampersand.
Consider (c) in the context of those rules and you’ll see that it breaks up as follows:
x = 10 & 987 & &654& &321
The lexer tells the parser that the program so far goes
ID EQ INT AMP INT AMP INT INT EOL. And the parser takes one look at that thing and says "this isn't a grammatical sentence." You can't have two integers in a row without an operator between them.
Similarly, you can figure out why (a) and (b) are illegal.
This illustrates
severalimportant points. First, the lexer is "greedy". It tries to eat as much as it can when tokenizing each character. Second, it also tries to never look ahead more than one character to figure out what to do next. Third, the parser does not "push back" on the scanner. The parser does not say "no, that didn't work out, see if you can find an alternative lexing that works". (The JScript parser does "push back" due to lexical ambiguities caused by the introduction of literal regular expressions, but that's another story!)
More next week!
Comments
- Anonymous
February 18, 2005
I'm a bit puzzled as to why it "defaults" to octal -- apart from Un*x permissions I don't think I've ever had call to use them, so I would have figured that in the absence of an o or h it would be a more reasonable assumption that the number is hex. Is there any historical or other valid reason for this, or is it just one of those things?
(And yes, you did make it too hard.) - Anonymous
February 18, 2005
Octal masks come in handy for coding things like base64 encoders/decoders or any other time you are working with 3/6/9/12/etc. bit strings. They make the intent clearer.
Aside from that I've seldom used them though. The defaulting behavior may go back to 8-bit days. A lot of people came from 12-bit machines into 8-bit programming, and a lot of electronics types were frightened by hex initially in those days and didn't trust it. I can't remember what early 8-bit MS Basic had in the way of mask literals, maybe only octal? - Anonymous
February 18, 2005
The comment has been removed - Anonymous
February 18, 2005
I'd echo James' comments. I use vbscript when I write ASP's, so I've got a working knowledge. But the quiz questions were absurdly out of my league - I didn't even know you could have octal literals (I don't use these in C even though I know how*). I am interested in finding out the answers, so I'll read along.
As an aside, aren't bit masking operations not possible in VBScript anyway (I recall having to div and mod to see if flags were set)? This is one of the main uses for hex and octal literals.
*for the nitpickers in the house, yes I use the octal literal "0", but nobody considers that a "real" octal constant. - Anonymous
February 20, 2005
Eric,
>> I think I made it too hard!
For me, most of the problem was the length rather than the difficulty. Not to say that the problems were easy (they weren't - I only knew the answers to about a quarter of them) but after typing out the answer to some of them it just seemed to take a long time.
If you posted a couple of questions at a time (if you did this again), that would be good I think.
Don't take this as a complaint - I love the blog. - Anonymous
February 22, 2005
If I recall correctly, & was the identifier for octal in the original Altair Basic too along with # for real (double?), % for int, and of course $ for string. - Anonymous
February 23, 2005
Well, I don't know how helpful practical knowledge of VBScript would be for this quiz. Most of these are not practical situations! As I mentioned to Eric, I've never actually written VB or VBScript before. My last exposure to BASIC was about 20 years ago. On the other hand, knowledge about compilers was very helpful. Although there seems to be almost NO documentation for what VBScript is, almost every compiler works in one of a small number of ways... It was a tricky quiz though.