Unicode output and the Windows Script Host
I'm working on a prolix essay on the future of declarative programming languages, but I've been sidelined by, horrors, actually having to do lots of real work on the next version of VSTO these last couple weeks. Expect that rant sometime in the next week or so.
In the meanwhile, a coworker asked me the other day what the //U "use Unicode" switch does in the console version of Windows Script Host. "Is this so that the host can read in script files that are saved in Unicode?" he asked.
Not quite -- actually it is the output that we are worried about, not the input. The script engine (VBScript, JScript) expects the source code to be in a standard UTF-16 BSTR. WSH is responsible for getting the source code off the disk in whatever format and turning it into UTF-16 in memory. WSH does this exactly the way you'd expect -- it calls IsTextUnicode and then does whatever is necessary to get it into UTF-16. (Reverse bytes, strip the order mark, call MultiByteToWideChar, whatever.) The //U flag doesn't affect this at all.
What we don't know is what output encoding the user wants. In short, the //U flag only causes a change in behaviour when you are running cscript.exe on an NT-based operating system and redirecting the output to a file. To expand on that a bit:
- If you're running cscript.exe on an operating system that descends from the NT kernel (Windows 2000, XP, Server, Longhorn… you get the picture) and you're dumping text (WScript.Echo, error messages, whatever) to a console window, we always call WriteConsoleW no matter whether the //U flag was passed or not. That is, we always dump the text as Unicode and let the console sort out how to display it.
- If you are running cscript.exe on an NT-based OS and you've redirected your console output to a file, and you did NOT specify the //U switch then we dump the text as ANSI by calling WideCharToMultiByte(CP_OEMCP).
- In the same scenario but with the //U switch we dump the text as UTF-16.
- When running cscript.exe on any Windows 95 descendent (95, 98, ME) we always dump text as ANSI whether you're redirecting output or not.
The behaviour of the standard streams is kind of complicated depending on whether you're opening WScript.StdErr, StdIn, StdOut, on NT-based or 95-based OS, and whether streams are redirected or not, but suffice to say that basically it's the same as above.
Comments
- Anonymous
February 11, 2004
Shouldn't CP_OEMCP be CP_ACP, or am I misunderstanding WideCharToMultiByte? By my understanding, CP_OEMCP converts to the configured DOS code page (e.g. 850) rather than the ANSI code page (e.g. Windows-1252). - Anonymous
February 13, 2004
One other bit worth noting - on NT-family operating systems, redirecting INPUT from a file also apparently works like this as well. If and only if the //U switch is used, stdin appears to be read as Unicode.
This can cause some frustration in certain circumstances, but I don't really see how this could have been done any other way without implementing all sorts of messy stream buffering. The "messy stream buffering" would be very handy for a variety of situations, but considering how little people have used the WSH console stream support it looks like it would have been an unappreciated effort (not to mention a potential bug source). - Anonymous
February 13, 2004
Just out of curiousity, if you were to tweak the console capabilities of the scripting host today, what would you choose to do?
I'm interested in the simpler items which would be clearly "good" to add now.
For example, using newer APIs, it might be useful to have AttachConsole() reattaching to a parent console (to handle broken input streams from not explicitly using cscript). PeekConsoleInput would be good to CHECK for the prior existence of input, but might be a very bad idea to include across the board simply because it would introduce a major difference in behavior from the standard TextStream.
And yes, I am most definitely playing with an extension DLL idea. That's why I'm trying to cadge ideas from you. :) - Anonymous
February 13, 2004
If I had the chance to change WSH today, there are two things I'd do to the console output.
First, I'd fix the implementation of the standard stream objects returned by the Exec method. I didn't do a very good job of implementing them; there are too many ways that you can accidentally get yourself into deadlocks.
Second, one of the most requested features is the one you mention -- being able to spawn a new process attached to the current console window. - Anonymous
February 15, 2004
What really ticks me off is that even the .NET Framework doesn't have a _kbhit or _getch equivalent (well, neither does the standard CRT, but that's another story). Luckily it's really easy to write a few lines of Managed C++ code to do expose that functionality though.