Compartir a través de


I see my favorite Ansi function has the behavior I want.

Occasionally I am asked about the A version of a W function.  Ie: GetLocaleInfoA does something that appears more convenient to some user than GetLocaleInfoW.  The implied thought is that maybe they should just use the A version.

For the most part our A functions are just wrappers for the W functions, so any perceived benefit is probably not real.   Additionally since the A function is just a wrapper function, under the hood we have to convert any input and output strings to and from Unicode.  That'll probably make the call take several times longer if nothing else.

For inputs the A version doesn't really restrict the code to particular Unicode sets.  Sometimes its perceived that some Unicode character(s) are undesirable in the input stream.  In this case either the application already restricts the input to this Unicode subset, in which case there's no difference in the supported character set for Unicode; or the application passes unknown data, relying on the Ansi to Unicode conversion to filter unwanted data.  Of course then the unwanted data is just mangled, converted to ? or whatnot.  So the unwanted data isn't really stripped, and, even worse, the unwanted data can be corrupted in a manner that causes a security hole.  For example, I've seen a password hashing algorithm that then converted the hash to code page 1252.  Quite often this caused a bunch of ??? in the resulting hash.  Since many combinations cause the ??? the password hash would match a very large number of inputs, pretty much defeating whatever security was provided originally.

For outputs the A version also doesn't prevent any Unicode code points from being read, they're just converted to junk like ? when the call returns.  So then the results are restricted to a subset of Unicode, but the restriction is done in a fairly useless manner.  I've seen configuration values being stored like this (imagine a user name), and then they pretty much just end up in ???? when read back.

So, in short, use Unicode! :)