My new "favorite" WIn32 API

Every once in a while, you discover a new Win32 API that you've never heard of.  The other day, one of the guys in my group sent an email extolling the values of a new WIn32 API that was added for Windows Professional X64 edition and Windows Server 2003 SP1 (and of course Windows Vista).

To read a value from the registry, historically you called the RegQueryValueEx.  Unfortunately, the RegQueryValueEx API suffered from a number of fatal problems.  The biggest one was that it didn't adequately type check the data being returned - for example, if the registry contained a string value, it was possible that the data in the registry might not be null terminated, resulting in the following warning in the documentation:

If the data has the REG_SZ, REG_MULTI_SZ or REG_EXPAND_SZ type, the string may not have been stored with the proper null-terminating characters. For example, if the string data is 12 characters and the buffer is larger than that, the function will add the null character and the size of the data returned is 13*sizeof(TCHAR) bytes. However, if the buffer is 12*sizeof(TCHAR) bytes, the data is stored successfully but does not include a terminating null. Therefore, even if the function returns ERROR_SUCCESS, the application should ensure that the string is properly terminated before using it; otherwise, it may overwrite a buffer. (Note that REG_MULTI_SZ strings should have two null-terminating characters, but the function only attempts to add one.)

Unfortunately, many people didn't implement this logic correctly (it's quite hard to get this right for all cases).  In addition to the null termination issue, the caller needed to deal with ANY data type being returned - you had to add in checks to ensure that the type of data returned matched the type of data you expected.  The root cause of this is a "leaky abstraction" issue - the NT base registry API simply stores blobs of data with the type information maintained as metadata alongside the data being stored.  Thus when you retrieve a value from the registry, you get the data in the underlying store and the metadata back.  But there's no attempt at ensuring that the metadata matches the intent of the application because the intent of the application isn't known.

So a new API was added to the Windows API set that resolves these issues, RegGetValue.  I just converted a 50 line routine to use it, the entire routine 50 line routine turned into a one line call to RegGetValue.  Using RegGetValue, I was able to remove:

  • The code that checked the type of data in the registry
  • The logic to handle REG_EXPAND_SZ (it's automatically handled by RegGetValue)
  • Code to ensure null termination of the registry string.
  • Code to validate that the length of the registry string was "appropriate" (a multiple of 2).

The bottom line was that I was able to remove a whole chunk of potentially buggy code and replace it with a single API call.  Heck, I didn't even need to open the registry key, since the RegGetValue API will even open and close the key for you (it opens the key for KEY_QUERY_VALUE if you care).

Comments

  • Anonymous
    January 12, 2006
    For new API like this, I wish we could get wrappers such as the ones that deal with multi-mo nitors. Thus we can convert our source to use the new API and when the program is run on a legacy OS, it uses the supplied wrapper code instead of the real API.

  • Anonymous
    January 12, 2006
    Cool -- a brand new function, and I get to report three different bugs against the documentation! I know it's not your job Larry -- but please mention to your boss that the Microsoft documentation department clearly needs more bodies! The number of silly mistakes that I find is leaps and bounds higher than for any other "big" company.

    The bugs are:

    1. It doesn't cross-reference SHRegGetValue

    2. The RRF_RT values are clearly a bit field, but they are documented as an exclusive enum (that is, according to the documentation you can't specify both a REG_SZ and a REG_EXPAND_SZ

    3. It doesn't mention that the environment variables are not expanded the same way that the shell expands them.

  • Anonymous
    January 12, 2006
    The comment has been removed

  • Anonymous
    January 12, 2006
    Speaking of Win32 APIs, I just discovered CreateFile() because OpenFile() was failing to create files > 128 characters.

    It takes more parameters, but it works. Sometimes I wish Microsoft would just make OpenFile() use CreateFile() when creating a file, so I could use either.

    -greg-

  • Anonymous
    January 12, 2006
    "For new API like this, I wish we could get wrappers such as the ones that deal with multi-mo nitors. Thus we can convert our source to use the new API and when the program is run on a legacy OS, it uses the supplied wrapper code instead of the real API. "

    Agreed. Of course, if the new API can be implemented in terms of the old API's, I have to wonder why it's an API in the first place. Making it an API introduces a new potential OS version dependancy...

  • Anonymous
    January 12, 2006
    The comment has been removed

  • Anonymous
    January 12, 2006
    I'd love to be able to use this function, but I can't - because Windows XP32 will still be in use for about 10 years. (We're only just starting to think about removing Windows 98 support from our products, because a significant chunk of our customers still uses it.)

    Why can't this function be backported to at least 9x / 2k / XP32 in a service pack? Requiring the latest service pack for an OS is reasonable. Requiring that the user install a different OS isn't.

    Overall, it's great that this function is introduced, but it's not really going to help us for about a decade. :(

  • Anonymous
    January 12, 2006
    Greg: That limit for OpenFile is documented, and the API itself is only provided for compatibility with Win16 (which means behavior changes, like making it use CreateFile some of the time, are bad). Why on earth would you be using it in the first place?

    http://msdn.microsoft.com/library/en-us/fileio/fs/openfile.asp

  • Anonymous
    January 12, 2006
    Serge: I think that the ANSI version just converts lpSubKey and lpValue to Unicode and calls the Unicode version with that, like pretty much all the ANSI versions apparently do.

    I don't remember where I saw it, but I do remember reading that the ANSI versions of API functions are created automatically and just do that, converting inputs to Unicode and outputs back to ANSI.

  • Anonymous
    January 12, 2006
    The old function's documentation needs some bug reports too.

    > For example, if the string data is 12
    > characters and the buffer is larger than
    > that, the function will add the null
    > character

    So far so good. You need to take more care than Microsoft did in trying to figure out whether the buffer is larger than that, but anyway it looks OK up to this point.

    > and the size of the data returned is
    > 13sizeof(TCHAR) bytes.

    That is true for Unicode but false for ANSI. Microsoft even went to the trouble of using the TCHAR macro and doing a computation but still didn't test it.

    In a Unicode compilation the size of the data returned is indeed 13
    sizeof(TCHAR) bytes because that's 13 wchar_t elements and sizeof(TCHAR) is sizeof(wchar_t).

    In an ANSI compilation the 12 characters can occupy anywhere from 12 to 24 bytes, and with an appended null character that's anywhere from 13 to 25 bytes. The size of the data returned is anywhere from 13sizeof(TCHAR) bytes to 25sizeof(TCHAR) bytes because sizeof(TCHAR) is sizeof(char).

    > However, if the buffer is 12*sizeof(TCHAR)
    > bytes, the data is stored successfully

    Sometimes it is, sometimes it isn't.

    > but does not include a terminating null.

    That part of it is true again.

    Thursday, January 12, 2006 2:05 PM by Greg Wishart
    > Speaking of Win32 APIs, I just discovered
    > CreateFile() because OpenFile() was failing
    > to create files > 128 characters.

    That's OK. In an ANSI compilation CreateFile() can't open some existing files either.

  • Anonymous
    January 12, 2006
    a little bit too late, who will use it anyway (and when, in 2020?) i wouldnt use it just to break compatibility with previous windows versions...

  • Anonymous
    January 12, 2006
    The comment has been removed

  • Anonymous
    January 13, 2006
    What happens to those of us who want to write Win32 programs that run under Windows in general? Do we have to reimplement RegGetValue ourselves?

  • Anonymous
    January 13, 2006
    Sorry, but why is this being called an API, instead of just a function?

    I always thought that the term "API" referred to a related collection of functions and datatypes - so the "Win32 API" referred to everything in Win32, the "Win32 registry API" referred to the set of functions and datatypes in Win32 for dealing with the registry, etc...

    So, when I saw "new API", I thought it meant "a whole new set of functions and datatypes", and got excited. Talk about a let-down. :-)

    So, where does this usage come from? I've never been aware of it before. Is it common?

  • Anonymous
    January 13, 2006
    I believe RegGetValue is just the port of SHRegGetValue from shlwapi.dll to be exported from a lower level binary of the OS (thus making it available to a broader set of consumers, since not everyone can link to shlwapi). SHRegGetValue should be available as a public export from shlwapi.dll as far back as XPSP2 if that helps.

  • Anonymous
    January 13, 2006
    Too bad it won't backport to XP ;-)

  • Anonymous
    January 14, 2006
    Norman Diamond wrote:

    "In a Unicode compilation the size of the data returned is indeed 13sizeof(TCHAR) bytes because that's 13 wchar_t elements and sizeof(TCHAR) is sizeof(wchar_t)."

    "In an ANSI compilation the 12 characters can occupy anywhere from 12 to 24 bytes, and with an appended null character that's anywhere from 13 to 25 bytes. The size of the data returned is anywhere from 13
    sizeof(TCHAR) bytes to 25*sizeof(TCHAR) bytes because sizeof(TCHAR) is sizeof(char)."

    Um, what?

    If you're trying to pick on the language-level difference between code units and code points, you're failing miserably, because UTF-16 has surrogates too. And let's skip the whole user-level character/glyph thing...

    When you go at this from a language-level standpoint, then the documentation is understood to be counting in code units. 12 is 12 chars or wchar_ts, depending on which version of the API you used to get the number 12 -- not 12 to 24 of anything.

    If you meant something else entirely by claiming differences between wchar_t and char, please explain.

  • Anonymous
    January 15, 2006
    Sunday, January 15, 2006 5:03 AM by Random Reader
    > If you're trying to pick on the language-
    > level difference between code units and code
    > points, you're failing miserably, because
    > UTF-16 has surrogates too.

    You're right, I seem to have forgotten to say "except for surrogate pairs" in my statement about Unicode.

    > When you go at this from a language-level
    > standpoint, then the documentation is
    > understood to be counting in code units.

    Problems:

    (1) The documentation SAYS characters where it says characters, and it SAYS TCHARs where it says TCHARs. We cannot quite assume that every occurence of "character" in MSDN means TCHAR. Also we cannot quite assume that every occurence of "byte" in MSDN means TCHAR.

    (2) Some APIs (and MFC methods etc.) really do count characters, even in ANSI compilations. Some count characters for some arguments while counting bytes for other arguments.

    (3) Some APIs (and MFC methods etc.) really count bytes even when semiintuitive understanding, based on making adjustments for many other misstatements by MSDN, would make us think they count TCHARs. In other words even in Unicode compilations some of them still count bytes.

    (4) Even in the DotNet Framework section, some MSDN pages tell bald lies about counting characters not bytes when the fact is they count bytes not characters.

    By the way, surrogate pairs, even though they exist, are still comparatively rare. But ordinary characters are not rare at all. Most of the world's characters still take two bytes per character to represent in their respective ANSI code pages.

  • Anonymous
    January 05, 2007
    PingBack from http://blogs.msdn.com/oldnewthing/archive/2007/01/05/1416853.aspx

  • Anonymous
    March 23, 2007
    It happened some time ago. Right after the Community Server update on the blog, in fact. I was bothered

  • Anonymous
    March 23, 2007
    It happened some time ago. Right after the Community Server update on the blog, in fact. I was bothered

  • Anonymous
    June 22, 2007
    PingBack from http://computinglife.wordpress.com/2007/06/23/how-do-you-chose-your-apis/

  • Anonymous
    November 05, 2007
    After I posted my article on the SHAutoComplete , I mentioned it to one of my co-workers. His response

  • Anonymous
    November 05, 2007
    PingBack from http://msdnrss.thecoderblogs.com/2007/11/05/the-shell-used-to-get-all-the-cool-apis/