다음을 통해 공유


Avoiding #defines for constant data and using enums instead

I think that the C preprocessor is a very powerful tool, but I like to limit my use of #defines. I have already touched on this when i talked about why I liked FORCEINLINE and I want to talk about it some more. I realize I can't eliminate the use of #defines throughout all of my code for various reasons, among them being:

  • They can have unintended side affects. The classic example of this is #define MAX(a, b) (a >= b ? a : b) and invoking it with MAX(foo++, bar) such that foo++ is evaluated twice leading to a double increment.
  • Other headers use them to enable/disable certain pieces of functionality or functions.
  • They are the only way to wrap up other preprocessor macros like __FILE__ and __LINE__ to allow for easier logging without more pushing more typing onto the caller.
  • They are the only way you can generate code on the fly in C (a poor man's C++ template if you will)
  • They are the only way you can use the quotify (#) or concat (##) functions within the preprocessor

The main reason I try to limit my usage of the preprocessor is that because it is a pure substitution, you generally don't have type information available to you when debugging the application or driver. A great example of this is constant numeric values within your application. You could do this to define values in a bit field:

 #define OPTION_ONE   (0x00000001)
#define OPTION_TWO   (0x00000002)
#define OPTION_THREE (0x00000004)

but you will not be able to evaluate OPTION_ONE in the debugger because there is no OPTION_ONE symbol in the PDB. The preprocessor substituted OPTION_ONE with 0x0000001 before the linker was ever involved. To look up the value of OPTION_ONE you would have to open up a source file. Instead, I do the following when I have bit field values that I want to define and use within my app/driver:

 #include <windows.h>

typedef enum _OPTION_FLAGS {
    OptionOne =   0x00000001,
    OptionTwo =   0x00000002,
    OptionThree = 0x00000004,
} OPTION_FLAGS;

typedef union _OPTIONS {
    ULONG Bits;

    // NOTE:  this is for debugging only, *always* use Bits in code
    struct {
        ULONG One : 1;   // OptionOne
        ULONG Two : 1;   // OptionTwo
        ULONG Three: 1;  // OptionThree
    } BitsByName;
} OPTIONS;

int _cdecl main(int argc, char *argv[])
{
    OPTIONS o;

    o.Bits = OptionTwo;

    return 0;
}

For now, let's ignore the fact that I can use protected or public to abstract how the values are set and read and that I am using an unnamed union. (Some folks feel that unnamed unions are unholy, but I find them very practical in this pattern.) There are 2 key concepts to what I am doing here:

  1. The enumeration remains in the symbols so you can dump it within the debugger [1]. Now you don't have to look up the value in a source file.
  2. By creating a union of the true value (Bits) [3] and a bit field of the value (BitsByName) [4] I can see what each bit means without having to look at the enum at all! This means that you can see the value without digging into the structure at all if you don't need to [2].
 I have broken into the debugger right before returning.

[1] 0:000> dt OptionsFlags
   OptionOne = 1
   OptionTwo = 2
   OptionThree= 4

[2] 0:000> dt o Bits
Local var @ 0x6ff78 Type Options
   +0x000 Bits : 2

[3] 0:000> dt o BitsByName.
Local var @ 0x6ff78 Type Options
   +0x000 BitsByName  :
      +0x000 One         : 0y0
      +0x000 Two         : 0y1
      +0x000 Three       : 0y0

[4] 0:000> dt o
Local var @ 0x6ff78 Type Options
   +0x000 Bits             : 2
   +0x000 BitsByName       : Options::<unnamed-tag>

March 28, 2006: Updated the enum and union definitions to be C compliant and match my style guidelines.

Comments

  • Anonymous
    March 27, 2006
    Your solution has a problem: It uses undefined behaviour of the C compiler:

    1. You are not guaranteed that accessing the union elements under different names has any meaning at all, let alone a useful meaning. For example, the compiler could choose to implement a union as a struct, thus, accessing o.Bits and o.BitsByName might result in accessing different memory areas.

    2. The order in which a bit field is allocated is not defined by C, either. Do bit fields get allocated started with the highest  bits, or with the lowest bits? Do they even get allocated consecutively? I compiler might even choose to allocate them in a "random" manner (there have been educational compilers doing this).

    Thus, while your code might work as long as you use Microsoft compilers (things like these are used in MS headers, thus, I doubt MS will change that behaviour in the near future), you cannot generate completely portable code with it.

    To summarize: I think your suggestion with using the bitfield and the union are very bad programming habits, at least if you want to write C programs (and not "Microsoft C"), and should be avoided.

    The enum suggestion is another thing; this is something I would recommend.

    - Spiro
  • Anonymous
    March 27, 2006
  1.  yes, actually the start of each field in a union has to be overlayed with each other.  that is the whole point of a union.  See  http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vccore98/html/_langref_union.asp.  It says "A union is a user-defined data type that can hold values of different types at different times. It is similar to a structure except that all of its members start at the same location in memory." Note that this is not under a Microsoft specific heading, this is part of the C standard.  

    2)  Yes, the bit field order is not defined by the C standard, but every production compiler I have ever encountered allows you to define the behavior or has a default which is documented.  I have seen code compiled with gcc rely on the ordering of bitfields, so Linux is just as dependent on its compiler as Windows is.  Besides, there is no harm if the bit field is wrong, you would realize it quickly.  Finally, purely portable code only allows for bit fields on (U)LONG/INT values, but almost every single compiler will let you get away with creating a bitfield on a char or a short.  Again, not purely portable if you use it, but any compiler is practical enough to let you do it so that you can create bit fields for things like one byte registers.

    To summarize, it is not a bad programming habit, it is just your opinion that it is bad.  Note that I don't use the bitfield in production code (because the compiler can do more shifting around then necessary), it is only for use within the debugger.  If it offends you, don't use the union.
  • Anonymous
    March 27, 2006
    The comment has been removed

  • Anonymous
    March 27, 2006
    I never said I was writing portable code across multiple compilers nor did I ever state that this was a goal for what I wrote.  I write Windows device drivers.  I write and have written drivers for very different architectures (x86, x64, IA64, MIPS, Alpha, etc).   Furthermore, I am blogging about Windows device drivers (and not about portable code).

    I would like to see a mainstream C/C++ compiler which did not start all fields at the same address.  You can bring up any number of edge cases and specialized compilers, but that is a completely different domain. Different practices and techniques apply when you are in such a specialized environment.  One size does not fit all.

    You are not seeing my point about bitfields. I have not advocated their use in production code, merely as a debugging tool.  If the bits are not laid out the same as my assumption, no harm is done.  If needed to construct a char/short/long full of bits, I would do it myself with masks and the correct shifting and not rely on a bit field with an arbitrary layout.

    I am finding it hard to figure how you are in a position to judge what I have done, what I do and how it applies to the practices I have developed over the years.  You have your opinion, I have mine.   I don't view what I wrote as bad programming habit/practice, I view it as a very practical and useful technique to get my job done.  Let me reiterate, it gets my job done. By using the pattern I laid out, I spend less time debugging failures and more time doing the other parts of my job.  

  • Anonymous
    March 27, 2006
    First of all, let me state that I never intended to offend you, and I appreciate your opinion and your blog very much. I know you know much more about many areas; this is the reason why I read your blog despite reading you in newsgroups and mailing lists.

    Having said this, I still stand by my opinion. Ok, I should have stated "in my opinion" more clearly, but the topics still remain.

    I am surprised you call other processors (Microcontrollers, DSPs) as "edge cases", although all statistics I know tell me that the market share of these embedded devices is much bigger than the market share of even the iA32.

    There has been more than one case where the habit "it works for me, I get my job done, so I do it this way, depending on some property of the current system, but I do not document it" - obviously, exactly what you are proposing - lead to severe, costly damage of real-world hardware (I am not speaking about PCs or similar things.) People read blogs like this here - again, I LIKE your blog very much! - and think they learn also about C, but do not see the problems in this.

    Have you ever tried to teach an experienced C programmer to "forget" about these things he cannot rely upon? It is very hard.

    Although you believe the opposite, I see your point about the bit-fields as being very helpful for debuging purposes. In fact, I am using similar techniques, "just to get the job done". Anyway, I am aware of the restrictions and problems. Is any reader here aware of these, too? This is the reason why I jumped in here in the first place.

    Again, I never wanted to offend you, nor judge what you have done or are doing. Anyway, I do not like to take these things "as given" and let them stand without telling where the pitfalls might be.

    Furthermore, aren't such comment sections in blogs there for expressing your opinion? This is exactly what I am doing. You read it, think about it, and decide if you think  a) "he is writing completely silly things", b) "he has a very good opinion, and what I wrote was wrong", or c) "he has some points, I have mine. This and that my apply in my case, this might not", and the like. Choose your opinion.

    And: Yes, in my opinion, it is still a bad programming practice. :)

    - Spiro.

  • Anonymous
    March 28, 2006
    Is there a reason not to do something like GDB does: (e.g.) http://devworld.apple.com/documentation/DeveloperTools/gdb/gdb/gdb_10.html

    It's a bit hacky, but so are most of the alternatives (as pointed out above)

  • Anonymous
    March 28, 2006
    Interesting stuff for sure.

    strik, technically the program isn't valid C, its C++. So perhaps you should be referencing the C++ standard?

    Thanks doron, I always learn something on this blog.

  • Anonymous
    March 28, 2006
    sayler: I forgot about that option with GDB, that's pretty cool.  macro evaluation in the debugger.  Are  -gdwarf-2 and -g3 debug settings?  If so, the one advantage my technique has over GDB is that you can still see my information in a free build.  Either way, that is a very cool and useful technique to have as well!

    d

  • Anonymous
    March 28, 2006
    mattd: Why isn't this valid C? I don't see anything that is not valid C in it (depite the facts I talked about before).

    Even the // - style comments are allowed since C99.

    saylor: Thank you for the link. Although I have used gdb before, I was not aware that there was something like that.

    - Spiro.

  • Anonymous
    March 28, 2006
    > Are  -gdwarf-2 and -g3 debug settings?

    They're compiler options, used for debugging info.  -gdwarf-2 tells gcc to use DWARFv2-format debugging information, and -g3 tells it to export level-3 debugging information.  From the gdb info page:

    -g
    Produce debugging information in the operating system's native format (stabs, COFF, XCOFF, or DWARF). <...>

    -ggdb
    Produce debugging information for use by GDB.  This means to use the most expressive format available (DWARF 2, stabs, or the native format if neither of those are supported), including GDB extensions if at all possible.

    -gdwarf-2
    Produce debugging information in DWARF version 2 format (if that is supported).  This is the format used by DBX on IRIX 6.

    -glevel
    Request debugging information and also use "level" to specify how much information.  The default level is 2.

    <...>

    Level 3 includes extra information, such as all the macro definitions present in the program.  Some debuggers support macro expansion when you use -g3.

    So I'm a bit surprised they didn't use -ggdb (since it has support for gdb extensions), but maybe DWARFv2 isn't the default on OSX.

  • Anonymous
    March 28, 2006
    "mattd: Why isn't this valid C? I don't see anything that is not valid C in it (depite the facts I talked about before). "

    Options needs to be typedef'ed or declare in main as
    union Options o;

    The code as is won't compile as C. If it does for you then your most likely are using the /TP switch...

    Is this not correct?

  • Anonymous
    March 28, 2006
    mattd: Of course, you are totally correct. I completely overlooked that .

    So, I stand corrected on this. :)

  • Anonymous
    March 28, 2006
    yes, the example is C++ (I was lazy I guess).  i will edit it to make sure it is C compliant ;)

  • Anonymous
    March 28, 2006
    The comment has been removed

  • Anonymous
    March 29, 2006
    The comment has been removed

  • Anonymous
    March 29, 2006
    If, like many people, you package your driver code in .CPP compilation units to gain things like stronger type checking, anywhere variable declaration, etc, then you might as well take this one step further and use actual const data rather than enums.

  • Anonymous
    March 29, 2006
    const integer data has a few issues: it might actually take up a storage in the final binary if the linker cannot determine if it can safely remove the variable.  an enum value will not.   they are not self contained and groupings are not apparent. Yes you can scope it to a class and make it a static memember, but knowing which other constant values are associated with each other can quickly grow problematic. you don't get auto value increment behavior as you do with an enum (that may represent a state vs a bit field in which you are assigning specific values). For non integer data like strings or complex types, const global vars are a great way to retain that info at runtime and provide type safety.

  • Anonymous
    April 09, 2006
    While working with WinDbg I encountered the following problem: when the
    project (in C) is compiled with MS Visual Studio 6.0 C compiler enum
    symbols values are not recordered in the symbols file. The "dt" debugger
    command gives the answers like "Int4B" for an enum.

    When I tryed the same with a little "trial" C++ project it worked well,
    giving the enum's names. I tried to change/add different debugger and
    linker options but it didn't work. Do you have any idea regarding this
    issue?
    Thanks

  • Anonymous
    April 10, 2006
    yulka, I don't have a clue as to why this is happening. I would hazard a few guesses though

    1)  VS6 is pretty old and the C and C++ compiler could be more different then in later versions

    2)  I know that in C, an enum is not a type, it is just a glorified #define.  Any enum type is just a signed integer.  What you are seeing is probably a strict interpretation of that.  In C++, a enum is a real type that is not the same as a signed int.  In later versions of visual studio this distinction is not made and the C code (at least symbols) treat an enum as its own type.

    d

  • Anonymous
    July 07, 2006
    The debugging aid of using enums instead of #defines is undoubtedly cool.
    Wouldn't it be nice if all windows (user and kernel) error status codes where defined as enums? Then you could immediately know what GetLastError() returned, instead of having to look around in MSDN or browse WinError.h.

    I tried out the following simple user mode example and was really amused (I am easy to amuse...) to see the Win32 error codes as descriptive enum names while debugging.

    #define ENUMSTAT(a) _##a = a

    typedef enum _WINSTATUS : DWORD
    {
       ENUMSTAT(ERROR_SUCCESS),
       ENUMSTAT(ERROR_INVALID_FUNCTION),
       ENUMSTAT(ERROR_FILE_NOT_FOUND),
       ENUMSTAT(ERROR_PATH_NOT_FOUND),
       // ...
    }
    WINSTATUS;

    #define _GetLastError (WINSTATUS)GetLastError

    int _tmain(int argc, _TCHAR* argv[])
    {
       WINSTATUS status;
       HANDLE hFile;

       hFile = CreateFile("X:\nonexistentdir\random.txt", GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
       if (hFile == INVALID_HANDLE_VALUE)
       {
           status = _GetLastError();
           return -1;
       }

       return 0;
    }


    In fact I liked it so much, that I wrote a quick and dirty utility that parses WinError.h and produces an enumeration with all error codes defined in it.
    You can grab the generated header file from http://www.staikos-manousopoulou.net/Programming/WinStatusEnum.txt

    I tried to find a way to use the actual error code names as the enum constants (by undefining them before declaring the enumerator) but my preprocessor skills are not that good :-)
    So instead I just prepended an underscore.

    And I will definitely update our SDKs (FireAPI) to use enums for the error codes instead of defines.

    Thanks!

  • Anonymous
    July 07, 2006
    Also checkout http://www.staikos-manousopoulou.net/Programming/NTStatusEnum.txt for the DDK status codes as an enum.

  • Anonymous
    July 07, 2006
    both are very nice ;)...we did the same thing when I worked on bluetooth for the test  build so that we could dump out names in DbgPrint.    you do have other options other then rolling your own parsing/strings/enum:

    1)  when printing an NTSTATUS when using WPP, you can use %!STATUS! instead of 0x%x and traceview will give you the name and hex value for free.

    2) net helpmsg <error number in decimal>.  this does not give you the name of the error though, just the description.

    3) !error <value> <flags>.  if you specify 1 for flags, the value is force to be interpreted as an NTSTATUS.  this also just gives you the description, not the name.

  • Anonymous
    January 17, 2007
    One of the goals of KMDF was to use clear and concise types in our parameters and structures so that

  • Anonymous
    June 08, 2009
    PingBack from http://insomniacuresite.info/story.php?id=2169

  • Anonymous
    June 18, 2009
    PingBack from http://thestoragebench.info/story.php?id=7562

  • Anonymous
    May 28, 2010
    Interesting stuff. The one catch I had while I was working on a similar issue was the sizeof the enum (i.e. OPTION_FLAGS).  It seems like this is compiler-dependent and not customizable from cl.  Your code gets around this issue through a level of indirection using OPTION. An argument against that, though, is that now you have to come up with a name for the pass-through value (called Bits). It's an interesting problem and its implications are all over driver software.