Alignment (part 1)
I got an email the other day from someone (who will remain nameless) complaining about the fact that some of the NT structures had to be declared with #pragma pack:
Consider file WinBase.h from the Platform SDK, containing the following
declaration:
> typedef struct _WIN32_FIND_DATAA {
> DWORD dwFileAttributes;
> FILETIME ftCreationTime;
[...]
By this person's logic, since the ftCreationTime field in the WIN32_FIND_DATAA structure was an 8 byte FILETIME structure, the ftCreationTime should be at offset 8 from the start of the structure, but he discovered that it was at offset 4 from the start.
And he was convinced that either there was something incorrect in the documentation for structure packing in Windows or that there had to be a hidden #pragma pack directive (which changes the default structure packing) in the Windows headers.
I realized after reading this article that many people have forgotten the lessons learned from the early days of MS-DOS. This kind of stuff was known to every developer who worked in C and coded to MS-DOS. You see, back in the MS-DOS days, memory was king. So DOS packed all of its data structures as tightly as possible to save memory.
And if you attempted to make MS-DOS system calls from C, you invariably ran into packing issues. For example, the MS-DOS get country data (which returned internationalization information) returned a structure that contained:
Offset | Length | Description |
0x00 | 2 | Date format |
0x02 | 5 | Currency Symbol (ASCIZ string) |
0x07 | 2 | Thousands separator (ASCIZ string) |
0x09 | 2 | Decimal separator (ASCIZ string) |
0x0b | 2 | Date separator (ASCIZ string) |
0x0d | 2 | Time separator (ASCIZ string) |
0x0f | 1 | Bit Field |
0x10 | 1 | Currency places |
0x11 | 1 | Time format |
0x12 | 4 | Case-map call address (DWORD) |
0x16 | 2 | Data-list separator (ASCIZ string) |
0x18 | 10 | Reserved |
If I was to represent this as a C structure, the naive representation would be:
struct INTL_DATA{ WORD _DateFormat; CHAR _CurrencySymbol[5]; CHAR _ThousandsSeparator[2]; CHAR _DecimalSeparator[2]; CHAR _DateSeparator[2]; CHAR _TimeSeparator[2]; BYTE _Padding; BYTE _CurrencyPlaces; BYTE _TimeFormat; LPVOID _CaseMapCallAddress; BYTE _DataListSeparator[2]; BYTE _Reserved[10];};
The problem is that this structure definition wouldn't work.
You see, the compiler has a fairly straightforward set of rules defining how structures are aligned, and this structure violates them.
The compilers rules are: In general, data is aligned on it's "natural" boundary.
What's that mean? "Natural" boundary? What on earth is that thing?
Typically, the "natural" alignment of data is based on its size. A 1 byte field can be located at any address in memory. A 2 byte field (a short) should only appear at even addresses in memory. A 4 byte field (a long) should only appear at multiples of 4 bytes in memory. And an 8 byte field (a longlong) should only appear at multiples of 8 bytes in memory.
A simple example goes a long way towards explaining this:
struct A{ int _FieldA1; char _FieldA2; short _FieldA3; char _FieldA4; long _FieldA5; void *_FieldA6;};
Consider what the compiler's going to do with a variable of type struct A. The first thing it does is to lay the structure out in "memory":
Field Index | Field Name | Field Size |
0 | _FieldA1 | 4 |
1 | _FieldA2 | 1 |
2 | _FieldA3 | 2 |
3 | _FieldA4 | 1 |
4 | _FieldA5 | 4 |
5 | _FieldA6 | 4 (8 on 64 bit) |
The next thing it does it to assign an offset for each field:
Field Index | Field Name | Field Size | Field Offset |
0 | _FieldA1 | 4 | 0 |
1 | _FieldA2 | 1 | 4 |
2 | _FieldA3 | 2 | 6 |
3 | _FieldA4 | 1 | 8 |
4 | _FieldA5 | 4 | 12 |
5 | _FieldA6 | 4 (8 on 64 bit) | 16 |
This is where the "natural" alignment comes to play - The natural alignment for _FieldA3 is 2 bytes (it's a word), so it gets put at offset 6 - there's some empty space left in the structure between _FieldA2 and _FieldA3 (and between _FieldA4 and _FieldA5). The overall "sizeof" struct A is 20 bytes (24 on 64 bit platforms).
For a given structure, the alignment of the structure is determined by the worst case field within the structure. So in struct A's case, the alignment of the structure is either 4 bytes (on 32bit platforms) or 8 bytes (on 64bit platforms).
When the compiler lays out nested structures, it just follows its rules recursively. Consider this structure:
struct B{ short _FieldB1; struct A _FieldB2; int _FieldB3; struct A _FieldB4; }
Field Index | Field Name | Field Size |
0 | _FieldB1 | 2 |
1 | _FieldB2.FieldA1 | 4 |
2 | _FieldB2.FieldA2 | 1 |
3 | _FieldB2.FIeldA3 | 2 |
4 | _FieldB2.FieldA4 | 1 |
5 | _FieldB2.FieldA5 | 4 |
6 | _FieldB2.FieldA6 | 4 (8 on 64bit) |
7 | _FieldB3 | 4 |
8 | _FieldB4.FieldA1 | 4 |
9 | _FieldB4.FieldA2 | 1 |
10 | _FieldB4.FIeldA3 | 2 |
11 | _FieldB4.FieldA4 | 1 |
12 | _FieldB4.FieldA5 | 4 |
13 | _FieldB4.FieldA6 | 4 (8 on 64bit) |
And again, the compiler then assigns offsets to the fields:
Field Index | Field Name | Field Size | Field Offset |
0 | _FieldB1 | 2 | 0 |
1 | _FieldB2._FieldA1 | 4 | 4+0=4 (8+0=8 on 64 bit) |
2 | _FieldB2._FieldA2 | 1 | 4+4=8 (8+4=12 on 64 bit) |
3 | _FieldB2._FIeldA3 | 2 | 4+6=10 (8+6=14 on 64 bit) |
4 | _FieldB2._FieldA4 | 1 | 4+8=12 (8+8=16 on 64 bit) |
5 | _FieldB2._FieldA5 | 4 | 4+12=16 (8+12=20 on 64 bit) |
6 | _FieldB2._FieldA6 | 4 (8 on 64bit) | 4+16=20 (8+16=24 on 64 bit) |
7 | _FieldB3 | 4 | 24 (32 on 64 bit) |
8 | _FieldB4._FieldA1 | 4 | 28 (40 on 64 bit) |
9 | _FieldB4._FieldA2 | 1 | 28+4=32 (40+4=44 on 64 bit) |
10 | _FieldB4._FIeldA3 | 2 | 28+6=34 (40+6=46 on 64 bit) |
11 | _FieldB4._FieldA4 | 1 | 28+8=36 (40+8=48 on 64 bit) |
12 | _FieldB4._FieldA5 | 4 | 28+12=40 (40+12=52 on 64 bit) |
13 | _FieldB4._FieldA6 | 4 (8 on 64bit) | 28+16=44 (40+16=56 on 64 bit) |
Note that _FieldB2._FieldA1 has different offsets depending on whether it's 32bit or 64bit - this is because of the rule I mentioned earlier - _FieldB2 is aligned to the natural alignment of struct A, which is 4 bytes on 32 bit platforms and 8 bytes on 64 bit platforms.
It's also interesting to see what happens with a slight rearrangement of the fields in struct A:
struct A{ int _FieldA1; char _FieldA2; char _FieldA4; short _FieldA3; long _FieldA5; void *_FieldA6;};
All I did was to move _FieldA4 up next to _FieldA2. But this change dramatically changed what happens when the structure is laid out in memory:
Field Index | Field Name | Field Size | Field Offset |
0 | _FieldA1 | 4 | 0 |
1 | _FieldA2 | 1 | 4 |
3 | _FieldA4 | 1 | 5 |
2 | _FieldA3 | 2 | 6 |
4 | _FieldA5 | 4 | 8 |
5 | _FieldA6 | 4 (8 on 64 bit) | 12 (16 on 64 bit) |
So on 32bit platforms, by just moving one field, the structure's memory footprint shrunk by 4 bytes! Note that the size of the data contained in the structure didn't change at all - all that was changed was the packing of the data in the structure. Also note that the total size of the structure didn't change on 64 bit platforms - this is because _FieldA3 pushes the alignment off on _FieldA6.
So let's go back to the original question about WIN32_FIND_DATAA. The structure in winbase.h is:
typedef struct _WIN32_FIND_DATAA {DWORD dwFileAttributes;FILETIME ftCreationTime; :
How does this get laid out in memory?
Lets run through the exercise above.
A FILETIME is:
typedef struct _FILETIME { DWORD dwLowDateTime; DWORD dwHighDateTime;} FILETIME, *PFILETIME, *LPFILETIME;
So, continuing as before:
Field Index | Field Name | Field Size | Field Offset |
0 | dwFileAttributes | 4 | 0 |
1 | ftCreationTime.dwLowDateTime | 4 | 4 |
2 | ftCreationTime.dwHighDateTime | 4 | 8 |
: | : | : | : |
And the mystery of my reader is solved - ftCreationTime is aligned at offset 4 because it only has offset 4 member variables.
And finally, lets consider the INTL_DATA structure mentioned above - why did I say that it didn't work?
Field Index | Field Name | Field Size | Field Offset |
0 | _DateFormat | 2 | 0x0 |
1 | _CurrencySymbol | 5 | 0x2 |
2 | _ThousandsSeparator | 2 | 0x7 |
3 | _DecimalSeparator | 2 | 0x9 |
4 | _DateSeparator | 2 | 11 |
5 | _TimeSeparator | 2 | 13 |
6 | _Padding | 1 | 15 |
7 | _CurrencyPlaces | 1 | 16 |
8 | _TimeFormat | 1 | 17 |
9 | _CaseMapCallAddress | 4 | 20 |
10 | _DataListSeparator | 2 | 24 |
11 | _Reserved | 10 | 26 |
We're just fine up until we get to the _CaseMapCallAddress field. That one's supposed to be at offset 18 according to the MS-DOS documentation, but it's at offset 20 in the C structure! This is because the natural alignment of a far pointer to a function was 32bits even on Win16.
Tomorrow, I'll write about how this was resolved for MS-DOS clients (and refine the packing algorithm mentioned above)
Most of the content in this post was already been posted in GrantRi's blog post Alignment from last summer, it's an excellent reference to the topic. In addition, Raymond Chen wrote about the FILETIME issue here.
Comments
Anonymous
April 07, 2005
The comment has been removedAnonymous
April 07, 2005
Sorry, I'm a little confused on what the offsets in the structure are used for..
I mean when you ask for data within a structure the compilers going to point that at the final offset not where you would expect it to be, right?Anonymous
April 07, 2005
The comment has been removedAnonymous
April 07, 2005
The comment has been removedAnonymous
April 07, 2005
Larry - VMS (or DEC , at least) does generate packed structures. I've just compiled and linked the following code on VC++7.1 and DEC C 5.6
#include <stdio.h>
typedef struct
{
char a;
int b;
} A;
int main(int argc, char** argv)
{
A x;
printf("Offset of a in x = %dn", ((unsigned long)&x.a - (unsigned long)&x));
printf("Offset of b in x = %dn", ((unsigned long)&x.b - (unsigned long)&x));
printf("sizeof(x) = %dn", sizeof(x));
}
On the PC, this prints out:
Offset of a in x = 0
Offset of b in x = 4
sizeof(x) = 8
On VAX/VMS, it prints out:
Offset of a in x = 0
Offset of b in x = 1
sizeof(x) = 5
i.e. the struct is packed.Anonymous
April 07, 2005
Larry,
I think you're right about VMS. I certainly remember running into alignment issues in one enourmous structure my boss had created, and being utterly baffled by what was going on. I'd copied a block of memory from this structure and was using an offset to access one of the members, but for some reason the value wasn't what I expected. I'd only been programming for a few months and I was convinced that there was a compiler bug somewhere! Ahh the arrogance of youth!
IIRC we ended up putting a bunch of packing bytes in places to make things more obvious.Anonymous
April 07, 2005
The comment has been removedAnonymous
April 07, 2005
You'll have to admit that this is one of the things that Pascal (at least what's used in Delphi) got right. If you're using a record in a file, you can just write:
type
TMyRecord = packed record
B: Byte;
D: LongInt;
end
SizeOf(TMyRecord) will now return 5. That's much better than fiddling with #pragmas.Anonymous
April 08, 2005
VMS avoided alignment issues by ording structures. Sure, VAXC would pack structures, but you would get performance hits with alignment problems on the Alpha processor. The common (and quick) thing was to place your largest members first and the decend in size.
struct X
{
INT32 bob;
INT32 frank;
INT16 george;
INT8 tom;
INT8 harry;
};
Of course, this can break down over time with persistant structures.Anonymous
April 08, 2005
The comment has been removedAnonymous
April 08, 2005
Yesterday, I wrote a bit about how the C compiler determines the alignment of...Anonymous
April 09, 2005
I hadn't dealt with packing in quite a while, (probably not since the MSDOS days when I used to calculate how big my data files would be), and then one bit me recently - I included somebodies header file that had data structures and there was a #pragma pack(1) in there without a #pragma pack(push) / #pragma pack(pop) around it! Took awhile to chase that one down.Anonymous
June 08, 2005
Very good! This is the best article that explains the problem of data alignment so clearly.Anonymous
June 08, 2009
PingBack from http://quickdietsite.info/story.php?id=13395Anonymous
June 18, 2009
PingBack from http://fancyporchswing.info/story.php?id=2543