Keeping your module section count below 16 on Windows CE
Here's a quick blog about an issue that we just hit today; most will merely find it interesting, but I hope it saves someone somewhere a little time, effort, and confusion.
We recently got a new codec library drop which we integrated into our mainline code tree. The codec team spends alot of time developing optimized ARM-versions of windows media codecs, and every once in awhile we get a new library that we need to integrate into our build system.
When we checked the libary into our source tree and ran a Smartphone build, we got roughly this error from one of our build tools:
wmvdmod.dll(0) : fatal error RM0024 : Input File has more than 16 sections
In the ensuing investigation we discovered two things we hadn't previously known:
1. Our codec team has been subdividing their C/C++/Assembly language routines into multiple sections to keep certain code paths together and improve cache/page hit rates. As a result, they had created about 14 extra sections with names like ".decodeX_Pass1" (names changed to protect the innocent ;-). In general, one can view this type of information for any lib or dll by running "dumpbin -headers" on it.
2. Windows CE has some limitations on the number of sections that a module can contain (due to design decisions in the kernel and ROM image filesystems). Ultimately this results in a limit of 16 sections for some scenarios, which is the case we hit in our build tools.
The simplest short-term solution to this problem was to use the merge linker directive to force the linker to merge the different sections in the library back into the .text section. To accomplish this, we added something like the following to the appropriate sources file. This solved the build error without the need to rebuild the library (at the expense of removing all the goodness of using multiple sections to control code placement).
LDEFINES=$(LDEFINES) \
-merge:.decodeX_Pass1=.text \
-merge:.decodeX_Pass2=.text \
-merge:.decodeY_Pass1=.text \
-merge:.decodeY_Pass2=.text \
...
Note: I'm told one can accomplish the same feature within c/cpp files using #pragma comment(linker,"-merge:.foo=.bar")
In the ensuing discussion of how to fix this in the correct way (e.g. removing the restriction on the number of sections, or using fewer sections in the codec lib), our compiler/linker guru came down firmly on the side that there's no reason to need more than 16 sections (or really more than four or five), and noted that this whole situation could have been easily avoided using the following techniques:
For performance, if you want page alignment, use __declspec(align). If you need to control code layout, use the linker’s /ORDER switch with a file containing the symbol ordering you need. Alternatively, use the linker’s automatic sorting of section suffixes, e.g. .text$FOO_A, .text$FOO_B, and .text$FOO_C are automatically merged with .text in alphabetical order.
We didn't previously know about the linker options to automatically sort and merge sections using the $ delimiter, and I suspect that most other people don't either. We'll now go back to the codec team and suggest that future drops can just use the automatic sorting mechanism to ensure that code is grouped as needed while keeping all the code in the .text section. As a nice side benefit, grouping code into the same section saves on the amount of ROM required for the code. Each section must start on a 4k boundary, so on average each section will waste 2k or ROM. Note that section names are case sensitive, so .TEXT is not the same as .text.
Here are some other related details about sections which I've shamelessly stolen from some other developers here at MS:
Paging:
Code may be paged into a size-limited RAM buffer called a "page pool". The page pool helps limit the RAM impact of code by keeping resident only the code pages currently in use. Code that must always stay resident in RAM can be marked as non-pageable, but this will cause the full extent of that code section to be copied into RAM for as long as the module is loaded.
To limit the footprint of a module in the page pool, it’s best to group the functions and constant data that are in the working set together. This will allow the working set of code to exist in the page pool in the smallest number of pages. You can group them together using custom section naming. If section names are unique they will each be page-aligned (4k), so unless they truly need unique attributes, it’s best to name them such that automatic section merging can take place. Automatic section merging happens on sections named using a “section_name$subsection_name” convention, such that they all merge into one section named “section_name”.
For readability, give the subsection a name related to the grouping reason, such as “initialization”, “debug”, or “core”.
Example
To group function1 and function3 together in a custom subsection, you can do the following.
#pragma code_seg(".text$initialization") // Code that follows goes into named subsection
void function1(void) {return;}
#pragma code_seg() // Code that follows goes into default .text section
void function2(void) {return;}
#pragma code_seg(".text$initialization") // Code that follows goes into named subsection
void function3(void) {return;}
#pragma code_seg() // Code that follows goes into default .text section
Non-Pageable Sections
If you need only a small bit of the code to stay in RAM always for performance or reliability reasons (like time-critical driver code), you can make the module partially pageable by creating a completely new section with custom attributes.
The following pragma defines a section called "NonPageableCode" which is set to non-pageable.
#pragma comment(linker, "/SECTION: NonPageableCode,ER!P")
There is also an newer, more readable way of specifying the section properties which has been available since CE5:
#pragma section("NonPageableCode", execute, read, nopage)
Now, in the source code, to make a section of code non-pageable, put the following line before the code:
#pragma code_seg("NonPageableCode")
Afterward, you may use the following line to force following code to be placed back in the default .text section:
#pragma code_seg()
Tools
DUMPBIN /HEADERS FOO.DLL (to see what sections exist in the module)
That's it for now.
Comments
Anonymous
February 22, 2008
PingBack from http://www.biosensorab.org/2008/02/22/keeping-your-module-section-count-below-16-and-understanding-why-this-is-important/Anonymous
March 31, 2008
Very interesting! Looking forward to hear about tips to improve cache/TLB misses and buffers management. Thanks a lot! Alex