Inside Windows CE API Calls
Posted by: Sue Loh
Windows CE APIs are implemented by a set of server processes. Besides the kernel (nk.exe) we have other server processes: filesys.exe, gwes.exe, device.exe, services.exe. When an application calls an API in one of these servers, the app thread actually jumps into the server process. The API call is done on the application thread. But how does all of that work? Let's trace the code. If you have Platform Builder you can look at some of this. If you have our shared source you can look at all of it.
Most Windows CE APIs are exported by a single central DLL: coredll.dll. All Windows CE applications link against coredll. When an application calls an API, such as GetTickCount, it is calling the GetTickCount export from coredll.dll. But the coredll export just is a small wrapper, also called a "thunk." Here's an example showing what a coredll implementation looks like:
DWORD xxx_GetTickCount ()
{
return GetTickCount ();
}
Reference: %_WINCEROOT%\private\winceos\coreos\core\thunks\*
You can find the coredll thunks in our shared source under %_WINCEROOT%\private\winceos\coreos\core\thunks. The thunk function name has "xxx_" in it, but is exposed from coredll by a different name. The rename happens inside coredll.def:
GetTickCount=xxx_GetTickCount
Reference: You can look at %_WINCEROOT%\private\winceos\coreos\core\coredll.def, though I think you could also find a copy of the .def file somewhere in your public tree even without shared source.
But, if coredll is implementing GetTickCount, then what is the GetTickCount that it is calling? You can find the answer in the public header files under %_WINCEROOT%\public\common\oak\inc. The GetTickCount "function" that coredll calls actually resolves to a specially-cooked invalid address.
Reference: %_WINCEROOT%\public\common\oak\inc\*:
#define GetTickCount COMPLICATED_MACRO(..., SH_WIN32, W32_GetTickCount, ...)
Where W32_GetTickCount is #defined to 13 in another OAK header, and SH_WIN32 is #defined to 0 in the SDK.
You can trace into the definition of IMPLICIT_CALL to find out how it works, but it quickly descends into macro nastiness. The important thing here is that the macros are combining two numbers, SH_WIN32 which is the ID of the API set table that GetTickCount is part of, and W32_GetTickCount which is the index of GetTickCount inside that API set table. The combination produces a 32-bit number, an invalid address into which the API "identity" is encoded. When the coredll thunk xxx_GetTickCount "calls" the GetTickCount macro, it jumps to that invalid address. If you look at the disassembly for the coredll thunks, you'll see the jumps to these addresses.
The jump produces an exception. All exceptions go to the kernel first, and the kernel says, "A-ha! I know this invalid address. It's the encoding for an API, index 13 of API table 0." The kernel marshals (maps) arguments, adjusts permissions, flushes cache and TLB if necessary, and finally sets things up so that the thread continues execution at the desired API inside the desired server process.
Reference:
%_WINCEROOT%\private\winceos\coreos\nk\kernel\x86\fault.c, Int20SyscallHandler.
%_WINCEROOT%\private\winceos\coreos\nk\kernel\objdisp.c, ObjectCall()
The thread, now running inside the server process, executes the real API call. When the call finally returns it takes another exception, because during the API call setup the kernel sets the return address to another specially-coded invalid address. During the return the kernel again adjusts arguments, permissions, and other state as necessary.
Reference: %_WINCEROOT%\private\winceos\coreos\nk\kernel\x86\fault.c, ServerCallReturn.
Finally execution returns to the coredll.dll thunk back inside the original process.
KMODE
As you can imagine, we pay a performance penalty to take these exceptions on the way into and out of every API call. That is part of the reason that "KMODE" and "ALLKMODE" exist. In Windows CE, "kernel mode" threads have permission to access memory addresses outside of their own process. Normal threads could not execute code outside their process slot. However kernel-mode threads have the ability to access any memory they like. A kernel-mode thread can jump straight into another process and execute code. Windows CE takes advantage of the expanded memory access to speed up the performance of kernel mode threads. If you look around the coredll code, you'll find thunks like this (contrived) example:
DWORD xxx_GetTickCount ()
{
// Kernel mode takes a direct jump if (IsInKMode) {
return g_pKmodeEntries->m_pGetTickCount();
}
// Non kernel mode takes a trap
return GetTickCount ();
}
g_pKmodeEntries is a table that the kernel passes to each instance of coredll that's running inside a trusted process. So, only kernel-mode threads running inside trusted processes gain the performance benefit of these KMode short-circuits.
Reference:
%_WINCEROOT%\private\winceos\coreos\core\dll\coredll.cpp, CoreDllInit().
%_WINCEROOT%\private\winceos\coreos\nk\kernel\resource.c, SC_GetRomFileInfo().
%_WINCEROOT%\private\winceos\coreos\nk\kernel\KmodeEntries.cpp, g_KmodeEntries.
Each of the short-circuit functions in the kernel does the work that normally would happen inside the API call trap: it switches the process, maps arguments, and such. This pseudocode might give you an example of what the kernel short-circuit wrappers look like:
DWORD NKGetTickCount ()
{
// This API takes no arguments, otherwise there'd be calls
// to map each argument here.
// Switch to the process that exports the "Win32" API table,
// and get a pointer to the table
pApiTable = SwitchProcess (..., SH_WIN32);
// Call the GetTickCount entry in the table result = (*(DWORD (*) ()) (pApiTable[W32_GetTickCount])) ();
// Return to the original process RestoreProcess ();
return result;
}
For a real example, see:
%_WINCEROOT%\private\winceos\coreos\nk\kernel\kmisc.c, NKRegOpenKeyExW().
Most of the APIs in the system don't have kernel-mode short-circuits. Only a few APIs were chosen for kernel-mode speed-ups, for performance reasons. Windows CE was originally designed to NOT run in all-kernel-mode, for security reasons. Non kernel-mode threads cannot read or write other processes' memory, so process data is more protected. But for performance reasons, some of the Windows CE devices were built for all-kernel-mode. The way these thunks are organized represents a balancing act between coding for all-kernel-mode devices and coding for those which make use of the improved security of user mode.
Comments
- Anonymous
February 08, 2006
Great description of the thunking process, I always had a general idea of what was going on but never saw a real description.
One correction, I believe that kernel mode threads do not have access to any memory address they like, they can only access the memory for their slot plus the kernel memory. Other user processes are still protected. The kernel can't trash the memory of another process any easier than a user mode thread can.
Any process (including the kernel) that wants to access the memory of another process must call SetProcPermissions to gain access to the virtual addresses in the slot.
- Dean - Anonymous
February 08, 2006
Correction to the above - the kernel can trash all physical memory via kernel virtual addresses, since all physical memory is mapped into kernel virtual address space. It just can't arbitrarily access any virtual address in user space below 0x80000000.
- Dean - Anonymous
February 09, 2006
Thanks Dean, you're right, I was being inaccurate when I said that kernel mode threads can access any memory they want. There is still address space protection for user mode virtual addresses. I think (but don't know for sure) that kernel mode threads are allowed to access all of the kernel virtual address space (over 0x80000000). And yeah that would mean that kernel mode threads could access all user mode memory via the static-mapped kernel address space, though they'd have a tough time figuring out the mapping between user mode address and static-mapped kernel address. In reality, any thread that's allowed to run in KMODE will also be allowed to call SetProcPermissions and access the user mode address directly. In KMODE the address space protection is more to catch accidental mistakes than intentional abuses. But, thanks for keeping me honest! ;-)
Sue - Anonymous
February 09, 2006
Kernel mode threads can access all kernel memory, that over 0x80000000. That's all that kernel mode really means, from a user point of view. The other benefits are under the hood performance, which you've described here.
Most people don't realize the difference between kmode and trust. In order for a thread to call SetProcPermissions (or SetKMode, or one of any number of other trusted APIs) the thread has to be trusted, not necessarily in KMode. Unless the OEM has done something to implement a trusted environment, all threads will be trusted, and all threads can move themselves in and out of KMode and independently call SetProcPermissions.
As you said, any thread in KMode can call SetProcPermissions, but that is because it could only have gotten into KMode if it was already trusted.
Maybe a blog on this topic sometime?
Keep up the good work!
- Dean - Anonymous
February 09, 2006
Good points, Dean, thanks!
Sue - Anonymous
February 16, 2006
Is generating an exception this way more efficient on ARM than a software interrupt? Or why was this method picked? Where can I learn more about the internals of Windows CE? - Anonymous
February 17, 2006
I don't know why this method was picked, actually. I think it's probably linked to the fact that we continue executing on the current thread rather than switch to a different thread inside the server process.
The best description of Windows CE internals that I know of is the book "Inside Microsoft Windows CE" by John Murray. It's a little dated by now, but it lays out the groundwork of what you'd need to know to understand what's going on inside the OS.
Sue - Anonymous
January 11, 2007
Posted by: Sue Loh I am occasionally asked whether I know any good books or other resources to help learn - Anonymous
January 31, 2007
Posted by: Sue Loh I've talked about this before but I want to really highlight it because I still see