Enhancing PTE Access Performance with ARM Processors (Windows Embedded Compact 7)
This article is intended for developers using Windows Embedded Compact 7 who are familiar with ARM processor architecture details. For more information about the ARM architecture, see the ARM Infocenter website.
Overview
This article refers to ARM processors in two ways, described in the following table.
Syntax | Description |
---|---|
ARMvX |
Indicates the ARM processer architecture version. Examples include ARMv5, ARMv6, and ARMv7. |
ARMXX |
Indicates the ARM processor family. For example, the ARM11 family of processors implements the ARMv6 architecture. |
Windows Embedded Compact 7 supports settings that enhance the performance of ARM processors when they access Page Table Entries (PTEs) in memory. You configure Windows Embedded Compact 7 settings to achieve this performance increase by enabling the Memory Management Unit (MMU) to fetch a PTE from the cache when a Translation Lookaside Buffer (TLB) miss occurs. You can enable the write-through cache for cache levels that the MMU cannot directly fetch to enhance memory access performance.
The Windows Embedded Compact 7 kernel provides three global variables with which you can adjust the way that the ARM architecture accesses memory; these variables are declared in OEMGlobal.h. You can read more about the following variables in subsequent sections of this article.
dwTTBRCacheBits
dwPageTableCacheBits
pfnPTEUpdateBarrier
For more information about global variables in Windows Embedded Compact 7, see OEMGLOBAL (Windows Embedded Compact 7).
Supported Versions
This article describes memory access performance enhancements that are available only in Windows Embedded Compact 7, but which also apply to ARM-licensed technologies for architecture versions v5, v6, v6 MP, and v7.
Translation Table Base Register Cache Bits
The dwTTBRCacheBits
global variable specifies how the ARM processor should access page tables in the processor’s Translation Table Base Register (TTBR0, TTBR1). The processor uses TTBR0 for processes and TTBR1 for the kernel and I/O. The TTBR holds the physical address of the first-level page table. This table might be in RAM, or it might be in the data cache.
For more information about the TTBR, see table 3.54 in section 3.3.10 of the ARM1136JF-S and ARM1136J-S Technical Reference Manual.
Page Table Cache Bits
The dwPageTableCacheBits
global variable specifies cache bits that the kernel uses to access the PTEs. This value is stored in each of the page table entries, and applies to the TEX, C, and B bits. Windows Embedded Compact 7 uses the small-page, 4-KB, second-level descriptor format. The TEX value is represented by bits 8, 7, and 6, and the C and B bits are 3 and 2 respectively. For details, see Figure 6.8 of the ARM1136JF-S and ARM1136J-S Technical Reference Manual.
It is important to note that the settings applied when you use the dwPageTableCache
bits must correspond to the behavior that you assign to dwTTBRCacheBits
by using settings. For example, if the MMU is configured to fetch a PTE from the L2 cache, you must set dwPageTableCacheBits
to enable the L2 cache access as well. This allows the kernel to access the PTE with the L2 cache enabled. If these variables are not set properly, a system crash can occur due to data inconsistency between the TLB (updated by the MMU) and memory (updated by the CPU).
Page Table Entry Update Barrier
The pfnPTEUpdateBarrier
global variable is a pointer to an optional function of the type PFN_PTEUpdateBarrier
you use to update the barrier to PTEs. The Windows Embedded Compact 7 kernel contains a default implementation of the UpdateBarrier
function, which issues a Data Synchronization Barrier (DSB) processor instruction for ARMv6 and later architectures. Usually you need to provide your own implementation of this function to drain the write buffer if you are running your operating system on ARMv5 architecture.
The following code example illustrates the instructions that are required to override the default PTE update barrier routine for iMX27 on ARMv5.
LEAF_ENTRY OEMPTEUpdateBarrier
mov r0, #0
mcr p15, 0, r0, c7, c10, 4 ; drain write buffer
mov pc, lr
END
Additional Resources
For more information about the ARM architecture, see the ARM Infocenter website.
For more information about the global variables in Windows Embedded Compact 7, see OEMGLOBAL (Windows Embedded Compact 7).
For a specific example of how to set the dwTTBRCacheBits
and dwPageTableCacheBits
variables, see %_WINCEROOT%\Platform\imx313ds\src\oal\oallib\init.c.