Share via


Enhancing PTE Access Performance with ARM Processors (Windows Embedded Compact 7)

This article is intended for developers using Windows Embedded Compact 7 who are familiar with ARM processor architecture details. For more information about the ARM architecture, see the ARM Infocenter website.

Overview

This article refers to ARM processors in two ways, described in the following table.

Syntax Description

ARMvX

Indicates the ARM processer architecture version. Examples include ARMv5, ARMv6, and ARMv7.

ARMXX

Indicates the ARM processor family. For example, the ARM11 family of processors implements the ARMv6 architecture.

Windows Embedded Compact 7 supports settings that enhance the performance of ARM processors when they access Page Table Entries (PTEs) in memory. You configure Windows Embedded Compact 7 settings to achieve this performance increase by enabling the Memory Management Unit (MMU) to fetch a PTE from the cache when a Translation Lookaside Buffer (TLB) miss occurs. You can enable the write-through cache for cache levels that the MMU cannot directly fetch to enhance memory access performance.

The Windows Embedded Compact 7 kernel provides three global variables with which you can adjust the way that the ARM architecture accesses memory; these variables are declared in OEMGlobal.h. You can read more about the following variables in subsequent sections of this article.

  • dwTTBRCacheBits
  • dwPageTableCacheBits
  • pfnPTEUpdateBarrier

For more information about global variables in Windows Embedded Compact 7, see OEMGLOBAL (Windows Embedded Compact 7).

Supported Versions

This article describes memory access performance enhancements that are available only in Windows Embedded Compact 7, but which also apply to ARM-licensed technologies for architecture versions v5, v6, v6 MP, and v7.

Translation Table Base Register Cache Bits

The dwTTBRCacheBits global variable specifies how the ARM processor should access page tables in the processor’s Translation Table Base Register (TTBR0, TTBR1). The processor uses TTBR0 for processes and TTBR1 for the kernel and I/O. The TTBR holds the physical address of the first-level page table. This table might be in RAM, or it might be in the data cache.

For more information about the TTBR, see table 3.54 in section 3.3.10 of the ARM1136JF-S and ARM1136J-S Technical Reference Manual.

Page Table Cache Bits

The dwPageTableCacheBits global variable specifies cache bits that the kernel uses to access the PTEs. This value is stored in each of the page table entries, and applies to the TEX, C, and B bits. Windows Embedded Compact 7 uses the small-page, 4-KB, second-level descriptor format. The TEX value is represented by bits 8, 7, and 6, and the C and B bits are 3 and 2 respectively. For details, see Figure 6.8 of the ARM1136JF-S and ARM1136J-S Technical Reference Manual.

It is important to note that the settings applied when you use the dwPageTableCache bits must correspond to the behavior that you assign to dwTTBRCacheBits by using settings. For example, if the MMU is configured to fetch a PTE from the L2 cache, you must set dwPageTableCacheBits to enable the L2 cache access as well. This allows the kernel to access the PTE with the L2 cache enabled. If these variables are not set properly, a system crash can occur due to data inconsistency between the TLB (updated by the MMU) and memory (updated by the CPU).

Page Table Entry Update Barrier

The pfnPTEUpdateBarrier global variable is a pointer to an optional function of the type PFN_PTEUpdateBarrier you use to update the barrier to PTEs. The Windows Embedded Compact 7 kernel contains a default implementation of the UpdateBarrier function, which issues a Data Synchronization Barrier (DSB) processor instruction for ARMv6 and later architectures. Usually you need to provide your own implementation of this function to drain the write buffer if you are running your operating system on ARMv5 architecture.

The following code example illustrates the instructions that are required to override the default PTE update barrier routine for iMX27 on ARMv5.

  LEAF_ENTRY OEMPTEUpdateBarrier

  mov     r0, #0
  mcr     p15, 0, r0, c7, c10, 4          ; drain write buffer

  mov     pc, lr

  END

 

Additional Resources

For more information about the ARM architecture, see the ARM Infocenter website.

For more information about the global variables in Windows Embedded Compact 7, see OEMGLOBAL (Windows Embedded Compact 7).

For a specific example of how to set the dwTTBRCacheBits and dwPageTableCacheBits variables, see %_WINCEROOT%\Platform\imx313ds\src\oal\oallib\init.c.