Jaa


A look at Windows Virtual Memory mechanisms (continuation of "A look at Virtual Address Space - VAS")

As I promised last time here comes next post on memory J.  Remember, my eventual goal is to reveal how memory management works in SQL Server but for you to really appreciate it, I think, you do need to get good feeling on how Windows manages memory. Understanding details is great however at this point I want you to understand the concepts! In some cases I on purpose skip the details because right now they are not important and many of them can be changing from one release to another.  So lets continue!

 

In my previous post I mentioned that VAS regions could be bound to physical memory right a way or latter on using VirtualAlloc API. As most of the operating system do Windows binds physical memory to the VAS on demand, the first time page in a region is touched. (The story is a bit different when you are running without paging file, then VAS page is bound to physical page right a way) The binding happens only one page at a time. When memory first touched hardware generates exception, called page fault. The exception is handled by Windows. The OS verifies if current VAS region is committed by looking into the VAD structure representing this region. If region is committed and it is accessed for the first time OS will find a physical page in RAM it can use,(Keep in mind that this page will be zeroed out before given away due to security reasons) Finally it binds VAS region to the page by filling appropriate data structures and loads this information into CPU then execution continues right at the point where page fault occurred.

 

Due to shortage of RAM Windows might decide to take a way physical page from a process. Using a reclaiming policy the OS finds a page that it will free. Then OS makes sure that page file contains up to date image of the page on disk by spilling page to paging file if necessary. Once the page is on disk, OS can fix up all required data structures so that next time the page is touched it will know where to find it. When finished, the physical page is zeroed out and then put on the free list to be used for subsequent memory requests.

 

As I said previously a CPU generates exception, a page fault, when it can’t find a physical page corresponding to a virtual address it tries to access. An application can generate page fault by two ways first when it touches VAS just committed region for the first time and second if the OS previously paged out given page to disk.  (when application touches a VAS page that hasn’t been committed yet Windows will generate exception). The type of fault when page needs to be brought from disk is called hard page fault. If application touches just committed VAS region for the first time it will generate a a demand zero page fault. There is also another type of page fault – soft page fault. Soft page fault occurs when Windows can still find the page in physical memory and serve it without bringing the page from disk.

Using Windows virtual memory mechanism the same physical pages can be mapped to different VASes at different places and multiple times. Physical pages that can only be mapped to single VAS are called private, because they can’t be shared across different VASes. Physical pages that are mapped to different VASes at the same time are called shared physical pages.  

 

All VAS regions committed using VirtualAlloc* interfaces bound to physical pages that can’t be shared across different VASes. One can think about them as private physical pages. A sum of all private physical pages, in RAM and on disk is called Private Bytes, using perfmon terminology or VM Size according to Task Manager’s terminology. A sum of all private physical pages that are only in RAM is called working set, or Memory Usage according to Task Manager.

 

As I mention before, depending on physical memory demands Windows might remove physical pages from process’s working set.  This activity is usually referred as paging. The OS provides a way to avoid paging of a given VAS regions. It provides mechanism to “lock” VAS regions in physical RAM. As you would expect an application that tries to lock its VAS regions might destabilized the behavior of the whole system.  To mitigate this behavior Windows has “Lock pages in memory” privilege turned off by default so that only applications that a specifically given this permission by administrator can lock pages in memory.  In addition the OS still might page out the whole process’s working set if needed.

 

Even though the size of x86 registers is 32 bit the platform enables an OS to manipulate with 64GB of RAM using Physical Address Extensions, PAE. On Windows PAE can be turned on from boot.ini file. If PAE is not turned on, Windows will only be able to manipulate with up to 4GB of RAM even though box might have more of RAM installed. 

 

Some user applications require more VAS than 2GB. Windows have ability to configure VAS of user application up to 3GB. This feature has significant drawback. It limits amount of VAS available to kernel down to 1GB. Increasing size of user VAS can be done by adding /3GB switch to boot.ini file. Once modification is complete system will require reboot.

 

Limiting kernel VAS to 1GB affects the whole box not only the application that needs larger size VAS. For example ones 3GB switch is turned on amount of RAM that the OS can support, if PAE is enabled, drops down from 64GB to16GB. 3GB switch affect all kernel components including all drivers. Having 3GB switch turned on can cause effects from drop in performance and memory allocation failures to system stalls. My suggestion is to avoid usage of 3GB switch unless you really really need it.

 

As I described, on x86 platforms VAS is limited to 2GB or 3GB depending on configuration in boot.ini file. On AMD64, 32 bit application can have up to 4GB.   Such small VAS could be a significant drawback for high end servers manipulating with gigabytes and terabytes of data, think about SQL Server. I also said that PAE switch enables Windows to manipulate with up to 64GB of RAM. But how one process does access so much memory? In order to enable single process to manipulate with amount of RAM bigger than size of VAS, Windows provides Address Window Extension, AWE, mechanism (Remember your DOS yearsJ). Be careful many authors use term AWE memory. There is no such thing as AWE memory, one can’t go and buy AWE memory at a store J. . Since there is no AWE memory, there is no either low AWE nor high AWE memory. There is AWE mechanism that enables one to manipulate with amounts of RAM larger than VAS! The principle is simple. Using AWE API one can allocate physical memory, then using VirtualAlloc API allocate VAS regions and then using AWE mechanism bind/ubind VAS regions and physical memory. As with locked pages in order for an application to use AWE mechanism it needs to have Locked pages in memory privilege enabled.

 

 AWE API can be used even on the boxes with RAM below of a process’s VAS.  In fact AWE could be used to avoid any type of paging (Remember that when locking pages in memory using VirtualLock mechanism Windows still can page out the whole process)  When using AWE mechanism, the OS can’t interfere at all. With such flexibility comes a difficulty. Misuse of AWE mechanism can stall the whole box so that only way to recover from such state could be a reboot.

 

Well I think it is enough info for today J From this and previous posts all I want you is to grasp the concepts. They are really important!  Here is the quick summary:

 

      - Every process has its own VAS

- VAS is often neglected by developers.

- VAS is limited resource even on 64 bit platform

- VAS is managed by Windows the same way as one would manage a heap

- VAS's smallest region is 64kb

- VAS regions is managed by corresponding VAD in kernel

- VAD describes state of VAS region - allocated, committed, uncommitted

- A VAS page bound to physical page in RAM on demand, the first time page is accessed, unless running without page file).

- Physical pages can be private or shared

- Set of all private physical pages for a process is called private bytes (PM), or virtual memory size (TM)

- Set of private physical pages in RAM is called process working set.

- Physical pages bound to specific VAS region can be locked in memory

- PAE switch enables Windows’s support for 64GB of RAM

- On x86 VAS is limited to 2GB, 3GB or 4GB

- /3GB switch increases user VAS to 3GB and decreased kernel VAS to 1GB.

- /3GB switch limits amount of physical memory enabled through PAE switch down to 16GB

- /3GB switch is not recommended

- AWE is mechanism, it is not memory.

- AWE enables single process to use memory outside of VAS limits.

      

-     /3GB swith is not required to enable AWE mechanism

-     /PAE switch is not required to enabel AWE mechansim

Some interesting gotchas:

- Allocating physical memory using AWE mechanism is slow on Windows 2000. (This is a reason why SQL Server 2000 doesn’t manage memory dynamically when AWE is enabled). The problem is fixed in Windows 2003 server so that Yukon does allocates physical memory dynamically

- Physical pages allocated through AWE mechanism are not part of process working set neither large pages (I will cover large pages sometime latter). This is exact reason why neither PerfMon or Task Manager show them

 

Next time we will talk about memory pressures and then we will be ready to dive into SQL Server Memory Manager.

 

For now enjoy the weekend!

Comments

  • Anonymous
    January 30, 2005
    I'm convinced that Windows is too agressive at reclaiming physical memory from other processes.

    My laptop has 1GB of ram in it, and yet a spyware scan essentially invalidates the entire 1gb of memory space. (Evident by how long it takes to bring up windows that were left open overnight.) Sometimes these windows take in excess of 15 seconds to come back to life.

    I wish there was some threshold I could set to limit the pushing of physical memory into a greedy process that is consuming ram without end.

    I'm beginning to wonder in a GC'd world if this problem is only going to get worse in terms of resposiveness to "background" apps

  • Anonymous
    January 31, 2005
    :-). What you actually hit in this case is a known problem. It is called: "standby list erosion problem". It is possible for an application to eat into other processes working sets. One likely will hit it, when he leaves a machine idle for sometime. As far as I know in the next releases of Windows this problem should be addressed.

  • Anonymous
    February 05, 2005
    Hello Slava, great posts so far!

    Now that Eric mentions this, I too am convinced that the current algorithm employed by the Windows Memory Manager is too agressive. I have a different benchmark than Eric:

    I have a laptop with 1 GB of RAM as well. If I boot Windows up, it will take roughly 120 MB of RAM. If I insert a CD and start copying it on HDD, Windows starts kicking pages out even though it would have enough physical memory to hold the existing working sets and the entire CD.

    I think this happens because of the cache manager that starts claiming pages and because of the paging algorithm that kicks pages out even when it doesn't have to. I suspect this is causing Eric's problems as well. Don't get me wrong, I think the Windows memory manager does a fine job on low end systems with 128 or 256 MB of RAM, but in order to get rid of the issue above, I disable the page file entirely on machines with at least 512 MB of RAM.

    It's good to know the issue will be addressed. Maybe Microsoft will address it on Windows Server as well: I have machines with 2 GB of RAM, yet Windows insists on creating a 3 GB swap file on installation. It should rarely get used, the problems described above show up very often on servers as well. I'm sure you know this better than me, on a server, paging is death. I end up disabling the page file on servers as well, but it should be done by Windows by default.

  • Anonymous
    February 06, 2005
    The comment has been removed

  • Anonymous
    February 09, 2005
    The reason AWE pages are not part of private bytes is because they are treated differently from process's private pages. The same case is for large pages. I don't know if the OS has plans in adding new perfmon counter to report AWE and large pages. In next version of SQL Server you will be able to see amount of AWE pages allcoated by SQL Server using dynamic management views, dmv.

  • Anonymous
    February 20, 2006
    Thanks, Slava, this is a really useful post. But I'd like to ask a specific question about my PC running WinXP: I have a gig of memory on an Athlon 64 3500+, and I regularly get notifications that my Windows Virtual memory is overloaded. I now know how to tweak this, but why is it happening in the first place?

    Thanks for any feedback, y'all.

  • Anonymous
    March 05, 2006
    The reason it happens is because applications that you have running along with OS together require more memory than RAM (1GB) + size of swap file. There are several ways you can address the issue:
    - Add more RAM
    - Increase swap file size
    - Limit number of applications you are running simultaneously

  • Anonymous
    April 26, 2006
    The comment has been removed

  • Anonymous
    June 26, 2006
    Q: Hi,

    you write:
    "A sum of all private physical pages that are only in RAM is called working set".

    I don't unerstand how can working set count only private pages in RAM. In my task monitor I see almost all processes with Mem Usage > VM Size. Since Mem Usage is  Workin Set and VM Size is private bytes, it means that private bytes in RAM are greater than tootal private bytes in the process which is an absurd.

    am I missing something? Can someone explain?

    thanks
    A: You are actually correct.  I made a mistake. The way working set is reported by perfmon and tm both of them include non private pages. I did following experiment: Started notepad.exe and verified that both perfmon and tm report working set to be 2.7MB and private bytes to be around 733KB. Then I used vadump (http://windowssdk.msdn.microsoft.com/en-us/library/ms726766.aspx) to get comprehensive working set report:
    >vadump.exe -p 444 -o

    Category                                Total            Private Shareable    Shared
                                          Pages    KBytes    KBytes    KBytes    KBytes
         Page Table Pages         15        60          60           0               0
         Other System              11        44          44           0                0
         Code/StaticData          445      1780       172          0              1608
         Heap                          49       196         196          0                0
         Stack                           2         8           8             0                0
         Teb                             1         4           4             0                0
         Mapped Data               146       584       0            24              560
         Other Data                  13        52        48           4                 0

         Total Modules              445      1780     172         0               1608
         Total Dynamic Data     211       844      256        28               560
         Total System               26       104       104         0                 0
         Total Working Set      682      2728       532        28              2168

    Notice that private working set is 532 and smaller than private bytes. Now it makes more sense, right?
    One thing we need to keep in mind that we can't use tm or perfmon working set counters to find out how much of physical memory is actually in use.  This is due to the fact that some pages might be counted multiple times.

  • Anonymous
    October 10, 2006
    Hi, This post was really helpfull. Could you post here a little information about the "virtual memory" watch of perfMon also please ? Does it mirror the total VAS of a given process ? I've indeed have a problem with the virtual memory, that i don't manage to discharge whereas that i can discharge the private bytes. Thanks for help, S.

  • Anonymous
    February 20, 2007
    Whatever happened to the "standby list erosion" issue? Did it get patched? Is it a case of over come by GB of excess RAM on most servers? If it still exists for Windows 2000/2003 servers, what is the key phrase or term most in use today? This page was the only one I could find referencing the issue via google and the term "standby list erosion".