Still more misinformation about virtual memory and paging files

The wired network in my building's being unusually flakey so I'm posting this from my laptop, sorry for the brevety..

Slashdot had a front page story today about an article be Adrian Wong posted in his Rojak Pot: "Virtual Memory Optimization Guide".

I've not finished reading it (the site's heavily slashdotted), but his first paragraph got me worried:

Back in the 'good old days' of command prompts and 1.2MB floppy disks, programs needed very little RAM to run because the main (and almost universal) operating system was Microsoft DOS and its memory footprint was small. That was truly fortunate because RAM at that time was horrendously expensive. Although it may seem ludicrous, 4MB of RAM was considered then to be an incredible amount of memory.

4MB of RAM?  Back in the "good old days" of 1.2MB floppy disks (those were the 5 1/4" floppy drives in the PC/AT) the most RAM that could be addressed by a DOS based computer was 1M.  If you got to run Xenix-286, you got a whopping 16M of physical address space.

I was fuming by the time I'd gotten to the first sentence paragraph of the first section:

Whenever the operating system has enough memory, it doesn't usually use virtual memory. But if it runs out of memory, the operating system will page out the least recently used data in the memory to the swapfile in the hard disk. This frees up some memory for your applications. The operating system will continuously do this as more and more data is loaded into the RAM.

This is SO wrong on so many levels.  It might have been be true for an old (OS8ish) Mac, but it's not been true for any version of Windows since Windows 95.  And even for Windows 1.0, the memory manager didn't operate in that manner (it it was a memory manager but it didn't use virtual memory (it was always enabled and active swapping data in and out of memory, but the memory manager didn't use the hardware (since there wasn't any hardware memory management for Windows 1.0))).

It REALLY disturbs me when articles like this get distributed.  Because it shows that the author fundimentally didn't understand what he's writing about (sort-of like what happens when I write about open source :) - at least nobody's ever quoted me as an authority on that particular subject)

Edit: I'm finally at home, and I've had a chance to read the full article.  I've not changed my overall opinion of the article, as a primer on memory management, it's utterly pathetic (and dangerously incorrect).  Having said that, the recommendations for improving the performance of your paging file are roughly the same as I'd come up with if I was writing the article.  Most importantly, he differentiates between the difference between having a paging file on a partition and on a separate drive, and he adds some important information on P-ATA and RAID drive performance characteristics that I wouldn't have included if I was writing the article.  So if you can make it past the first 10 or so pages, the article's not that bad.

Comments

  • Anonymous
    March 25, 2005
    Never believe anything you read on /. -- it's a shadow of its former self; mostly populated by a bunch of trolls and ne'er do wells, the editors are so far up themselves they're in danger of becoming Klein bottles, and it's nothing more than a mouthpiece for press releases these days.
  • Anonymous
    March 25, 2005
    Even Linux has the idea of a working set, although the Linux algorithm (http://www.linux-tutorial.info/modules.php?name=Tutorial&pageid=311) appears to be based on the number of free pages in the system as a whole, unlike NT which IIRC treats each individual working set (each process plus the system working set) separately.

    My source as always is "Windows Internals, 4th Edition" by Mark Russinovich and David Solomon.

    It's worth being clear that the system working set ('Memory: Cache Bytes' performance counter or 'System Cache' in Task Manager) does not just cover the file system cache, but all kernel-mode pageable code and data, including paged pool. Some memory is double-counted in 'Available' and in 'System Cache' in Task Manager - I've actually seen the sum of the two exceed the actual amount of memory available on my 1GB system at work.
  • Anonymous
    March 25, 2005
    The comment has been removed
  • Anonymous
    March 25, 2005
    Larry,

    As your comments above seem reasonable and you seem to know a thing or two perhaps you can help out. I have been looking for a way to make WinXP limit the size of memory it will use for Disk Caching functions. Win9x was configurable through the system.ini with:
    [vcache]
    MinFileCache=4096
    MaxFileCache=32768

    This was a wonderful adjustment given that Windows will try to use all "free" memory that is available for disk cache when anything over what is now a small amount of memory has very negligable, if any gains.

    It's a pretty sorry statement when you have three programs running and one of them is writing a lot of data to disk (Photoshop), and windows decides to swap your browser or mail program out to disk because 700 MB just isn't enough disk cache.

    Any ideas? Or contacts that could solve this problem?
  • Anonymous
    March 25, 2005
    Shouldn't one always read EVERYTHING that the writer has to say before giving out comments?

    Plus, at least he tried and write the article. Not accurate? SO what, help the author correct it to be so and help save the world for further condemnation with wrong information!

    I just keep seeing bashing saying I'm right and you're wrong but no further actions to help right the wrong.... is that smart? I think not.
  • Anonymous
    March 25, 2005
    If its any consolation, the article was written in 1999. And throughout the article claims are made but not backed up with empirical evidence. We can see how a pagefile or swap file or whatever it is you wish to call it can be taken from across the disk to some edge from pretty pictures, and we can see from a graph that the outer edges of a disk will run faster, but from those two graphics the author infers that moving the swap file has a 16+ percent improvement. Sure, there's some theory on seeking and reading added, but there's also latent assumptions about the workload strewn in there.

    The biggest problem with the article is that it needs a big "In theory," prepended to every statement.
  • Anonymous
    March 25, 2005
    The comment has been removed
  • Anonymous
    March 25, 2005
    Memory management is a difficult subject.

    A certain large company keeps its details secret.

    As raymond demonstrated in "how much memory can your app access" series (a per process view of memory), most comments were about people's machines.

    How about you go get "dev manager in charge of memory management" to answer technically common questions.

    In the absence of "official" data all I can say to "how do I disable paging to make my paid for app run faster" is "the processor is designed to page, the chipset is designed to page, Windows is designed to suit the hardware, so is also designed to page". This is not really an answer so I normally avoid debates.

    Theory is supposed to be a guiding thought for action. So how does one apply an unknown implemtation to guide ones actions in common user senarios.
  • Anonymous
    March 25, 2005
    The comment has been removed
  • Anonymous
    March 27, 2005
    The comment has been removed
  • Anonymous
    March 27, 2005
    I think what the author said about the OS gradually paging apps out to disk as data is loaded is correct, but not for the stated reason.

    Try typing this in an NT command prompt with a dir that has a lot of big files:

    for /r %x in (*) do @copy "%x" nul

    CMD itself uses a neglegible amount of memory but you'll find that most of your other apps are paged out by the time it finishes. My understanding is that the NT memory manager treats the cache manager like any other process, so when it sees the cache manager memory mapping a lot of data, it gradually enlarges its working set of the cache, paging other app memory out in the process. As Alan noted above, this happens in Windows 95/98 as well, even though the VCACHE architecture is different.

    In other words, I don't think your assertion that Windows only throws out pages when programs need them is quite correct -- when the disk cache is involved, Windows will also page when it thinks the app might need more memory for caching. This behavior was very useful in the days of Windows 95 when memory was very tight and paging memory to increase the disk cache from 1MB to 4MB helped considerably, but nowadays doing it to enlarge the cache from 500MB to 800MB isn't as helpful.
  • Anonymous
    March 27, 2005
    The comment has been removed
  • Anonymous
    March 27, 2005
    Phaeron,
    In that case, some program needed to use the memory. Now it only needed to use it to copy a file, but it DID need to use the memory.

    Having said that, there IS an issue that I purposely ignored. If you have a boatload of low frequency threads running (like a bunch of tray icons that wake up to check the internet to see if a new version of software's available), then the system needs to page those apps in.

    And that may, in turn cause the pages that hold your application to be paged out. Even though you, the user weren't using those pages, someone wanted to use them.

    This particular issue's been one of great concern to the memory management people in Windows - they know that this leads to highly negative user experiences and they're working really hard to fix it.
  • Anonymous
    March 27, 2005
    The comment has been removed
  • Anonymous
    March 28, 2005
    Larry: there is a typo in your text: "fundimentally"

    Mike Dimmick: working set is not "an idea", it is precisely defined replacement algorithm developed by Denning (http://cne.gmu.edu/pjd/PUBS/bvm.pdf). According to publically available information, Windows kernel uses modified version of working set algorithm.

    Linux, on the other hand, uses MACH/BSD active/inactive queues and physical scanning as a basis for its per-zone page replacement, which is quite different from working set.

    2.2 Linux kernel did use virtual-based scanning that is somewhat closer to working set.
  • Anonymous
    March 28, 2005
    The comment has been removed
  • Anonymous
    March 28, 2005
    Mike: Yeah, we realize that it's a problem. Unfortunately when the devs working on IE3-or-so wrote the code they didn't have large files in mind. And until recently this code "just worked" or at least nobody complained, so the team was focused on solving other problems.

    Based on testing with huge PEs, I still doubt that signature verification is the root cause behind an install taking an hour.
  • Anonymous
    March 28, 2005
    The comment has been removed
  • Anonymous
    March 28, 2005
    Larry,

    Phaeron pretty simply summarized the issue. Sure the disk cache thinks it needs pages, and to an extent it does, but being a cache it has no idea if the pages will serve any purpose in the future. The point being that it is doing this at the expense of other pages that belong to applications. I would agree with you that the OS really doesn't have any more reason to believe those paged would ever be needed again either. So it is a difficult problem.


    My point is that it's fairly well proven that cache efficiency gains drop off very rapidly and the amount of gain you get in performance after a certain size is very miniscule. To that extent I would certainly like to limit my disk cache size to something reasonable to stop the memory management from wastefully throwing away pages that I know will be used again for pages that I know won't be. I realize it doesn't know this.

    In general I think the behaviour will benifit just about everyone, if Microsoft doesn't believe so they can leave the default however they want to. I just want a way to fix it so I don't have to suffer along with everyone else.

    Mike,

    The point isn't to fix the copy command, that was just an example that demonstrates this problem in a simple fashion. It will happen when any number of applications are running. For instance when I run Photoshop, it will page in some huge image (that was paged out to accomodate a disk cache expansion) so that it can write it out to disk, all the while the window manager grows the disk cache to accomodate it while throwing those pages (or worse as those pages were just loaded) away. Then I when I go back to editing the image again it has to reload it. The net effect is that you need over 2x the memory of the actual in memory image. Quite wasteful, especially so when you don't quite have that much memory.
  • Anonymous
    March 29, 2005
    I did a search on the writer. He doesn't sound as stupid as you made him out. He even has a book on the BIOS in Amazon.com. Looks interesting. I may just buy it and see how good it is. Are you sure you are not being too harsh? You do sound like you have quite an ego. Can you prove he's wrong and you are right? Just wondering.
  • Anonymous
    March 29, 2005
    One of the problems with disk caching is that modern disks throw a monkey wrench in the problem: Most now include a very large amount of cache on their own side, so OS-side disk cache is less useful than before now that writing over a IDE/SCSI cable is just a (slow) direct mem copy. This makes things much faster and less fragmented, without needing nearly as much intervention from the OS. Maybe new versions of windows should query the drive and step back if they have enough of their own caching.
  • Anonymous
    March 29, 2005
    The comment has been removed
  • Anonymous
    March 29, 2005
    The comment has been removed
  • Anonymous
    March 30, 2005
    Once again a small comment fleshed out into... this. :-)

    James Risto wrote:
    this file caching thing. Is there merit to it?

    I don't know what I considered more funny - calling it "this thing" or even questioning if there's merit to it. :-)

    But seriously; yes, caching can have a dramatic (positive) effect on performance. The cache is, when a cache-hit occurs, usually several orders of magnitude faster (think 10^3-10^6 times faster) than having to read data from the disk (or redirector, or whatever). Besides performance it also has other positive effects, such as reducing the mechanical work disks have to do, reducing overall network usage (if you're using e.g. an SMB environment) and so on. This makes disks last longer, less energy (uselessly) be used, and other users on your LAN segment suffer less. :-)

    But this all assumes cache-hits. It unfortunately takes just one single (bad) application opening and accessing a single (large) file in cached mode to completely destroy these positive effects for all other applications, and indeed the whole system itself.


    Things I'd like to see researched, with results presented - or had the option presented itself (), research myself - would be:

    - Limit cached data of (sequentially) accessed "large" files to, say only last two 64 or 256KB (or more, depending on how much RAM is currently free - but still marking these pages as "very old"). This would catch "bad" installers (such the stuff Drew mentioned ;-> ).
    - Allow for a "sliding window", incl. read-ahead for "large" files, perhaps especially mapped files accessed sequentially (is currently the read-ahead thread of the cache manager even involved in memory-mapped access of files?).
    - Prefer discarding recently read data (in the cache) for a sequentially accessed file, over discarding any other data.
    - Prefer discarding R/O data segments over code, or the other way around.
    - Prefer keeping directory entries.
    - Use fast (!) compression for data scheduled to be put in the pagefile, over actually writing it to disk. I read some research done using Linux re. this, and IIRC they found data to be swapped out was often the data needed again very soon, while the data just read (that caused the swap out) was more unlikely to be needed again. I immediately considered LZO for its fast compression.

    Especially the compression idea has been haunting me for over half a decade now, since I know from experience it can help very much once you start to run out of RAM "just a little bit".


    (
    ) Had the NT kernel been open source (I mention this only for the technical/research/engineering ascpects - I don't have a political motive, even if it provides a "real" argument for a comment to jasonmatusow) it would have been possible to research, and share the experiences, of such changes between sites and usage scenarios way more diverse than Microsoft likely can consider.

    Even if such research and tweaking would likely initially only benefit the ones doing it, and the ones with similar scenarios, it could have displayed useful patterns - not to mention testing tools - that Microsoft and all its customers could benefit from (I can easily imagine an AppPatch flag, perhaps most suited for installers, saying "don't cache large files from this app, even if it's so dumb to request cached access" :-) ).

    Anyway, as such a scenario (open sourced NT kernel) seems unlikely, perhaps the ideas could serve as a seed for possible areas to look at in the cache manager for the people that do have access to it?

    Enough off-topic from me. Cheers.
  • Anonymous
    March 31, 2005
    The comment has been removed
  • Anonymous
    June 19, 2009
    PingBack from http://mydebtconsolidator.info/story.php?id=16709