NonPagedPool Depletion
I recently was engaged on an issue where a server was depleting NonPagedPool over a period of a few days. Ordinarily, we would just use a tool like PoolMon to identify the offending pool tag and then find the driver that uses that pool tag using the method in this article.
However, what made this case interesting was the pool tag and that we were unable to identify the driver using the normal methodology. You’ll see what I mean in a moment. The engineer supplied me with a kernel dump of the server while it was in-state and this is what I found.
Let’s start by taking a look at the virtual memory usage:
2: kd> !vm
*** Virtual Memory Usage ***
Physical Memory: 851420 ( 3405680 Kb)
Page File: \??\C:\pagefile.sys
Current: 3584000 Kb Free Space: 3568552 Kb
Minimum: 3584000 Kb Maximum: 3584000 Kb
Available Pages: 573277 ( 2293108 Kb)
ResAvail Pages: 800628 ( 3202512 Kb)
Locked IO Pages: 1067 ( 4268 Kb)
Free System PTEs: 25102 ( 100408 Kb)
Free NP PTEs: 335 ( 1340 Kb)
Free Special NP: 0 ( 0 Kb)
Modified Pages: 22 ( 88 Kb)
Modified PF Pages: 22 ( 88 Kb)
NonPagedPool Usage: 31369 ( 125476 Kb) ß Very high
NonPagedPool Max: 31986 ( 127944 Kb)
********** Excessive NonPaged Pool Usage *****
PagedPool 0 Usage: 19071 ( 76284 Kb)
PagedPool 1 Usage: 735 ( 2940 Kb)
PagedPool 2 Usage: 747 ( 2988 Kb)
PagedPool 3 Usage: 720 ( 2880 Kb)
PagedPool 4 Usage: 746 ( 2984 Kb)
PagedPool Usage: 22019 ( 88076 Kb)
PagedPool Maximum: 38912 ( 155648 Kb)
********** 3 pool allocations have failed **********
So we can see that NPP usage is very high given the server is using the /3GB switch which limits NPP to 128MB by default. We need to identify what pool tag is associated with the high NPP usage:
2: kd> !poolused /t2 2
Sorting by NonPaged Pool Consumed
Pool Used:
NonPaged Paged
Tag Allocs Used Allocs Used
None 246479 50827424 0 0 call to ExAllocatePool
MmCm 1198 18462512 0 0 Calls made to MmAllocateContiguousMemory , Binary: nt!mm
Interesting, so the offending tag is “None”. This means that these allocations were made by calling the function ExAllocatePool instead of ExAllocatePoolWithTag. ExAllocatePool is obsolete and should no longer be used.
Now, I need to find out which driver is calling this function. First, I need to know where ExAllocatePool lives:
2: kd> x nt!ExAllocatePool
e0894d1f nt!ExAllocatePool
Next, I need to search all the drivers to see which one is importing this function:
2: kd> !for_each_module s-d @#Base @#End e0894d1f
f50b8058 e0894d1f e0828e04 e089b708 e084011b .M..............
Hmm, looks suspiciously like an import table, let’s see:
2: kd> dps f50b8058
f50b8058 e0894d1f nt!ExAllocatePool
f50b805c e0828e04 nt!_wcsnicmp
f50b8060 e089b708 nt!ExFreePoolWithTag
f50b8064 e083e30a nt!KeInitializeEvent
<SNIP>
Yep, that’s an import table. You can also verify that this is the import table of a particular module by checking the header (!dh on the module’s base address and look for “Import Address Table Directory”).
As you can see, we have only one driver that imports ExAllocatePool. Let’s see which driver this is:
2: kd> !lmi f50b8058
Loaded Module Info: [f50b8058]
Module: {XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}
Base Address: f50b3000
Image Name: {XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}.sys
<SNIP>
I’ve removed the incriminating identifiers from module information displayed above to protect the guilty. It is interesting to note that the driver name was a GUID and that this driver did not exist on the disk. This was because the driver is dynamically created when its parent program loads.
The software package was removed and the server was happy again.
- David
Comments
- Anonymous
December 17, 2008
Hi! Thanks for very helpful information! I have a problem with non-paged memory leak on our production Win 2003 sp2 server. The matter is that non paged memory growths about three weeks to the highest value(~256mb).The examination of the Task Manager gives no handles or threads growth ,but the poolmon shows that the problem is in the constant growth of Diff and Bytes columns of Thre (nt!ps).For example :12.12.08 non-paged 166700, Diff110065 ,Bytes 68680560 and 15.12.08- non-paged 183820 ,Diff 137029,Bytes 85506096.I can use the debugger only in live mode localy,but what could I discover in the live mode(It’s the production server!). So could you give me an advice how to obtain the source of leaking? [ Hi Nikolay - I would use Poolmon to find the tag with the highest consumption of pool. Alternatively, you can use the “!Poolused” command in the debugger with the correct flags. After locating the leaking tag you should be able to follow http://blogs.msdn.com/ntdebugging/archive/2006/12/18/Understanding-Pool-Consumption-and-Event-ID_3A00_--2020-or-2019.aspx. If a debug is required after finding the leaking tag you can use the PoolHitTag method mentioned at http://msdn.microsoft.com/en-us/library/cc267847.aspx. I hope this helps.]