Troubleshooting Pool Leaks Part 3 – Debugging

In our previous articles we discussed identifying a pool leak with perfmon, and narrowing the source of the leak with poolmon.  These tools are often preferred because they are easy to use, provide verbose information, and can be run on a system without forcing downtime.  However, it is not always possible to get perfmon and poolmon data.  If a system is experiencing poor performance you may have a business need to get the system up and running as quickly as possible without allowing time to troubleshoot.  It is also possible to completely exhaust memory through a pool leak, leaving the system in a state where tools such as perfmon and poolmon will not work.  In these scenarios it may be possible to troubleshoot the poor performance by forcing a bugcheck, gathering a memory dump, and performing a post mortem analysis.

 

Although a dump is not the ideal data to troubleshoot a leak, it can be done.  I say less than ideal because a dump is a snapshot of the system memory, and does not provide the historical data which perfmon would provide.  The lack of historical data makes it difficult to differentiate between high memory usage due to load and high memory usage due to a leak.  It is up to you, as the troubleshooter, to determine if the dump is sufficient evidence of a leak.  Sometimes identifying the tag and the driver will help you identify a known issue that causes a leak, or your knowledge of the driver architecture may allow you to determine if the memory usage is normal or not.  In some scenarios you may decide to start monitoring with perfmon and collect additional data for a future occurrence.

 

The first step to debug a pool leak using a dump is to load the dump in windbg, set the symbol path, and reload symbols.

 

1: kd> .symfix c:\symbols

1: kd> .reload

Loading Kernel Symbols

...............................................................

................................................................

.....

 

The !vm command will show memory utilization, the 1 flag will limit the verbosity of this command.  For the scenario of a pool leak, the significant values are “NonPagedPool Usage:” and “PagedPool Usage:”.  If the debugger identifies a value that is out of the normal range it will flag it, and we can see here that the debugger has flagged excessive nonpaged pool usage.  This is similar to the information we obtained in Part 1 using perfmon, but unlike perfmon we do not have trend data to indicate if this is temporary high pool usage due to load or if this is a leak.

 

1: kd> !vm 1

 

*** Virtual Memory Usage ***

        Physical Memory:      403854 (   1615416 Kb)

        Page File: \??\C:\pagefile.sys

          Current:   1048576 Kb  Free Space:   1015644 Kb

          Minimum:   1048576 Kb  Maximum:      4194304 Kb

        Available Pages:      106778 (    427112 Kb)

        ResAvail Pages:       225678 (    902712 Kb)

        Locked IO Pages:           0 (         0 Kb)

        Free System PTEs:   33533355 ( 134133420 Kb)

        Modified Pages:         4844 (     19376 Kb)

        Modified PF Pages:      4838 (     19352 Kb)

        NonPagedPool Usage: 155371 ( 621484 Kb)

        NonPagedPool Max:     191078 (    764312 Kb)

        ********** Excessive NonPaged Pool Usage *****

        PagedPool 0 Usage:     27618 (    110472 Kb)

        PagedPool 1 Usage:      3848 (     15392 Kb)

        PagedPool 2 Usage:       299 (      1196 Kb)

        PagedPool 3 Usage:       283 (      1132 Kb)

        PagedPool 4 Usage:       344 (      1376 Kb)

        PagedPool Usage: 32392 ( 129568 Kb)

        PagedPool Maximum:  33554432 ( 134217728 Kb)

        Session Commit:         7764 (     31056 Kb)

        Shared Commit:          6371 (     25484 Kb)

        Special Pool:              0 (         0 Kb)

        Shared Process:         5471 (     21884 Kb)

        PagedPool Commit:      32394 (    129576 Kb)

        Driver Commit:          2458 (      9832 Kb)

        Committed pages:      326464 (   1305856 Kb)

        Commit limit:         665998 (   2663992 Kb)

 

The debugger can parse the pool tagging database and present similar information as poolmon.  The !poolused command will do this, the /t5 option will limit output to the top 5 consumers, the 2 flag will sort by nonpaged pool usage (use the 4 flag if your leak is in paged pool).

 

1: kd> !poolused /t5 2

..

Sorting by NonPaged Pool Consumed

 

               NonPaged                  Paged

Tag     Allocs         Used     Allocs         Used

 

Leak 601 615424000          0            0    UNKNOWN pooltag 'Leak', please update pooltag.txt

Pool         6      1717840          0            0    Pool tables, etc.

nVsC       664      1531552          0            0    UNKNOWN pooltag 'nVsC', please update pooltag.txt

netv      4369      1172224          1          144    UNKNOWN pooltag 'netv', please update pooltag.txt

Thre       607       774048          0            0    Thread objects , Binary: nt!ps

 

TOTAL     43424    634209952      63565    126487760

 

The above output shows that the tag “Leak” is associated with almost all of the nonpaged pool usage.  This is the same information we obtained in Part 2 using poolmon.

 

Now we must identify what drivers use the pool tag “Leak”. Because we have a snapshot of the system memory we can search the dump for this tag.  We can match each address to a module using the command lm a.

 

1: kd> !for_each_module s -a @#Base @#End "Leak"

fffff880`044b63aa  4c 65 61 6b 3b c1 0f 42-c1 41 8d 49 fd 8b d0 ff  Leak;..B.A.I....

fffff880`044b6621  4c 65 61 6b 3b c1 0f 42-c1 33 c9 8b d0 ff 15 cc  Leak;..B.3......

1: kd> lm a fffff880`044b63aa

start             end                 module name

fffff880`044b5000 fffff880`044bc000   myfault    (no symbols)

1: kd> lm a fffff880`044b6621

start             end                 module name

fffff880`044b5000 fffff880`044bc000   myfault    (no symbols)

 

The tag and driver name can be used to search the internet for known problems.  If a known issue is found a driver update may be available, and installing this update may prevent a future memory leak.

 

If there are no updates available for the driver, or if this is your driver and you need to identify the cause of the leak, don’t panic.  In future articles we will show techniques for getting call stacks of pool allocations, these call stacks be used to identify under what conditions the driver leaks memory.

Comments

  • Anonymous
    August 31, 2012
    There may be also scenarios where you don't have system prepared for manual memory dump capture and you need to collect the dump file. In cases like this, LiveKd from Sysinternals is a great tool of choice. The latest "mirror dump" (-m) parameter for Vista and above is simply awesome as it solves some drawbacks of using this tool (data consistency etc.). Regarding !vm in windbg and the "excessive nonpaged pool usage" - there is one known bug in kdexts.dll (support.microsoft.com/.../2509968) which causes the extension to report some non-realistic numbers. This may lead to false positives for users with little debugging/memory management experience.

  • Anonymous
    April 27, 2015
    Great article - to the point. Helped me a lot in debugging a buggy pool allocation.