How to identify a driver that calls a Windows API leading to a pool leak on behalf of NT Kernel?
Hello my name is Gurpreet Singh Jutla and I would like to share information on how we can trace the caller which ends up allocating “Se “ Pool tag. When we use the Windows debugger and investigate the pool allocation and the binary associated with this pool tag, we see NT Kernel responsible for the allocations. But is the NT Kernel really responsible for a pool leak associated with this pool tag?
Issue at hand
- On windows 2003 x86 we see that the paged pool has depleted and we are running into event id 333.
- We can see the same behavior on later versions of the OS and even on x64
- We see that the leaking pool tag is “Se “ which is the pool tag for security objects. Is Microsoft component at fault or something is calling an API and using security objects on a large scale?
- This is windows 2003 x86 and we have limited options to root cause the issue. We need to really find out why we end up having so many allocations for this tag.
- We could enable Pool Tracking on the NT Kernel, but would that help?
Step 1
!vm 1 -> This tells us if there were any page pool allocation failures
0: kd> !vm 1
*** Virtual Memory Usage ***
Physical Memory: 2096922 ( 8387688 Kb)
Page File: \??\D:\pagefile.sys
Current: 16779264 Kb Free Space: 16552492 Kb
Minimum: 16779264 Kb Maximum: 16779264 Kb
Available Pages: 1607242 ( 6428968 Kb)
ResAvail Pages: 1991659 ( 7966636 Kb)
Locked IO Pages: 656 ( 2624 Kb)
Free System PTEs: 163671 ( 654684 Kb)
Free NP PTEs: 32766 ( 131064 Kb)
Free Special NP: 0 ( 0 Kb)
Modified Pages: 10775 ( 43100 Kb)
Modified PF Pages: 10728 ( 42912 Kb)
NonPagedPool Usage: 7881 ( 31524 Kb)
NonPagedPool Max: 65279 ( 261116 Kb)
PagedPool 0 Usage: 67074 ( 268296 Kb)
PagedPool 1 Usage: 3266 ( 13064 Kb)
PagedPool 2 Usage: 3282 ( 13128 Kb)
PagedPool 3 Usage: 3268 ( 13072 Kb)
PagedPool 4 Usage: 3214 ( 12856 Kb)
PagedPool Usage: 80104 ( 320416 Kb)
PagedPool Maximum: 134144 ( 536576 Kb)
Session Commit: 14832 ( 59328 Kb)
Shared Commit: 19969 ( 79876 Kb)
Special Pool: 0 ( 0 Kb)
Shared Process: 19362 ( 77448 Kb)
Pages For MDLs: 146 ( 584 Kb)
PagedPool Commit: 80140 ( 320560 Kb)
Driver Commit: 1602 ( 6408 Kb)
Committed pages: 520485 ( 2081940 Kb)
Commit limit: 6241313 ( 24965252 Kb)
!poolused /t10 4 -> This will list the top consumers of paged pool. It is “Se “ in our case.
0: kd> !poolused /t10 4
....
Sorting by Paged Pool Consumed
NonPaged Paged
Tag Allocs Used Allocs Used
Se 0 0 172204 232720312 General security allocations , Binary: nt!se
MmSt 0 0 15231 31835696 Mm section object prototype ptes , Binary: nt!mm
Ntff 9 1872 10434 8514144 FCB_DATA , Binary: ntfs.sys
WD 384 1251376 27 5591040 UNKNOWN pooltag 'WD ', please update pooltag.txt
UlHT 0 0 1 4198400 Hash Table , Binary: http.sys
NtfF 0 0 3259 3050424 FCB_INDEX , Binary: ntfs.sys
Toke 0 0 966 3034216 Token objects , Binary: nt!se
NtFs 13717 551176 19678 1758008 StrucSup.c , Binary: ntfs.sys
IoNm 0 0 12034 1737360 Io parsing names , Binary: nt!io
FSim 0 0 11336 1451008 File System Run Time Mcb Initial Mapping
CM16 0 0 293 1437696 Internal Configuration manager allocations
Wmit 6 11392 23 1376912 Wmi Trace
NtFU 0 0 8719 1237232 usnsup.c , Binary: ntfs.sys
Obtb 0 0 397 995600 object tables via EX handle.c , Binary: nt!ob
Key 0 0 7245 753432 Key objects
MmSm 0 0 11393 729152 segments used to map data files , Binary: nt!mm
TOTAL 192495 33751032 347461 310172416
Step 2
Once we know the pool tag we need to run the following command. Remember the pool tag has to be a 4 char case sensitive string so I would run the following command which tells me in which all modules we have the Pool tag “Se “ used. Please note that I have added two spaces in the string during my search. This is because the !poolused shows Se tag has caused the issue. Not providing the spaces in the search string will give different results. The following command searches each module that is loaded on the system.
!for_each_module s -a @#Base @#End "Se "
The result should be something like:
0: kd> !for_each_module s -a @#Base @#End "Se "
8096a1ae 53 65 20 20 6a 0c 6a 01-e8 bd a9 f2 ff 3b c7 89 Se j.j......;..
8096a5e3 53 65 20 20 6a 0c 6a 01-e8 88 a5 f2 ff 3b c7 89 Se j.j......;..
8096af9e 53 65 20 20 50 6a 01 89-45 fc e8 cb 9b f2 ff 8b Se Pj..E.......
8096c909 53 65 20 20 8d 1c 9d 10-00 00 00 53 6a 01 e8 5c Se .......Sj..\
8096c9e3 53 65 20 20 53 6a 01 e8-89 81 f2 ff 8b f8 85 ff Se Sj..........
8096ca3e 53 65 20 20 8d 04 85 10-00 00 00 50 6a 01 e8 27 Se .......Pj..'
8096caf2 53 65 20 20 8d 1c 9d 0c-00 00 00 53 6a 01 e8 73 Se .......Sj..s
8096cb3e 53 65 20 20 8d 1c 9d 0c-00 00 00 53 6a 01 e8 27 Se .......Sj..'
8096cb9d 53 65 20 20 56 6a 01 e8-cf 7f f2 ff 8b f0 3b f7 Se Vj........;.
8096cc44 53 65 20 20 6a 10 6a 01-e8 27 7f f2 ff 85 c0 0f Se j.j..'......
8096cc65 53 65 20 20 6a 04 6a 01-e8 06 7f f2 ff 85 c0 0f Se j.j.........
8096cc96 53 65 20 20 6a 04 6a 01-e8 d5 7e f2 ff 85 c0 0f Se j.j...~.....
8096ccbe 53 65 20 20 6a 38 6a 01-e8 ad 7e f2 ff 8b f8 85 Se j8j...~.....
809718ac 53 65 20 20 74 42 8b 46-68 3b c3 74 0f 53 50 e8 Se tB.Fh;.t.SP.
80971b48 53 65 20 20 74 42 8b 46-68 3b c3 74 0f 53 50 e8 Se tB.Fh;.t.SP.
8097372b 53 65 20 20 6a 0c 6a 01-e8 40 14 f2 ff 85 c0 75 Se j.j..@.....u
80976c38 53 65 20 20 6a 0c 6a 01-e8 33 df f1 ff 3b c6 89 Se j.j..3...;..
80a20d1b 53 65 20 20 bf 00 01 00-00 57 56 e8 4d 3e e7 ff Se .....WV.M>..
80a22698 53 65 20 20 c7 45 80 4f-62 43 6c c7 45 84 43 63 Se .E.ObCl.E.Cc
The above steps are also explained in detail at https://blogs.msdn.com/b/ntdebugging/archive/2012/08/31/troubleshooting-pool-leaks-part-3-debugging.aspx
Step 3
Now the tough part begins. We need to run “ln” command on each of the addresses shown by the output above. Example
0: kd> ln 8096c909
(8096c8cc) nt!SeQueryInformationToken+0x3d | (8096cdc0) nt!SeCaptureObjectTypeList
See the highlighted above. It is the nearest API to that address. Search on MSDN for all API’s listed. The ones that do not have an MSDN article can likely be ignored. SEQueryInformationToken is publically available and hence callable by any loaded driver. Example https://msdn.microsoft.com/en-in/library/windows/hardware/ff556690(v=vs.85).aspx
At this point we will make the assumption that it is a 3rd party driver making the calls to this Windows API. It is, of course, possible that a Microsoft module is doing this, but if such a problem exists in our code it usually becomes apparent very quickly when we get flooded by support calls.
Step 4
Method 1
We can use the “!for_each_module” debugger command to search the address space of each loaded module for the entry point of the function found in step 3.
0: kd> !for_each_module s-d @#Base @#End 8096c8cc
b94b5938 8096c8cc 8096000a 80960182 80894b78 ............xK..
ba7d50f4 8096c8cc 8081e5e0 8096b42e 8096b40c ................
ba83624c 8096c8cc 8081757e 8082b4f2 8094d438 ....~u......8...
bab8103c 8096c8cc 80960122 80960158 8096010a ...."...X.......
f74e2950 8096c8cc 8089e180 8096091a 80960638 ............8...
f74f0b68 8096c8cc 8089e180 80959546 80884150 ........F...PA..
f75cf094 8096c8cc 80960122 80960158 8096010a ...."...X.......
f797c110 8096c8cc 808847f0 80966c60 80894b78 .....G..`l..xK..
f7b643c4 8096c8cc 80960054 809e437c 808eadc6 ....T...|C......
The first column in the above output can then be passed to the “ln” command to get the nearest function to the address listed.
0: kd> ln b94b5938
*** ERROR: Module load completed but symbols could not be loaded for ABCMiniFilter.sys
…
…
0: kd> ln f797c110
*** ERROR: Module load completed but symbols could not be loaded for XVhdBusxxx.sys
Method 2
This is a bit more complicated. Run the lm command to dump all modules. You will get the output as follows
0: kd> lm
start end module name
80800000 80a5d000 nt
80a5d000 80a89000 hal
b476c000 b4775000 asyncmac (deferred)
b7f55000 b7fb6000 ABCD (deferred)
b817b000 b81da000 eeCtrl (deferred)
b82b6000 b82b9300 xyz (deferred)
b87a2000 b87cc000 Fastfat (deferred)
b8f04000 b8f0de80 ABCMiniFilter (deferred)
<snip>
Ignore the unloaded modules and the modules you know are Microsoft binaries. For the rest of them you need to run the following command on each module. Remember the first column is the start of the module and the second column is the end of module.
Use the !dh command for each header one by one and you will see the header information for the module. I just picked one module for reference you will need to run the command for each third party module.
0: kd> !dh b8f04000 -f
File Type: EXECUTABLE IMAGE
FILE HEADER VALUES
14C machine (i386)
7 number of sections
51F7D252 time date stamp Tue Jul 30 20:18:50 2013
0 file pointer to symbol table
0 number of symbols
E0 size of optional header
102 characteristics
Executable
32 bit word machine
OPTIONAL HEADER VALUES
10B magic #
9.00 linker version
8800 size of code
1200 size of initialized data
0 size of uninitialized data
86B2 address of entry point
480 base of code
----- new -----
00010000 image base
80 section alignment
80 file alignment
1 subsystem (Native)
6.01 operating system version
6.01 image version
5.01 subsystem version
9E80 size of image
480 size of headers
11E27 checksum
00040000 size of stack reserve
00001000 size of stack commit
00100000 size of heap reserve
00001000 size of heap commit
0 DLL characteristics
0 [ 0] address [size] of Export Directory
8A28 [ 50] address [size] of Import Directory
9180 [ 498] address [size] of Resource Directory
0 [ 0] address [size] of Exception Directory
9E80 [ 1FD0] address [size] of Security Directory
9680 [ 6C0] address [size] of Base Relocation Directory
7990 [ 1C] address [size] of Debug Directory
0 [ 0] address [size] of Description Directory
0 [ 0] address [size] of Special Directory
0 [ 0] address [size] of Thread Storage Directory
7A68 [ 40] address [size] of Load Configuration Directory
0 [ 0] address [size] of Bound Import Directory
7880 [ 108] address [size] of Import Address Table Directory
0 [ 0] address [size] of Delay Import Directory
0 [ 0] address [size] of COR20 Header Directory
0 [ 0] address [size] of Reserved Directory
The highlighted line is important for us. Remember this is the offset of the start of Import Address Table directory and the end of the same. In our case 7880 offset is start and 108 offset is the end. We run the following command to dump the import table. Running the dps command from the start offset in the module till the end of the import table will list all the functions it imports
0: kd> dps b8f04000+7880 b8f04000+7880+108
b8f0b880 f7873892 fltMgr!FltGetStreamHandleContext
b8f0b884 f7873988 fltMgr!FltAllocateContext
b8f0b888 f787459c fltMgr!FltSetStreamHandleContext
<snip>
b8f0b938 8096c8cc nt!SeQueryInformationToken
b8f0b93c 8096000a nt!RtlValidSid
<snip>
You see the module we searched in will have the API in its import table. Running the above command on each module, dumping the import tables will help us identify which modules have the API in its import table. However it cannot be confirmed if the driver calls the API and causes the leak. This is really tedious and time consuming but will help us identify each binary that calls the above API which leads to pool tag “Se “ allocation on behalf of the NT Kernel.
Conclusion
We can disable the binaries that call the SEQueryInformationToken API one by one and see if the issue still persists. Please note that the boot/System drivers should not be disabled as it can lead to a no boot situation. You can use Autoruns from Sysinternals to disable these drivers and if you run into a no boot situation, boot back into the “last known good configuration” to help recover the box.
Comments
- Anonymous
October 03, 2014
Very good stuff. I especially like the things you show in "Method 2" Thanks, and keep it coming!