ISA Firewall Service Process (wspsrv.exe) high CPU utilization issue
1. Introduction
When dealing with ISA high CPU utilization where wspsrve.exe is the one consuming more resources, the first impression is that ISA is the culprit for that. There are some scenarios where this statement is true, such as this one that I documented last year. But there are other scenarios where this is not true, including scenarios where wspsrv.exe is crashing – not always is because of ISA itself. I tried to demystify this on this post that I wrote back in 2009.
This post is about a scenario where ISA Server was facing a 100% CPU utilization and wspsrv.exe process was consuming most part of that (around 93%). This was happening at least twice a day and the only way to get CPU get to a normal utilization was by restarting Firewall Service. Sounds like an ISA issue right? Not really and you will see.
2. Gathering Data
Use the same data gathering approach as the one listed in this post with an addition of getting a full memory dump using notmyfault tool, see KB972110 for more info.
3. Analyzing the Data
By reviewing perfmon data from the time that the issue was happen it was possible to see that wspsrv.exe was 88% average of CPU utilization, but notice that within this process there is no thread spending too much CPU cycles as shown below:
In order to better understand this issue we started to review the manual dump of wspsrv.exe process. By reviewing the user mode dump we were unable to determine what the root cause of the issue was. Most of the threads were waiting for something in Kernel mode as shown below:
> First step was to review the top threads that had calls in Kernel mode:
0:000> !runaway 7
User Mode Time
... [hiding the values as they are not relevant here]
...
...
Kernel Mode Time
Thread Time
69:1310 0 days 0:00:49.500
53:129c 0 days 0:00:46.812
104:113c 0 days 0:00:42.390
45:127c 0 days 0:00:40.875
78:688 0 days 0:00:40.734
94:4d0 0 days 0:00:38.890
162:18a8 0 days 0:00:38.500
166:18c8 0 days 0:00:38.093
108:1418 0 days 0:00:37.062
> By taking a closer look into the threads we see that some of them are either performing RPC or DNS such as the one below:
0:000> ~104kb
ChildEBP RetAddr Args to Child
WARNING: Stack unwind information not available. Following frames may be wrong.
28cada54 77c7feb0 28cada90 28cada74 77c80845 ntdll!KiFastSystemCallRet
28cada60 77c80845 28cada90 76ed42d8 28cade84 rpcrt4!I_RpcSendReceive+0x23
28cada74 77ce415a 28cadabc 24b86ff4 00000000 rpcrt4!NdrSendReceive+0x28
28cade64 76ed5049 76ed42d8 76ed421c 28cade84 rpcrt4!NdrClientCall2+0x1a8
28cade7c 76ed4f69 00000000 29e2f078 00000001 dnsapi!NetInfo_Copy+0x5c2
28caded8 76ed6f5d 29e2f078 00000001 00000000 dnsapi!NetInfo_Copy+0x4e2
28cadf0c 76ee9d0c 00000003 28cae1d8 00000001 dnsapi!DnsValidateName_W+0x31f
28cadf34 62ea2cc4 28cae1d8 00000001 00000000 dnsapi!DnsQuery_A+0x20
28cadf78 62ea30a2 28cae1d8 00000001 28cadfb8 msphlpr!ProxyDnsCacheInit+0x52f
28cadf98 62ea217c 28cae1d8 28cadfb8 62ea1f0f msphlpr!ProxyDnsCacheInit+0x90d
28cae2dc 64760c2a 021d7710 28caeedc 00000001 msphlpr!ProxyGetHostByName+0x26d
28cae764 64761d11 28caeed4 00000400 28caee74 W3Filter!DllUnregisterServer+0x42ca6
28cae9d4 647638dc 28caeed4 00000400 28caee74 W3Filter!DllUnregisterServer+0x43d8d
28caf534 647295c7 00000000 00000000 28cafe14 W3Filter!DllUnregisterServer+0x45958
28caf554 647262ef 021d7008 00000000 000003e5 W3Filter!DllUnregisterServer+0xb643
28caf574 647277f8 28cafe14 00000000 00000001 W3Filter!DllUnregisterServer+0x836b
28caf590 647281c5 28cafe14 28038ce0 28038728 W3Filter!DllUnregisterServer+0x9874
28cafe1c 64728ad0 000000a7 00000000 0000000c W3Filter!DllUnregisterServer+0xa241
28cafe90 0046051a 28038728 00000001 00000000 W3Filter!DllUnregisterServer+0xab4c
28cafefc 0046032d 000000a7 00000000 00000001 wspsrv!SSL_CREDENTIALS::operator=+0x2a870
28caff20 00460bea 2685d018 004602b1 28caff50 wspsrv!SSL_CREDENTIALS::operator=+0x2a683
28caff30 00460266 000000a7 00000000 00000001 wspsrv!SSL_CREDENTIALS::operator=+0x2af40
28caff50 0044b0ff 2685d0bc 000000a7 00000000 wspsrv!SSL_CREDENTIALS::operator=+0x2a5bc
28caff7c 0044c08e 2685d0bc 000000a7 00000000 wspsrv!SSL_CREDENTIALS::operator=+0x15455
28caffb8 77e6482f 0000004f 00000000 00000000 wspsrv!SSL_CREDENTIALS::operator=+0x163e4
28caffec 00000000 0044bf26 0000004f 00000000 kernel32!GetModuleHandleA+0xdf
It was time to review the kernel dump file and verify if we had more information there:
> In this case we first started reviewing if there was anything locked in kernel space:
1: kd> !locks
**** DUMP OF ALL RESOURCE OBJECTS ****
KD: Scanning for held locks............................................................
Resource @ 0x8a7f3008 Shared 1 owning threads
Contention Count = 856651
NumberOfExclusiveWaiters = 5
Threads: 84149020-01<*>
Threads Waiting On Exclusive Access:
84166a80 8961d620 84166750 89649240
841777a0
> Notice that we have 5 exclusive waiters wanting on a thread. Let’s see which thread is that:
1: kd> !thread 84149020
THREAD 84149020 Cid 1074.18a8 Teb: 7ff12000 Win32Thread: 00000000 READY
IRP List:
fe5eae28: (0006,01d8) Flags: 00000884 Mdl: 00000000
88bb1a68: (0006,01d8) Flags: 00000000 Mdl: 89128478
fee50ba8: (0006,01d8) Flags: 00000000 Mdl: 880eb6c8
f9d62af8: (0006,01d8) Flags: 00000000 Mdl: 887adee8
fa284008: (0006,01d8) Flags: 00000000 Mdl: 87fb7bb8
feb2dd80: (0006,01d8) Flags: 00000000 Mdl: 88bbdc48
ff8c95f0: (0006,01d8) Flags: 00000000 Mdl: 88bbe488
fac4b820: (0006,01d8) Flags: 00000000 Mdl: 8847d798
fea1d008: (0006,01d8) Flags: 00000000 Mdl: 89252968
ff097008: (0006,01d8) Flags: 00000000 Mdl: 8a5ee320
fe64b008: (0006,01d8) Flags: 00000000 Mdl: 88a114a0
feba1b30: (0006,01d8) Flags: 00000000 Mdl: 8975beb8
ff499550: (0006,01d8) Flags: 00000000 Mdl: 87cd53a8
ff5cf9f0: (0006,01d8) Flags: 00000000 Mdl: 8856dd18
fc26ec38: (0006,01d8) Flags: 00000000 Mdl: 883df0a8
Not impersonating
DeviceMap e1750820
Owning Process 89f4ead8 Image: wspsrv.exe
Wait Start TickCount 4518588 Ticks: 0
Context Switch Count 3718083
UserTime 00:01:15.546
KernelTime 00:01:02.906
Win32 Start Address 0x0044bf26
Start Address 0x77e617ec
Stack Init b7b39000 Current b7b38898 Base b7b39000 Limit b7b36000 Call 0
Priority 8 BasePriority 8 PriorityDecrement 0
ChildEBP RetAddr Args to Child
b7b388b0 8083d5b1 84149020 841490c8 00000000 nt!KiSwapContext+0x26 (FPO: [Uses EBP] [0,0,4])
b7b388dc 8083df9e 84149020 8a7f3008 00000000 nt!KiSwapThread+0x2e5 (FPO: [Non-Fpo])
b7b38924 8082e057 88f99718 0000001b 00000000 nt!KeWaitForSingleObject+0x346 (FPO: [Non-Fpo])
b7b3895c 80824ba8 89a0a400 00000000 8082c38e nt!ExpWaitForResource+0x30 (FPO: [Non-Fpo])
b7b3897c 8085110e 8a7f3008 00000001 b7b389d4 nt!ExAcquireResourceSharedLite+0xf5 (FPO: [Non-Fpo])
b7b3898c b9fb5c70 8a7f3008 89a0a400 00000000 nt!ExEnterCriticalRegionAndAcquireResourceShared+0x19 (FPO: [Non-Fpo])
b7b389d4 b9fb5b17 b7b38a18 b7b389ec 00000028 afd!AfdGetTransportInfo+0x1b (FPO: [Non-Fpo])
b7b389f0 b9fb5d9f b7b38a34 b7b38a18 00000000 afd!AfdAllocateEndpoint+0x1e (FPO: [Non-Fpo])
b7b38a38 b9fbf7a4 89b1e438 8a9a9f38 b7b38a5c afd!AfdCreate+0x13b (FPO: [Non-Fpo])
b7b38a48 80840153 8a9bc030 fe5eae28 fe5eae28 afd!AfdDispatch+0x79 (FPO: [Non-Fpo])
b7b38a5c 8092e87e b7b38c04 8a9bc018 00000000 nt!IofCallDriver+0x45 (FPO: [Non-Fpo])
b7b38b44 8092c3ea 8a9bc030 00000000 89062228 nt!IopParseDevice+0xa35 (FPO: [Non-Fpo])
b7b38bc4 8092d80d 00000000 b7b38c04 00000042 nt!ObpLookupObjectName+0x5b0 (FPO: [Non-Fpo])
b7b38c18 8092c6cd 00000000 00000000 dffa7c01 nt!ObOpenObjectByName+0xea (FPO: [Non-Fpo])
b7b38c94 80931d92 29b2f958 c0140000 29b2f90c nt!IopCreateFile+0x447 (FPO: [Non-Fpo])
b7b38cf0 8092fb28 29b2f958 c0140000 29b2f90c nt!IoCreateFile+0xa3 (FPO: [Non-Fpo])
b7b38d30 80833bef 29b2f958 c0140000 29b2f90c nt!NtCreateFile+0x30 (FPO: [Non-Fpo])
b7b38d30 7c82860c 29b2f958 c0140000 29b2f90c nt!KiFastCallEntry+0xfc (FPO: [0,0] TrapFrame @ b7b38d64)
WARNING: Frame IP not in any known module. Following frames may be wrong.
29b2f9ec 00000000 00000000 00000000 00000000 0x7c82860c
> This stack has a very good amount of IRPs (I/O Request Packet), let’s see if there is any pending IRP:
1: kd> !irp 88bb1a68
Irp is active with 4 stacks 2 is current (= 0x88bb1afc)
Mdl=89128478: No System Buffer: Thread 84149020: Irp stack trace.
cmd flg cl Device File Completion-Context
[ 0, 0] 0 0 00000000 00000000 00000000-00000000
Args: 00000000 00000000 00000000 00000000
>[ f, 8] 0 e1 8a94ddf0 870404b0 b9e964d0-870f66c8 Success Error Cancel pending
\Driver\Tcpip*** ERROR: Module load completed but symbols could not be loaded for idmtdi.sys
idmtdi
Args: 0000127b 00010020 00000000 00000000
[ f, 8] 0 e0 8a85d020 870404b0 b9fc1b72-833955a0 Success Error Cancel
\Driver\IDMTDI afd!AfdRestartBufferReceiveWithUserIrp
Args: 0000127b 00010020 00000000 00000000
[ e, 5] 5 0 8a9bc030 8583a190 00000000-00000000
\Driver\AFD
Args: 0000127b 00000000 20000020 00000000
> Notice that there is a third party transport driver interface (idmtdi.sys) loaded within this IRP and it is pending.
4. Conclusion
As you could see not always the user mode dump will give us the definitive answer for a problem, mainly when most of the threads on user space are waiting for something to happen on kernel space. In this particular case, when reading the kernel dump, we used the !locks command, which displays all locks held on resources by threads. A lock can be shared or exclusive, which means no other threads can gain access to that resource. In this case we saw that we had five threads waiting for exclusive access. The threads that were waiting for exclusive access were part of the wspsrv.exe.
What we can conclude here is that we have a condition where wspsrv.exe was allocating more and more threads without releasing resources (releasing threads) because it was getting stuck waiting for an answer from kernel driver. Some threads were working on I/O, mainly socket and LRPC call (almost socket) that run in Kernel mode. The third party driver is causing every AFD call to take more time and more CPU cycles, the fact that ISA is extremely using sockets amplify the problem. The third party kernel driver was uninstalled from ISA Server and the issue was resolved.
Another takeaway from this particular scenario is that not always that wsprsv.exe is using lot of CPU resources is because it’s an ISA issue. Keep that in mind.
Author
Gabriel Koren
Sr. Support Escalation Engineer
Microsoft CSS Forefront Edge Team
Technical Reviewer
Franck Heilmann
Sr. Escalation Engineer
Microsoft CSS Forefront Edge Team
Comments
Anonymous
January 01, 2003
I understand your point, however in not always that we have a wait condition CPU will be zero, it depends how the process that is consuming CPU works in order to perform other actions (allocating new resources) while other threads are waiting for other things to happen.Anonymous
January 01, 2003
I understand your point, however in not always that we have a wait condition CPU will be zero, it depends how the process that is consuming CPU works in order to perform other actions (allocating new resources) while other threads are waiting for other things to happen.Anonymous
January 26, 2011
The comment has been removed