Yes, We Recycle!...That Includes AppPools
And I am not just talking about Aluminum and Polystyrene. Application Pool recycling is one great feature of IIS6 but how do you know if it is helping you or hurting you? It’s great for availability and reliability but it can also be masking real problems with code running in your applications. If you are recycling too quickly based on memory limits you may have a memory leak that should be looked at, if you are recycling due to ping fails or other errors, you may have a more serious issue to look at.
We monitor our IIS worker process uptime to measure general health of our applications and make capacity planning decisions (how many apps per app pool, etc). Process startup and .Net framework initialization does have a performance cost, If you are recycling too frequently, this can affect your web servers throughput. We’ve had our share of misbehaving applications that have recycled as frequently as every 5 minutes or just flat out crashed the worker process. You can set Application pools to recycle based on time, number of requests, virtual and private memory limits. Application pools may also recycle due to health reasons, Ping Fail, Fatal Communications Error, etc., or there’s always the admin recycle or the config change recycle. We have many applications (sometimes hundreds) spread out across several app pools. We use our WP uptime data to check if we have a misbehaving application out there (maybe someone published some bad code) or make decisions on how many app pools we can run, which applications to put into separate app pools and how many applications an App Pool should have.
How do you verify and monitor that is really happening?
Well, the first thing you have to do is log all recycle events before you can start collecting and analyzing the data. To enable recycle event logging, if you haven’t already:
cscript adsutil.vbs Set w3svc/AppPools/LogEventOnRecycle 255
This enables all recycle events for all AppPools to be logged in the system event log.
More information on Application Pool recycling events and logging is available in the following KB article:
332088 How to modify Application Pool Recycling events in IIS 6.0
https://support.microsoft.com/default.aspx?scid=kb;EN-US;332088
(note: there may be an error in the above article, I use event ID 1117 for private memory)
Once you have logging enabled you can start collecting this info and creating baselines for your application pools. One of our favorite tools, Log Parser, is great for collecting this data.
For more information on Log Parser:
https://www.microsoft.com/technet/scriptcenter/tools/logparser/default.mspx
You can use a simple log parser query to get the recycle events:
Logparser "Select top 100 to_string(TimeGenerated, 'MM/dd/yyyy hh:mm:ss') as dateTime from \\SERVERNAME\System where SourceName = 'W3SVC' and EventID in (1009;1010;1011;1074;1077;1078;1079;1080;1117) and Message like '%DefaultAppPool%'"
This query will give all the recycle and process failure events for the application pool named like defaultapppool. I’ve included a few other events in this query that aren’t necessarily recycle events but worker process failure events that do affect the uptime of your worker process.
Here is a list of events that I am interested in:
1009 = Worker process Terminated UnExpectedly
1010 = Worker process failed a Ping
1011 = Worker process suffered a Fatal Comummincations Error
1074 = Recycled based on the defined Time Limit
1077 = Recycled based on the Virtual Mem Limit
1078 = An ISAPI reported unhealthy
1079 = An Admin requested a recycle
1080 = A Config Change required a recycle
1117 = Recycled based on Memory limit
That’s a good start but I want to see why we recycled as well, so this query is a little better:
Select top 100 to_string(TimeGenerated, 'MM/dd/yyyy hh:mm:ss') as dateTime,
case
EventID
when 1009 then 'UnExpEnd'
when 1010 then 'PingFail'
when 1011 then 'FatalComErr'
when 1074 then 'TimeLimit'
when 1077 then 'VMem'
when 1078 then 'ISAPIUnHealth'
when 1079 then 'Admin'
when 1080 then 'ConfigChange'
when 1117 then 'PMem'
end as Reason
from \\SERVERNAME\System
where SourceName in ('W3SVC';'WAS') and
EventID in (1117;1080;1079;1078;1077;1074;1011;1010;1009) and
Message like '%defaultapppool%' and
TimeGenerated > to_timestamp('01-01-2006 00:01:01','yyyy-MM-dd hh:mm:ss')
Save this as wpuptime.sql and run “logparser file:wpuptime.sql” and this will give you an ouput like:
dateTime Reason
------------------- ------------
09/23/2006 01:18:51 ConfigChange
09/23/2006 01:19:52 VMem
09/23/2006 01:20:04 ConfigChange
09/28/2006 09:28:03 Admin
10/04/2006 18:24:16 ConfigChange
10/05/2006 07:50:02 PingFail
10/05/2006 08:21:26 PingFail
That’s better, I can see event date/time and reason but I still want more info, like what is the average up time for the app pool that I am interested in? What was the shortest recycle time, what was the longest, what are the count of each events? To go further with this, I turn to vbscript and invoke the log parser COM object. To use the log parser COM object you must register the DLL on the machine where you will run the script.
Regsvr32 logparser.dll
Here’s an example script that uses the above query and calculates the info, it doesn’t have much error checking and output isn’t exactly pretty, but I did say it was an example right?
' WPUPTIME.VBS - Queries event logs for W3SVC Recycle events (last 100) and calculates up time stats between events
' Requires LogParser.dll to be registered
' 1009 = TermUnExp
' 1010 = PingFail
' 1011 = FatalComErr
' 1074 = TimeLimit
' 1077 = VMem
' 1078 = ISAPIUnhealthy
' 1079 = Admin
' 1080 = ConfigChange
' 1117 = PMem
'
Dim cConfChg, cAdmin, cISAPI, cVMem, cPMem, cTimeLimit, cFatalComErr, cPingFail, cTermUnExp
Dim myQuery
Dim myInputFormat
Dim MinDate
Dim recordSet
Dim lastD, curD, oldD, firstD
Dim uptime_avg, uptime_mins, uptime_mins_last
Dim reccount
Dim timetotal
Dim MinTime
Dim MaxTime
Dim strReason, strReasonLast
Dim strComputer
Dim strAppPool
Dim szQuery
WScript.Echo
ParseCommandLine()
MinDate = CStr("01-01-2006 00:01:01")
Set myQuery = CreateObject("MSUtil.LogQuery")
Set myInputFormat = CreateObject("MSUtil.LogQuery.EventLogInputFormat")
myInputFormat.direction = "BW"
Call QueryServer(strComputer, strAppPool)
CalcUpTime()
Wscript.Echo strComputer, "," , strAppPool, "Avg Uptime:", uptime_avg, "minutes"
WScript.Echo
WScript.Echo "Count of Recycle Events"
WScript.Echo "VirtualMem:" + Chr(9)+ CStr(cVMem)
WScript.Echo "PrivateMem:" + Chr(9) + CStr(cPMem)
WScript.Echo "TimeLimit:" + Chr(9) + CStr(cTimeLimit)
WScript.Echo "PingFail:" + Chr(9) + CStr(cPingFail)
WScript.Echo "Admin:" + Chr(9)+ Chr(9) + CStr(cAdmin)
WScript.Echo "FatalComErr" + Chr(9) + CStr(cFatalComErr)
WScript.Echo "TermUnExp" + Chr(9) + CStr(cTermUnExp)
WScript.Echo "ConfChg" + Chr(9)+ Chr(9) + CStr(cConfChg)
WScript.Echo "ISAPI" + Chr(9) + Chr(9) + CStr(cISAPI)
WScript.Echo
Set myInputFormat = Nothing
Set myQuery = Nothing
set Locator = Nothing
WScript.Quit(0)
Sub CalcUpTime()
WScript.Echo("RecycleTime Uptime(mins) Reason")
reccount = 0
timetotal = 0
MinTime = 1000
MaxTime = 0
cConfChg = 0
cAdmin = 0
cISAPI = 0
cVMem = 0
cPMem = 0
cTimeLimit = 0
cFatalComErr = 0
cPingFail = 0
cTermUnExp = 0
Do While recordSet.atEnd() <> True
Set record = recordSet.getRecord()
curD = CDate(record.GetValue(0))
strReason= CStr(record.GetValue(1))
If reccount <> 0 then
uptime_mins = DateDiff("n", curD, lastD)
WScript.Echo lastD, " ", uptime_mins, strReasonLast
timetotal = timetotal + uptime_mins
if MinTime > uptime_mins Then
MinTime = uptime_mins
End if
if MaxTime < uptime_mins Then
MaxTime = uptime_mins
End if
Else
firstD = CurD
End IF
Select Case strReason
Case "VMem" cVMem = cVMem + 1
Case "PMem" cPMem = cPMem + 1
Case "Admin" cAdmin = cAdmin + 1
Case "ISAPIUnHealth" cISAPI = cISAPI + 1
Case "TimeLimit" cTimeLimit = cTimeLimit + 1
Case "PingFail" cPingFail = cPingFail + 1
Case "FatalComErr" cFatalComErr = cFatalComErr + 1
Case "UnExpEnd" cTermUnExp = cTermUnExp + 1
Case "ConfigChange" cConfChg = cConfChg + 1
End Select
lastD=curD
reccount = reccount + 1
strReasonLast = strReason
uptime_mins_last = uptime_mins
recordSet.moveNext()
If Err.number <> 0 Then
WScript.Echo "Error in CalcUpTime: ", Err.Description
Err.Clear
Exit Do
End If
Loop
oldD = lastD
uptime_avg = Round(timetotal / (reccount-1))
WScript.Echo ""
WScript.Echo "Time Span: " + CStr(oldD) + " through " + CStr(firstD )
WScript.Echo "MinTime:", MinTime
WScript.Echo "MaxTime:", MaxTime
WScript.Echo ""
End Sub
Sub QueryServer(tmpComp, tmpAppPool)
szQuery = "Select top 100 to_string(TimeGenerated, 'MM/dd/yyyy hh:mm:ss') as dateTime, case EventID when 1077 then 'VMem' when 1117 then 'PMem' when 1079 then 'Admin' when 1078 then 'ISAPIUnHealth' when 1074 then 'TimeLimit' when 1010 then 'PingFail' when 1011 then 'FatalComErr' when 1009 then 'UnExpEnd' when 1080 then 'ConfigChange' end as Reason" + _
" from \\" + tmpComp + "\System where SourceName in ('W3SVC';'WAS') and EventID in (1117;1080;1079;1078;1077;1074;1011;1010;1009) and Message like '%"+tmpAppPool+ _
"%' and TimeGenerated > to_timestamp('"+MinDate+"','yyyy-MM-dd hh:mm:ss')"
WScript.Echo "Querying", tmpComp, tmpAppPool, " events"
WScript.Echo
Set recordSet = myQuery.Execute(szQuery, myInputFormat)
If Err.number <> 0 then
WScript.Echo "Could not execute query in QueryServer: ", Err.Description
End if
End Sub
Sub ParseCommandLine()
Dim vArgs
set vArgs = WScript.Arguments
if vArgs.Count <> 2 then
DisplayUsage()
Else
strComputer = vArgs(0)
strAppPool = vArgs(1)
End if
End Sub
Sub DisplayUsage()
WScript.Echo "Usage: cscript.exe " & WScript.ScriptName & " ServerName AppPoolName" & vbLF & vbLF & _
"Example: " & vbLF & _
WScript.ScriptName & " MyServer DefaultAppPool" & vbLF
WScript.Echo
WScript.Quit(0)
End Sub
The above script takes two parameters to run – the servername and the app pool name to match. The query uses a LIKE comparison, so make sure your app pool name given is unique enough to get correct data. Here is the sample outoput (abbreviated):
RecycleTime Uptime(mins) Reason
10/18/2006 1:49:40 PM 304 PMem
10/18/2006 8:45:56 AM 535 PMem
10/17/2006 11:50:04 PM 592 PMem
10/17/2006 1:58:28 PM 842 PMem
10/16/2006 11:56:41 PM 14973 Admin
…
… (cut for brevity)
…
9/27/2006 9:06:14 AM 14 VMem
9/27/2006 8:52:02 AM 96 VMem
Time Span: 9/27/2006 7:16:08 AM through 10/18/2006 1:49:40 PM
MinTime: 1
MaxTime: 14973
MyServer , MyAppPool Avg Uptime: 309 minutes
Count of Recycle Events
VirtualMem: 81
PrivateMem: 6
TimeLimit: 0
PingFail: 11
Admin: 1
FatalComErr 0
TermUnExp 0
ConfChg 1
ISAPI 0
It prints out the date/time of the event, minutes elapsed since the previous event and the reason. Followed by the time span of the events collected (first/last event used). It gives you minimum, maximum and average uptime of the application pool and a count of the various events the query found.
If you see you are recycling based on your time limits, you probably have a pretty healthy app. Looking at the above output we can see the majority of the recycling is due to memory related limits with a few ping fails. This app pool may be have a memory leak, may be using ASP.Net cache excessively or could just have too much content to be contained in a single app pool process (ASPX content is compiled into DLL’s that consume virtual memory), more investigation may be required.