HOWTO: Basics of IIS6 Troubleshooting
I recently sat down and thought a little about the typical user experience when troubleshooting IIS6, assuming s/he had little/no IIS context that long-time users have... and the picture did not look so good.
Now, I know that IIS7 will make huge improvements in this area (and will unfortunately obsolete some of this information... but not the general concepts! :-) ), but it is not available yet. I am also certain that users will continue to push us to literally start to self-diagnose the issue instead of providing useful information for users to make corrective actions (and certainly not merely dump trace data, like we do in WS03SP1...), so there has to be a good balance.
But in the meantime, I wanted to gather together some of the basic troubleshooting steps that I go through to diagnose issues which appear to involve IIS, as well as explain some of the rationale of the steps, so that the reader can better understand the process.
Preserve System State
First and foremost, when you are trying to troubleshoot why something is not working, you want to preserve the system state for examination. I cannot emphasize this enough... especially when you want someone else to help you. I can tell you that developers will not even look at your issues unless you give them the exact state that triggers the issue, and their rationale is simple - they want to diagnose the actual issue... not some psuedo or related issue. Only the Real Thing.
Thus, you want to treat the system more like a "crime scene" where you take measured steps around the chalk figures, take non-destructive samples, record the location of everything, etc... If you must make changes, make note of the order in which you do them as well as any consequential effects. I prefer to keep a little temporary "journal" in notpad.exe of my actions - so that I remember what I did, why I did them, and in what order... so that subtle patterns may emerge in later analysis without needing to gather data again.
For example, RESIST the urge to try to uninstall/reinstall IIS, change filesystem ACLs to wide-open, make user accounts to be administrator, or lower security settings on the server or in the browser just to see if things work. These sorts of actions merely destroy system state such that you may never know *why* something started working (and since you destroyed system state, no one else can help re-interpret for you)... so you never know *how* to deal with it in the future. In other words, you fail to learn and grow from the experience, and it can only cost you more time/effort in the future. Resist the urge to snap up short-term gains and focus on the long-term benefit. Simply finding solutions is ok; learning how to solve a class of problems is even better. Gee... these same ideas apply to life equally well... ;-)
This is why on newsgroups/Q&A, when someone tells me that they reinstalled, wiped ACLs, changed Administrators, etc during their "troubleshooting", I simply have to turn off and stop helping them... not because I do not care... but because at that point I am no longer dealing with the original issue but some new meta-issue.
Now, I know that some of you enjoy "troubleshooting" by tinkering around with settings in the UI to see if any combination works and if it does, great. But realize that this simply means that you condemn yourself to be limited by whatever is accessible via the UI, and in the case of an open platform like IIS, understanding fundamentals of how things plug together is really important because the UI simply cannot and does not represent all useful interactions. Thus, the astute reader should notice that I never write a blog entry from the perspective of the UI but always from the perspective of the IIS Core Server and what is going on... and the UI is merely a means to configure the necessary settings as appropriate.
In general, I prefer to the classic warfare approach of first gathering and analyzing my "recon data" before formulating any strategy and making any intrusive deployments and movements...
Useful Logs
A common pitfall that I see users fall into is to claim:
I did not see any useful errors in X, so I decided to do something more drastic.
While I understand that users are trained to look for errors in a variety of places (Event Log seems to be a common place where people expect everything to be logged), and IIS does not log everything everywhere (and certainly does not flood the event log), it pays to be patient. Even when I am dealing with some system I do not fully understand (like the Windows Firewall, Exchange, SQL...), I still first search the web for any diagnostic aid, search the filesystem/registry for any log file clues, etc... before trying anything else... simply because I value preserving system state.
The most valuable log files for IIS6 (and their locations) are:
- HTTP.SYS Error Log - %windir%\System32\LogFiles\HTTPERR (Default location; configurable)
- IIS Website Log - %windir%\System32\LogFiles\W3SVC# (Default location; configurable) **
- Event Log (both System and Application)
** I'm not going to complicate things with Centralized Binary Logging and such... because the point of the logfile remains the same; just location differs. Materially it does not change the discussion nor rationale.
One of the more non-intuitive things when dealing with IIS log files is that there is no single location to correlate all of them. Worse, it often appears that information is haphazardly split amongst them, so it is never clear where to look for what. Barring unintentional bugs, here is how I logically classify the log files:
IIS Website Log contains information related to the actual execution of a request (HTTP Response Code and Win32 Error Code). It tells me that IIS Core in user-mode got the request and sent some response (no guarantee the client got the response, of course).
Examples include:
HTTP.SYS Error Log contains information related to the TCP connection, pre-dispatch of request to user-mode, and process-management. Unexpected connection drops (either before, during, or after the request is dispatched, for whatever reason), w3wp crashes, application pool disabled (HTTP Status 503), invalid parsing of HTTP request (HTTP Status 400), etc. Basically, everything surrounding the actual execution of the request by user-mode w3wp.exe.
Examples include:
Event Log contains interesting information or state-change on the server that is not tied to a given request. For example, failure to load/read configuration, detection of some invalid configuration values, failure to load ISAPI Filters, detection of w3wp crash, disabling of Application Pool, etc. In particular, one should not expect event log to contain everything necessary to troubleshoot. Treat it more like an emergency phone call to 911 and not the actual crime scene.
Examples include:
One common source of "debugging information" that I dissuade you from relying upon for troubleshooting is the HTTP response displayed by your web browser. Quite frankly, do not trust the browser for troubleshooting unless you have nothing better. Use a tool like WFetch from IIS Resource Kit Tools. Browsers simply have too many "usability" features that limit their usefulness, including:
- Browsers may not display the actual HTTP Response but some rationalized response, so you never detect the flawed output of custom ISAPI code running on the server
- Browsers may re-interpret HTTP Response and generate some other pre-canned response, so you never get the actual HTTP Response Code
- Web Server may not send a useful Custom Error page to the Client despite logging all the information to Log files on the server
In short, first look at the aforementioned Log files on the server before doing anything else...
Use Non-Invasive Monitors
When the log files do not tell the whole story, I resort to pragmatic, non-invasive monitors like FileMon, RegMon, and NetMon to observe and record what happens on the situation in question.
Suppose the problem has to do with the request accessing some file or registry key and returning access denied or file not found. I suggest using FileMon (aka File Monitor) or RegMon (aka Registry Monitor) from www.sysinternals.com to track which resource is accessed and the associated error.
Some representative blog entries:
Suppose the problem is a repeating user dialog popup, or the browser is "hanging" and eventually times out, or just any other unexpected HTTP sequence. I suggest using NetMon (aka Network Monitor) from Add/Remove Windows Components / Management and Monitoring Tools / Network Monitor Tools to capture the incoming/outgoing request/response.
In particular, this approach is necessary if ISAPI Filters or ISAPI Extensions are involved in the request because they can cause arbitrary server behavior. So, you need to capture these arbitrary behaviors, determine which is wrong/right, and go from there.
Suppose you know the actual request which will generate the misbehaving hang, then you can consider using a tool like WFetch to independently make that request and then observe the raw HTTP Response to figure out what is wrong with it. Using browser plugins like Fiddler in IE is not as independent nor direct as you need.
Use Real Debuggers
Suppose the log files and pragmatic monitors fail to tell the whole story... I then attach debuggers like NTSD or WINDBG from the Microsoft Debugging Toolkit (do NOT use Visual Studio because installing/using that tool changes too much machine state which may be relevant. Visual Studio is more a development platform than a debugging tool), set up symbols as directed, and then investigate the process responsible for handling the request in question.
A couple of useful breakpoints to use include:
- w3isapi!ServerSupportFunction - any time an ISAPI Extension makes a pECB->ServerSupportFunction call, you can trap it and based on the ID, examine every single parameter value. Now, with IIS6 on WS03SP1, this information can be obtained by turning on ETW Tracing, but I still enjoy the generic method of simply setting a breakpoint on whatever I am interested in trapping and then observing it.
- w3core!FilterServerSupportFunction - any time an ISAPI Filter makes a pfc->ServerSupportFunction call, you can trap it and based on the ID, examine every single parameter value. With IIS6 on WS03SP1, this information can be obtained by turning on ETW Tracing as well.
In conjunction with Log files indicating the error response/codes, Network Monitor indicating what is wrong with the response, and trapping what ISAPI Filters/Extensions do on the server, you can usually track down whether the problematic response came from IIS or some particular ISAPI... which is a huge step forward in troubleshooting.
Conclusion
Ok, I am going to stop at this point because I do not want this blog entry to be some all-encompassing novel that takes me forever to write, edit, and perfect and hence never publish. ;-) I hope that this information provides a useful scaffold for any user of IIS to effectively gather data to troubleshoot a variety of IIS-related issues by employing various associated diagnosis techniques (which I intentionally did not mention... though they would be nice subjects for future blog entries).
Please note that my troubleshooting steps do not involve changing the system's or even IIS's configuration other than replaying the request or action that triggers the issue under investigation... because to me, preserving system state and recording my actions/observations is most important. Why? Well... suppose I cannot actually resolve the issue... then I definitely do not want to prevent anyone else from helping me resolve the issue and want to provide them with the best environment and all the information I had already independently gathered to save them time.
Yes, I realize that this approach does not make you feel empowered because you do not actively change anything... but please realize that troubleshooting is about making correct changes quickly; not quickly making correcting changes... ;-)
//David
[2006-07-14] Hmm... minds thinking alike. I recommend this URL:
https://weblogs.asp.net/steveschofield/archive/2006/07/08/Troubleshooting-process.aspx#comments
Comments
Anonymous
January 01, 2006
When refering to "NetMon", do you mean any special application? I searched Google for "NetMon" and "SysInternals NetMon", but found various unspecific items only.Anonymous
January 02, 2006
Uwe - When I say "NetMon" I meant "Network Monitor", which I described how to obtain/install. I'll make the association more clear. It's similar to how "RegMon" is "Registry Monitor".
//DavidAnonymous
January 02, 2006
Hi David,
A useful post, thanks. One minor typo:
> I suggest using RegMon (aka Network Monitor)
I think you mean NetMon there.
PaulAnonymous
January 02, 2006
Paul - Thanks. A typo from copy/paste/edit for the prior comment...
//DavidAnonymous
January 17, 2006
I'm looking for an answer to the (silly) question, what does a http status code 200 in the iis website log really mean.
You say: "IIS Website Log contains ... It tells me that IIS Core in user-mode got the request and sent some response (no guarantee the client got the response, of course)."
That I was thinking also, but there are guys arguing "if there is an http status code 200 for a GET is in the IIS logs it means >>guarantee of delivery<< to the client, like a fax-reception-o.k."
I know this looks silly to you, but help me anyway ;-) thanx a lot!!!Anonymous
February 07, 2006
Hi David:
we have lots of 400 error codes register in IIS logs and HTTP.sys logs. But http.sys logs are not that usefull, as they don't enough information about cs-uri, most of the time "-". So wondering is there way to debug the issue behind this error.
Thanks.Anonymous
February 07, 2006
The comment has been removedAnonymous
March 03, 2006
If you are looking for information on how to troubleshoot a variety of IIS-related issues, the following...Anonymous
April 10, 2006
Sigh... it seems that the Application Health Monitoring features added in IIS6 are merely used by VARs...Anonymous
May 02, 2006
As i just noticed, IIS6 will also go all "eek!" on multiple dots within abs_path, too.
"http://server/abc+.+def/ghi.aspx" will work perfectly fine, but
"http://server/abc+..+def/ghi.aspx" will throw some "Bad Request" at me.
RFC 2396 saying
> unreserved = alphanum | mark
> mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"
would lead me to thinking "abc+..+efg" was a legal segment, though.Anonymous
May 02, 2006
The comment has been removedAnonymous
May 02, 2006
The comment has been removedAnonymous
May 02, 2006
Robert - You probably have either:
1. something configured on your IIS6 server
2. between your server and client
that is rejecting those requests because I do not see what you claim.
I am able to make your exact failing requests against my default IIS6 installation and get the correct responses (not 400).
Obviously, just because HTTP.SYS/IIS6 allows a request does not mean other applications running on the server or between the server and client likes the request. The observed behavior there is obviously arbitrary, so you must always be clear on WHAT is handling and rejecting the request because it may not be IIS6 at all.
//DavidAnonymous
May 02, 2006
Hi David.
Following your hints I tried using a new virtual folder on the server in question.
As you suggested, it worked well for defaul..t.htm and defaul&t.htm.
As the original scenario involves an asp.net 1.1 application with rewriting happening within the "begin_request" event (which is supposed to map 'virtual' urls to database-driven content).
Trying to reproduce the behavior using a simpler setup I set up a wildcard-handler "C:WINNTMicrosoft.NETFrameworkv1.1.4322aspnet_isapi.dll", and unchecked the "verify if file exists" checkbox.
Then I tried opening "default.htm", "defaul&t.htm" and "defaul..t.htm"
"default.htm" still worked.
"defaul&t.htm" and "defaul..t.htm" caused "Bad request"
So, I conclude, it is not the fault of http.sys or IIS6 at all, but all aspnet_isapi.dll's.
Thank you for pointing me to this direction.
RobertAnonymous
May 02, 2006
Hi David,
it's me again. There's one last thin i forgot to mention:
One thing, that might point at IIS6 is that the same application works fine under IIS5 (and the same version of the .net framework). But then it might as well have something to do with some difference between Windows 2000 Server and Windows 2003 I do not know about.
Hopefully I'll finde some clue.
Thank you again.
RobertAnonymous
May 03, 2006
Robert - You've already proven that your issue is not with IIS6 but something ASP.Net specific.
I would start looking at whether they are the same version and running the exact same httpModules.
WS03 (and WS03SP1) contains updated binaries of .Net Framework 1.1. Verify that you have the same updated binaries on Windows Server 2000.
I suspect that one of the versions of ASP.Net is doing path validation/canonicalization and now deciding to fail your requests.
//DavidAnonymous
May 03, 2006
Hi David,
I guess I finally found out, why it would work on IIS5/Windows 2000, and not on IIS6/Windows 2003.
It had nothing to do with the version of IIS nor the version of Windows, nor the versions of the .Net-Framework (1.4322.2032 / 1.4322.2300).
The different behavior was caused by the registry setting "VerificationCompatibility" under "HKEY_LOCAL_MACHINESOFTWAREMicrosoftASP.NET" being set to "1" on the 2000-Server and not on 2003.
(This setting seems to be introduced by kb826437).
I think I'd never found out without you pointing me away from my first suspicions, so thank you again for your patience with me.
RobertAnonymous
May 03, 2006
Robert - thanks for finding the KB. Actually, it does have to do with the "version" of the .Net Framework since it includes ASP.Net.
I didn't have the exact KB#, but I knew the guys that made the change in ASP.NET, so I was trying to say that during one of the ASP.Net 1.1 SPs a "canonicalization check" was introduced (that resulted in the behavior you observed), and the Registry Key from the KB basically turns that check off (it is on by default for security reasons).
So:
1. if you ran the original ASP.Net 1.1, this check would not be present at all and things would work
2. if you ran ASP.Net 1.1 with the latest service packs, the check would be present and enabled by default so things would fail
3. and if you then add the Registry Key to turn the check off, things would "work" again
//DavidAnonymous
May 04, 2006
Hi David.
I see. Thank you for the explanation.
Regarding the KB-Entry, there's one thing that I'd just like to share though:
I think It'd be nice if the KB-entry explained, what consequences, security-wise, setting the option would have.
It reads a bit like "It won't work, but set this setting, and it will." and one gets to thinking "Where's the catch? If there wasn't any, why would they use a registry-setting instead of changing it alltogether?" But there's no hint as to why leaving this setting alone could possibly wiser.
I think that if the consequence of this setting might be decreased security, one should be informed. And if there is no risk at all, it'd be nice to know, too, so you don't get sleepless nights wondering what evil you might have done by changing this setting.
And a possibility to not disable this check for all the server but to override a "failed" result within one's application would be quite nice to have.
But this shall not bother you, I'm very grateful for the time you voluntarily invested helping me with my problem.
Best wishes
RobertAnonymous
May 05, 2006
The comment has been removedAnonymous
May 05, 2006
David,
I agree on your point about full security being preferable over a false sense of security - It's just that sometimes things look completely harmless, like "defaul..t.aspx". I see the need for preventing canonicalization issues, but I don't see how "defaul..t.aspx" might cause such.
This is probably the reason why I should leave such things to the .Net people, who know what they are doing, as I generally dislike the idea of being the one responsible when something bad happens to the server; and it's just one, wheras they are responsible for thousands of servers.
Hope they can sleep well.
RobertAnonymous
May 08, 2006
HI DAVID
I agree on your point about full security. But I seem to have a little problem when. I log on the internet explorer. It give me a error message that says your Internet explorer has encounted a problem and needs to close. We are sorry for the inconvenience.Anonymous
May 12, 2006
Robert - Canonicalization issues are by-definition arbitrary and usually hard since paths crossing multiple domains can suffer from such naming issues. I am glad someone else is thinking about it. ;-)
However, I agree that I do not see anything wrong with defaul..t.aspx. It seems that the ASP.Net validation is just checking for presence of ".." and not necessarily "/../"
//DavidAnonymous
May 12, 2006
HOVIK - sounds like something is crashing Internet Explorer. Just attach a debugger to iexplore.exe, and on the crash, look at the stack trace, determine what is crashing Internet Explorer, and follow up with the appropriate support party for that software.
//DavidAnonymous
July 14, 2006
The comment has been removedAnonymous
July 14, 2006
Sorry for the second post right after the first, but I did eventually figure out the root cause. It looks like ASP.NET automatically sets the 'bin' dir of any application to 'None' the first time it executes something there. I had some ASP.NET DLLs mixed in with ISAPI ones. Moving them out to their own, separate, dir made it work right. Probably more secure anyway.Anonymous
July 16, 2006
Dan P - glad you found the answer - that was what I was going to suggest, but you found it first. :-)
ASP.Net will reset the permissions of /bin directory to "None" on its own for security reasons - .Net assemblies are essentially source code, and having that directory readable is like revealing your source code. To protect on IIS5/5.1, ASP.Net changes the AccessFlags property because it is LocalSystem. To protect on IIS6, it runs a filtering algorithm because it runs as Network Service which cannot change IIS configuration. /DavidAnonymous
August 18, 2006
It's a 10K entry!
//DavidAnonymous
October 07, 2006
This article is great. I always have to lecture my hosting company when I need error logs. They normally dont give me any information or tell me. "We have found nothing". Now I know what to tell them next time. Good Job DaveAnonymous
May 23, 2007
Hola all How I can change avatar in this forum?Anonymous
October 10, 2007
The comment has been removedAnonymous
November 07, 2007
Hi, I am involved in migrating all the web sites from a web server running with IIS 5.0 to another web server with IIS 6.0. For a web site, in the source server it doesnt have a default document but, it displays some links which takes to another web site. I have made copies of those folders and also created a web site which is very much similar to that. But, when I tested the new web site in the target server, it gives the following error: "Directory Listing Denied - This Virtual Directory does not allow contents to be listed". I have checked with the permission for all the sub folders including the Virtual Directory. But, the error is displayed. Please give me a valuable guidance to solve this issue.Anonymous
November 11, 2007
Hi David I have IIS 6 on a Win (SBS) 2K3 domain controller (together with ISA 2000, Exchange 2003 and SQL 2000). On IIS there's only the stuff from Exchange installation (web-mail etc). IIS gives me "Service Unavailable" no matter what I browse from it, "DefaultAppPool" crashes after every "browse" command from IIS Administration Console. Any idea ? I am desperately trying to put a website on that server (web service goes under port 1234 to avoid interraction with ISA on 80/8080) Pls help me...Anonymous
November 17, 2007
The comment has been removedAnonymous
November 17, 2007
George - Is your SBS installation the "Premium" edition of SBS2003, or your proprietary combination of software? If it is your proprietary combination, then do you have support statements from Microsoft saying your combination is possible? If your combination is allowed, then please contact Microsoft PSS for support. If your combination is not possible, then please read this blog entry to find the blog URL on how to diagnose your "503 Service Unavailable" issue. Otherwise, you should expect to pay someone to consult on your issues. I am happy to tell and teach people on how to do things, but if you want me to diagnose and do the work, then I must be compensated. //DavidAnonymous
January 26, 2008
The Problem IS In Host Header Value In IIS . Delete It If You Fill A Value it's Work fine ;)Anonymous
February 20, 2008
The comment has been removedAnonymous
May 20, 2008
iis can't start, service no response or control request! hwo to doing?Anonymous
June 18, 2008
When going into a folder on my usb hard drive it a message comes up saying 'Windows Explorer has encountered a problem and needs to close. We are sorry for this inconvenience'. It is ok in other foldersAnonymous
June 21, 2008
Ronniebatch - it sounds like you have installed some application which added folder handlers to Explorer (for example, WinZip), and that handler is crashing Explorer on that folder on your USB hard drive. You'll have to figure out which buggy application extension you installed and get rid of it (or fix it). It may even be a bogus extension left over from a previously installed folder extension. In any case, this sounds like a misconfiguration of your computer and not a problem with Windows Explorer. For example, many people, including myself, have no problems going into a folder on our USB Hard Drives. So the problem has to be specific to some software you've installed/uninstalled on your computer, which can only be fixed if you determine the broken folder extension and getting rid of it. //DavidAnonymous
August 12, 2008
Hello, here my problem. We are running a webfarm in an NLB configuration , unicast mode. The farm has 3 servers windows 2003 standard edition SP1. Webfarm works flawlessly asp and asp.net 1.1 and 2.0 sites. Now, we want to add a 4-th server in the farm. We configured it exactly identical to the 3 other machines in the farm. We tested all webapplications, and all seems normal. Now, and here is the problem, we upgraded this 4-th machine to SP2, with the idea of testing out the SP2 upgrade on this machine BEFORE we upgraded are 3 other servers in the farm. AT first, during the test, all seems normal. But then , after a few minutes , the asp pages in the sites are getting TCP errors. Asp.net pages keep on running. What is even more strange is taht we do not see any error message whatsoever in the logfiles, eventviewer, , http.sys errorlog, none nothing! I can get the applications stable by reducing the nr of worker procesees in the application pool of the site to 1 worker process. Since this server has 4 CPU's, i configured it to having 4 worker processes (as i did on the other 3 server, because they to are or biprocessor machines or 4 way cpu machines). Any ideas on how i should troubelshoot this? Or any ideas on possible issues with SP2 regarding this matter? Thanks, ThierryAnonymous
August 21, 2008
The comment has been removedAnonymous
January 13, 2009
The comment has been removedAnonymous
February 25, 2009
Hello. I have run debug diagnostics and am a little confused by its output. [2/25/2009 2:02:23 PM] Thread created. New thread system id - 4728 [2/25/2009 2:02:23 PM] Thread created. New thread system id - 4016 [2/25/2009 2:02:23 PM] First chance exception - 0xc0000005 caused by thread with system id 2448 [2/25/2009 2:02:23 PM] First chance exception - 0xc0000005 caused by thread with system id 2448 [2/25/2009 2:02:23 PM] First chance exception - 0xc0000005 caused by thread with system id 2448 [2/25/2009 2:02:23 PM] First chance exception - 0xc0000005 caused by thread with system id 2448 [2/25/2009 2:02:23 PM] First chance exception - 0xc0000005 caused by thread with system id 2448 [2/25/2009 2:02:23 PM] First chance exception - 0xc0000005 caused by thread with system id 2448 [2/25/2009 2:02:23 PM] First chance exception - 0xc0000005 caused by thread with system id 2448 [2/25/2009 2:02:23 PM] First chance exception - 0xc0000005 caused by thread with system id 2448 [2/25/2009 2:02:23 PM] First chance exception - 0xc0000005 caused by thread with system id 2448 [2/25/2009 2:02:23 PM] First chance exception - 0xc0000005 caused by thread with system id 2448 [2/25/2009 2:02:23 PM] First chance exception - 0xc0000005 caused by thread with system id 2448 [2/25/2009 2:02:23 PM] First chance exception - 0xc0000005 caused by thread with system id 2448 [2/25/2009 2:02:23 PM] First chance exception - 0xc0000005 caused by thread with system id 2448 [2/25/2009 2:02:23 PM] First chance exception - 0xc0000005 caused by thread with system id 2448 [2/25/2009 2:02:23 PM] First chance exception - 0xc0000005 caused by thread with system id 2448 [2/25/2009 2:02:23 PM] First chance exception - 0xc0000005 caused by thread with system id 2448 [2/25/2009 2:02:23 PM] First chance exception - 0xc0000005 caused by thread with system id 2448 This is just a sample but from what I understand, First Chance Excpetions are not necessarily a problem. However.... Should I be seeing this many so often on a site that is not being used hardly at all?