Jaa


Nobody ever reads the event logs…

In my last post, I mentioned that someone was complaining about the name of the bowser.sys component that I wrote 20 years ago.  In my post, I mentioned that he included a screen shot of the event viewer.

What was also interesting thing was the contents of the screen shot.

“The browser driver has received too many illegal datagrams from the remote computer <redacted> to name <redacted> on transport NetBT_Tcpip_<excluded>.  The data is the datagram.  No more events will be generated until the reset frequency has expired.”

I added this message to the browser 20 years ago to detect computers that were going wild sending illegal junk on the intranet.  The idea was that every one of these events indicated that something had gone horribly wrong on the machine which originated the event and that a developer or network engineer should investigate the problem (these illegal datagrams were often caused by malfunctioning networking hardware (which was not uncommon 20 years ago)).

But you’ll note that the person reporting the problem only complained about the name of the source of the event log entry.  He never bothered to look at the contents of this “error” event log entry to see if there was something that was worth reporting.

Part of the reason that nobody bothers to read the event logs is that too many components log to the eventlog.  The event logs on customers computers are filled with unactionable meaningless events (“The <foo> service has started.  The <foo> service has entered the running state.  The <foo> service is stopping.  The <foo> service has entered the stopped state.”).  And they stop reading the event log because there’s never anything actionable in the logs.

There’s a pretty important lesson here: Nobody ever bothers reading event logs because there’s simply too much noise in the logs. So think really hard about when you want to write an event to the event log.  Is the information in the log really worth generating?  Is there important information that a customer will want in those log entries?

Unless you have a way of uploading troublesome logs to be analyzed later (and I know that several enterprise management solutions do have such mechanisms), it’s not clear that there’s any value to generating log entries.

Comments

  • Anonymous
    May 03, 2011
    I think actionable messages should be given to the user by more noticeable means. The log is not a way of communicating with the user, it's a troubleshooting tool. And when you get to troubleshoot some weird misbehavior of the system, and you've got nothing but the logs, there is no such thing as "too much information". It may be more or less convenient to browse the logs, but it is up to the using good tools for log analysis.

  • Anonymous
    May 03, 2011
    Isn't that what the filter function is for? That's always the first thing I click, filter out information messages so I can look at the relevant warning/error/critical entries. Once something interesting has been identified that way, it makes sometimes sense to remove the filter and look at the information entries around the same time. The real issue with the event log is that nobody looks into it if they don't think they have a reason to look at it. Many quite important messages can get lost that way. What I would like is a program that shows an icon and bubble hint in the notification area if an warning or error entry gets added to the system log. But I haven't found anything like that yet. Guess I'll have to write one myself sooner or later.

  • Anonymous
    May 03, 2011
    Log in XML. Provide XSL that filters only error messages.

  • Anonymous
    May 03, 2011
    The problem is that you are trying to log information for two different groups of people.  Users generally only want to see error messages, maybe the occasional informational message.  Developers (or tech support) want to see everything (or at least more than error messages).  Since there is only one log sink it is inevitable that one of those groups is not going to be happy.  We ship a product that defaults to error-only logging.  The problem is that at this level there is not enough context to figure out what went wrong, so we almost always have to get the customer to turn up the log level and reproduce the issue to get usable log files.  The flipside is that the verbose logs are filled with so much stuff that it can be hard to figure out what is relevant for the issue you are tracking.

  • Anonymous
    May 03, 2011
    The comment has been removed

  • Anonymous
    May 03, 2011
    The comment has been removed

  • Anonymous
    May 03, 2011
    The comment has been removed

  • Anonymous
    May 03, 2011
    I'll jump in here and make the claim that the writers/maintainers of the event viewer at Microsoft do not actually use the tool themselves.  If they did, it wouldn't be so difficult and un-useful to use, and would have improved it a bit in the course of 15 years.  These people could learn a lot of from how the interface for many of the NirSoft tools work.

  • Anonymous
    May 03, 2011
    mpbk: Actually the event viewer was totally redesigned in Windows Vista and is dramatically better than it was in XP and before.  If you haven't used Vista yet, you should try it.  

  • Anonymous
    May 03, 2011
    Unfortunately the mess in the event log is being leveraged by scammers who are calling up unsuspecting average (i.e. non-technical) PC users claiming to be from Microsoft/their ISP/whatever saying "we believe that your computer is causing a lot of errors on the internet. Let's look in the event log to see if that's the case". They then walk the customer through opening event viewer, use the resulting overwhelming amount of info (including both info, warnings and errors) is then used to convince said user to allow the scammer to remotely connect to their machine to "fix it". Or sell them a bogus product to "fix it". In either case maliciousness ensues. I hear from these people (victims) all the time. Have a peek at the comments here: ask-leo.com/what_is_the_event_viewer_and_should_i_care.html (it's an old article, but you can see people are finding it after they've been called), and here for one person's "transcript" of his experience: ask-leo.com/is_my_isp_calling_me_to_clear_up_my_problems_with_windows.html Leo

  • Anonymous
    May 04, 2011
    "If you haven't used Vista yet, you should try it" - You meant "try Windows 7", right? ;-)

  • Anonymous
    May 04, 2011
    Actually no I meant Vista.  The rewritten eventviewer came online in Vista, not Win7.

  • Anonymous
    May 04, 2011
    "Nobody ever reads the event logs…" <-- almost true "Nobody ever bothers reading event logs because there’s simply too much noise in the logs." <-- it's like saying nobody reads Wikipedia because it's too much information in there. Sometimes it's better to have more information than no information at all. More information/noise shouldn't stop users investigate or monitor issues. With a powerful search/filtering mechanism the impact of noise could be reduced considerably. The main problem I found related to logs is not knowing what to search for, which events are relevant, and which are not, and this especially when performing troubleshooting or monitoring.

  • Anonymous
    May 06, 2011
    The comment has been removed

  • Anonymous
    May 07, 2011
    In System management (I work with System Center Operations Manager) we face every day the issue of choosing the right events to act upon and what to ignore as simply noise... or more generally, the issue with inconsistent instrumentation that makes it hard to tell if an application is healthy or if it isn't... and more importantly what to do if it is not healthy. It's a tough problem to solve, as it essentially boils down to educate developers in moving away from "debug" logs (easy to write and good when you are testing your code) to reliable instrumentation to tell if the app is "behaving" correctly, which is what the sysadmin is worried about. I am NOT blaming the developers here - finger pointing is NOT solving anything! - they might not have the mindset for "monitoring" their application and that's why this type of instrumentation only improves over time when the application developers AND the monitoring guys work hand in hand for a few release cycles, IMHO. So, I second the "only a few useful and actionable events" in EVT movement I read in between Larry's lines. Everything else can be moved to ETW/ETL, for example.

  • Anonymous
    May 10, 2011
    That's why whenever I install Windows XP or 7 (on 7 I use the Classic XP Event Viewer msc copied from an XP installation), the first thing I do is create two new views in the Event Viewer that filter Information logs. Only errors and warnings are shown in these two views (Application Errors) and (System Errors).

  • Anonymous
    June 02, 2011
    Larry, the only thing I have noticed with the new Event Viewer is that it is noticably slower than the old one. It can, on some machines, take ages to navigate to the system events and start diagnosing a user's problems. I love Windows 7, and I even liked Vista, but I am not 100% onboard yet when it comes to the Event Viewer. I was able to diagnose problems just fine using the old one... And it was faster. But I will get used to it I suppose, and I consider it a good thing that the team spend time on improving such bits.

  • Anonymous
    July 08, 2011
    The comment has been removed

  • Anonymous
    July 30, 2011
    The comment has been removed