Interpreting Event 153 Errors

Hello my name is Bob Golding and I would like to share with you a new event that you may see in the system event log.  Event ID 153 is an error associated with the storage subsystem. This event was new in Windows 8 and Windows Server 2012 and was added to Windows 7 and Windows Server 2008 R2 starting with hot fix KB2819485.

 

An event 153 is similar to an event 129.  An event 129 is logged when the storport driver times out a request to the disk; I described event 129 messages in a previous article.  The difference between a 153 and a 129 is that a 129 is logged when storport times out a request, a 153 is logged when the storport miniport driver times out a request.  The miniport driver may also be referred to as an adapter driver or HBA driver, this driver is typically written the hardware vendor.

 

Because the miniport driver has a better knowledge of the request execution environment, some miniport drivers time the request themselves instead of letting storport handle request timing.  This is because the miniport driver can abort the individual request and return an error rather than storport resetting the drive after a timeout.  Resetting the drive is disruptive to the I/O subsystem and may not be necessary if only one request has timed out.  The error returned from the miniport driver is bubbled up to the class driver who can log an event 153 and retry the request.

 

Below is an example event 153:

 

Event 153 Example

 

This error means that a request failed and was retried by the class driver.  In the past no message would be logged in this situation because storport did not timeout the request.  The lack of messages resulted in confusion when troubleshooting disk errors because timeouts would occur but there would be no evidence of the error.

 

The details section of the event the log record will present what error caused the retry and whether the request was a read or write. Below is the details output:

 

Event 153 Details

 

In the example above at byte offset 29 is the SCSI status, at offset 30 is the SRB status that caused the retry, and at offset 31 is the SCSI command that is being retried.  In this case the SCSI status was 00 (SCSISTAT_GOOD), the SRB status was 09 (SRB_STATUS_TIMEOUT), and the command was 28 (SCSIOP_READ). 

 

The most common SCSI commands are:

SCSIOP_READ - 0x28

SCSIOP_WRITE - 0x2A

 

The most common SRB statuses are below:

SRB_STATUS_TIMEOUT - 0x09

SRB_STATUS_BUS_RESET - 0x0E

SRB_STATUS_COMMAND_TIMEOUT - 0x0B

 

A complete list of SCSI operations and statuses can be found in scsi.h in the WDK.  A list of SRB statuses can be found in srb.h.

 

The timeout errors (SRB_STATUS_TIMEOUT and SRB_STATUS_COMMAND_TIMEOUT) indicate a request timed out in the adapter. In other words a request was sent to the drive and there was no response within the timeout period.  The bus reset error (SRB_STATUS_BUS_RESET) indicates that the device was reset and that the request is being retried due to the reset since all outstanding requests are aborted when a drive receives a reset.

 

A system administrator who encounters event 153 errors should investigate the health of the computer’s disk subsystem.  Although an occasional timeout may be part of the normal operation of a system, the frequent need to retry requests indicates a performance issue with the storage that should be corrected.

Comments

  • Anonymous
    August 31, 2013
    I dont have WDK to have access to the full scsi.h and srb.h. Could you please post more codes online to make it easier to find more information about this error? I'm getting a 00 04 28, so SCSI is good but SRB is ????, all that while doing a read. [Hi Mauricio.  The WDK is a free download, you can get it from http://msdn.microsoft.com/en-us/library/windows/hardware/hh852362.aspx.  Regarding the error you are seeing, 04 is SRB_STATUS_ERROR.]

  • Anonymous
    October 06, 2013
    Downloading the whole WDK just to find the codes does seem a little overkill. I'd be especially thankful if you could decode the following "02 08 28". [Thank you for your feedback.  To decode your message: 02 -SCSISTAT_CHECK_CONDITION 08 - SRB_STATUS_NO_DEVICE 28 - SCSIOP_READ It appears your system attempted to read from a device which was not present.  Most likely the device was removed.]

  • Anonymous
    November 18, 2013
    Hey Bob, this is great stuff man. Would this also report with wiht the Microsoft iSCSIPORT driver. Can is help troubleshoot dropped frames for an ethernet/iSCSI SAN? [Most iSCSI implementations now use msiscsi rather than iscsiport.  To answer your question, it depends on if msiscsi returns an error that that will bubble up to the class driver to be retried.  I do not recall of any errors that are handled in that way. The only one I can think of that may bubble up is SRB_STATUS_BUS_RESET (0xE).  Most all of the retryable errors occur between msiscsi and storport. Msiscsi does not time requests, so you should not see a timeout.]

  • Anonymous
    January 16, 2014
    The comment has been removed

  • Anonymous
    January 23, 2014
    Hi, following up on Mauricio's question, can you please tell me where and how to look/check next when getting SRB_STATUS_ERROR = 04? Can you list the all possible causes? Thanks. [If the requests to the disk are failing, most likely there is an issue with the controller, cable, disk, etc.]

  • Anonymous
    April 07, 2014
    The comment has been removed

  • Anonymous
    May 08, 2014
    FYI: We use Altaro Hyper-V Backup to backup our Hyper-V machines. Now the Hyper-V Host (where the Backupsoftware runs) logs this event every time after the backup completes; with the code 02 08 28, so the status is: 02 -SCSISTAT_CHECK_CONDITION 08 - SRB_STATUS_NO_DEVICE 28 - SCSIOP_READ The disks with the numbers in the logged event are really removed (in my case Disk 1 and Disk 2). When i go to the storage manager, I've got only Disk 0. So Altaro mounts the VHD's of the virtual machines to do the backup and unmounts it, when the backup is completed. So the logged event is correct and can be ignored in this case (when you backup your virtual machines with Altaro or any other software acting the same way to backup Hyper-V machines... I was really surprised and feared that the disks are going to fail soon :) Have a nice day everybody [You may also be getting event 157 messages if the disks are being surprise removed.  For more information see http://blogs.msdn.com/b/ntdebugging/archive/2013/12/27/event-id-157-quot-disk-has-been-surprise-removed-quot.aspx.]

  • Anonymous
    May 31, 2014
    i also assume this could be related to a DVD? [Possibly.]

  • Anonymous
    August 18, 2014
    Can you please help me decode this error message: 0000: 0F 01 04 00 04 00 2C 00   ......,. 0008: 00 00 00 00 99 00 04 80   ....™..€ 0010: 00 00 00 00 00 00 00 00   ........ 0018: 00 00 00 00 00 00 00 00   ........ 0020: 00 00 00 00 00 00 00 00   ........ 0028: 00 22 04 2A               .".* [22 is SCSISTAT_COMMAND_TERMINATED 04 is SRB_STATUS_ERROR 2A is SCSIOP_WRITE Usually this happens because a frame is dropped.  This is an error at the hardware level (controller, disk, cabling, SAN fabric, etc).]

  • Anonymous
    October 09, 2014
    Is there anyway of determining exactly which disk was having this error?  Many thanks. [The text of the error indicates which disk had the error.  The example shown in this article occurred on Disk 0.  You can correlate the "Disk #" string to a specific disk using Disk Management.]

  • Anonymous
    October 17, 2014
    I have the following.  0028: 00 00 04 28 ? [04 is SRB_STATUS_ERROR.]

  • Anonymous
    November 10, 2014
    Great article Bob. Thanks for the heads up Stephan, I'm getting 02 08 28 for 3 missing discs on my Hyper-V host logged every hour which relates to the DPM backup of my 3 VMs. [Hi Paul.  Unfortunately we are not able to provide 1:1 support through this blog.  The issue reported does not seem to match a known issue.  You can obtain 1:1 support through http://support.microsoft.com/.]

  • Anonymous
    December 01, 2014
    hi,Can you please help me decode this error message: 二进制数据: 以字为单位 0000: 0004010F 002C0003 00000000 80040099 0008: 00000000 00000000 00000000 00000000 0010: 00000000 00000000 2A0A0000   以字节为单位 0000: 0F 01 04 00 03 00 2C 00   ......,. 0008: 00 00 00 00 99 00 04 80   ....™..€ 0010: 00 00 00 00 00 00 00 00   ........ 0018: 00 00 00 00 00 00 00 00   ........ 0020: 00 00 00 00 00 00 00 00   ........ 0028: 00 00 0A 2A               ...*   [0A is SRB_STATUS_SELECTION_TIMEOUT.]

  • Anonymous
    December 01, 2014
    hello,I'd be especially thankful if you could decode the following "00 0A 2A" [0A is SRB_STATUS_SELECTION_TIMEOUT.  Usually this means the device has been removed from the system.]

  • Anonymous
    December 23, 2014
    Hi, I am getting Event ID 153 Warning in one of my server installed with Windows 2012 Standard  and I am finding it difficult to Decode it. Can you please help me in Decoding it, Below is the event log Binary data: In Words 0000: 0016010F 003E0003 00000000 80040099 0008: 00000000 00000000 00000000 00000000 0010: 00000000 00000000 28040200 00050070 0018: 0A000000 00000000 00000021 0000 In Bytes 0000: 0F 01 16 00 03 00 3E 00   ......>. 0008: 00 00 00 00 99 00 04 80   ....™..€ 0010: 00 00 00 00 00 00 00 00   ........ 0018: 00 00 00 00 00 00 00 00   ........ 0020: 00 00 00 00 00 00 00 00   ........ 0028: 00 02 04 28 70 00 05 00   ...(p... 0030: 00 00 00 0A 00 00 00 00   ........ 0038: 21 00 00 00 00 00         !.....   [This does not resemble the binary data from an event 153.  Usually there are only 28 bytes of data.]

  • Anonymous
    March 06, 2015
    The comment has been removed

  • Anonymous
    April 19, 2015
    The comment has been removed

  • Anonymous
    July 25, 2015
    The comment has been removed

  • Anonymous
    August 13, 2015
    Hello, we have this error almost every day , sometimes with effect on production. please can you check this ? timeout during writing? what is it * on the end? can be some problem on hardware or configuration? we have new storage and from this time is every day. 0028: 00 00 09 2A               ...* thank you [00 is SCSISTAT_GOOD, 09 is SRB_STATUS_TIMEOUT, and 2A is SCSIOP_WRITE.  Your disk timed out a write request.  This indicates a problem with the storage hardware.  The * character is the ASCII representation of the 0x2A character, the output you copied shows hexadecimal on the left and ASCII characters on the right.  Sometimes the ASCII characters are present on purpose (such as when the error logs a string) and sometimes they are simply a coincidence.]

  • Anonymous
    September 04, 2015
    In my case it was Kaspersky which triggers this error. [Perhaps the antivirus was issuing requests to the storage which failed, however antivirus does not usually plug in at the storport miniport layer.]

  • Anonymous
    September 22, 2015
    I'm getting flooded with event 153 warnings on my computer and i don't know what's causing it. Looking in the details i get: "0028: 00 00 04 2A" at every warning. I'm running windows 8.1 and I'm using a Samsung 850 EVO SSD. I get the warning like every 10 seconds at different block addresses. It doesn't seem to effect my computer, but i guess it is something i should be worried about? [04 is SRB_STATUS_ERROR.  I would not ignore storage timeouts, this is a sign of unhealthy storage hardware.]

  • Anonymous
    February 22, 2016
    Nice post! Thanks!