Blue Screen: Beware Verifier Settings on Production Machines
Recently we had a case where the customer was reporting blue screen crashes every time the server was under load. The server would appear to be mildly sluggish under normal conditions. Doing a backup or other heavy load would cause the server to blue screen and go down.
The customer was able to provide a memory dump, and the informations was as follows:
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
DRIVER_VERIFIER_DETECTED_VIOLATION (c4)
A device driver attempting to corrupt the system has been caught. This is
because the driver was specified in the registry as being suspect (by the
administrator) and the kernel has enabled substantial checking of this driver.
If the driver attempts to corrupt the system, bugchecks 0xC4, 0xC1 and 0xA will
be among the most commonly seen crashes.
Arguments:
Arg1: 000000a0, A CRC error was detected on the sector (synchronously).
Arg2: 882906a0, Request Irp.
Arg3: 8a76aca0, Device object of the lower device.
Arg4: 02a9abe0, Sector number on which the CRC error was detected.
The stack text looked like this:
STACK_TEXT:
f78bad48 f75391c0 000000c4 000000a0 882906a0 nt!KeBugCheckEx+0x1b
f78badac f75396e8 89f590d0 02a9abe0 0000e429 crcdisk!VerifyOrStoreSectorCheckSum+0x284
f78bae00 f75383cd 89f590d0 882906a0 02a9abc0 crcdisk!VerifyCheckSum+0x14a
f78bae70 f75389c7 89f590d0 882906a0 89f590d0 crcdisk!CompleteXfer+0x2a9
f78bae88 809b523e 89f59018 882906a0 8835ea58 crcdisk!CrcScsiReadCompletion+0x31
f78baeac 8081e123 89f59018 882906a0 f78baf10 nt!IovpLocalCompletionRoutine+0xb4
f78baedc 809b577e 00000000 882906a0 8a76ad58 nt!IopfCompleteRequest+0xcd
f78baf48 f783b3cc 88290834 f78baf8c f7848343 nt!IovCompleteRequest+0x9a
f78baf54 f7848343 882906a0 00000001 00000000 storport!RaidCompleteRequestEx+0x1c
f78baf8c f783b910 8a00c008 8a7bda14 f78baff4 storport!RaidUnitCompleteRequest+0x8f
f78baf9c 8083211c 8a7bda14 8a7bd9a0 00000000 storport!RaidpAdapterDpcRoutine+0x28
f78baff4 8088dba7 f791a048 00000000 00000000 nt!KiRetireDpcList+0xca
f78baff8 f791a048 00000000 00000000 00000000 nt!KiDispatchInterrupt+0x37
WARNING: Frame IP not in any known module. Following frames may be wrong.
8088dba7 00000000 0000000a 0083850f bb830000 0xf791a048
So we have a bugcheck of type: DRIVER_VERIFIER_DETECTED_VIOLATION (c4). The debugger help file tells us: "The DRIVER_VERIFIER_DETECTED_VIOLATION bug check has a value of 0x000000C4. This is the general bug check code for fatal errors found by Driver Verifier. "
The help file gives the instruction that we need to pay attention to the first parameter to determine the source of the verifier bugcheck:
Arguments:
Arg1: 000000a0, A CRC error was detected on the sector (synchronously).
Arg2: 882906a0, Request Irp.
Arg3: 8a76aca0, Device object of the lower device.
Arg4: 02a9abe0, Sector number on which the CRC error was detected.
The first parameter is 0xA0, this means the bugcheck was related to a CRC check error. "A cyclic redundancy check (CRC) error was detected on a hard disk" is the text from the help file. A bug check with this parameter occurs only when the Disk Integrity Checking option of Driver Verifier is active.
We can use the !verifier command to view the verifier options that are active in the dump:
2: kd> !verifier
Verify Level ff ... enabled options are:
Special pool
Special irql
Inject random low-resource API failures
All pool allocations checked on unload
Io subsystem checking enabled
Deadlock detection enabled
Enhanced Io checking enabled
DMA checking enabled
Looking at the output above we can see that the customer has more than one of the verifier options enabled. The options are generally used for driver developers to test thier code and should not be used on a production server. It would appear that the customer had been performing some testing at some time and the settings were never disabled.
To resolve the issue we advised the customer to reset the verifier settings to thier defaults by running verifier.exe and choosing "delete existing settings"
In addition, it was noted that the customer was using a storage driver that had known problems and a kb published so we advised a driver update as discussed here: https://support.microsoft.com/default.aspx?scid=kb;EN-US;969550
And because storage is involved, a chkdsk is always a good idea as a last step.
The customer reported that the issue has resolved, and the server is now stable.
For more information on the use of verifier settings please review the article: https://support.microsoft.com/kb/244617