Windows 2008 and the IsAlive Sanity Check

Admins running Exchange Clusters are likely already familiar with volume mount points, but I'll provide a brief overview for those new to the topic:

A mount point is a location on a disk that looks and acts like a folder (in a logical sense) but actually points to a physical disk logical unit number (in a physical sense). Most cluster administrators use mount points to overcome limitations on the number of drive letters that can be assigned in a multi-node cluster. Some admins choose to use them just for convenience as it results in simplified paths to your databases and logs while still allowing you to divide the dbs and logs across many separate disk LUNs. (Our Exchange Team blog mentions mount points explicitly when discussing storage configurations - see https://msexchangeteam.com/archive/2007/01/15/432199.aspx). Mount points are still around in Windows 2008, but if you're thinking of implementing them for Exchange, I recommend reading the following KB article first:

https://support.microsoft.com/kb/953652 : A physical disk resource may unexpectedly fail or go offline when the IsAlive function is executed on a Windows Server 2008 cluster node


Scenario

Our server is an Exchange 2007 cluster (SCC) on a Windows 2008 platform. This server has 35 storage groups set up with one database in each SG. Users begin complaining that they can't access their email; a review of the server finds that Storage Group 8's database is in an `Online Pending` state. The database will not come online. Attempts to bring the database online or offline from the cluster failover console produce no results; cluster cannot manage the database at all. Attempts to fail back to the original node also go nowhere as the db resource cannot be controlled.

We wound up killing store.exe with Task Manager. From there we were able to fail over between our cluster nodes. Storage Group 8 came back online without a problem on our passive node. Root cause analysis (RCA) was requested for the database failure.

 

The Application Event Log shows the following events chronologically:

Source: ESE
Event 482 -
Attempt to write E07.log failed. System error 21 / -1022 (device not ready).

Event 414 -
Unable to write Section 0 while flushing logfile G:\MySG8_Logs\E07.log. Error -1022 (0xfffffc02)

Event 492 -
Logfile sequence for G:\MySG8_Logs halted due to a fatal error. No further updates possible for databases on this sequence. Please correct or restore from a backup.

Source: MSExchangeIS
Event 9558 -
Error occurred while writing to database log of SG "MySG8", attempting to unmount.

Source: ESE
Event 471 -
MSExchangeIS (7732) MySG8 unable to rollback operation #461177896 on database G:\MySG8_DB\MySG8_DBFile.EDB error -510. All further database updates will be rejected.

--

1:01:22pm
Source: MSExchangeCluster
1021 -
Failed to terminate the cluster resource MySG8_DB due to a timeout.


Additional events in the log show that an online backup had completed on Storage Group 8 minutes before the failure.  

Event breakdown:
The current transaction log file for Storage Group 8 (E07.log) could not be written to (see Event 482). At the same time, E07.log cannot be properly flushed (Event 414). Both operations fail with error -1022 / device not ready. At this point Exchange halts the transaction log seqeuence and attempts to perform a rollback. When that fails, Exchange takes the database offline. Cluster tries to bring it back online but that process hangs forever.

Root cause:
The customer's had configured mount points for their transaction logs. These were stored on a single physical disk resource (G:). Each mount point went to the root of a separate physical disk LUN: for example,  `G:\MySG8_Logs` maps to ` \\?\GLOBALROOT\Device\Harddisk14\Partition1\`. This runs directly into KB 953652.

The Windows 2008 RTM's Cluster IsAlive check works by opening a random file handle for an object at the root of a LUN. In this case, an Online Backup had truncated the transaction log files. Cluster was performing an IsAlive check on two of the transaction log files that were truncated. Log truncation caused Cluster to register an IsAlive Sanity Check failure. Cluster, believing this is catastrophic, moves the physical disk offline, then back online.

All Exchange knows is that access to its current transaction log (Enn.log) has failed. Exchange assumes the worst, tries to roll back transactions, but fails because it can't access its own log to see what's going on. Exchange believes something terrible has happened to the transaction log disk so halts database updates to protect the data in the .edb.

Analysis:
If you open up your cluster log you may see statements like these (date/time codes omitted):

ERR [RES] Physical Disk <MYSG8LOG>: VerifyFS: Failed to open file \\?\GLOBALROOT\Device\Harddisk14\Partition1\E0200025C63.log Error: 2.

ERR [RES] Physical Disk <MYSG8LOG>: VerifyFS: Failed to open file \\?\GLOBALROOT\Device\Harddisk14\Partition1\E0200025C93.log Error: 2.

ERR [RES] Physical Disk <MYSG8LOG>: IsAlive sanity check failed!, FS check failed for partition 1, Error 2.

WARN [RHS] Resource MYSG8LOG IsAlive has indicated failure.

Error 2 is ERROR_FILE_NOT_FOUND. Cluster is trying to read a log file that no longer exists. This example occurred when a backup truncated logs that Cluster wanted to use for it's IsAlive check. These logs are far down the transaction log sequence.

INFO [RCM] HandleMonitorReply: FAILURENOTIFICATION for 'MYSG8LOG', gen(0) result 1.
INFO [RCM] TransitionToState(MYSG8LOG) Online-->ProcessingFailure.
INFO [RCM] rcm::RcmGroup::ProcessStateChange: (MyServername, Online --> Failed)
ERR [RCM] rcm::RcmResource::HandleFailure: (MYSG8LOG)
INFO [RCM] resource MYSG8LOG: failure count: 1, restartAction: 1.
INFO [RCM] Will restart resource in 500 milliseconds.

When the IsAlive Sanity Check fails, Cluster assumes the physical disk resource is dead.

INFO [RCM] TransitionToState(MYSG8LOG) ProcessingFailure-->[Terminating to DelayRestartingResource].

INFO [RCM] rcm::RcmGroup::ProcessStateChange: (MyServername, Failed --> Pending)

INFO [RES] Physical Disk <MYSG8LOG>: Terminate request.

WARN [RES] Physical Disk <MYSG8LOG>: OfflineThread: Failed to offline volume \Device\Harddisk14\Partition1, Error 5

INFO [RCM] TransitionToState(MYSG8LOG) [Terminating to DelayRestartingResource]-->DelayRestartingResource.

INFO [RCM] Delay-restarting MYSG8LOG and any waiting dependents.

Following that assumption, Cluster calls for termination of the physical disk resource so that it can be restarted. Remember, the physical disk has nothing wrong with it at this time - Cluster takes the IsAlive Sanity Check at face value. The physical disk resource is forced offline, then brought online again immediately. The whole process takes around five seconds on a typical server. Exchange will not tolerate the loss of a disk resource in this manner.

This problem can also present where the IsAlive Sanity Check fails with Error 5 (ERROR_ACCESS_DENIED) if Cluster tries to read a file where the handle is locked by another process. There may be other codes, as well; this is not an inclusive list.

 

The good news:

A hotfix is available for KB 953652 that corrects this problem. Alternately you can work around this by making sure that your transaction logs are always located in a directory - even on the other side of a mount point. (For this case, G:\MySG8 was the mount point, so creating the folder G:\MySG8\SG8logs\ and using that as your transaction log/system file path would have avoided this issue.)

Comments

  • Anonymous
    January 01, 2003
    If you are using a Windows 2008 / Exchange 2007 Single Copy Cluster (SCC) you should read my co-workers