Jaa


w2k8 / w2k8r2 :Cluster Disk May Fail and Run ChkDsk when a Backslash ‘’ or Forwardslash ‘/’ is used in the Resource Name

Hi All

In rare circumstances with 2008 R2 clusters where a ‘Physical Disk’ resource may fail to come online with no events logged in the system event log.

Cluster Disk May Fail to Online and Run ChkDsk when a Backslash ‘\’ or Forwardslash ‘/’ is used in the Resource Name
Issue

They symptoms of this issue is that the disk will show in a ‘failed’ state in Failover Cluster Manager but there are no events in the system event log to correlate to the failure.

clip_image002

To see if you are running into this issue, you need to generate a cluster log on the node the disk fails.

From a command prompt:

‘cluster log /gen’

From the cluster log file located in \Windows\Cluster\Reports\cluster.log

ERR   [RES] Physical Disk <Cluster Disk X:\>: VerifyFS: Failed to open file \\?\GLOBALROOT\Device\Harddisk52\Partition1\Logfile.ldf Error: 5.

The problem occurs if and only if the following 2 conditions are present.

1. ‘Local System’ cannot open a handle to a file at root of drive (whether because it’s in use or permissions).

In my example, you can see from the cluster log snippet that the file the cluster is trying to open is Logfile.ldf and getting an access denied ‘Error: 5’

AND

2. The name of the ‘Physical Disk’ resource has an backslash or forward slash character in the resource name. In my example, my disk name was ‘Cluster Disk X:\’

Generally, we don’t recommend storing files at the root of a disk as the cluster needs to open handles to files and folders as part of the health detection mechanism used to determine possible access issues to storage. Since the cluster service runs in the context of the ‘Local System’ account, if that account does not have permission to files at the root of a drive, the health check may fail.

Workaround

The simplest resolution is to remove the invalid character(s) in resource names for ‘Physical Disk’ resource types.

OR

Verify that the ‘Local System’ account has at least ‘Read’ permissions to files at the root of the drive.

In the above example, I renamed my resource from ‘Cluster Disk X:\’ to ‘Cluster Disk X:’. I could have also granted the ‘Local System’ account ‘Read’ permissions on the Logfile.ldf file

This does not indicate actual corruption on the disk. What happened is that cluster set the dirty bit on the disk so chkdsk is run to verify an intact file system.

Best Regards

Hugo Ferreira