次の方法で共有


Deleting a Cluster resource? Do it the supported way

The purpose of this posting is to explain the supportability of removing resources on a Cluster Server. We have seen an increase lately with users manually deleting resources from the Cluster registry and I wanted to say that this is unsupported by Microsoft. Doing this can cause issues with your Clusters and I wanted to bring up the issues as well as how to get out of the predicament that you can get in.

First, the ONLY supported ways of deleting a resource is either through Cluster Administrator (Windows 2003), Failover Cluster Management (2008 and 2008 R2), CLUSTER.EXE, and Powershell (2008 R2). Reasons that have been given for manually deleting the resource from the registry is that they cannot get into the UI. This is where CLUSTER.EXE or Powershell comes in. For example, say I have a resource called Johns Resource and I want to delete it. The command I would be using to do this would be:

CLUSTER.EXE

Cluster res “Johns Resource” /delete

 

Powershell

Remove-ClusterResource “Johns Resource”

Using the command will remove the resource from all entries on all nodes, including the quorum.

To break this down further, a resource in the Cluster will be in several locations in the Cluster Hive and it is referenced by a guid.

HKEY_LOCAL_MACHINE

Cluster

Resources

C8d32427-7daa-4a94-ba85-850f5a920382 <<-- Johns Resource

HKEY_LOCAL_MACHINE

Cluster

Groups

28baec47-2589-49a9-aa7c-cc32b57e1875 <<-- the group name

Contains <<-- all resources in the group here

What users have been doing is simply deleting the guid under the Resources key only. This GUID can also listed in the HKEY_LOCAL_MACHINEClusterDependencies as well as the HKEY_LOCAL_MACHINEClusterCheckpoints registry keys, so checking there is also needed as it is not being removed. However, the resource is still listed under the group. When they do this, they also manually delete it on all nodes as well as the quorum drive. Sometimes, it takes a restart of the Cluster Service everywhere before it finally is no longer there. CLUSTER.EXE would have done it right then and there and no restarts necessary.

In Windows 2003 Cluster, when you start the Cluster Service, we see this in the Cluster Log:

[FM] Group 28baec47-2589-49a9-aa7c-cc32b57e1875 contains Resource C8d32427-7daa-4a94-ba85-850f5a920382.
[FM] Creating resource C8d32427-7daa-4a94-ba85-850f5a920382
[FM] Initializing resource C8d32427-7daa-4a94-ba85-850f5a920382 from the registry.
[FM] Unable to open resource key C8d32427-7daa-4a94-ba85-850f5a920382, 2
[FM] DestroyResource: destroying C8d32427-7daa-4a94-ba85-850f5a920382
[DM] Deleting object C8d32427-7daa-4a94-ba85-850f5a920382
[FM] Failed to find resource C8d32427-7daa-4a94-ba85-850f5a920382 for group 28baec47-2589-49a9-aa7c-cc32b57e1875

When you go to open Cluster Administrator, there are no initial errors. However, if you have multiple resources that are like this in the same group, you could receive an Error 1130 (Not enough Server Storage) and you are unable to create any more resources in the group.

In Windows Server 2008 (and R2) Clusters, the results are much different. The Cluster Service will show as started; however, the cluster will not form. In the System Event Log, you will see these errors:

Event ID: 7024
Source: Service Control Manager
Description: The Cluster Service terminated with service-specific error 2 (0x2).

Event ID: 1092
Source: FailoverClustering
Description: Failed to form Cluster ‘clustername’ with error code 2. Failover cluster will not be available.

In the Windows 2008 Cluster Log, you will see this:

WARN [DM] Key RegistryMachineCluster does not appear to be loaded (status STATUS_OBJECT_NAME_NOT_FOUND(c0000034)
INFO [DM] Loading Hive, Key Cluster, FilePath C:WindowsClusterCLUSDB
ERR [CORE] Node 1: exception caught ERROR_FILE_NOT_FOUND(2)' because of 'OpenSubKey failed.'
ERR Exception in the InstallState is fatal (status = 2)
ERR FatalError is Calling Exit Process.

 

These are the things that you can run into by manually removing or “hacking” a resource out of the registry and not remove it from all the locations in the hive. This is also one of the reasons why this is an unsupported method for removing a resource in a Cluster. The whole reasoning for Failover Clusters is high availability. By attempting the unsupported methods above, you can cause downtime which gets away from high availability.

John Marlin
Senior Support Escalation Engineer
Microsoft Enterprise Platforms Support

Comments

  • Anonymous
    January 01, 2003
    @Yuri
    If you bring up the properties of the 'Bck_SG1DB' resource, you will see the Possible Owners of the resource. Make sure that all nodes are listed here.
  • Anonymous
    January 01, 2003
    @Ravi,
    In Windows 2003, unfortunately, you would need to do this or assign the drives to only one node at a time. Assign the drives to first node only, format them, and add to the Cluster. You can then add the assignment to the other node(s) and test your failovers. In Windows 2008 and beyond, this is not necessary as all new drives added to the systems are in an offline state. So you can manually bring them online in Disk Management (or Server Manager) individually on each node. The reasons are that we do not want to have multiple machines with direct access to the drives while they are not Clustered. This could lead to corruption of the drive(s).

    @Dan,
    Client Access Points are simply an IP Address resource and a Network name resource. So you would delete the Name and then the IP Address, or, delete the IP Address and Cluster will auto delete the name based on the dependencies.
  • Anonymous
    February 08, 2011
    Hi JohnThanks for this very useful information. I have a question on this - in a windows server 2003 two node sql server cluster - if I am doing an activity like adding new disks to the servers then I will have to bring down the servers one at a time. Can a situation like the one that you have described above can occur after the upgrade activity?
  • Anonymous
    October 16, 2014
    Hi,

    How do I go about removing client access points on 2008 R2 clusters?
  • Anonymous
    April 22, 2015
    Hi John,
    Currently I have a problem with a two-node cluster on Window 2008. This cluster has two SQL instances (Services and Applications) SG1DB and SG2DB instances. Instance 1 was no longer needed so I used SQL setup to remove it (using the Maintenance option and remove node option as instructed by MSFT help) I did that on passive node first and went successfully. No longer shows up on SQL server configuration manager. Then, I did the same on active node for Instance 1 but setup ended unexpectedly almost at the end of the process. SQL error screen says something like:
    The resource 'Bck_SG1DB' could not be moved from cluster group 'SQL Server (SG1DB)' to cluster group 'Available Storage'.
    Error: There was a failure to call cluster code from provider. Exception message: Generic Failure.
    Status code: 5015.
    Description: The operation failed because either the specified cluster node is not the owner of the resource or the node is not a possible owner of the resource

    So, what I tried to do is to delete it from Failover Cluster Manager console, by selecting it from within Services and Applications but I was unable to do this 'cos I got an other error saying:
    Could not move the resource to available storage. The operation failed because either the specified cluster node is not the owner of the resource or the node is not a possible owner of the resource.

    I noticed that on both nodes, SQL instance on SQL server configuration manager was removed. It looks to me that it only got stuck only when it was tried to be removed from Failover cluster Manager.

    Can you help me to find a way to remove it?
    Thanks
  • Anonymous
    June 30, 2015
    hey, thanks for writing, this is a very good article. unfortunately i ended up here because i am trying to fix an issue someone else created and using the supported methods to fix it seems impossible.

    I have a failover cluster with a virtual machine role, and the virtual machine resource no longer exists (the xml is gone along with all the VHD's)

    hence, the virtual machine is not present in hyper-v manager on any node, but it still exists in the cluster gui. and is in a "stopped" state. I am able to move it between nodes but i cannot remove it. this is causing other issues in the cluster, as it will no longer pass validation tests, and so i cannot add another node.

    the error when I try to do anything with it is "a virtual machine resource was not found in clustered virtual machine "

    any idea? at this stage i am looking to delete the reg keys i find by searching for that in the cluster registry hive, and then restart each cluster node. if that fails I am considering shutting down the cluster (with it's ~50 virtual machines)

    ADDITIONAL ERROR:
    "'Virtual Machine Configuration ' failed to register the virtual machine with the virtual machine management service. The Virtual Machine Management Service failed to register the configuration for the virtual machine 'A3734967-BF1F-429C-B27E-E7382313DD01' at 'C:ClusterStorageVolume2Hyper-V Replica': The system cannot find the file specified. (0x80070002). If the virtual machine is managed by a failover cluster, ensure that the file is located at a path that is accessible to other nodes of the cluster.

    (of course, the above path does not exist, I have tried to recreated it copying the GUID but have not had any luck)

    any suggestions? thanks
  • Anonymous
    October 16, 2015
    I have windows 2012 R2 Cluster ( 1 + 1 node) ans SQL server 2012 Ent cluster (1+1) node , I want remove windows cluster ( break cluster ) - I remove passive sql cluster node , so need to be uninstalled SQl Active node for removing windows cluster ?

    Please give advice.

    Thanks