SQL SERVER 2005 RESOURCE FAILS TO COME ONLINE ON CLUSTER AFTER CHANGING THE SAN DRIVE TO ADD MORE DISK SPACE
We have observed a number of issue now where in clustered instance of sql server 2005 fails to come online after performing changing on shared disk on a cluster.
As mentioned in my previous posts if the sql server fails to come online we need to start troubleshooting by looking at the sql server error log. So we checked the sql server error log and observed the following error
2009-01-11 21:10:06.96 Server Error: 26054, Severity: 16, State: 1.
2009-01-11 21:10:06.96 Server Could not find any IP address that this SQL Server instance depends upon. Make sure that the cluster service is running, that the dependency relationship between SQL Server and Network Name resources is correct, and that the IP addresses on which this SQL Server instance depends are available. Error code: 0x103.
2009-01-11 21:10:06.96 Server Error: 17182, Severity: 16, State: 1.
2009-01-11 21:10:06.96 Server TDSSNIClient initialization failed with error 0x103, status code 0xa.
2009-01-11 21:10:06.96 Server Error: 17182, Severity: 16, State: 1.
2009-01-11 21:10:06.96 Server TDSSNIClient initialization failed with error 0x103, status code 0x1.
2009-01-11 21:10:06.96 Server Error: 17826, Severity: 18, State: 3.
2009-01-11 21:10:06.96 Server Could not start the network library because of an internal error in the network library. To determine the cause, review the errors immediately preceding this one in the error log.
2009-01-11 21:10:06.96 Server Error: 17120, Severity: 16, State: 1.
2009-01-11 21:10:06.96 Server SQL Server could not spawn FRunCM thread. Check the SQL Server error log and the Windows event logs for information about possible related problems.
As seen from the above Error 26054 message which says that cluster service is unable to find the dependency between the sql server instance and the sql network resource name.
We checked the Cluster Administrator for the sql server resource and found that the dependency exists between the sql server resource and sql network name but still we receive the error.
Cause
=====
We checked the Registry Location HKLM\Cluster\Resources\<SQL Server GUID>\Parameters
The values InstanceName and VirtualServerName were missing from the keys below:
HKLM\Cluster\<SQL Server GUID>\Parameters
HKLM\Cluster\<SQL Agent GUID>\Parameters
The above registry keys are used by the cluster administrator to connect and start the sql server service. So the Cluster Administrator was not able to connect and start the sql server resource and hence we receive the above error
Resolution
========
IMPORTANT : This resolution contains information about modifying the registry. Before you modify the registry, make sure to back it up and make sure that you understand how to restore the registry if a problem occurs. For information about how to back up, restore, and edit the registry, click the following article number to view the article in the Microsoft Knowledge Base:
256986 Description of the Microsoft Windows Registry
We added the following registry values in the Parameters registry key:
For a named instance of SQL Server:
• InstanceName
Value Name: InstanceName
Value Type: REG_SZ
Value Data: <instance name of the named instance>
• VirtualServerName
Value Name: VirtualServerName
Value Type: REG_SZ
Value Data: <Virtual server name of the instance>
Once we added the registry keys we were able to bring the sql server online on Node1.
We failed the instance over to Node2 we checked the above registry keys and found they were missing, so we added missing keys and rebooted both the servers of the clusters.
After rebooting the server the sql server resources came online fine without any errors
Note:
This post should not be treated as the Microsoft’s Recommendation or Resolution to the problem, it is only a workaround which worked in our environment and so we would like to share it.
Parikshit Savjani,
SE, Microsoft SQL Server
Comments
Anonymous
February 06, 2009
The comment has been removedAnonymous
April 03, 2009
Excellent Post. Saved me multiple hour SQL Cluster reinstall.Anonymous
April 20, 2009
Excellent Post. Thankyou, this saved me having to re-install the SQL 2005 cluster.Anonymous
October 14, 2009
Helped me out too. I removed the SQL Server resource from the cluster to prevent it from being brought up as we "retired" this instance. As usual, later some of the old data was needed. Thanks.Anonymous
January 07, 2010
This was a life saver! Thought I was going to have to rebuild the cluster!Anonymous
November 07, 2010
I see that sql server resource and sql server agent resource are missing in cluster administratorfor one of my instance. Can any of you provide me with the info to recover these resource.Anonymous
July 30, 2011
Excellent solution. Saved me a lot of hours & effort.Anonymous
September 16, 2011
Hi! Just a quick question, is there any reason why these values were originally missing? Thanks!Anonymous
November 23, 2011
I am not able locate. I am installing SQL Server 2005 On Windows 2008 R2. Having the same issue as mentioned above. How to find the SQL Server GUID as mentioned above. ?Anonymous
November 23, 2011
Hi Hema, Ideally you shouldn't face this issue while installing SQL Server. Since we have observed this issue specifically when you already have SQL installed as a clustered resource which is up and running and You had to bring down the sql instance to perform some SAN activity and thereafter the SQL resource doesn't come online. However still to answer your question. In regedit, once you go to the location HKLMCluster you can search (Ctrl+F) for InstanceName (case sensitive) or VirtualServerName (case sensitive) and locate your SQL instance. if you are seeing these DWORDs with missing values then you might be facing this issue. If the value are present and they are correct then it might be another issue and you might want to raise a ticket with CSS to help you troubleshoot the issue.