Compartilhar via


Unable to correct invalid SQL Server Network Configuration on clustered SQL Server causes clustered SQL Server fail to start “permanently”

Recently, we encountered an issue reported by a customer on a clustered SQL Server. What happens is the following.

 

The Customer needs to restrict the number of IP addresses SQL 2005 Server is listening. He uses the following procedure for that: open “SQL Server Configuration Manager” (SSCM) -> “SQL Server 2005 Network Configuration” -> “Protocols for XX” , right clicked TCP/IP choosing Protocols and set “Listen All” to “NO”. After that he wants to set the proper IP-addresses. This is working fine on a non-clustered system. On a clustered system the following happens when he is restarting SQL Server: SQL Server does not come online anymore. And the SQL Server error log reports the following.

SQL Server 2005 errorlog

2005-11-28 10:53:29.02 Server Error: 17182, Severity: 16, State: 1.

2005-11-28 10:53:29.02 Server TDSSNIClient initialization failed with error 0x32, status code 0x1c.

2005-11-28 10:53:29.02 Server Error: 17182, Severity: 16, State: 1.

2005-11-28 10:53:29.02 Server TDSSNIClient initialization failed with error 0x32, status code 0x1.

2005-11-28 10:53:29.03 Server Error: 17826, Severity: 18, State: 3.

2005-11-28 10:53:29.03 Server Could not start the network library because of an internal error in the network library. To determine the cause, review the errors immediately preceding this one in the error log.

2005-11-28 10:53:29.03 Server Error: 17120, Severity: 16, State: 1.

2005-11-28 10:53:29.03 Server SQL Server could not spawn FRunCM thread. Check the SQL Server error log and the Windows event logs for information about possible related problems.

Configuring the server to listening on individual IP-addresses is not supported on Clustered SQL Server, thus the failure indicated by the error log is accurate. However, the customer can NOT set “Listen ALL” to “Yes” any more to bring the server online. To be more accurate, even though the customer can set “Listen ALL” to “Yes” using SSCM on each physical node, each time he try to bring SQL Server online, the value will be overwritten to “NO” and the clustered SQL Server will end up failing. This is a serious issue that causes frustration of our customers.

The root cause of this issue is because of cluster checkpoint service behavior. If a setting is changed while the resource is online, that change will get check-pointed to the CPT hive file in the cluster quorum disk. If the resource is offline while you make the parameter change, then it will never be check-pointed. Each time you bring up the resources, the check-pointed value will overwrite the local value. SQL Server network configuration is one resource that is check-pointed. So, if you put the SQL Server resource offline while you change the “Listen All” from “NO” to “YES”, and then you try to put the resource back online, it will fail as the "local" change was overwritten (during resource startup) with what was persisted in the checkpoint file.

Because of this check-pointing behavior, any time that SQL Server network configurations are modified into invalid values while the server is online, restarting the clustered SQL Server will cause the server fail to start “permanently”.

To get out of such BAD state, one workaround is to disable the check-pointing for SQL Server network configuration, described as following.

1. While SQL Server instance is in offline/failed state, disable cluster checkpointing for network configuration by:

      cluster res "SQL Server" /removecheck:”SoftwareMicrosoftMicrosoft SQL ServerMSSQL.XXXMSSQLSERVER”

2. Correct the configuration by using SSCM. Verified the key was corrected on both nodes.

3. Bring SQL cluster back online.

4. Re-enabled cluster checkpointing for network configuration by:

      cluster res "SQL Server" /addcheck: ”SoftwareMicrosoftMicrosoft SQL ServerMSSQL.XXXMSSQLSERVER”

Note that, for named instance, the resource display name "SQL Server" should be replaced with "SQL Server (<instance name>)".

 

If the workaround does not resolve the issue described as above in your case, please let us know.

 

Do you know that you can post question w.r.t SQL Server data access, connectivty issues at https://forums.microsoft.com/MSDN/ShowForum.aspx?ForumID=87&SiteID=1?

 

Nan Tu, Software Design Engineer, SQL Protocols

Disclaimer: This posting is provided "AS IS" with no warranties, and confers no rights

Comments

  • Anonymous
    December 14, 2005
    there is a similar story in sql 2000

    a resource kit tool clusterrecovery.exe can do the same thing (refresh the registry checkpoint) with a GUI, without having to know the registry keys

  • Anonymous
    February 05, 2006
    I am trying to connect to a server which is installed with SQL server 2000, using SQL server 2005 client. I am getting the following error

    When connecting to SQL server 2005, his failure may be caused by the fact that under the default settingsSQL server does not allow remote connections. (Provider: Named pipes Provider, error: 40 - could not open a connection to SQL server)(Microsoft SQL server, Error:53)

  • Anonymous
    March 02, 2006
    Manik,

    This is a connectivity issue not pertain only to Cluster Scenario. In your case, you need to check:

    (1) Remote connectivity is enabled. If not, you can use SQL Server Surface Area Configuration to enable that.

    (2) Make sure that the listening port of SQL Server is not blocked by Windows Firewall

  • Anonymous
    June 05, 2006
    When connecting to SQL server 2005, his failure may be caused by the fact that under the default settingsSQL server does not allow remote connections. (Provider: Named pipes Provider, error: 40 - could not open a connection to SQL server)(Microsoft SQL server, Error:53)

  • Anonymous
    January 17, 2007
    I get the error message above. I have made the exception in the firewall for both Database engine and the Browser processes.  I configured Surface Area config to allow remote with TCP and Named Pipes.  I turned on the service "Browser".  I still get the error above.

  • Anonymous
    January 18, 2007
    Please try to answer questions in http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=362498&SiteID=1, If it doesn't solve your problem, please post your question on the forum. It'll be answered in a timely manner. Thanks, Nan

  • Anonymous
    June 06, 2007
    Thanks for your blog Nan. I was expecting to have to make a call but was saved hours of heartache by being able to stop the cluster checkpoint. I found your explanation doing a search on the network error. May I suggest that KB docs 823938 and 265808 be updated to warn folks like me that do not realize static ports are not supported on a cluster? I struggled a bit with the syntax but eventually got it. Fortunately, it was not a production machine. Ralph

  • Anonymous
    February 15, 2008
    Hi all, I did use this but got error as below: C:>cluster res "SQL Server" /removecheck:"SoftwareMicrosoftMicrosoft SQL ServerMSSQL.1MSSQLSERVER" The 'removecheck' option does not require any parameters to be specified. See "CLUSTER RESOURCE /?" for correct syntax. So, I open a case with support team and the soluction was: 1.With SQL offline on cluster C:>cluster res "SQL Server" /checkpoints --- List all checkpoint to SQL Server

  1. Run /removecheckpoints C:>cluster res "SQL Server" /removecheckpoints:"SoftwareMicrosoftMicrosoft SQL ServerMSSQL.1MSSQLServer" 3.Correct the configuration by using SSCM. 4.run /addcheckpoints C:>cluster res "SQL Server" /addcheckpoints:"SoftwareMicrosoftMicrosoft SQL ServerMSSQL.1MSSQLServer"
  2. Bring SQL cluster back online. Yes....it´s ok now !! regards Nilton Pinheiro
  • Anonymous
    March 26, 2008
    Worked like a champ.  Thanks for the post - we looked everywhere trying to fix this.

  • Anonymous
    May 09, 2008
    Thanks heaps, worked perfectly - just the commands in the original post weren't right but Nilton Pinheiro's correctios were.  It's /removecheckpoints not /removecheck and /addcheckpoints not /addcheck

  • Anonymous
    June 25, 2008
    Nilton, you save me from a new support call... txs!

  • Anonymous
    February 11, 2009
    This post just saved me from insanity, thank you so much!

  • Anonymous
    March 30, 2009
    Thanks Nan and Nilton, worked perfectly

  • Anonymous
    November 09, 2010
    In my case (this took some time to figure out) SQL 2005 32 bit on Win 2008 64 bit, you need to remove/add the WOW6432 checkpoint. SoftwareWow6432NodeMicrosoftMicrosoft SQL ServerMSSQL.1MSSQLSERVER ...and as to the conflicting final 2 steps: Nan Tu 3) start up cluster 4) /addcheckpoint vs. Nilton 4)  /addcheckpoint 5) bring cluster back online ...I went with Nan Tu's approach and that worked for me. Thanks!