共用方式為


Why is my SQL Clustered Instance changing authentication modes?

We get our fair share of cases related to SQL Server running ( or not running) on a Windows Cluster.  I had one of them recently where the customer was seeing different authentication modes for SQL Server depending on which node of the 2-node cluster that it was online on. The Errorlogs document this behavior clearly as follows

We see that Sql came online on node P1 it was in Mixed Mode

2010-05-08 05:17:41.13 Server Authentication mode is MIXED.
2010-05-08 05:17:41.70 spid4s The NETBIOS name of the local node that is running the server is 'P1'.
2010-05-08 23:26:05.69 spid4s SQL Trace was stopped due to server shutdown. Trace ID = '1'.

Afer a failover Sql came online on node P2 in Windows Authentication mode

2010-05-08 23:26:52.82 Server Authentication mode is WINDOWS-ONLY.
2010-05-08 23:26:53.60 spid5s The NETBIOS name of the local node that is running the server is 'P2'.
2010-05-09 10:16:18.55 spid5s SQL Trace was stopped due to server shutdown. Trace ID = '1'

Finally when they fail back to P1, SQL came online in Mixed Mode

2010-05-09 10:16:34.50 Server Authentication mode is MIXED.
2010-05-09 10:16:45.42 spid4s The NETBIOS name of the local node that is running the server is 'P1’

The only recent change to their SQL Server environment was that they applied Service Pack 3 a couple of weeks back but they had not tested failover till this weekend – which is when they ran into this issue.

The behavior clearly indicates that there is something amiss between the two nodes of the cluster. I had just started to compare the registry keys between the nodes, when my colleague Adam Saxton suggested that this might be a problem with checkpoint not getting applied.

After some additional research I found that Service Pack 3 relies on the checkpoint to be applied to the passive node on the first failover- hence making it a non -issue in most cases. In this case I needed to first determine whether the checkpoint file was even there to be applied to the passive node or not.

Ok so let us see how to determine where the checkpoint files for SQL Server instance are located in a cluster. We start by first finding the GUID under HKEY_LOCAL_MACHINE\Cluster\Resources that corresponds to our SQL Server instance i.e. where the name=SQL Server

image[6]

Now expand the GUID folder and click on the RegSync key under this GUID and you should see seven keys  in it from 00000001 thru 00000007 as shown below.

clip_image004

If you are curious as to where these files actually reside, they are under the Quorum drive in the folder Q:\MSCS\<SQL Server GUID>

clip_image006

In this customer’s case we only had one key :- 00000007. Since the customer was clearly missing the checkpoint files, we had to recreate them by following these steps:-

a. Bring SQL Server online on working node with Mixed mode authentication

b. Take SQL Server resource group offline from within Cluster Administrator

c. Open regedit and add the following keys under HKEY_LOCAL_MACHINE\Cluster\Resources\<SQL Server GUID>\RegSync ( via Right click à New String Value in the right hand pane)

Value name Value Data
--------- -----------
00000001 Software\Microsoft\Microsoft SQL Server\MSSQL.1\Replication
00000002 Software\Microsoft\Microsoft SQL Server\MSSQL.1\SQLserverAgent
00000003 Software\Microsoft\Microsoft SQL Server\MSSQL.1\Cluster
00000004 Software\Microsoft\Microsoft SQL Server\MSSQL.1\MSSQLSERVER
00000005 Software\Microsoft\Microsoft SQL Server\MSSQL.1\PROVIDERS
00000006 Software\Microsoft\Microsoft SQL Server\MSSQL.1\SQLServerSCP

d. Bring SQL Server resource group online from within Cluster Administrator

At this time you should see some new .CPT files created under Q:\MSCS\<GUID>\ indicating that the checkpointing is now working as expected. The next step is to attempt failover to the passive node, during which the checkpoint file shall get applied there and then SQL Server shall come up with Mixed Mode authentication on it as well.

Rohit Nayak | Microsoft SQL Server Support

Comments

  • Anonymous
    July 09, 2010
    Hi Rohit, I have the same issue, but I don't see the RegSync in any resource (I have Windows 2008) Does it apply also in Windows 2008?

  • Anonymous
    July 14, 2010
    Hello Jose, This technique only applies to Sql Server running on Windows Server 2003. Windows Server 2008 no longer has any checkpoint files and all the checkpointing is done via the registry hive on the quorum drive.

  • Anonymous
    July 22, 2010
    I've seen a similar issue with named sql instances using fixed ports.  If you upgrade or attempt an upgrade and then failover between nodes the fixed tcp/ip port gets wiped.  Then chaos ensues.  You see your server up and you can connect to it when you're on the node (because you're using shared memory) but your app keeps saying the sql server is unavailable so you tell someone to reset the web server.  Then while you're reading your sql server error log you see that it came up on some dynamic port.  Sounds like it could be the same issue?

  • Anonymous
    January 28, 2011
    @Mike Abair - Your issue is not related to the /passive method of patching. Which is what this blog talks about. In your case, its more than likely that the checkpoints (for SQL server resource) were missing and you did a port change at SQL server. Fix the checkpoints and that should work.

  • Anonymous
    August 13, 2011
    The real question is: How come checkpoint files are missing? PSSSQL MUST focus on the quality of the blog! Let only EEs post here!!

  • Anonymous
    November 30, 2011
    @Anonymous - The most common cause of checkpoints (for SQL resources) going missing is re-creation of SQL resources (After these have been accidentally deleted).. there could possibly be some more causes but a detailed discussion around that is outside the purview of this blog. HTH - Vijay