Note to self on AlwaysOn...

I came up with the idea that perhaps we could let clients connect to a database in an AlwaysOn Availability Group (AG) by the current instance name instead of the virtual network name (VNN) if the cluster service crashed. This idea does not work.

Microsoft Consultant Don Scott set up a very simple 2-node cluster with a stand-alone instance of SQL Server 2012 on each node and 1 availability group with 1 database in it, and we could connect to the database by it's VNN or by connecting to the current instance name, as expected. However, when we turned off the cluster service on both nodes to simulate a cluster service failure, we could no longer connect to the database by it's virtual name, as expected, but we could also not connect to the database through the current instance name. In SQL Server Management Studio (SSMS), the database icon listed "Recovery pending" after the name of the database, even though this was the primary replica. The secondary replica didn't even show up in SSMS with the cluster service off.

 

Even though the "recovery pending" status didn't make sense to us, we tried "RESTORE DATABASE <dbname> WITH RECOVERY" and got a very strange error message: "Msg 3148, Level 16, State 3; This restore statement is invalid in the current context. The 'Recover Data Only' option is only defined for secondary filegroups when the database is in an online state. When the database is in an offline state filegroups cannot be specified." This is strange because this database only had the default primary filegroup and it was online. We checked SSMS to confirm the database was online, and the "Bring Online" option was not available in SSMS, but the "Take Offline" option was available, confirming the database was still online despite the error message.

 

Moral of the story: When using AlwaysOn, keep at least half your cluster nodes/witness healthy, because if the cluster goes down completely, AlwaysOn goes down.

 

Other notes:

  • With this test configuration (2-node cluster, each with Non-FCI, 1 availability group, all functioning correctly): When we evicted the primay replica node from the cluster and restarted the node, we expected the availability group to be disabled, but instead it completely ceased to exist (no trace of it remained to be displayed in SSMS). The database was then a normal database without an availability group.
  • When creating or altering an Availability Group, SQL Server interacts with the Windows cluster service to automatically create the Virtual Network Name for the AG in the cluster service.

Comments

  • Anonymous
    January 01, 2003
    Considering SQL in a cluster, and this is excellent advice! THanks!

  • Anonymous
    September 08, 2014
    I had this problem, too. I removed the database from the AG and still had the error. I took the database offline and brought it back online and it worked.

  • Anonymous
    March 14, 2016
    The comment has been removed