Partilhar via


SQL Server 2012 AlwaysOn Availability Group and Listener in Azure VMs: Notes, Details and Recommendations

If you are running, or going to run in the future, SQL Server inside Azure Virtual Machines, your favorite high availability solution (HA) is AlwaysOn Availability Groups (AG). Focusing on SQL Server 2012, why AG is or should be your favorite solution? Well, at least for these reasons:

  • AG provides the fastest failover time compared to all other HA (and DR) solutions;
  • SQL Server Mirroring is going to be deprecated in favor of AlwaysOn AG;
  • SQL Server Mirroring is limited to only two replicas with no direct (read-only) access to secondary replicas;
  • Windows Server Failover Clustering (WSFC) is not supported (at the moment J) along with shared storage (no SAN in Azure!);
  • Looking at SQL Server 2014, especially to Hekaton In-Memory Database, it’s pretty clear that Microsoft is putting more and more effort and investments on AG;

If you want to review and/or compare all possible HA solutions, see the article below:

High Availability and Disaster Recovery for SQL Server in Windows Azure Virtual Machines

http://msdn.microsoft.com/en-us/library/jj870962.aspx

If you remember the days in last April when Azure IaaS went in GA, AlwaysOn AG technology was immediately supported, but with a “small” drawback: no support for AG Listener!!!

What did it mean? Practically, it was possible to configure a AlwaysOn AG for SQL Server 2012 inside VMs, but without automatic and transparent redirection on the application side: the nice workaround to this missing functionality was to use Mirroring connection string to “emulate”, on the application side,  the AG failover behavior but obviously this worked only with two SQL Server VMs. Starting last July, Microsoft filled the gap and AG Listeners can be used, thanks to the hotfix below:  

Update enables SQL Server Availability Group Listeners on Windows Server 2012-based Windows Azure virtual machines 

http://support.microsoft.com/kb/2854082

I would like to emphasize that this is a Windows Server 2012 hotfix you have to install inside each Azure VM hosting an AG replica in the Cluster, not an Azure hotfix!

How does it work? In short, the Windows Server Cluster modified the logic of the classic “IP Address” resource, in order to respond to Azure Load Balancer (LB) custom probes and then redirecting the incoming traffic only to the active node/VM hosting the AG Primary replica of the database. That’s why the affected binary file is CLUSRES.DLL that my Cluster experts friends know very well J. If you want to learn more about Load Balancer, Custom Probes and Endpoints in Azure, look at the Michael Washam blog series on Azure IaaS at http://michaelwasham.com/category/iaas.

To give you a better understanding of this mechanism, look at how the clustered IP resource appears after the STEP(4) – POINT(8) in the second tutorial reported later in this post:

NOTE:It is worth noting that the static IP address assigned to the clustered IP address resource is static and the same as the Cloud Service public internet IP address containing the SQL Server VMs!

Now, let me report below the official announcement from the SQL Server blog, there is a nice picture I want to present you that I will use later in my notes and feedbacks:

AlwaysOn Availability Groups Fully Supported on Windows Azure Infrastructure Services

http://blogs.technet.com/b/dataplatforminsider/archive/2013/08/12/alwayson-availability-groups-fully-supported-on-windows-azure-infrastructure-services.aspx

If you want to run over the entire configuration experience, you can follow the nice two tutorials reported below (PowerShell versions also exists), in this precise order:

 

After completing the second tutorial, this is how the AwaysOn AG Listener appears in SQL Server Management Studio:

Before running through the tutorials, I would recommend you to read my notes and feedbacks reported below, I’m sure this will save your precious time:

  • Keep in mind that even if your application resides in a Cloud Service in Azure, you will access AlwaysOn AG replicas through the Azure Load Balancer as external Internet traffic, then paying a small additional latency;
  • Even if AlwaysOn AG, under the hood, uses Windows Server Failover Clustering (WSFC) feature, it’s not supported outside the context of AG, at least today: this means that you should not use WSFC for other contexts or purposes;
  • Keep in mind that your application, in case is deployed in Azure, cannot reside in the same Cloud Service with SQL Server VMs otherwise routing in Azure will fail: it’s highly recommended to use a dedicated Cloud Service containing only the SQL Server VM; 
  • You can create only one AG Listener per AG group of SQL Server instances: this limitation comes from the fact that the Cluster Network Name used by the AG Listener *is* the public Internet Facing IP Address of the Cloud Service, then obviously you can only use one
    time;
  • Remember that usage of AlwaysOn AG requires prior WSFC configuration for all the AG replica nodes, that in turn requires Active Directory Domain Controllers to be accessible from other VMs, that in turn requires Azure Virtual Network to be used for all the VM of the
    configuration topology;
  • If you want to use AlwaysOn AG not only for HA, but also for read-only workload offloading (readable secondary replicas), you have to do more work and configuration steps, compared to what is explained in the tutorials mentioned above. Also consider that you have 2 different configuration choices that I’m going to explain you. In order to better explain, let me show you a nice architectural diagram, and remember that your application is in a different Cloud Service or on-premise:

    • OPTION(1): Leveraging the Azure Load Balancer (B). As you can here, there is a second Azure input (balanced) endpoint that must be defined to permit access to readable secondary instances: here, the AlwaysOn Listener is not used at all, it’s the Azure LB that will redirect you to one of the SQL Server secondary replica; you will point this new endpoint (port) in your application. In order to ensure that you will be redirected to an on-line instance, you should define an Azure “Custom Probes” for SQL Server port;
    • OPTION(2): Leveraging the AlwaysOn AG URL redirection. Your application will still connect to the original endpoint (port) defined for the AG Listener described in the second tutorial mentioned above, and if you have correctly configured the “Read-only Routing” of AG (see below), you will be redirected automatically to one of the secondary instances; in this case, your application connection string should specify an “Application Intent” of “Read-Only” (see below). Be aware that this mechanism can only works if the Cloud Services of SQL Server and your application are in the same Azure Virtual Network: the reason is simple, the “Read-only Routing” AG mechanism must use internal SQL VM names/IPs that will be not available otherwise.

 Read-Only Routing with SQL Server 2012 Always On Database Availability Groups

http://social.technet.microsoft.com/wiki/contents/articles/13503.read-only-routing-with-sql-server-2012-always-on-database-availability-groups.aspx?Sort=MostUseful&PageIndex=1

  • Since SQL Server in Azure VM with AlwaysOn AG requires a different Cloud Service with a mandatory public input endpoint, you should secure/control incoming traffic from the Internet: every time you create an endpoint, you are opening/exposing your Cloud Service resources to the world, then it’s highly recommended to use another nice recently introduced Azure feature called “Virtual Network Endpoint ACLs”: with this mechanism, you can restrict the public Internet IP addresses that can access a specific endpoint, in this case the AlwaysOn AG Listener, to only your application.  

About Network Access Control Lists (ACLs)

http://msdn.microsoft.com/en-us/library/windowsazure/dn376541.aspx

    • Here is a simple example to restrict traffic only from a remote Cloud Service (where my application resides):

 

$ApplicationCloudServiceIPsubnet = "<<<Public Cloud Service IP Address of your application>>>/32"

$ServiceName = "<<<cloud service name containing SQL Server VMs>>>"

$LBSetName = "<<<Load Balancer Set name used for AG Listener configuration"

$acl = New-AzureAclConfig

Set-AzureAclConfig –AddRule –ACL $acl –Order 100 –Action Permit `

–RemoteSubnet $ApplicationCloudServiceIPsubnet
–Description "Remote App ACL config"

 Set-AzureLoadBalancedEndpoint –ServiceName $ServiceName –LBSetName $LBSetName `

-Protocol tcp –LocalPort 1433 –PublicPort 1433 –ProbePort 59999 `
-ProbeProtocolTCP -DirectServerReturn $true –ACL $acl

 

    • Power Shell detailed instructions can be found here but be aware that there is a mistake in the last PS example with “Set-AzureLoadBalancedEndpoint” cmdlet that you should use: the last pipe to “Update-AzureVM” is incorrect and must be removed!

Network Access Control List Capability in Windows Azure PowerShell

http://michaelwasham.com/windows-azure-powershell-reference-guide/network-access-control-list-capability-in-windows-azure-powershell

  • Once you configured the entire environment, how you can connect to your primary AG replica?
    • From Internet using the Cloud Service Name XXX.cloudapp.net, eventually specifying the TCP port corresponding to the input endpoint, if the port used is not the default SQL Server TCP 1433, or also specifying the Cloud Service public Virtual IP:

 

    • From another Cloud Service joined to the same VNET: same options as above, in addition you can also connect using:
      • The listener network name: be aware that also in this case the traffic will pass through the Azure Load Balancer;
      • VM internal DNS names and IP addresses: in this case, you will be not redirected automatically to the primary since you are bypassing listener mechanism, but in this case you can reduce latency;

Additionally, I want to provide you some warnings on the two tutorials mentioned above:

  • For the first tutorial:
    • Be careful and use Windows Server 2012 and not 2008/2008R2 as mentioned: this tutorial has been written before the final support for AlwaysOn Listeners, remember that only Windows Server 2012 is supported for the AlwaysOn Listener configuration procedure;
    • On each SQL Server VM Guest OS, remember to open firewall ports for AlwaysOn replication port (default port 5022);
    • Be sure to install the hotfix below on all VMs before configuring the Cluster, otherwise you will encounter the problem described here: http://blogs.technet.com/b/askcore/archive/2013/01/14/error-in-failover-cluster-manager-after-install-of-kb2750149.aspx:

Failover Cluster Management snap-in crashes after you install update 2750149 on a Windows Server 2012-based failover cluster

http://support.microsoft.com/kb/2803748/en-us

  • For the second tutorial:
    • At STEP(4) – POINT(8), you must execute the Power Shell script locally on one of the SQL VM, otherwise the cmdlet “Get-ClusterResource” will not be recognized resulting in a failure: the reason is that the Power Shell module “FailoverClusters” can only be imported on a machine where the Cluster service is running, then even if you installed RSAT tools for Windows Server 2012 on your Windows 7/8 machine, this will not solve the problem.

http://social.technet.microsoft.com/Forums/windowsserver/en-US/a4f7fb67-dbc8-4c9a-926a-1b34cdecdc79/im-unable-to-addimport-the-failover-clustering-powershell-module-in-windows-7

Finally, let me bring to your attention an excellent resource on AlwaysOn Listener troubleshooting recently published:

Troubleshooting Availability Group Listener in Windows Azure
http://msdn.microsoft.com/en-us/library/windowsazure/dn495646.aspx

That’s all folks, have fun with AlwaysOn in Windows Azure!

Regards.

Comments

  • Anonymous
    September 05, 2013
    Would love to know what a good indicative number is too setup the above ? Would you be able share an approximation - I appreciate its subject to knowledge and gotchas etc - thanks

  • Anonymous
    September 11, 2013
    Hi Mark, what you exactly mean with "good indicative number"? thanks.

  • Anonymous
    October 14, 2013
    The comment has been removed

  • Anonymous
    October 20, 2013
    The comment has been removed

  • Anonymous
    October 23, 2013
    The comment has been removed

  • Anonymous
    October 23, 2013
    The comment has been removed

  • Anonymous
    December 24, 2013
    How to achieve this in Windows Server 2012 R2 the powershell scripts is trowing error when i try to setup 255.255.255.255 and my hosted service vip address

  • Anonymous
    January 02, 2014
    Hi Igor, Great article. I wish I had come across it before I worked through most of the issues you provided answers for. I still have one question though. Is there any way to connect to the secondary replica (S1 in your 'Availability Group' diagram) directly or via ApplicationIntent from the Internet? Option1 seems to require at a minimum the S2 server and Option2 only works from within the Azure virtual network. I was hoping to only implement a 2 server solution and use the secondary replica for readonly reporting, but maybe that isn't possible in Azure... Is there something I am missing or do I have to have at least a 3rd server to have a readable secondary replica from the Internet? Thanks, Troy

  • Anonymous
    January 10, 2014
    Hi Troy, If you want to leverage ApplicationIntent, then AG automatic routing, you cannot connect from Internet since the address references returned are internal DIPs, then option #2 is not a viable solution. But also option #1 is not possible to access S1 since it must be configured as not accessible in read-only mode. Regards.

  • Anonymous
    September 05, 2014

  1. Is the update required on server 2012 R2?
  2. If I gave a three node cluster ... Two SQL servers, o e server for quorum, all in different regions / vnets.  Do they all need to be in the same cloud service?  
  3.  If everything accessing the listener will either be internal site to site traffic, can we created a listener that is not public?
  • Anonymous
    September 28, 2015
    The comment has been removed

  • Anonymous
    September 29, 2015
    Hi Roy, Yes, ILB IP is the IP used  by Listener. Are you able to connect using IP address instead of listener Netbios Name? If not, please check in the Cluster console that Listener is online and OS Event Log of the node owning the resource, you should be able to find some errors in case of problems. Regards.

  • Anonymous
    March 09, 2016
    Thanks for article. I would like to know how did you manage SQL agent jobs and SSRS during fail over? I mean do you manually create jobs in all the non-primary nodes? Please provide article on this. This helps the community.

    • Anonymous
      March 13, 2016
      Hi Deepak, What you ask is already documented somewhere around in the SQL documentation, and does not related to the context of this blog post since nothing is different here compared to on-premises.Regards.