Bewerken

Delen via


Troubleshoot Always On Availability Groups Configuration (SQL Server)

Applies to: SQL Server

This topic provides information to help you troubleshoot typical problems with configuring server instances for Always On availability groups. Typical configuration problems include Always On availability groups is disabled, accounts are incorrectly configured, the database mirroring endpoint doesn't exist, the endpoint is inaccessible (SQL Server Error 1418), network access doesn't exist, and a join database command fails (SQL Server Error 35250).

Note

Ensure that you are meeting the Always On availability groups prerequisites. For more information, see Prerequisites, Restrictions, and Recommendations for Always On Availability Groups (SQL Server).

In This Topic:

Section Description
Always On Availability Groups Isn't Enabled If an instance of SQL Server is not enabled for Always On availability groups, the instance doesn't support availability group creation and can't host any availability replicas.
Accounts Discusses requirements for correctly configuring the accounts under which SQL Server is running.
Endpoints Discusses how to diagnose issues with the database mirroring endpoint of a server instance.
Network access Documents the requirement that each server instance that is hosting an availability replica must be able to access the port of each of the other server instances over TCP.
Listener Documents how to establish the IP address and port of the listener and make sure it is running and listening for incoming connections
Endpoint Access (SQL Server Error 1418) Contains information about this SQL Server error message.
Join Database Fails (SQL Server Error 35250) Discusses the possible causes and resolution of a failure to join secondary databases to an availability group because the connection to the primary replica isn't active.
Read-Only Routing is Not Working Correctly
Related Tasks Contains a list of task-oriented topics in SQL Server Books Online that are relevant to troubleshooting an availability group configuration.
Related Content Contains a list of relevant resources that are external to SQL Server Books Online.

Always On Availability Groups Is Not Enabled

The Always On availability groups feature must be enabled on each of the instances of SQL Server.

If the Always On Availability Groups feature isn't enabled, you'll get this error message when you try to create an Availability group on SQL Server.

The Always On Availability Groups feature must be enabled for server instance 'SQL1VM' before you can create an availability group on this instance. To enable this feature, open the SQL Server Configuration Manager, select SQL Server Services, right-click on the SQL Server service name, select Properties, and use the Always On Availability Groups tab of the Server Properties dialog. Enabling Always On Availability Groups may require that the server instance is hosted by a Windows Server Failover Cluster (WSFC) node. (Microsoft.SqlServer.Management.HadrTasks)

The error message clearly indicates that the AG feature isn't enabled and also directs you how to enable it. There are two scenarios where you can get in this state besides the obvious one where AG wasn't enabled in the first place.

  1. If SQL Server was installed and the Always On Availability Groups feature was enabled before you installed the Windows Failover Clustering feature, you may get this error when you attempt to create an Always On AG.
  2. If you remove an existing Windows Failover Clustering feature and rebuild it while SQL Server still has Always On configured, when you attempt to use AG again this error may occur.

In such cases you can take the following steps to resolve it:

  1. Disable the AG feature
  2. Restart SQL Server service
  3. Enable the AG feature back
  4. Again restart the SQL Service

For more information, see Enable and Disable Always On Availability Groups (SQL Server).

Accounts

The accounts under which SQL Server is running must be correctly configured.

  1. Do the accounts have the correct permissions?

    1. If the partners run under the same domain account, the correct user logins exist automatically in both master databases. This simplifies the security configuration and is recommended.

    2. If two server instances run under different accounts, then each account must be created in master on the remote server instance, and that server principal must be granted CONNECT permissions to connect to the database mirroring endpoint of that server instance. For more information, see Set Up Login Accounts for Database Mirroring or Always On Availability Groups (SQL Server). You can use the following query on each instance to check if the logins have CONNECT permissions:

    SELECT 
      perm.class_desc,
      prin.name,
      perm.permission_name,
      perm.state_desc,
      prin.type_desc as PrincipalType,
      prin.is_disabled
    FROM sys.server_permissions perm
      LEFT JOIN sys.server_principals prin ON perm.grantee_principal_id = prin.principal_id
      LEFT JOIN sys.tcp_endpoints tep ON perm.major_id = tep.endpoint_id
    WHERE 
      perm.class_desc = 'ENDPOINT'
      AND perm.permission_name = 'CONNECT'
      AND tep.type = 4    
    
  2. If SQL Server is running under a built-in account, such as Local System, Local Service, or Network Service, or a nondomain account, you must use certificates for endpoint authentication. If your service accounts are using domain accounts in the same domain, you can choose to grant CONNECT access for each service account on all the replica locations or you can use certificates. For more information, see Use Certificates for a Database Mirroring Endpoint (Transact-SQL).

Endpoints

Endpoints must be correctly configured.

  1. Make sure that each instance of SQL Server that is going to host an availability replica (each replica location) has a database mirroring endpoint. To determine whether a database mirroring endpoint exists on a given server instance, use the sys.database_mirroring_endpoints catalog view:

    SELECT name, state_desc FROM sys.database_mirroring_endpoints  
    

    For more information on creating endpoints, see either Create a Database Mirroring Endpoint for Windows Authentication (Transact-SQL) or Allow a Database Mirroring Endpoint to Use Certificates for Outbound Connections (Transact-SQL).

  2. Check that the port numbers are correct.

    To identify the port currently associated with database mirroring endpoint of a server instance, use the following Transact-SQL statement:

    SELECT type_desc, port FROM sys.tcp_endpoints;  
    GO  
    
  3. For Always On availability groups setup issues that are difficult to explain, we recommend that you inspect each server instance to determine whether it's listening on the correct ports.

  4. Make sure that the endpoints are started (STATE=STARTED). On each server instance, use the following Transact-SQL statement:

    SELECT state_desc FROM sys.database_mirroring_endpoints  
    

    For more information about the state_desc column, see sys.database_mirroring_endpoints (Transact-SQL).

    To start an endpoint, use the following Transact-SQL statement:

    ALTER ENDPOINT Endpoint_Mirroring   
    STATE = STARTED   
    AS TCP (LISTENER_PORT = <port_number>)  
    FOR database_mirroring (ROLE = ALL);  
    GO  
    

    For more information, see ALTER ENDPOINT (Transact-SQL).

    Note

    In some cases, if the endpoint is started but the AG replicas are not communicating, you may try to stop and restart the endpoint. You can use ALTER ENDPOINT [Endpoint_Mirroring] STATE = STOPPED followed by ALTER ENDPOINT [Endpoint_Mirroring] STATE = STARTED

  5. Make sure that the login from the other server has CONNECT permission. To determine who has CONNECT permission for an endpoint, on each server instance use the following Transact-SQL statement:

    SELECT 'Metadata Check';  
    SELECT EP.name, SP.STATE,   
       CONVERT(nvarchar(38), suser_name(SP.grantor_principal_id))   
          AS GRANTOR,   
       SP.TYPE AS PERMISSION,  
       CONVERT(nvarchar(46),suser_name(SP.grantee_principal_id))   
          AS GRANTEE   
       FROM sys.server_permissions SP , sys.endpoints EP  
       WHERE SP.major_id = EP.endpoint_id  
       ORDER BY Permission,grantor, grantee;   
    
  6. Ensure correct server name is used in the endpoint URL

    For server name in an endpoint URL, it's recommended to use fully qualified domain name (FQDN), although you can use any name that uniquely identifies the machine. The server address can be a Netbios name (if the systems are in the same domain), a fully qualified domain name (FQDN), or an IP address (preferably, a static IP address). Using the fully qualified domain name is the recommended option.

    If you've already defined an Endpoint URL, you can query it by using:

    select endpoint_url from sys.availability_replicas
    

    Next, compare the endpoint_url output to the server name (NetBIOS name or FQDN). To query the server name, run the following commands in a PowerShell on the replica locally:

    $env:COMPUTERNAME
    [System.Net.Dns]::GetHostEntry([string]$env:computername).HostName
    

    To validate the server name on a remote computer, run this command from PowerShell.

    $servername_from_endpoint_url = "server_from_endpoint_url_output"
    
    Test-NetConnection -ComputerName $servername_from_endpoint_url
    

    For more information, see Specify the Endpoint URL When Adding or Modifying an Availability Replica (SQL Server).

Note

To use Kerberos authentication for the communication between availability group (AG) endpoints, register a Service Principal Name for Kerberos Connections for the database mirroring endpoints used by the AG.

Network Access

Each server instance that is hosting an availability replica must be able to access the port of each of the other server instance over TCP. This is especially important if the server instances are in different domains that don't trust each other (untrusted domains). Check if you can connect to the endpoints by following these steps:

  • Use Test-NetConnection (equivalent to Telnet) to validate connectivity. Here are examples of commands you can use:

    $server_name = "your_server_name"
    $IP_address = "your_ip_address"
    $port_number = "your_port_number"
    
    Test-NetConnection -ComputerName $server_name -Port $port_number
    Test-NetConnection -ComputerName $IP_address -Port $port_number
    
  • If the Endpoint is listening and connection is successful, you will see "TcpTestSucceeded : True". If not, you'll receive a "TcpTestSucceeded : False".

  • If Test-NetConnection (Telnet) connection to the IP address works but to the ServerName it doesn't, there's likely a DNS or name resolution issue

  • If connection works by ServerName and not by IP address, then there could be more than one endpoint defined on that server (another SQL instance perhaps) that is listening on that port. Though the status of the endpoint on the instance in question shows "STARTED", another instance may actually have the port bound and prevent the correct instance from listening and establishing TCP connections.

  • If Test-NetConnection fails to connect, look for Firewall and/or Anti-virus software that may be blocking the endpoint port in question. Check the firewall setting to see if it allows the endpoint port communication between the server instances that host primary replica and the secondary replica (port 5022 by default). Run the following PowerShell script to examine for disabled inbound traffic rules

  • If you're running SQL Server on Azure VM, additionally you would need to ensure Network Security Group (NSG) allows the traffic to endpoint port. Check the firewall (and NSG, for Azure VM) setting to see if it allows the endpoint port communication between the server instances that host primary replica and the secondary replica (port 5022 by default)

    Get-NetFirewallRule -Action Block -Enabled True -Direction Inbound |Format-Table
    
  • Capture the output from Get-NetTCPConnection cmdlet (equivalent of NETSTAT -a) and verify the status is a LISTENING or ESTABLISHED on the IP:Port for the endpoint specified

    Get-NetTCPConnection 
    

Listener

For correct configuration of an Availability Group listener follow "Configure a listener for an Always On availability group"

  1. Once the listener is configured you can validate the IP address and port it is listening on by using the following query:

    $server_name = $env:computername  #replace this with your sql instance "server\instance"
    
    sqlcmd -E -S$server_name -Q"SELECT dns_name AS AG_listener_name, port, ip_configuration_string_from_cluster 
    FROM sys.availability_group_listeners"
    
  2. You can also find the listener information together with the SQL Server ports using this query:

    $server_name = $env:computername      #replace this with your sql instance "server\instance"
    
    sqlcmd -E -S($server_name) -Q("SELECT  convert(varchar(32), SERVERPROPERTY ('servername')) servername, convert(varchar(32),ip_address) ip_address, port, type_desc,state_desc, start_time 
    FROM sys.dm_tcp_listener_states 
    WHERE ip_address not in ('127.0.0.1', '::1') and type <> 2")
    
  3. If you need to establish connectivity to the listener and suspect a port is blocked, you can perform a test using the PowerShell Test-NetConnection cmdlet (equivalent to telnet).

    $listener_name = "your_ag_listener"
    $IP_address = "your_ip_address"
    $port_number = "your_port_number"
    
    Test-NetConnection -ComputerName $listener_name -Port $port_number
    Test-NetConnection -ComputerName $IP_address -Port $port_number
    
  4. Finally, check if the listener is listening on the specified port:

    $port_number = "your_port_number"
    
    Get-NetTCPConnection -LocalPort $port_number -State Listen
    

Endpoint Access (SQL Server Error 1418)

This SQL Server message indicates that the server network address specified in the endpoint URL can't be reached or doesn't exist, and it suggests that you verify the network address name and reissue the command.

Join Database Fails (SQL Server Error 35250)

This section discusses the possible causes and resolution of a failure to join secondary databases to the availability group because the connection to the primary replica isn't active. This is the full error message:

Msg 35250 The connection to the primary replica is not active. The command cannot be processed.

Resolution:

Summary of steps is outlined below.

For detailed step-by-step instructions, refer to Engine error MSSQLSERVER_35250

  1. Ensure the endpoint is created and started.
  2. Check if you can connect to the endpoint via Telnet and ensure no firewall rules are blocking connectivity
  3. Check for errors in the system. You can query the sys.dm_hadr_availability_replica_states for the last_connect_error_number that may help you diagnose the join issue.
  4. Ensure the endpoint is defined so it correctly matches the IP/port that AG is using.
  5. Check whether the network service account has CONNECT permission to the endpoint.
  6. Check for possible name resolution issues
  7. Ensure your SQL Server is running a recent build (preferably the latest build to protect from running into fixed issues.

Read-Only Routing is Not Working Correctly

  1. Ensure that you have set up read-only routing by following Configure read-only routing document.

  2. Ensure Client Driver Support

    The client application must use a client provider that support ApplicationIntent parameter. See Driver and client connectivity support for availability groups

    Note

    If you are connecting to a distributed network name (DNN) Listener, the provider must also support MultiSubnetFailover parameter

  3. Ensure connection string properties are set correctly

    For read-only routing to work properly, your client application must use these properties in the connection string:

    • A database name that belongs to the AG
    • An availability group listener name
      • If you're using DNN, you must specify DNN listener name and DNN port number <DNN name,DNN port>
    • ApplicationIntent set to ReadOnly
    • MultiSubnetFailover set to true is required for Distributed network name (DNN)

    Examples

    This example illustrates the connection string for .NET System.Data.SqlClient provider for a virtual network name (VNN) listener:

    Server=tcp:VNN_AgListener,1433;Database=AgDb1;ApplicationIntent=ReadOnly;MultiSubnetFailover=True
    

    This illustrates the connection string for .NET System.Data.SqlClient provider for a distributed network name (DNN) listener:

    Server=tcp:DNN_AgListener,DNN_Port;Database=AgDb1;ApplicationIntent=ReadOnly;MultiSubnetFailover=True
    

    Note

    If you are using command line programs like SQLCMD, ensure that you specify the correct switches for server name. For instance, in SQLCMD you must use the upper case -S switch that specifies server name, not the lower case -s switch which is used for column separator.
    Example: sqlcmd -S AG_Listener,port -E -d AgDb1 -K ReadOnly -M

  4. Ensure that the availability group listener is online. To ensure that the availability group listener is online run the following query on the primary replica:

    SELECT * FROM sys.dm_tcp_listener_states;
    

    If you find the listener is offline, you can attempt to bring it online using a command like this:

    ALTER AVAILABILITY GROUP myAG RESTART LISTENER 'AG_Listener';
    
  5. Ensure READ_ONLY_ROUTING_LIST is correctly populated. On Primary replica, ensure that the READ_ONLY_ROUTING_LIST contains only server instances that are hosting readable secondary replicas.

    To view the properties of each replica you can run this query and examine the connectivity endpoint (URL) of the read only replica.

    SELECT replica_id, replica_server_name, secondary_role_allow_connections_desc, read_only_routing_url 
    FROM sys.availability_replicas;   
    

    To view a read-only routing list and compare to the endpoint URL:

    SELECT * FROM sys.availability_read_only_routing_lists;
    

    To change a read-only routing list you can use a query like this:

    ALTER AVAILABILITY GROUP [AG1]   
    MODIFY REPLICA ON  
    N'COMPUTER02' WITH   
    (PRIMARY_ROLE (READ_ONLY_ROUTING_LIST=('COMPUTER01','COMPUTER02')));  
    

    For more information see Configure read-only routing for an availability group - SQL Server Always On

  6. Check that READ_ONLY_ROUTING_URL port is open. Ensure that the Windows firewall is not blocking the READ_ONLY_ROUTING_URL port. Configure a Windows Firewall for database engine access on every replica in the read_only_routing_list and any for clients that will be connecting to those replicas.

    Note

    If you are running SQL Server on Azure VM, you must take additional configuration steps. Ensure that the network security group (NSG) of each replica VM allows traffic to the endpoint port and the DNN port, if you are using DNN listener. If you are using VNN listener, you must ensure the load balancer is configured correctly.

  7. Ensure that the READ_ONLY_ROUTING_URL (TCP://system-address:port) contains the correct fully qualified domain name (FQDN) and port number. See:

  8. Ensure proper SQL Server Networking configuration in the SQL Server Configuration Manager.

    Verify on every replica in the read_only_routing_list that:

    • SQL Server remote connectivity is enabled
    • TCP/IP is enabled
    • The IP addresses are configured correctly

    Note

    You can quickly verify all of these are properly configured if you can connect from a remote machine to a target secondary replica's SQL Server instance name using TCP:SQL_Instance syntax.

See: Configure a Server to Listen on a Specific TCP Port (SQL Server Configuration Manager) and View or Change Server Properties (SQL Server)

Related Tasks

Related Content

See Also

Transport Security for Database Mirroring and Always On Availability Groups (SQL Server)
Client Network Configuration
Prerequisites, Restrictions, and Recommendations for Always On Availability Groups (SQL Server)