Udostępnij za pośrednictwem


Create Availability Group Fails With Error 35250 'Failed to join the database'

You can create an availability group using the New Availability Group wizard or using a series of transact-sql commands. A common failure can occur when attempting to create an availability group using either method. The error is as follows and is caused because the target primary and secondary replicas are unable to communicate:

Msg 35250, Level 16, State 7, Line 1
The connection to the primary replica is not active.  The command cannot be processed.

If creating the availability group using the New Availability Group wizard, the 'Joining <availability database> to availability group <availability group> at <replica>' will fail. Clicking the Error link, message 35250 is reported: 

 

If you used transact-sql to create the availability group instead of the New Availability Group wizard, the ALTER DATABASE...SET HADR AVAILABILITY GROUP... command fails with message 35250:

Following is a list of common causes for this failure and steps to diagnose each.

Reason - Inbound Port 5022 Traffic is Blocked

IMPORTANT: By default, AlwaysOn configures the database mirroring endpoints to use port 5022, and the following document will use 5022 when discussing the database mirroring endpoint in use for availability groups. However, this may not be the configured port for the database mirroring endpoints in your environment.  Query sys.tcp_endpoints on each replica to confirm which port is being used:

select name, type, port from sys.tcp_endpoints where type_desc='database_mirroring'
go

By default inbound traffic is blocked in Windows firewall.

Port 5022 is used by the primary and secondary replicas for synchronization and communication purposes. Traffic must be allowed inbound on this port. Testing has shown that if port 5022 inbound traffic is blocked at the primary, at the secondary or at both, you will be unable to create the availability group and message 35250 will be reported.

Reason - Endpoint is not created or started

Ensure the mirroring endpoints are created and started on the primary and the secondary replicas.

To detect if the endpoints are created, query for database_mirroring endpoints at the primary and the secondary.

:Connect SQLNODE1
select name, state_desc, port from sys.tcp_endpoints where type_desc='DATABASE_MIRRORING'
go
:Connect SQLNODE2
select name, state_desc, port from sys.tcp_endpoints where type_desc='DATABASE_MIRRORING'
go

If you find an endpoint does not exist, create it:

:Connect SQLNODE1
create endpoint [Hadr_endpoint]
 state=started
 as tcp (listener_port = 5022, listener_ip = all)
 for database_mirroring (role = all, authentication = windows negotiate, encryption = required algorithm aes)
go

To detect if the endpoints are started, query them at the primary and the secondary.

:Connect SQLNODE1
select name, state_desc, port FROM sys.tcp_endpoints where name='hadr_endpoint'
go
:Connect SQLNODE2
select name, state_desc, port from sys.tcp_endpoints where name='hadr_endpoint'
go

If you find an endpoint is not running, start it:

 :Connect SQLNODE1
alter endpoint [Hadr_endpoint] state = started

IMPORTANT: There is a known issue in which sys.tcp_endpoints.state_desc may incorrectly report the endpoint as STARTED when it is not started. It is a good idea to execute the ALTER ENDPOINT command for the endpoint at each replica, despite the report given by sys.tcp_endpoints, queried above.

Reason - Endpoint permissions

If database mirroring endpoints are configured to use Windows authentication, ensure that the SQL Server instances hosting your availability replicas run with a SQL Server startup account are domain accounts.

The New Availability Group wizard catches this misconfiguration, and reports it. Note that the Endpoints tab is displayed and the SQL Server Service Account reports that one of the replica's startup account is configured for LocalSystem.

If you proceed by clicking Yes and complete the wizard, it will fail during the 'Joining <availability database> to availability group <availability group> at <replica>.'

If you are attempting to create an availability group using transact-sql, this configuration problem will not be detected and command ALTER DATABASE...SET HADR AVAILABILITY GROUP...  will fail with 35250. 

Reason - SQL Server is not listening on port 5022

If SQL Server is unable to listen on port 5022, the New Availability Group wizard and the transact-sql command ALTER DATABASE...SET HADR AVAILABILITY GROUP... will fail with 35250. 

 To determine if SQL Server is listening on port 5022, review the SQL Server error log. You should find the following message(s) in the SQL Server error log:

2013-12-09 08:52:25.47 spid23s     Server is listening on [ 'any' <ipv6> 5022].
2013-12-09 08:52:25.47 spid23s     Server is listening on [ 'any' <ipv4> 5022]. 

SQL Server may not be able to listen on port 5022 if another application is already listening on the port. If you find that SQL Server is not listening on port 5022 because it is already being used, run 'netstat -a' to determine what application is using the port:

Additional diagnostics - query sys.dm_hadr_availability_replica_states on the secondary

 If the above settings check out, query the sys.dm_hadr_availability_replica_states for the last_connect_error_number which may help you diagnose the join issue. Depending on which replica was having difficulty communicating, you should query both the primary and secondary for the local replica:

select r.replica_server_name, r.endpoint_url,
rs.connected_state_desc, rs.last_connect_error_description,
rs.last_connect_error_number, rs.last_connect_error_timestamp
from sys.dm_hadr_availability_replica_states rs join sys.availability_replicas r
on rs.replica_id=r.replica_id
where rs.is_local=1

For example, if the secondary was unable to communicate with the DNS server or if a replica's endpoint_url was configured incorrectly when creating the availability group, you may get the following results:

 

Comments

  • Anonymous
    May 21, 2014
    I am seeing the same exact issue, but unable to determine the root cause or solution.  Both SQL 2014 nodes are Windows 2012 R2 VMs (created from the same template).  I've confirmed the following:1) Both on the primary and secondary replica are listening to port 5022:oclv0244 (primary)05/21/2014 11:15:12,spid24s,Unknown,Server is listening on [ 'any' <ipv4> 5022].05/21/2014 11:15:12,spid24s,Unknown,Server is listening on [ 'any' <ipv6> 5022].oclv0245 (secondary)05/21/2014 11:14:11,spid21s,Unknown,Server is listening on [ 'any' <ipv4> 5022].05/21/2014 11:14:11,spid21s,Unknown,Server is listening on [ 'any' <ipv6> 5022].2)  Firewall has been disabled on both nodes3) End-points on both nodes have been created and startedoclv0244 (primary)select name, type, port from sys.tcp_endpoints where type_desc='database_mirroring'Hadr_endpoint STARTED 5022oclv0245 (secondary)select name, type, port from sys.tcp_endpoints where type_desc='database_mirroring'Hadr_endpoint STARTED 5022When I performed this query on the secondary, I got:select r.replica_server_name, r.endpoint_url,rs.connected_state_desc, rs.last_connect_error_description,rs.last_connect_error_number, rs.last_connect_error_timestampfrom sys.dm_hadr_availability_replica_states rs join sys.availability_replicas ron rs.replica_id=r.replica_idwhere rs.is_local=1replica_server_name endpoint_url connected_state_desc last_connect_error_description last_connect_error_number last_connect_error_timestampOCLV0245 TCP://oclv0245.win2012r2svtdom.usca.ibm.com:5022 DISCONNECTED An error occurred while receiving data: '10054(An existing connection was forcibly closed by the remote host.)'. 10054 2014-05-21 19:19:44.697The following message is logged in the sql log on the secondary (oclv0245)05/21/2014 11:45:10,spid57s,Unknown,A connection timeout has occurred while attempting to establish a connection to availability replica 'OCLV0244' with id [6A6ECC71-1589-4E21-9FC1-E1C96A95069E]. Either a networking or firewall issue exists<c/> or the endpoint address provided for the replica is not the database mirroring endpoint of the host server instance.
  • Anonymous
    June 11, 2014
    Thank you for your post! This had the best troubleshooting steps I have found. I got this error when migrating one of two instances from one host to a new host. Permissions were the same, logins were the same, but I still got the same error.After running the scripts, I found that the endpoint_url in sys.availability_replicas:select r.replica_server_name, r.endpoint_url,rs.connected_state_desc, rs.last_connect_error_description,rs.last_connect_error_number, rs.last_connect_error_timestamp, rs.replica_id,r.replica_idfrom sys.dm_hadr_availability_replica_states rs join sys.availability_replicas ron rs.replica_id=r.replica_idwhere rs.is_local=1Was set to port 5022. When I ran the sys.tcp_endpoints:select name, state_desc, port FROM sys.tcp_endpoints where name='hadr_endpoint'It was set to 5023 (from the previous server where it was the second hadr instance). After rebuilding the Availability group, with the proper endpoint ports, everything started working.
  • Anonymous
    November 23, 2014
    Managed to fix the installation error "Joining database on secondary replica resulted in an error".  If you are getting this error then make sure SQL Service account has CONNECT permission on the end_point.  use [master]GOCREATE LOGIN [DOMAINCOMPUTERNAME$] FROM WINDOWSGOGRANT CONNECT ON ENDPOINT::[Hadr_endpoint] TO [DOMAINCOMPUTERNAME$]GO
  • Anonymous
    March 04, 2017
    Error : Connection attempt failed with error: '10060(A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.)'. Added a inbound rule on Windows Firewall to allow port 5022 on both the nodes Resolved the issue.
  • Anonymous
    April 13, 2017
    The comment has been removed