Troubleshoot link - Azure SQL Managed Instance

Applies to: Azure SQL Managed Instance

This article teaches you how to monitor and troubleshoot issues with a link between SQL Server and Azure SQL Managed Instance.

You can check the state of the link with Transact-SQL (T-SQL), Azure PowerShell or the Azure CLI. If you encounter issues, you can use the error codes to troubleshoot the problem.

Many issues with creating the link can be resolved by checking the network between the two instances, and validating the environment has been properly prepared for the link.

If you run into issues with a link, you can use Transact-SQL (T-SQL), Azure PowerShell or the Azure CLI to get information about the current state of the link.

Use T-SQL for a quick status details of the link state, and then use Azure PowerShell or the Azure CLI for a comprehensive information about the current state of the link.

Use T-SQL to determine the state of the link during the seeding phase, or after data synchronization begins.

Use the following T-SQL query to determine the status of the link during the seeding phase on the SQL Server or SQL Managed Instance that hosts the database seeded through the link:

SELECT
    ag.local_database_name AS 'Local database name',
    ar.current_state AS 'Current state',
    ar.is_source AS 'Is source',
    ag.internal_state_desc AS 'Internal state desc',
    ag.database_size_bytes / 1024 / 1024 AS 'Database size MB',
    ag.transferred_size_bytes / 1024 / 1024 AS 'Transferred MB',
    ag.transfer_rate_bytes_per_second / 1024 / 1024 AS 'Transfer rate MB/s',
    ag.total_disk_io_wait_time_ms / 1000 AS 'Total Disk IO wait (sec)',
    ag.total_network_wait_time_ms / 1000 AS 'Total Network wait (sec)',
    ag.is_compression_enabled AS 'Compression',
    ag.start_time_utc AS 'Start time UTC',
    ag.estimate_time_complete_utc as 'Estimated time complete UTC',
    ar.completion_time AS 'Completion time',
    ar.number_of_attempts AS 'Attempt No'
FROM sys.dm_hadr_physical_seeding_stats AS ag
    INNER JOIN sys.dm_hadr_automatic_seeding AS ar
    ON local_physical_seeding_id = operation_id

-- Estimated seeding completion time
SELECT DISTINCT CONVERT(VARCHAR(8), DATEADD(SECOND, DATEDIFF(SECOND, start_time_utc, estimate_time_complete_utc) ,0), 108) as 'Estimated complete time'
FROM sys.dm_hadr_physical_seeding_stats

If the query returns no results, then the seeding process hasn't started or has already completed.

Use the following T-SQL query to check the health of the link once data synchronization begins:

DECLARE @link_name varchar(max) = '<DAGname>'
SELECT
   rs.synchronization_health_desc [Link sync health]
FROM
   sys.availability_groups ag 
   join sys.dm_hadr_availability_replica_states rs 
   on ag.group_id = rs.group_id 
WHERE 
   rs.is_local = 1 and ag.is_distributed = 1 and ag.name = @link_name 
GO

The query returns the following possible values:

  • HEALTHY: The link is healthy, and data is being synchronized between the replicas.
  • NOT_HEALTHY: The link is unhealthy, and data is not synchronizing between the replicas.

The replicaState value describes the current link. If the state also includes Error then an error occurred during the operation listed in the state. For example, LinkCreationError indicates that an error occurred while creating the link.

Some possible replicaState values are:

  • CreatingLink: Initial seeding
  • LinkSynchronizing: Data replication is in progress
  • LinkFailoverInProgress: Failover is in progress

For a complete list of link state properties, review the Distributed Availability Groups - GET REST API command.

There are two distinct categories of errors you can encounter when using the link - errors when you try to initialize the link, and errors when you try to create the link.

The following error can occur when initializing a link (Link state: LinkInitError):

The following error can occur when creating a link (Link state: LinkCreationError):

  • Error 41977: The target database isn't responsive. Check link parameters and try again.

Inconsistent state after forced failover

Following a forced failover, you might encounter a split-brain scenario where both replicas are in the primary role, leaving the link in an inconsistent state. This can happen if you fail over to the secondary replica during a disaster, and then the primary replica comes back online.

First, confirm you're in a split-brain scenario. You can do so by using SQL Server Management Studio (SSMS) or Transact-SQL (T-SQL).

Connect to both SQL Server and SQL managed instance in SSMS, and then in Object Explorer, expand Availability replicas under the Availability group node in Always On High Availability. If two different replicas are listed as (Primary), you're in a split-brain scenario.

Alternatively, you can run the following T-SQL script on both SQL Server and SQL Managed Instance to check the role of the replicas:

-- Execute on SQL Server and SQL Managed Instance 
USE master
DECLARE @link_name varchar(max) = '<DAGName>'
SELECT
   ag.name [Link name], 
   rs.role_desc [Link role] 
FROM
   sys.availability_groups ag 
   JOIN sys.dm_hadr_availability_replica_states rs 
   ON ag.group_id = rs.group_id 
WHERE 
   rs.is_local = 1 AND ag.is_distributed = 1 AND ag.name = @link_name 
GO

If both instances list PRIMARY in Link role column, you're in a split-brain scenario.

To resolve the split brain state, first take a backup on whichever replica was the original primary. If the original primary was SQL Server, then take a tail log backup. If the original primary was SQL Managed Instance, then take a copy-only full backup. After the backup completes, set the distributed availability group to the secondary role for the replica that used to be the original primary but will now be the new secondary.

For example, in the event of a true disaster, assuming you've forced a failover of your SQL Server workload to Azure SQL Managed Instance, and you intend to continue running your workload on SQL Managed Instance, take a tail log backup on SQL Server, and then set the distributed availability group to the secondary role on SQL Server such as the following example:

--Execute on SQL Server 
USE MASTER

ALTER availability group [<DAGName>] 
SET (role = secondary) 
GO 

Next, execute a planned manual failover from SQL Managed Instance to SQL Server by using the link, such as the following example:

--Execute on SQL Managed Instance 
USE MASTER

ALTER availability group [<DAGName>] FAILOVER 
GO 

Test network connectivity

Bidirectional network connectivity between SQL Server and SQL Managed Instance is necessary for the link to work. After you open ports on the SQL Server side and configure an NSG rule on the SQL Managed Instance side, test connectivity by using either SQL Server Management Studio (SSMS) or Transact-SQL.

Test the network by creating a temporary SQL Agent job on both SQL Server and SQL Managed Instance to check the connection between the two instances. When you use Network Checker in SSMS, the job is automatically created for you, and deleted after the test completes. You need to manually delete the SQL Agent job if you test your network by using T-SQL.

Note

Executing PowerShell scripts by the SQL Server Agent on SQL Server on Linux is not currently supported, so it's not currently possible to execute Test-NetConnection from the SQL Server Agent job on SQL Server on Linux.

To use the SQL Agent to test network connectivity, you need the following requirements:

  • The user doing the test must have permissions to create a job (either as a sysadmin or belongs to the SQLAgentOperator role for msdb) for both SQL Server and SQL Managed Instance.
  • The SQL Server Agent service must be running on SQL Server. Since the Agent is on by default on SQL Managed Instance, no additional action is necessary.

To test network connectivity between SQL Server and SQL Managed Instance in SSMS, follow these steps:

  1. Connect to the instance that will be the primary replica in SSMS.

  2. In Object Explorer, expand databases, and right-click the database you intend to link with the secondary. Select Tasks > Azure SQL Managed Instance link > Test Connection to open the Network Checker wizard:

    Screenshot of object explorer in S S M S, with test connection selected in the database link right-click menu.

  3. Select Next on the Introduction page of the Network Checker wizard.

  4. If all requirements are met on the Prerequisites page, select Next. Otherwise resolve any unmet prerequisites, and then select Re-run Validation.

  5. On the Login page, select Login to connect to the other instance that will be the secondary replica. Select Next.

  6. Check details on the Specify Network Options page and provide an IP address, if necessary. Select Next.

  7. On the Summary page, review the actions the wizard takes and then select Finish to test the connection between the two replicas.

  8. Review the Results page to validate connectivity exists between the two replicas, and then select Close to finish.

Caution

Proceed with the next steps only if you've validated network connectivity between your source and target environments. Otherwise, troubleshoot network connectivity issues before proceeding.

For more information on the link feature, review the following resources: