Jaa


SQL Server detected a DTC/KTM in-doubt transaction

Few days back, one of my clients' faced an issue with their Availability group getting into the resolving state. On checking the logs found that the issue was due to a split brain scenario and as it was a business critical issue, we resolved the issue by restarting the SQL server services on the Secondary replica.

Post which the AG information with regards to its owners were clearly updated on the Failover cluster manager and thus the issue was resolved regarding the AG state. It was running as expected on the owner node and the secondary replica was also in a good shape.

As a verification point, I checked the states of the stand alone databases on both the replicas and happened to see the state of one of the databases in the (Not Synchronising/ Suspect) state on the secondary replica. On checking the error logs found the below errors :

2018-07-31 16:54:47.250 spid37s SQL Server detected a DTC/KTM in-doubt transaction with UOW {DD0xxxxx-BCB0-4xx4-9B52-2xxxxF88xxxA}.Please resolve it following the guideline for Troubleshooting DTC Transactions.
2018-07-31 16:54:47.250 spid37s Error: 3437, Severity: 21, State: 3.
2018-07-31 16:54:47.250 spid37s An error occurred while recovering database 'xxx'. Unable to connect to Microsoft Distributed Transaction Coordinator (MS DTC) to check the completion status of transaction (0:857660049). Fix MS DTC, and run recovery again.
2018-07-31 16:54:47.250 spid37s Error: 3414, Severity: 21, State: 2.
2018-07-31 16:54:47.250 spid37s An error occurred during recovery, preventing the database 'xxx' (9:0) from restarting. Diagnose the recovery errors and fix them, or restore from a known good backup. If errors are not corrected or expected, contact Technical Support.

On checking the environment further found that they did have MSDTC configured and checked with the client if we could proceed with losing the transaction.

On his agreement, tried to abort the transaction from the MSDTC console, but failed to get the state of the database back to the online state. Thus, used the below script to find the UoW of the list of uncommitted MSDTC transactions :

USE Master;
GO

SELECT
DISTINCT(request_owner_guid) as UoW_Guid
FROM sys.dm_tran_locks
WHERE request_session_id =-2
GO

Then, killed the UoW {DD0xxxxx-BCB0-4xx4-9B52-2xxxxF88xxxA} and restarted the SQL instance which helped us to resolve the issue and both the database and the AG were up and running as before.

Hope this helps !! Happy Solving !!