CSI: Replication–the mysterious vanishing row
I have been working with a customer on an interesting situation. On occasion they have replication issues. Examining the error we see many errors that state:
The row was not found at the Subscriber when applying the replicated command
So we need to understand why the row was not found. (cue TV show music)…. CSI: Replication
We fire up SQL Server Management Studio and connect to the distributor. We then issue the following T-SQL code
use distribution;
go
select * from MSRepl_Errors
This returns us the transaction sequence number of the transaction in our publisher which cannot be replicated. We want to see what table is being involved so we are going to have to use an undocumented function to examine the transaction log in order to uncover that clue…
There are many great posts on the web about ::fn_dblog which explain this undocumented function. Most notably Paul Randal’s blog post here.
In Paul’s post he provides code to go from the result set back to a hexadecimal representation of the xact_seqno. However, we want to search the output of the call to fn_dblog and in order to do so, we have to convert the format returned by MSRepl_Errors (0x0000004E00000048000400000000) into the colon separated hexadecimal format that fn_dblog returns (0000004e:00000048:0004).
To do this we can use the following code
use distribution
go
declare @LSN varbinary(16)
select TOP 1 @LSN = xact_seqno
from MSrepl_errors
group by xact_seqno, time
order by time desc
select @LSN
-- Now we have to convert it to the Current LSN format that fn_dblog uses
declare @convertedLSN varchar(max)
select @convertedLSN = [sys].[fn_varbintohexstr](@LSN)
SELECT @convertedLSN =
SUBSTRING(@convertedLSN, 3,8 ) +':'+
RIGHT( '0000000000' + SUBSTRING( CAST(@convertedLSN as varchar), 11, 8 ), 8 ) + ':' +
RIGHT( '00000' + SUBSTRING( CAST(@convertedLSN as varchar), 19, 4), 4 )
--fn_dblog uses the current context to decide which transaction log to read.
use AdventureWorks2008R2;
select * from
fn_dblog (NULL,NULL) Where [Transaction ID] in
(select [Transaction ID] from fn_dblog(NULL,NULL)
where [Current LSN] = @convertedLSN)
In my case we can see that a row in the Sales.SalesTaxRate table has been modified at the publisher, but cannot be replicated to the subscriber.
.
Current LSN Operation Transaction ID AllocUnitName
0000004e:00000048:0001 LOP_BEGIN_XACT 0000:000017ec NULL
0000004e:00000048:0003 LOP_MODIFY_ROW 0000:000017ec Sales.SalesTaxRate.PK_SalesTaxRate_SalesTaxRateID
0000004e:00000048:0004 LOP_COMMIT_XACT 0000:000017ec NULL
Now we must locate witnesses on the subscriber who might have seen what happened to our row.
So, we get our forensics kit and head out to the subscriber
We need to use our new friend fn_dblog to examine the transaction log of the subscriber using the clues we obtained above.
declare @tablename nvarchar(256) = '%Sales.SalesTaxRate%' -- NOTE: the % because the AllocUnitName will also contain the object name (i.e. Index name)
-- fn_dblog uses the current context to pick the log to view
-- So, set your database name here
USE AdventureWorks2008R2;
-- Now lets see the log records for deleted rows for this table
select
SUSER_SNAME([Transaction SID]) as [User]
,SPID
,ISNULL([AllocUnitName],'Delete Transaction') +
case when Operation='LOP_BEGIN_XACT' then ' Start'
when Operation='LOP_ABORT_XACT' then ' Aborted'
when Operation='LOP_DELETE_ROWS' then ' Deleting'
else ' Committed' end as [Description]
,[Begin Time]
,[Current LSN]
,[Operation]
,[Context]
,[Transaction ID]
,[Description]
,[Transaction ID]
,* -- Leave this here in case we want other columns
from ::fn_dblog(NULL, NULL)
where [Transaction ID] IN
(select [Transaction ID] from ::fn_dblog(NULL,NULL)
where [Operation]='LOP_DELETE_ROWS' and [AllocUnitName] LIKE @tablename )
GO
We are getting closer…
After interrogating our suspects we obtain the following evidence (output abbreviated for clarity)
User SPID Description Begin Time
REDMOND\bobtay 51 Delete Transaction Start 2013/09/12 09:54:10:563
NULL NULL Sales.SalesTaxRate.AK_SalesTaxRate_StateProvinceID_TaxType Deleting NULL
NULL NULL Sales.SalesTaxRate.AK_SalesTaxRate_rowguid Deleting NULL
NULL NULL Sales.SalesTaxRate.PK_SalesTaxRate_SalesTaxRateID Deleting NULL
NULL NULL Delete Transaction Committed NULL
It appears that boB Taylor is our suspect and deleted the row that we needed to replicate!
But wait, our suspect is really quite clever. After he deleted the row, the issued a checkpoint command and backed up the transaction log.
In this case, the output of the fn_dblog query above will return zero rows.
Time for advanced forensic tools – fn_dump_dblog. Paul Randal describes all the parameters in hisblog post here so I won’t duplicate that but essentially it performs a similar analysis for the transaction log that has been backed up.
DECLARE @tranlog nvarchar(1024) = N'C:\Program Files\Microsoft SQL Server\MSSQL11.SQL2K12SP1\MSSQL\Backup\MyTestDB.trn'
-- Also, if you are appending to your backup set with each tran
-- log backup you will need to view the file with HEADER_ONLY to determine which file in the sequence to use.
-- you can examine the time frames by using the restore UI to see the files in the backup set and the time frames that
-- they cover.
DECLARE @FileNumber int = 1
select
SUSER_SNAME([Transaction SID]) as [User]
,SPID
,ISNULL([AllocUnitName],'Delete Transaction') +
case when Operation='LOP_BEGIN_XACT' then ' Start'
when Operation='LOP_ABORT_XACT' then ' Aborted'
when Operation='LOP_DELETE_ROWS' then ' Deleting'
else ' Committed' end as [Description]
,[Begin Time]
,[Current LSN]
,[Operation]
,[Context]
,[Transaction ID]
,[Description]
,*
from
fn_dump_dblog (
NULL, NULL, N'DISK', @FileNumber, @tranlog,
DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT),
(select
distinct [Transaction ID] AS [tid]
from
fn_dump_dblog (
NULL, NULL, N'DISK', @FileNumber, @tranlog,
DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT)
where
[Operation] = 'LOP_DELETE_ROWS') [fd]
where [Transaction ID] = [fd].[tid];
GO
We get the same information from the physical log
User SPID Description Begin Time Current LSN AllocUnitID
REDMOND\bobtay 59 Delete Transaction Start 2013/09/16 15:54:06:910 00000026:00000298:0001
NULL NULL Deleting NULL 00000026:00000298:0003 72057594044153856
NULL NULL Delete Transaction Committed NULL 00000026:00000298:0005
Now in this example we know based on the time frame that this is our record, but if there are a larger number of deletes occurring we need to figure out if this is our table. To do this we must then take the AllocUnitID (which is conveniently translated for us in fn_dblog) and get the table name. The following code will do that for you.
-- set this from AllocUnitID in the output of fn_dump_dblog
declare @allocUnitID bigint = 72057594044153856
declare @allocationType tinyint
declare @containerID bigint
select @containerID = Container_Id, @allocationType = type
from sys.allocation_units
where allocation_unit_id = @allocUnitID
if @allocationType IN (1,3)
begin
select object_name(object_id) as [Object Name]
from sys.partitions
where hobt_id = @containerID
end
else
begin
if @allocationType= 2
begin
select object_name(object_id) as [Object Name]
from sys.partitions
where partition_id = @containerID
end
else
select 'Objected is marked for deferred drop.'
end
Just as we expected the output gives us
Object Name
SalesTaxRate
This case is closed!