Udostępnij za pośrednictwem


Change Data Capture - What is it and how do I use it?

In the All Good Things Come to an End - Why You Should Upgrade article we discussed some of the new features that applications can take advantage of by upgrading to SQL Server 2008.

CDC (Change Data Capture) was introduced in SQL Server 2008 as a way to capture the data that is inserted, updated, or deleted for a specified table. As inserts, updates, and deletes are applied to a table (that has CDC enabled) the "Capture Process" of CDC gathers those changes from the transaction log and then adds the information to the associated change table.

CDC does NOT use triggers!

CDC piggybacks on logic that has been used for a long time within transaction replication - sp_replcmds

How to Enable CDC

To enable CDC on a database you will need to use the system stored procedure: sys.sp_cdc_enable_db

USE <<DBName>>
GO
EXEC sys.sp_cdc_enable_db

Is it already enabled?

You can look at your system tables and see if you have any cdc schema sytem tables present but if you want a quick and clear indication if CDC is enabled and on what databases you can run this:

SELECT name, is_cdc_enabled FROM sys.databases

Enabling CDC on a Table

To enable CDC on a table you will need to use the system stored procedure: sys.sp_cdc_enable_table

Basic CDC Enable Table Statement:

USE <<DBName>>
GO
EXECUTE sys.sp_cdc_enable_table
@source_schema = N'<<Schema>>'
, @source_name = N'<<TableName>>'
, @role_name = N'<<RoleName>>';
GO

The only value that is not as obvious is the @role_name.  This is the name of the database role used to gate access to change data. role_name is sysname and must be specified. If explicitly set to NULL, no gating role is used to limit access to the change data.

Read more about the other CDC Enable Table parameters here

When you enable CDC on a table two SQL Server Agent jobs will get created:

cdc.<<DBName>>_capture
cdc.>>DBName>>_cleanup

The cdc.<<DatabaseName>>_capture job has 2 steps:

1.  Starting Change Data Capture Collection Agent
2.  Change Data Capture Collection Agent

Basically what it's doing here is raising an error event to start the change data capture session.  After this is complete it runs a system stored procedure (sp_MScdc_capture_job) to start the Change Data Capture Collection Agent.

Because CDC piggybacks on transactional replication functionality the system stored procedure (sp_MScdc_capture_job) checks to see if Transactional Replication is already scanning the transaction log and if it is then the transactional log reader is used so a seperate capture job is not needed (or allowed).

The cdc.<<DBName>>_cleanup job runs a system stored procedure (sp_MScdc_cleanup_job) that purges the change tables periodically (default setting is 4320 minutes - 3 days).

NOTE:   If a table that you are looking to enable CDC on has BLOB fields you will need to modify the "max text repl size" to it's maxiumum allowed value (2147483647) otherwise you will get truncation errors.  This is a replication setting but because CDC piggybacks on the functionality this is where you would need to make the change.

Where Are My Changes?

OK so you've enabled CDC and now you want to go and see your changes.  There are a couple different ways you can go about doing this.  You can query the CDC tables directly:

SELECT * FROM cdc.<<SchemaName>>_<<TableName>>_CT

The __$operation codes are:
1 - Delete
2 - Insert
3 - Update (Before Values)
4 - Update (After Values)

You can link your change data capture table to a table called cdc.lsn_time_mapping in order to make the LSN values translate to something meaningful (date/time)

SELECT LSN.tran_begin_time, LSN.tran_end_time,
CASE CDCTable.__$operation
WHEN 1 THEN 'Delete'
WHEN 2 THEN 'Insert'
WHEN 3 THEN 'Update - Before'
WHEN 4 THEN 'Update - After'
END AS 'OperationChangeType',
CDCTable.*
FROM
cdc.<<SchemaName>>_<<TableName>>_CT AS CDCTable INNER JOIN
cdc.lsn_time_mapping AS LSN ON CDCTable.__$start_lsn = LSN.start_lsn

Now you can add a WHERE clause and filter by date.

The alternative to querying the tables directly is to make use of the cdc.fn_cdc_get_all_changes_<<SchemaName>>_<<TableName>> and cdc.fn_cdc_get_net_changes_<<SchemaName>>_<<TableName>> functions.

Steffen Krause from the SQLCAT team published an amazing article on Tuning the Performance of Change Data Capture in SQL Server 2008.  If you're considering using the Change Data Capture feature this should be considered a "must read"

Also for more information on Change Data Capture please check out the Change Data Capture article on MSDN.