SQL Server Change Data Capture with SAP
Hello I am pleased to introduce Clas Hortien as new contributor to this blog site. He is located in the SAP development facilities in Germany. For many years
his focus is to support SAP customers running on SQL Server. Some of you might know him by his contributions to the SDN SQL Server forum. Clas will
start a series of articles looking at new features of SQL Server 2008 in conjunction with SAP. As a first feature we start with “Change Data Capture” and
how it could be leveraged in combination with an SAP system.
Change Data Capture (CDC) provides information about DML changes on a table and a database. By using change data capture, you eliminate expensive
techniques such as user triggers, timestamp columns, and join queries in order to detect changes. Change Data Capture records modifications on tables on
which CDC got enabled. The details of the changes are available in a normal relational table for selection. CDC only tracks the columns which got modified as
we see later on in this article. Change Data Capture functionality is available only in SQL Server 2008 Enterprise, Developer, and Evaluation editions.
So far SAP does not provide any GUI or interface to setup or control the CDC feature. Since Basis release 720, the DBA cockpit provides some monitoring
functionality, mainly to check the status and to see which tables are monitored through CDC.
You can find this new screen in the NON-SAP Features section of the Configuration branch in the DBA Cockpit (see the screenshot).
When you plan on using CDC, you should consider the following limitations and impacts:
- You may not use CDC, when your SAP System ID (SID) is CDC, because the Change Data Capture feature will interfere massively with your SAP system.
- CDC will have an impact on CPU and space usage in your system. Especially the space usage can be tremendously, when you monitor multiple tables which a high change rate.
- Please never ever try to monitor the SAP queuing tables like VBDATA, VBHDR and VBMOD as the amount of data is enormous and not useful anyway.
- Do not monitor the SAP RFC tables like ARFC*, QRFC* and TRFC*.
- Never monitor all tables of an SAP system simultaneous.
- Be extremely selective what table to capture changes on and keep the number of tables limited.
To use CDC you have to manually enable it in a SQL Query Window with the following T-SQL commands:
USE PAX
EXEC sp_cdc_enable_db – enabling it on the DB
-- start it for some tables
EXEC sp_cdc_enable_table 'pax',
'REPOSRC', @role_name = null, @supports_net_changes =1;
EXEC sp_cdc_enable_table 'pax',
'SNAP', @role_name = null, @supports_net_changes =1;
EXEC sp_cdc_enable_table 'pax',
'sap_tabstats', @role_name = null, @supports_net_changes =1;
GO
The first command (sp_cdc_enable_db) enables CDC for the current database. All the necessary structures will be created and the database gets prepared
for the CDC usage.
In the example we started the monitoring for three tables (SNAP – Short dumps, REPOSRC – SAP Report sources, sap_tabstats – DBA Cockpit history
data). With the option @support_net_changes an additional function gets created, which shows the net changes on the table. By setting the @role_name
option to null the security of the tables which contains the changes is not restricted.
Change Data Capture will create a new schema called ‘cdc’ in the SAP database (therefore never use CDC with a SAP system using the SID of CDC). At least
three additional objects are created in the new cdc schema in the SAP database for each table which is in the scope of CDC:
Object name |
Type |
Example |
fn_cdc_get_all_changes_<schema>_<table name> |
function |
fn_cdc_get_all_changes_pax_SNAP |
fn_cdc_get_net_changes_<schema>_<table name> |
function |
fn_cdc_get_net_changes_pax_SNAP |
<schema>_<table name>_CT |
table |
pax_SNAP_CT |
The table <schema>_<table_name>_CT contains the data of the changed rows. It is an exact copy of the table structure of the source table (e.g. SNAP),
with some additional fields (__$start_lsn, __$seqval, __$operation, __$update_mask) at the beginning of the table.
From SQL Server Books online:
Column name |
Data type |
Description |
__$start_lsn |
binary(10) |
Log sequence number (LSN) associated with the commit transaction for the change. All changes committed in the same transaction share the same commit LSN. For example, if a delete operation on the source table removes two rows, the change table will contain two rows, each with the same __$start_lsn value. |
__$end_lsn |
binary(10) |
Identified for informational purposes only. Not supported. Future compatibility is not guaranteed. In SQL Server 2008, this column is always NULL. |
__$seqval |
binary(10) |
Sequence value used to order the row changes within a transaction. |
__$operation |
int |
Identifies the data manipulation language (DML) operation associated with the change. Can be one of the following: 1 = delete 2 = insert 3 = update (old values) Column data has row values before executing the update statement. 4 = update (new values) Column data has row values after executing the update statement. |
__$update_mask |
varbinary(128) |
A bit mask based upon the column ordinals of the change table identifying those columns that changed. |
<captured source table columns> |
varies |
The remaining columns in the change table are the columns from the source table that were identified as captured columns when the capture instance was created. If no columns were specified in the captured column list, all columns in the source table are included in this table. |
The additional objects can be seen in SQL Server Management Studio under System Tables of the SAP database:
You can use the two functions to show the changes on the table. The function fn_cdc_get_all_changes_<schema>_<table name> returns one row for every change on a table:
DECLARE @begin_time datetime;
DECLARE @end_time datetime:
DECLARE @begin_lsn binary(10);
DECLARE @end_lsn binary(10);
SET @begin_time = '2010-03-01 12:00:00.000';
SET @end_time = '2010-03-10 12:00:00.000';
-- Map the dates
SELECT @begin_lsn = sys.fn_cdc_map_time_to_lsn ('smallest greater than', @begin_time);
SELECT @end_lsn = sys.fn_cdc_map_time_to_lsn ('largest less than or equal', @end_time);
SELECT __$start_lsn, __$seqval, __$operation, __$update_mask,PROGNAME, R3STATE, TYPE FROM cdc.fn_cdc_get_all_changes_pax_REPOSRC( @begin_lsn, @end_lsn, 'all');
GO
Above you can see the changes of the column TYPE in the table REPOSRC from space to ‘A’ to ‘B’.
The function fn_cdc_get_net_changes_<schema>_<table name> shows only the very last version of the row:
With this information you will be able to manually revert or replay the changes against data in a table. For long term auditing or monitoring purposes, it is
essential, that the data is moved outside the CDC monitoring tables (e.g. pax_REPOSRC_CT), because the data is kept in those tables for a short period of
time only.
The data in the CDC system tables is automatically purged by an SQL Agent job ( cdc.<DB Name>_Cleanup, e.g. cdc.PAX_cleanup) every night at 2:00 am.
In the default configuration the retention period is 4320 minutes (3 days). Means, only changes younger than 3 days will be available in the cdc monitoring
tables. However one cleanup run will only delete up to 5000 entries per table. You can view this setting in msdb.dbo.cdc_job by running
SELECT retention, threshold, * FROM msdb.dbo.cdc_jobs WHERE job_type = 'cleanup'
Deleting 5000 rows in one cleanup might not be good enough to keep the volume of the CDC tables in check. Therefore if CDC tables grow day after day,
you might want to change these settings. To switch off CDC completely for a given table, you have to run the sp_cdc_disable_table procedure for the
created capture instance:
EXECUTE sys.sp_cdc_disable_table @source_schema = N'pax', @source_name = N'REPOSRC', @capture_instance = N'pax_REPOSRC'
GO
To de-activate CDC for an entire database you have to use the sp_cdc_disable_db procedure.
EXEC sp_cdc_disable_db
GO
This will remove all created objects in the cdc schema within the database.
Since the CDC functionality is based on the SQL Server Replication Log Sniffing Framework, you have to set the configuration option “max text repl size” to
its maximum of 2147483647, otherwise you will get truncation errors for tables with blob fields(e.g. REPOSRC).
The analysis and storage of the collected data has to be done outside of the SAP system. CDC can be used to get historical data about all changes on critical
tables within the system. In contradiction to other data monitoring solutions, CDC offers you the ability to get and store the information about the changes,
not only about the old and new values of columns.
Some useful Microsoft Knowledgebase articles for CDC: