Share via


Data Replication Using SSIS and MS "Change Track"

Introduction

Our customer required near real-time data from our OLTP database. There were many approaches we could have taken to replicate the data. We decided to use this approach, which uses Microsoft's SSIS application and Microsoft's Change Track option in SQL Server.

There are three components of this process: 
    The Source (OLTP) database
    An SSIS package
    The Target DB

Approach

Source DB
On the source OLTP database, Microsoft "Change track" was enabled on 70 tables. We then created a read-only access account. The two requirements for Change Track was that the candidate tables needed to have a primary key, and the database had to have Read Commit Snapshot Isolation set. That was all that was needed on source DB side.

Target DB
One the Target DB, we created exact duplicates of the source tables, called target tables. We also create 2 additional supporting tables for each target table. One of the supporting tables was used as a temporary holding area for changed data. The other was a history table, an optional component that tracks the history of changes. Both of these tables contain the same columns as the target table, and, in addition, contain the following column(s):

target_table_tmp:
    sys_change_opration nchar(2) - contains the type of change for the record (I - insert, U- update, D - Delete)
target_table_history:
    sys_change_opration nchar(2) - contains the type of change for the record (I - insert, U- update, D - Delete)
    sync_version - Change Track version number at the time of capture
    extraction_date - date/time of the extraction

Two other tables were created:
A table (T_REP) that contains a record for each table was being replicated. 
    Table_name
    last_sync_version bigint - last version that was changed for that table
    last_extraction_date datetime - last date/time that the table was updated. 
    last_change_count - number of rows changed from the last capture for that table.

A table (T_REP_SUMMARY) that contains one row of summary information
    last_change_tracked bigint - last value of Change Track that was captured for all tables
    last_check_date datetime - last checked date/time
    last_extraction_date datetime - last extraction datetime
    history_flag - allows the abilty to set history off/on.

SSIS Package
We then created an SSIS Package than contains the ETL code.
The package is very simple:

Step 1: Connect to the Source DB:
        1.1 Determine if changes occurred to any of the 70 tables since the last check. This is performed using a Change Track Function Call using the last_change_tracked number value from T_REP_SUMMARY. (use 0 if this is the first time checking)
        1.2 If changes occured, proceed to Step 2. Otherwise, update the last_check_date field in the T_REP_SUMMARY, and wait until the next checktime

Step 2: ETL
       2.1 Using the T_REP table and a loop, use the MIN_VALID_VERSION function to find which table(s) were changed
            2.1 For each changed table:
                2.1.1 Truncate the target_tmp table associated with selected table
                2.1.2. By Joining the internal Change Track table with the actual source table, populate the rows in the the target_tmp table with the source table along with the sys_change_operation number. 
                2.1.3 Using the MERGE statement, merged the changes with the target table to update the replicated table
                2.1.4 Update the T_REP table with last_sync_version ,last_extraction_date , and last_change_count 
                2.1.5 Finally, copy all the changes from the target_tmp table to the target_history table along with the sync_version and extraction_date. 
        2.2 Update the T_REP_SUMMARY table with the last_updated timestamp and the last_sync_version. 

For testing, the application was stress tested using a tool that performed 120 DML changes per minute on the source. The ETL SSIS package was implemented using the SQL Server Job Scheduler with a 30 second interval. The average process time was 6 seconds. The interval was changed to 1 minute. The average process time remained the same.

Conclusion

By using the power of  Microsoft's Change Track and SSIS, our customer is able to view near real time OLTP data without having to access the production servers.