Introduction to Data Sync Service for SQL Azure
[This article was contributed by the SQL Azure team.]
Data Sync Service for SQL Azure allows you to perform bi-directional synchronization between two or more SQL Azure databases. This blog post will discuss the basics of Data Sync Services for SQL Azure.
Under the Covers
SQL Azure Data Sync Service runs on the Windows Azure making use of web and worker roles using the Microsoft Sync Framework. The Windows Azure account is one that is owned by the Microsoft Sync Team, and is managed by them; Data Sync Service for SQL Azure is not an application that you install within a role on you Windows Azure account. This service is hosted currently in the data center in South Central US. There is an online web interface that manages the configuration of your synchronization sets, allowing you to configure what databases you want to synchronize on what servers, in what data centers and how often you want them to synchronize. Currently, SQL Azure Data Sync Service is in CTP and is available in SQL Azure Labs.
Getting Started
In every synchronization scenario, there is a single hub database and many member databases. When you setup the synchronization group, you declare the hub database and assign as many empty member databases as you want.
On the first synchronization, Data Sync Service for SQL Azure provisions the member databases by copying the hub databases schema to them. In other words, you do not have to generate the schema on member databases; this is done for you by Data Sync Service for SQL Azure. Changes are also made to the hub database to efficiently track data changes.
When the schema is copied from the hub database to the member databases the foreign key constraints are not copied. This allows the table data to be inserted in any order on the member database. By not enforcing the foreign key constraints on member databases Data Sync Service for SQL Azure doesn’t have to convert the hub database to read-only on the first synchronization, plus it works with data schemas that have circular foreign key references (for more about circular references see this blog post).
After the member databases have been provisioned, on the first synchronization, all the data from the hub database is copied to the member databases table by table. The order of the inserts doesn’t matter, since there are no foreign key constraints on the member databases.
Data Transfer Pricing
The member database can be on servers that are not in the same database center as Data Sync Service (South Central US). If all your databases are in different data centers, when you synchronize your databases (transfer data) you will be charged for bandwidth for both directions for database that are not in South Central US.
The limitation of only being able to run Data Sync Service from South Central US is a limitation of the CTP. As with the other Windows Azure Platform services, if all your databases are in the same data center and the Data Sync Service is running there, there will be no fee for data exchange.
Only the data that has changed will be transferred on synchronization, not the whole database. However, since that data needs to be synchronized with all member databases and the hub, the number of members is a multiplier in the cost of the data transfer.
For more about pricing click here.
Partial Synchronization
Data Sync Service for SQL Azure allows you to pick the tables you would like to synchronize. You do not have to synchronize the whole database. This allows you to create a synchronization group where the member databases are a subset of the hub. This create an interesting scenario where you can give permissions to a member database that are different than the permission on the hub database, allowing users to access a subset of the data. I will discuss how to do this in a later blog post using PowerPivot.
Schema Changes
When Data Sync Service for SQL Azure creates the schema on the member databases it is a copy of the schema on the hub database. The hub databases schema cannot be changed once the synchronization group has been established. In order to change the schema on the hub database you need to remove the synchronization group and recreate a new synchronization group which will reallocate your member databases; re-generating their schemas and moving the data again.
On Demand or Scheduled Synchronization
Synchronization can be done on demand or on a set schedule, unlike merge replication there is no real-time synchronization. The smallest increment for schedule replication is currently an hour in the current CTP, more frequent synchronization will be coming in the release version. Synchronization can also be done on demand.
Bi-Direction Synchronization
Synchronization on Data Sync Service happens bi-directionally you can write to the member databases and when synchronization occurs, the hub and all member databases are updated. Or, you can write to the hub database and synchronize the member databases. In other words, all the databases are read write.
Data Sync Service for SQL Azure doesn’t allow you to program the handling of conflict resolution as compared to the Microsoft Sync Framework. For this reason it is best if you are inserting row into multiple member databases that you use uniqueidentifier data type as primary keys to your tables, instead of the int IDENTITY data type. This will prevent identity conflicts during the bi-directional synchronization. If you are working with a legacy database or do not want to use the uniqueidentifier data type, there are several other methods of ensuring a unique primary key across all the databases. I will blog more about this later.
Summary
Do you have questions, concerns, comments? Post them below and we will try to address them.