Notification to Pull
As developers start architecting their apps with Sync Services for ADO.NET they're starting to ask questions about the holes we didn't get complete in this 1.0 release. One such topic is how can I send changes from the server to the client, without the client having to ask? There are a few problems that quickly surface:
- Addressability of each client - where do I send the notification if my client address keeps changing
- Maximize the scalability of central resources by minimizing the detailed knowledge of each client
- Minimize network traffic
Here's the problem, the client doesn't really know when something has changed on the server so it's left to "ping" the server occasionally asking "has anything changed in these objects?". This sort of sounds like the electronic version of "are we there yet?" With Sync Services, we very specifically designed the system so the server doesn't actually know who the clients are. Clients may come and go over time, and requiring the server to track changes for each client imposes scalability limitations. This is one of the challenges with Merge Replication. It's not that we didn't want to solve this problem, but we wanted to layer the problem as it's not unique to Sync Services. This leaves us with a gap. So, what to do?
KISS
Keep It Silly Simple. The simplistic option is to just have the client occasionally ask if anything has changed. Consider an application that continually asks the server for the same list of states or Order Status Codes. Now without any caching the app will simply ask the same question over and over again. These values don't change very often. I remember back in the late 80's Puerto Rico was considering becoming the 51st state. I often joke that California may fall off into the ocean after an earthquake. These are pretty dramatic events that I joke about. Let's say you were caching these values locally, and once a day the client asks the server for what changed. Not all states, but just what changed. Now, let's look a that 5,000 item product catalog. How often do they change? Maybe a few additions, price changes, descriptions a week. Here, you might query once an hour. But again, not the whole catalog, but just what changed. While not as efficient as some alternatives, it's still much better than asking for all the values each time which clogs up the network with redundant data, as well as server load. So, just moving to a routine sync pull operation may already improve things, so keeping it simple may be the way to go.
Notification to Pull
Notification to pull essentially means the server sends a small notification to the client indicating there's something the client may be interested in. Think of this as dialing a phone number. You dial the number, and wait for a response. You don't dial the number, say "Hi Steve, how's thing? I had a great day today... " If someone is home, they answer and say hello. This initiates the conversation. Now you can go further and after 4 rings, voice mail picks up. While some people leave long conversations, most know to simply leave a summary message. Those who have friends that leave long messages usually figure out how to limit the time of a message. If the recipient is interested, they call back. Now, this may not be the best example, but it's what I've got for now. <g>
Getting back to our sync design: With any central system, you want to aggregate as many individual operations as possible. Lets continue our States example from above. Let's say we have 10,000 clients that are caching the list of states. Rather than the central database track each client, the central database simply tracks if anything changes. The item being tracked could be called an Article. Another set of components tracks that something has changed and looks up who's interested in the change, we can call it a subscription. Now this get's complex quickly, so let me try a drawing:
Data Sources
As we quickly see, relational database changes aren't the only thing we may be interested in being notified about. Wouldn't it be nice to know files have changed, application updates are available, or any other type of resource you could imagine?
Listener
A listener listens or checks for changes to specific sources. You might plug in Service Broker, Query Notification, or might even use Sync Services here. When changes are available the listener services looks up to see what clients are interested in this data source. The listener then queues up a notification. Now, since we're only notifying a change has occurred, not the details of the change, the Listener service can perform an Upsert to any notifications that may be queued for each client. There's no reason to tell the same client the same thing has changed multiple times in a single notification.
Notification Distribution
Another service looks for notifications to be sent. But where does it send them? Remember, while servers tend to be stable, with known IP addresses, clients are constantly moving. Not only does their IP address change, their method of communication may change as well. We're in a chicken & egg problem, so let me just defer this problem to the Here I Am service below. Assuming we know the last know location for a particular client, we can now attempt to ping the client. In the picture, I've labeled it "Hello" or step A. If the appropriate client is still there, it would Ack back with the appropriate authentication, step B. If another client has roamed in and now occupies the address, the Ack would fail, and the Notification Distribution service can update the Last_Known_Location as out of range since the client didn't Ack back with the appropriate credentials. Now, assuming the Ack was successful, the Notification is sent to the client, step C, simply saying, "There was a change to Article X". The client can then chose what to do with that info. It may decide to ignore it as it just did a sync and it has a policy of only synching once/hour, or it could sync immediately. Of course it could also check for application updates.
Here I Am Service
This is a standard problem the cellular network had to solve. When a call is made to your cell phone attempting to broadcast to the entire planet doesn't scale. Instead you're cell phone is continually interacting with a clearing house that says, here I am. I'm currently in Sammamish WA, USA. As I roam to different locations, my phone "phones home" and says, here I am; I'm in NYC NY, USA. Now, I like to go skiing and snowmobiling in the great white north. If I'm not careful, my phone could drain while I'm out of range. This is because transmitting a signal consumes a lot more power than receiving. My phone is constantly trying to find a tower. Once it does, it says Here I Am, and can then just go into receive mode and reduce battery usage. Only when it goes out of range does it need to transmit again to establish a last known location. In the mountains, I'm constantly in/out of range, thus the battery drain.
Why send just the notification, not the data
You could argue the same infrastructure above could be used to send the actual changes. While that's true, it's not the most efficient. Depending on your infrastructure, your clients may likely be out of range for multiple updates. Rather than sending blocks of updates the client may not even be interested in, systems tend to scale better by simply doing the notification to pull.
So, as you can see, a Notification to Pull model is very powerful, but quite complex. There are a lot of moving parts to build, configure, maintain. We have a number of the parts including:
- SQL Server - Ability to store all this data on the server
- SQL Server Compact - Ability to store all this data easily on the client with minimal deployment issues
- Sync Services for ADO.NET - The pull operation to send/receive net changes between the server and the client
- SQL Server Service Broker - Ability to queue up operational changes within the data center
- WCF - communication stack between the client and the server over various protocols
- Query Notifications - Ability for the server to notify a listener service that something has changed
- Change Notifications - SQL Server 2008 introduces a more efficient way to track changes that can be used with Query Notifications to provide an efficient way to know when something changed, and what the details of the change are.
Building out this infrastructure may be more than you need right away. However, if you look closely, the configuration of Sync Services doesn't actually change. We're simply layering a notification system to determine when to pull.
Steve
Comments
Anonymous
August 16, 2007
Hi Steve, I have a requirement where I need to let the users work offline. My initial thought was to use Express edition and create a database to house the data that they require to work offline - similar to Outlook emails. But I do not want to run a Service on the end user machine and thought I could use SQL CE database. How to actually populate a SQL CE database from a set of queries that will get the data from the SQL Server 2005 databases. The data resides in one or more tables. This offline data will be read-only and there will no need to sync. Will SQL CE database handle say 100k rows, no data manipulation will happen in this database and it will be readonly. Could you please suggest how SQL CE can be used? Thanks. RamAnonymous
August 17, 2007
Hi Ram, Yes, SQLce is well designed for this exact model. There are three sync technologies offered, Merge Replication, RDA and Sync Services. They all create the schema and download data for your scenario. For what you’re looking for, Sync Services would likely fit the need best. You have complete control over the commands, service definition, etc. Sync Services hasn’t yet shipped, but is in B2. It ships later this year with Visual Studio 2008. If you need something now, RDA would work well, but you won’t be able to get incremental changes. For more info, please see this Q&A post which describes the different sync technologies and their roadmap. SteveAnonymous
August 20, 2007
Thanks Steve. But how could one easily populate a few tables in SQLCE from a SQL Server database in Management Studio. For example, I have an application that supports offline mode and requires data to be available in the same data structure as the main database then I would like to populate the same tables in SQLCE with the rows needed. Or when my dataset is filled I could save that data to a SQLCE just like saving it as an XML. Is this something that the SQLCE team has thought about? I tried accessing the SQL Server from SQLCE databse connection, but as expected it did not work as it would require a Linked server entry and that is not possible with SQLCE. Thanks, RamanAnonymous
August 22, 2007
Hi Steve, Is it possible to BULK INSERT say 10000 rows into SQL Server CE database? Thanks, RamAnonymous
August 23, 2007
Hi Ram, We don't have a SQL Bulk API, but in reality, it may not be needed. If you open a connection, use the same parameterized command and simply replace the parameter values in a loop, SQLce is pretty darn fast. The main benefits of SQL Bulk Copy is it bypasses the logging system of SQL Server. Because SQLce doesn't target the massive concurrent user scenarios of SQL Server, we have a simplified logging system that is quite fast. Of course all the same rules apply. Lots of indexes can slow things down, so it may help to drop and re-create the indexes. I have tried the SqlCeResultSet API, but couldn't find any significant performance difference between that and a standard Insert statement with the SqlCeCommand object as noted above, Hope that helps, SteveAnonymous
December 01, 2007
Hi Steve, Is there any plan to implement notification services part you mentioned here in the next release or maybe as another framework? Thanks, LeonidAnonymous
December 03, 2007
The comment has been removedAnonymous
April 16, 2008
There are many limitations on query notification in 2005, e.g.
- the options must be right for both the registering command and the triggering commands, 2. if stored procedure is the registering command, then it must not contain any set options statement. and ... The restriction set by (1) probably will disallow the combined use of data replication and query notification, since the options set in data replication process are prety much pre-programmed by MS, I doubt if they had the requirement of query notification in mind when they did it.
Anonymous
April 29, 2008
In our current thinking, we wouldn't actually use the same infrastructure as SQL Server. As an in-proc database, we can take advantages that SQL Server can’t as it tends to run best in a stateless environment. SteveAnonymous
October 17, 2008
There's been a lot of talk, momentum and questioning for where and how occasionally connected designsAnonymous
December 02, 2008
we keep being told that "microsoft recommends using not more than 10 fat clients to register query notifications" is this true or just an urban legend, started from an initial fear that people would register notifications from thousands of asp.net clients, overloading the server? thanks! dAnonymous
December 09, 2008
The comment has been removed