Jaa


Distributed Cache != Database

Even though this topic looks obvious, there have been some interesting customer scenarios centered around this topic. I am hoping that this blog helps for decision making. As a basic tenet, in most solutions, a distributed cache will be used along with a database to optimize the application performance. In most cases, one would not replace one of these technologies with the other, since they provide different set of capabilities. However, based on some of the new kinds of web workload, there are some key criteria where a distributed cache can be chosen over a database.

This blog is a synopsis from a recent presentation done at SQL PASS.

Considerations

Here are a set of questions to ask yourself when looking at this decision:

  • Are there expensive 'key based lookup' operations?
  • Are there rarely changing data items accessed frequently?
  • Are there a lot of temporal writes?
  • Do you need a scalable ASP.NET session store?
  • Is highly availability in memory enough instead of requiring durability?

Most benefit is got when objects cached in AppFabric are frequently accessed aggregated objects, created by executing JOIN across several tables by a stored procedure or by making a set of Web service calls or a combination of both. For example, consider a popular forums website with over 300M page views per month. Each time, a user visits the forums home page, the ASP.NET application might have to run a stored procedure to aggregate the set of Posts, the various related Threads, showcase stats for the number of unanswered questions and review rating for the popular topics in each category. With 100s of categories and 1000s of Posts, each Post having 10s of Threads very quickly this becomes a scaling problem. In order to make this efficient, the aggregated "ForumPost" object can be cached so that subsequent read requests are made against a distributed cache such as AppFabric Cache thus freeing up the database server for transactional and durable data. So one will have both the distributed cache and the database, just doing different things

Storing an entire table and raw rowsets in AppFabric cache is not going to be optimal, since there are serialization and de-serialization costs when doing GETs or PUTs from AppFabric Cache. The latency of requests will not be optimal. However, depending on the how overloaded the database system gets and your allowable performance metrics, this approach might be useful. However, this is not the typical usage.

Setting up High Availability is a configuration knob in AppFabric Cache. There is no need to have any high end hardware or complex deployment techniques. And it is available at a Named Cache level and allows to apply this selectively

AppFabric Cache provides elastic scale thus allowing your data or application tier to scale linearly. Adding or Removing nodes at run-time can be done based on your needs. This is due to the scale-out architecture that it uses by leveraging some core platform components such as Fabric and CAS (Common Availability Substrate)

ASP.NET session state is one such scenario where temporal reads & writes can remain in-memory, highly available and does not really need durable storage.

Another related scenario is when performing a lot of computations (reads and writes) with the need of a "centralized scratch pad", which again may not require durability. The final result could be persisted in durable storage, like a database server

If your application needs rich querying, then the relational operators & model will work. However, if you are dealing with complex event processing with real time querying involving time windows, a product like StreamInsight may be a better fit. AppFabric Cache has support for tags and allows 'Bulk' operations which may work as a basic workaround. However this does not provider querying functionality.

Transactions and Durability are some of the core tenets in database systems. AppFabric Cache does not support them out of the box. We have got ASKs about Write-Behind feature to persist the in-memory contents and this feature is being prioritized and evaluated for a future release.

Here is a scorecard that compares a database server with AppFabric cache based on the criteria above.

Criteria

Database server

AppFabric Cache

<key, value> where value is an aggregated object

Ease of setting up HA

Ease of Scale out

ACID properties

 

Temporal data

Read-only data

Rich query semantics

 

And finally, you would have both of them in your solution architecture, possibly warming up the cache with the aggregated objects and expiring them or keeping cached objects in sync explicitly with the backend changes.

You may not agree with the * rating since it varies by scenario, but some of the aspects should be factored in your decision criteria.

Happy Caching!

Contribution from Todd Robinsonis acknowledged.

Authored by: Rama Ramani
Reviewed by:
Quoc Bui , Christian Martinez