Introducing Project Codename "Velocity"

The advances in processors, memory, storage, and connectivity have paved the way for next-generation applications that are data-driven, whose data could reside anywhere (i.e. on the desktops, mobile devices, servers, and in the cloud) and that require access from anywhere (i.e. local, remote, over the network, from mobile devices, in connected and disconnected mode). This trend has led to the development of distributed, multi-tiered, and composite application architectures for the web and for the enterprise. A typical enterprise application accesses data from multiple data sources, integrates that data, re-shapes (or transforms) that data into a form most suitable for the application (typically into object form like C# or Java object), and writes application logic. The same is true of web applications – consider social networking apps or mashups – they access data from multiple web sources, over the internet, aggregate it, execute application logic, and generate pages for web interaction. As these styles of multi-tiered web and enterprise application are becoming main stream, the demand for application performance and scale is increasing. End users become less tolerant and more frustrated when a web application cannot respond in milliseconds; web applications that cannot scale, as the number of concurrent accesses increase, lose traffic and thereby business. Fundamentally, we have all begun to expect high performance and scale from every application. And let’s not forget application availability. For similar reasons to those I describe above, an application cannot be down. We cannot imagine the MSN portal or the Amazon web site, or the corporate SAP financial application being down when we need it. We expect to access our personal information on MSN at any time; consumers do business with Amazon at any time and from anywhere. Fundamentally, applications need to be available all the time to support access at any time, and from anywhere. Another major expectation, especially from application developers and from application hosters is that of scalable and available applications at a low cost. A decade ago, only mission-critical businesses could afford to invest in large and expensive infrastructure (both hardware and software) to support scale and availability of their applications. But, now with web hosting, everyone expects and demands high scale and availability at low cost. Extending this even further, not only developers want cheap scalable and available applications, they want the ability to develop (and deploy) such applications very rapidly.

To cope with competitive pressure, both from an innovation and a deployment perspective, rapid development and deployment of these applications is critical for application vendors.  In turn, application developers are looking for application infrastructure that enables them to build highly performant, scalable, and available applications using commodity hardware and software, at a rapid pace. Traditional application platforms like the .NET and Java platforms, which are known for rapid multi-tier application development and deployment, are required to provide the scalability and availability infrastructure. 

Distributed cache is becoming the key application platform component for providing scalability and high availability. In-memory caching has been traditionally used primarily for meeting the high performance requirements of applications. By fusing caches on multiple nodes into a single unified cache however, the distributed caches offer not only high performance, but also scale. By maintaining copies of data on multiple cache nodes (in a mutually consistent manner), the distributed cache can also offer high availability to applications. Distributed caches are especially ideal for applications with the following characteristics:

  • There is a considerable number of data requests that are mostly read (e.g. product catalogs)
    • Large concurrent access to such data can be provided by replicating the catalog data on multiple cache nodes. Since updates are infrequent to such data, maintaining consistency (synchronously or asynchronously) is not very expensive
  • Applications that can tolerate some staleness of data
    • Such applications can provide better performance and scale by not requiring immediate updates ore refreshing of caches
  • Applications that can work with highly partitioned data (e.g. session data, shopping cart)
    • High scale and performance can be supported by partitioning and distributing data across multiple cache nodes, and thereby distributing data processing across the cache nodes
  • Applications that can work well with eventual consistency
    • Consider a flight inventory application, which must satisfy a large number of concurrent read/writes to the inventory of seats. To support large scale, the distributed cache may replicate the inventory value on multiple nodes; however, the inventory values on different nodes have to be made consistent in some fashion.  Requiring immediate (also known as strong) consistency will require updates to be synchronously propagated to all the copies. Such action would impact the overall performance and scale of the application. However, instead of immediately making the copies consistent, allowing them to eventually (in an asynchronous manner) become consistent will provide low latency, high performance access to inventory.

As distributed caches become more widely deployed, I believe over the next few years, distributed cache will be used as the first tier of all data access. Multi-tier application architecture will include the cache tier as a data access tier between the application server tier and the backend data tier.

Today, Microsoft is announcing the first CTP of a distributed caching product to provide the .NET application platform support for developing highly performant, scalable, and highly available applications. The project code named “Velocity” is a distributed cache that allows any type of data (CLR object, XML document, or binary data) to be cached. “Velocity” fuses large numbers of cache nodes in a cluster into a single unified cache and provides transparent access to cache items from any client connected to the cluster. https://msdn.microsoft.com/data provides additional information about project code named “Velocity” as well as links to download our first CTP.

Distributed caches are not new – during the last couple of years several caching products have emerged to address the performance and scalability needs of applications. Most of these products are point products, primarily supporting key-based access. Other than memcached, which is an open source technology, most others target enterprises and enterprise workloads and scale. I think the web workloads require considerably large scale, with 1000s of cache nodes in a cluster. The web scale distributed caches not only require mechanisms that can scale and provide availability in very large clusters, they must be easy to manage or self-managed. In the Future, “Velocity” envisions being an integral part of the .NET application stack targeting both enterprise and web workloads (and scale). As applications start using the caches for data access, I also believe, they will demand richer data services like query, transactions, analytics, synchronization etc. For example, we believe .NET applications will require LINQ queries on the distributed cache, the same way they query the backend SQL Server database. We envision “Velocity” becoming such a comprehensive distributed caching platform. The performance, scale, and availability functionality of “Velocity” along with its rich data services will allow for rich web and enterprise applications development and deployment.

Anil Nori
Microsoft Distinguished Engineer

Comments

  • Anonymous
    June 03, 2008
    Here are the highlights from the TechEd 2008 Keynote (as seen from afar by watching the TechEd 2008 Keynote

  • Anonymous
    June 03, 2008
    Did no one notice that there is an Open Source project Velocity that has been arround for many years. http://velocity.apache.org/ is a templating engine written in pure java. Perhaps Microsoft should check it's project names before making announcements!!

  • Anonymous
    June 03, 2008
    I always enjoy reading Microsoft tech announcements simply because they just seem to "nail" business requirements and building the proper solutions / technology. The quote the announcement: "Distributed cache is becoming the key application platform

  • Anonymous
    June 04, 2008
    There is TechEd in Orlando, and this means, that there are a lot of new announcements. Let’s fill up

  • Anonymous
    June 04, 2008
    There is TechEd in Orlando, and this means, that there are a lot of new announcements. Let’s fill up

  • Anonymous
    June 05, 2008
    Microsoft Project Codename "Velocity" : Introducing Project Codename "Velocity" (소식)

  • Anonymous
    June 06, 2008
    General The Design Is Never Right The First Time : This is hands down the best reason why an iterative approach to software development is the way to go...get something working in front of the decision makers early and often. Phil Haack presents an example

  • Anonymous
    June 15, 2008
    作者JonathanAllen译者胡键发布于2008年6月11日上午5时8分 社区 Architecture, .NET, SOA 主题 .NET框架, 企业架构 标...

  • Anonymous
    June 25, 2008
    h2.entry-title { font-size: 1.1em; clear: left; } ul.hfeed { list-style-type: none; } li.xfolkentry

  • Anonymous
    July 08, 2008
    Finally have a few minutes to play with Velocity , Microsoft's new distributed cache offering that's

  • Anonymous
    July 08, 2008
    Finally have a few minutes to play with Velocity , Microsoft's new distributed cache offering that's

  • Anonymous
    July 28, 2008
    Цветик-семицветик ... Velocity... или зачем нам еще один Distributed Cache? http://dev.net.ua/blogs/leshchinsky/archive/2008/07/16/6492.aspx

  • Anonymous
    November 28, 2008
    在过去几年里,从主流的Java应用到象Erlang这样的边缘语言,分布式内存缓存的应用相当流行。为了继续疯狂赶超开源世界中处于支配地位的技术,微软也引入了它的分布式缓存。Velocity是专门针...

  • Anonymous
    December 15, 2008
    The comment has been removed

  • Anonymous
    December 16, 2008
    The comment has been removed

  • Anonymous
    February 22, 2009
    General The Design Is Never Right The First Time : This is hands down the best reason why an iterative approach to software development is the way to go...get something working in front of the decision makers early and often. Phil Haack presents an example

  • Anonymous
    June 10, 2009
    “Velocity” è il codename di un sistema distribuito di in-memory caching molto utilie nel disegno di applicazioni