Introducing Azure DocumentDB – Microsoft’s fully managed NoSQL document database service

Today is an extremely exciting day as we release Microsoft Azure DocumentDB, a fully managed, JSON document database service.

DocumentDB was built from ground up in response to the increasing demands of applications being developed here at Microsoft and by Microsoft Azure customers. We heard from customers that they need a database that can keep pace with their rapidly evolving applications – something fast, flexible and scalable. Increasingly NoSQL databases are becoming the tool of choice for many developers but running and managing these databases can be costly, especially at scale. We also heard that customers wanted more of the capabilities inherent to relational database systems – rich queries and transactional processing are still important.  Most data stores offer extreme choices to developers – strong or eventual consistency, schema-free with limited query capabilities or schematized and rich queries capabilities, transactions or scale and so on. The fact is that numerous real world scenarios exist between these extremes and we want to address them.

So we considered what it would take to build a massively scalable, schema-free database with rich query and transaction processing using the most ubiquitous programming language (JavaScript), data model (JSON) and transport protocol (HTTP) – that is DocumentDB. 

We decided to build a database engine which makes a deep commitment to the JSON data model and JavaScript language. This singular design choice, in-turn, enabled a set of distinctive capabilities including, the ability to automatically index documents without requiring any schema or secondary indices, the ability to issue SQL based relational and hierarchical queries over heterogeneous JSON values, the ability to integrate database transactions with JavaScript exceptions and the ability to seamlessly operate over JSON documents. As a multi-tenant database service, we have built each component of the stack with robust resource governance to ensure tenant isolation and the elastic scale of throughput and storage.  As engineers, we obsess relentlessly on site reliability, high availability, performance, and scale. Finally, we believe that databases should be blazingly fast and yet safe by default. 

Meeting the promise of schema-free

We wanted DocumentDB to support SQL queries over arbitrary documents without forcing the developer to create explicit schema or secondary indices or views. We wanted to give developers the freedom to rapidly iterate on application schema while preserving the ability to execute ad hoc queries. We also felt that queries should yield consistent results even when write rates are high.

Through the deep commitment to the JSON data model, DocumentDB is able to efficiently index, query and process heterogeneous documents. We designed the DocumentDB SQL language to be based on the JavaScript type system, expression semantics and ability to invoke JavaScript UDFs. DocumentDB’s query grammar adds document semantics, hierarchical and relational projections through a familiar SQL dialect for developers. This creates an efficient and natural way for you to query over JSON documents. The .NET SDK also includes a LINQ provider and we are considering native JavaScript mapping to our SQL query language.

We have designed the storage and indexing subsystem to serve consistent queries in the face of sustained high volumes of writes. This is accomplished using novel log structured storage techniques for index maintenance and indexing algorithms which fully exploit the SSDs. By default, all document properties are indexed and can be queried through the DocumentDB SQL query language.

More on DocumentDB SQL Query

Crowning JavaScript as a modern day T-SQL

For years developers have been able to rely on RDBMS systems for complex, transactional processing of data. As developers adopt NoSQL systems for the simplicity, speed and scale; they are often required to give up the transactional processing capabilities offered by traditional database systems. Database support for transactions provides a performant and robust programming model for dealing with concurrent changes. This can result in faster apps that are easy to maintain. We feel that support for application code execution within the database is important. But we don’t want to invent another procedural language. We want a broad set of developers to be able to write code that runs within the database, we also want the mapping from the procedural language to JSON be a seamless and natural as possible. So we chose JavaScript as the de facto language of DocumentDB – supported on all platforms, easy to understand with intrinsic support for JSON.

DocumentDB has deeply integrated JavaScript execution directly into the database engine. All execution of application JavaScript logic is sandboxed, resourced governed and fully isolated. DocumentDB lets developers write stored procedures and triggers natively in JavaScript. This allows developers to write application logic which can be shipped using HTTP POST and executed directly on the storage partition within a transaction boundary. JSON can be materialized as JavaScript objects and transactions can be aborted by throwing an exception. This approach of “JavaScript as a modern day T-SQL” frees application developers from the complexity of OR mapping technologies.

More on DocumentDB JavaScript Integration

More on DocumentDB Server Side JavaScript APIs – Stored Procedures, Triggers and UDFs

Tunable consistency and predictable performance

Eventually consistent systems can offer high availability and improved performance for applications. However as a developer it can be very challenging to build experiences in the face of eventually consistent data. There are no promises – data can be stale and out of order. While we are strong advocates of weaker consistency models (pun intended), we want to make sure that we provide a service that gives developers predictability, especially when it comes to data consistency. Why not give you the control to make smart and predictable tradeoffs when it comes to performance and consistency?

DocumentDB offers four distinct consistency levels for reads and queries - Strong, Bounded Staleness, Session, and Eventual. These well-defined consistency levels allow you to make sound tradeoffs between consistency, availability and latency. Bounded staleness guarantees both total ordering of writes as well as maximum staleness, a consistency level that is useful for applications dealing with time and ordered operations. Session consistency provides read your own write guarantees and can be a good match for user centric apps. These consistency levels are backed by predictable performance levels ensuring you can achieve consistent results for your application.

More on DocumentDB Consistency Levels

Seamless scale and delivered as a service

We hear frequently from customers that they don’t want to be consumed by managing, scaling and maintaining their database infrastructure. This is true for customers using relational databases as well as NoSQL databases. We feel that part of as-a-Service delivery means that developers should get fine grained control over how much of the service they consume and that scaling should be as simple as turning a dial. If you need more, turn the dial to increase your usage. If you need less, turn the dial back down. In either case, no downtime, no fuss, no problem. Continue to scale to as much as your application needs in either database storage and request throughput.

DocumentDB is a fully managed, multi-tenant Azure service and can be configured to scale with your user base. Database accounts can be easily created through the Azure portal with capacity to serve an application’s needs today. As these needs change, you can easily add or remove capacity. DocumentDB will allocate and reserve capacity exclusively for your application – this includes high performance database storage as well as dedicated request throughput capacity. This means that you get predictable performance with the ability to elastically scale by purchasing more capacity units.

Open and approachable

The world doesn’t necessarily need more data formats, procedural languages or protocols. The learning curve for new systems can be steep. Not to mention working with new and unfamiliar tools can slow you down. As we developed DocumentDB we firmly believed that we should resist the urge to be inventive where it didn’t deliver real value to you - the developer. Our goal with DocumentDB is to eliminate any friction associated with getting data in and using the service.

Programming against DocumentDB is simple, approachable and does not require you to buy into a specific tool-chain or require custom encodings or extensions to JSON or JavaScript. All functionality including CRUD, query and JavaScript processing is exposed over a RESTful HTTP interface. By offering simple, scalable JavaScript and JSON over HTTP, DocumentDB doesn’t invent in the area of data models, application models or protocols. DocumentDB’s uniqueness is in how it embraces these standards and offers distinctive, high value capabilities on top of them.

We have validated DocumentDB with first party applications at consumer scale. Today we are delighted to make DocumentDB is available to you through the Azure portal. In the coming weeks we’ll post more on both how to use DocumentDB as well as, the technical design of various sub-systems that make up the service.

To get started, visit the Azure DocumentDB service page.

- Azure DocumentDB Team

Comments

  • Anonymous
    August 21, 2014
    2 questions that come to mind, that need answering for me to even consider this..
  1. Seeing as I will store ALL my documents in Azure for this to work, how much will it cost me  (ALL Costs from bandwidth, storage, etc)
  2. How do I get my documents out , could be terrabytes, if I need to . Or if I build an archiving solution that needs to pull documents out.. How will this work.. I love this technology you've built BUT the biggest hurdles for me are what I mentioned above..
  • Anonymous
    August 21, 2014
    Looks like what I've been waiting for. Can't wait to try to replace my SQL Server/Table Storage mix data architecture with this one. Personally I don't like to use JavaScript more than absolutely necessary, but as far as I can see, also LINQ queries are supported against DocumentDB in .NET languages - so that's fine for me. Drawbacks are the costs (the preview price is OK but given this will be 100 % more expensive when out of Beta, it could be a hurdle for people needing lots of storage for historical data like me) and the document query limit of 2.000/Sec - if I'm right and this also means that a single query returning 2.000 docs already puts me to the limit if I don't throw more money at you :) However, it looks like it could be the solution I've been waiting for in Azure ...

  • Anonymous
    August 21, 2014
    Do you have document encryption on the road map?

  • Anonymous
    August 21, 2014
    My current system stores millions of documents quite efficiently using Couchbase Server (CS, hereafter). The downside in Azure is that I have to maintain VMs to run CS on myself. Issuing patches, etc. It seems to me that this could be a reasonable replacement for CS and save me some work. That said, there are features I use quite heavily that Azure DocumentDB does not seem to have yet. The first one is the concept of a view. It is essentially an index of documents that is built on the fly as data comes in. I can quickly pull large lists of data using it. The second one is an in-memory layer (memcached) of the most popular documents/views. This allow me very quick access for both pulling and pushing data. The speed of these features are absolutely vital to the performance of my very data intensive system. Do you expect Azure DocumentDB to have these features in the future? Thanks, Corey

  • Anonymous
    August 21, 2014
    Most of the application which were going with Table storage (no-SQL Storage)was facing transactional problems and were a bottleneck for the transition to cloud. Azure DocumentDB will solve remove all those obstacles ! Future is cloud ! Thanks for the awesome alternative !

  • Anonymous
    August 22, 2014
    To second Corey's question: Lists/Views/Paging seems to be missing.  There is an example in the code samples that shows how to somewhat do this in a stored proc, but it seems as if it can't handle millions of items, and requires you to pull back all data to sort.  Ideally this could be controlled by a View or a special index of some kind. Also, is there a story on partial document updates? If I have a large document but just want to change one property or maybe add something to a collection, can I just do that? Lastly, the guidance on CUs and Collections is a little confusing.  I can't really tell if you are recommending a sharing approach (as it is mentioned that Collections are how data is partitioned for scale).  So, if I have 10 CUs, do I need to have 10 collections, or can I just have 1 collection that gets all of the resources of the 10 CUs?  

  • Anonymous
    August 22, 2014
    Hey - Just wanted to check if you have the typescript definition file for Javascript SDK

  • Anonymous
    August 22, 2014
    Is Azure DocumentDB covered under Microsoft Trust / HIPAA compliant? (as it appears there is governance setup robustly within a multi-tenant environment)? Is there a BA agreement on this product? thank you for your time and assistance on this question.

  • Anonymous
    August 23, 2014
    Its great to see the Azure team innovating in this space and will be great if their effort with DocumentDB can be released as a standalone db with a community (free) and self-hosting(use your own nodes at an affordable price point) version. This allows different projects and businesses to have a pick at what deployment (community, self-hosting, azure cloud) works for them.

  • Anonymous
    August 25, 2014
    @jose fajardo, to find out how much DocumentDB will cost, please take a look at this page azure.microsoft.com/.../documentdb. In order to export documents in bulk from DocumentDB, you can use the ReadFeed method from any of the client SDKs. The response from the method can then be streamed to the local file system or e.g., Azure Blob storage for archival. If you’d like to see archiving capabilities in the service please post your suggestions to feedback.azure.com/.../263030-documentdb

  • Anonymous
    August 25, 2014
    @Jeff, we do have encryption of the list of future feature work. Please help us prioritize the timing by voting for it on feedback.azure.com/.../263030-documentdb.

  • Anonymous
    August 25, 2014
    @Corey, these are great suggestions. Regarding views, note that views that are based on filters and projections can be accomplished by simple SQL queries since DocumentDB supports automatic indexing. We do understand that there are scenarios where views based on aggregates are quite useful. Please post your feedback at feedback.azure.com/.../263030-documentdb. Common documents will be read from memory in DocumentDB. You can also implement a caching layer in your application by using DocumentDB with Azure Cache. We will add more documentation and tooling on how to do this.

  • Anonymous
    August 25, 2014
    @Ryan LM, for paging, please take a look at the QueryWithPaging method in the MSDN samples in this file (code.msdn.microsoft.com/.../sourcecode). Sorting (order by) and partial document updates are planned for future updates. Please vote for these features at feedback.azure.com/.../263030-documentdb. In the preview offers, the maximum size of a collection is 10GB. To fully utilize 10CUs, you should create at least 10 collections. With 10 CUs: • If you create 10 collections, they will be allocated 2000 request units each = total of 20,000 request units. • If you create 30 collections, they will be allocated 667 request units each also = total of 20,000 request units. Hope that helps.

  • Anonymous
    August 25, 2014
    @Emmaneul Buah, thanks for the feedback. Please vote for standalone installations of DocumentDB at feedback.azure.com/.../263030-documentdb. Be sure to distinguish whether you want a stand-alone deployment option vs. a local emulator.

  • Anonymous
    August 25, 2014
    @Sushant, we don't have the Typescript definition file for the JavaScript SDK. It's a great suggestion - please propose it at feedback.azure.com/.../263030-documentdb

  • Anonymous
    August 26, 2014
    Will there be an on-premise variant as well?

  • Anonymous
    August 27, 2014
    @LeRenard242 If you would like to see this as a feature, please go vote for it on feedback.azure.com/.../6352899-on-premise-instance

  • Anonymous
    August 29, 2014
    Can I use DocumentDB in my local server instead of Azure ? If the answer is NO, then sorry but I'm not interested in about the db anymore.

  • Anonymous
    September 02, 2014
    "This allows developers to write application logic which can be shipped using HTTP POST and executed directly on the storage partition within a transaction boundary." I'm very surprised to read this. After all these years of education around SQL injection, we now encourage "procedure injection"? In principle, anonymouse T-SQL blocks could be sent to the database from clients already today, so this isn't even novel...

  • Anonymous
    September 03, 2014
    The comment has been removed

  • Anonymous
    September 04, 2014
    In msdn. msdn.microsoft.com/.../microsoft.azure.documents.client.connectionmode.aspx > Direct and Gateway connectivity modes are supported. Direct is the default. But now, Gateway is default.

  • Anonymous
    September 11, 2014
    The comment has been removed

  • Anonymous
    October 17, 2014
    What is the roadmap of Java SDK for DocumentDB

  • Anonymous
    October 21, 2014
    @Mallikarjun, we are entering the final stages of the Java SDK. we can expect the first version of this to be published within weeks. Follow us on twitter at @DocumentDB or check back on the blog for the announcement when it happens.

  • Anonymous
    January 07, 2015
    Is there a date for general availability of DocumentDB? If not, can you give any kind of indication as to when this will happen?

  • Anonymous
    January 20, 2015
    @Paul we're hard at work to make this happen. All I can say here is expect something by the end of 2015 Q1.