Azure Table Service

I was recently helping a customer design a web application that would allow them to track donors and donations to their organization. They had tried going the Software As A Service route, but found them to be too expensive and difficult to get started with.

They had a small in-house Agile team that was tasked with getting this into production. One of my first meetings with them was around the data model and the requirements document they needed to give to their SQL DBA to provision the databases for them.

  • At this point the project was just starting and the requirements weren't really that clear.
  • Developing the relational model was taking too long.
  • They wanted something that would be super fast and responsive from multiple platforms and locations.
  • They didn't really want to deal with backups.

Using a SQL DB on their existing infrastructure seemed to be the obvious choice, but it seemed to be overkill for an application like this.

  • The users weren't going to be accessing the data outside the application. i.e. SQL Management Studio, etc.
  • The relationships between entities were going to be modelled in code anyways, the SQL relational model wasn't really going to add any value.
  • Once the relational model is defined and implemented, any future changes requiring changes to the model\relationships is a lot more effort.
  • Using SQL required a lot of upfront planning and forecasting to make sure that the server and databases were sized properly, backups, etc.
  • SQL is great for Scale Up approach. However this particular organization had decided last year that all future development was going to be based on a Scale Out\Low cost commodity hardware approach.
  • Connectivity from the internet was going to be a challenge. Need to open ports\firewalls into the private datacenter.

We started having conversations about using an alternate approach, the best option that came to mind was Azure Tables. It met most of the customers requirements, although it was a new paradigm and they weren't fully convinced. As we talked through the benefits, they quickly realized the savings in costs and effort.

  • Pay as you go. Much lower licensing costs. Costs per GB much lower than they could achieve on their own.
  • No need to worry about backups.
  • Available on the internet directly, Can use Shared Access Control to restrict people accessing the data selectively.
  • Faster development time. Don't need to define the data model upfront. Can easily add new entities as you go along.Easily re-partition data as requirements are added and evolved

 

https://azure.microsoft.com/en-us/documentation/articles/storage-table-design-guide/

What is the Table service? As you might expect from the name, the Table service uses a tabular format to store data. In the standard terminology, each row of the table represents an entity, and the columns store the various properties of that entity. Every entity has a pair of keys to uniquely identify it, and a timestamp column that the Table service uses to track when the entity was last updated.

The account name, table name and PartitionKey together identify the partition within the storage service where the table service stores the entity.

In the Table service, an individual node services one or more complete partitions and the service scales by dynamically load-balancing partitions across nodes. If a node is under load, the table service can split the range of partitions serviced by that node onto different nodes; when traffic subsides, the service can merge the partition ranges from quiet nodes back onto a single node.

The article above also talks helps developers understand how to optimize the key\value store for Read or Write heavy scenarios. The best thing that I like about the Azure Table service is forces the developer to think about the query they need to fulfill the application requirements...and then store data based on that design. Table storage is so cheap that it allows you to de-normalize data and store different views of the same data to help make the application go faster.

https://azure.microsoft.com/en-us/documentation/articles/storage-table-design-guide/#design-for-querying

The article above also has some great information about Design Patterns and Modelling relationships in a lightweight scalable way.

 

 

For your next application, do consider using Azure Table, or some other similar NoSQL persistent storage.

  • Do you really need a full fledge SQL DB for your application?
  • Are you fully going to leverage the complex modelling and relationships that SQL provides? Maybe your application code is going to implement anyways.
  • Managing the SQL backend and infrastructure is going to require additional work and operational overhead.

I don't think that SQL DBs are going away by any means. There is still a place for SQL in complex business applications, where access to the data is required through multiple channels, Business Intelligence needs, etc. However, I would challenge you to really think through the need for SQL rather than using it as the default.

Some scenarios could also benefit from the Azure SQL service. You get all the functionality of SQL, but don't need to manage and tune the infrastructure. Let the service do that for you, but still have all the full SQL functionality that you are used to.

I think the future of application development is going to tilt heavily in the No\Low SQL direction. Moving towards DataLayer-as-a-Service is definitely going to help minimize developer effort while providing higher scalability and reliability at a lower cost. Who could argue with that :) .

Comments

  • Anonymous
    July 21, 2015
    Huge fan of Table Storage...but I turn to Azure SQL when I know I'm going to need to let users page through data.  The Skip, Take LINQ syntax really is a life savor.  I even jumped on DocumentDB until I figured out I could sort, skip, take with that either. What other alternatives are there?  Example: User adds another (contact, widget, entry, etc...).

  • Anonymous
    July 21, 2015
    Hi Brian...not sure if I understand the question fully, but are you looking for something similar to SkipTake for Table Storage? You should be able to use similar syntax using the EnumerableAsync method or similar. Take a look at this blog post and let me know if this is what you had in mind? azure.microsoft.com/.../better-support-for-paging-with-table-storage-in-azure-mobile-services-net-backend Otherwise, can you provide some more details about the question, and I can try to answer the question.

  • Anonymous
    July 21, 2015
    In my first comment I actually meant "I even jumped on DocumentDB until I figured out I could NOT sort, skip, take, count". I understand that I could essentially get ALL the data and page through it myself, only displaying the relevant page number/size.  I guess my problem is a mindset issue.  In the past it was important to only send the information you want across the wire.  If you need page 100 and your page size is 10...I would only request those ten records from SQL, and that is what was sent across the wire. But using table storage proves to be more "difficult" (not really, just more work) to store some kind of "Count" row so that I can determine how many "pages" there are.  Then every time they page I need to download the previous items and get the "next batch".   I LOVE table storage and use it for fast Key/Value lookups. I've got millions of rows in some of my tables and with unique Partition/Row keys it's blazing fast.  I just wish I could use it in a more generic sense.  Again, maybe it's a mentality issue I need to get over, but not having Count(), Skip(), or Take() means I can't use it for a lot of common UI scenarios.

  • Anonymous
    July 27, 2015
    I hear the feedback Brian. You are correct on that one. I am going to send a note to dev in the Table Storage team and see what he has to say. They are more than likely going to point to SQL Azure as a solution if we need more server side processing. However, that being said, I don't think implementing the feature you requested would be that big a task or introduce significant overhead. The Storage service is sending back the continuation token anyways, so it could send back empty tablesets for Skip(), or the actual results for Take(). Count() would be harder since you would need to update the count on the insert, or do a scan on read. With different kind of objects being stored possibly on the same table, it could get more complicated to and resource intensive to count a particular query. I do agree that an aggregate count of partition keys could be done quickly walking through the storage tree. At this point however, I think SQL Azure may be a better solution. Great feedback though...thanks for the comments!

  • Anonymous
    January 26, 2016
    The comment has been removed