Essential Knowledge for Azure Table Storage
This post is devoted to Azure tables
- Azure tables are used to store non-relational structured data at massive scale
What you get | How much | |||
---|---|---|---|---|
Compute | 750 small compute hours per month | |||
Web sites | 10 web sites | |||
Mobile services | 10 mobile services | |||
Relational database | 1 SQL database | |||
SQL reporting | 100 hours per month | |||
Storage | 35GB with 50,000,000 storage transactions | |||
Bandwidth | Unlimited inbound & 25GB outbound | |||
CDN | 20GB outbound with 50,000 transactions | |||
Cache | 128MB | |||
Service bus | 1,500 relay hours and 500,000 messages | |||
Sign Up Link |
|
Azure Storage Options
- There are many types of storage options for the MS cloud. We will focus on Azure tables.
- Here is what we'll cover:
- When to use Azure Tables
- When are the appropriate to consider
- Understanding that Azure Tables are collection of entities
- Access Azure Tables directly or through a cloud application
- Key Features of Azure Tables
- Relationship between accounts, Tables, and entities
- Efficient Inserts and Updates
- Designing for scale
- Query Design and Performance
- Understanding Partition Keys
- How data is partitioned
- Coding considerations
- Azure Table Query Concepts
- Understanding TableServiceEntity/TableServiceContext
- Additional Resources
When to use Azure tables
- These are some typical use case scenarios for using Azure tables.
- Azure tables are optimized for capacity and performance (scale)
Azure Tables : When Appropriate
- SQL Database is limited currently to 150 GB without federation. Federation can be used to increase the size beyond 150 GB.
- If your code requires strong relational semantics, Azure tables are not appropriate. They don't allow for join statements.
- You can think of Azure tables as nothing more than a collection of objects. Note that each entity (similar to a row in a table) could have different attributes. In the diagram above, the second entity does not have a city property.
- One of the beauties of Azure Tables is that your can replicate across data centers, aiding in disaster recovery.
Tables: A collection of entities
- A table is a collection of entities.
- An entity is like an object. It has name/value pairs.
- An entity is kind of like a row in a relational database table, with the caveat that entities don't need to have the exact same attributes.
Accessing Azure Table Storage From Azure
- Any application that is capable of http is capable of communicating with Azure tables. That is because Azure tables are REST-based. This means a Java or PHP application can directly perform CRUD (create, read, update, delete) operations on an Azure Table.
Accessing Azure Table Storage From Azure
- Azure cloud applications can be hosted in the same data center as the Azure Table Storage. The compelling point here is that the latency from the cloud application is very low and can read and update the data at very high speeds.
Features: Azure Table Storage
- One of the key features of Azure tables is the low cost. You can use the Pricing Calculator to determine your predicted costs at https://www.windowsazure.com/en-us/pricing/calculator/
- It is important to remember that Azure tables are non-relational and therefore joins are not possible.
- Azure tables can automatically span multiple storage nodes, maintaining performance. This is based on the partition key that you define. It is very important to consider the partition key carefully as it determines performance.
- Transactions can occur only within Partition Keys. This is another example of why you must carefully consider Partition Keys.
- The data is replicated 3 times, including alternate data centers.
Relationships among accounts, tables, and entities
- Note that an account can have multiple tables and that each table can have one or more entities.
- Note the URL that is used to access your tables. This is the URL that any client that is http-capable can use.
Efficient Inserts and Updates
- Special semantics are available to make inserts and updates efficient. The bottom line is that you can do either an update or insert in just one operation.
Designing For Scale
- The Partition Key and RowKey are required properties for each entity. They play a key role on how the data is partitioned and scaled. They also determine performance for various queries. As mentioned previously, they also play a role in transactions (transactions cannot span Partition Keys).
- How to issue efficient queries will be addressed later in this post.
Query Design & Performance
- Performance is always an important consideration. The spectrum of speed varies considerably, depending on the type of query you issue. Specific examples are provided later in this post.
Understanding Partition Keys
- This slide illustrates how your entities get distributed across partition nodes. Note that the partition key determines how data is spread across storage nodes.
How Data is Partitioned
- The key point here is that every entity is uniquely identified by the combination of partition key and row key. You can think of partion key and row key together being similar to a primary index in a relational table.
How data is partitioned
- Azure will automatically manage both the partitioning and the replication of your entities. I am trying to emphasize how important it is to consider the partition key and row key.
Coding Considerations
- Note that Query 1 is fast because it performs and exact match on partition key and row key. It only returns one entity.
- Query 2 is slower than Query 1 because it does a range-based query.
- Query 3 is slower than Query 2 because it doesn't leverage the row key.
Azure Table Query Concepts
- Queries 4 and 5 are very slow because they don't use the partition key. This is equivalent to a full table scan with SQL Server. You want to avoid this at all costs. You may need to re-consider your partition keys and row keys if you find yourself issuing these type of queries.
- You may even want to keep duplicate copies of your data in other tables that are optimized for certain types of queries.
Understanding TableServiceEntity/TableServiceContext
- The table above stores email addresses. The partition key is the domain part of the email address and the mailname is the row key.
- TableServiceEntity and TableServiceContext are used when programming with C# or Visual Basic. By deriving from TableServiceEntity you can define your own entities that get stored in tables. TableServiceContext is used when you wish to perform CRUD operations on tables and is not illustrated here.
Additional Resources
- The Windows Azure Training Kit is the best way to get up and running.
- One of the labs is called Exploring Windows Azure Storage. It provides excellent examples on using storage.
- It can be found here (once you install the training kit) C:\WATK\Labs\ExploringStorage\HOL.htm
Thanks..
I appreciate that you took the time to read this post. I look forward to your comments.
Comments
Anonymous
December 20, 2012
Thank you, it was very useful.Anonymous
June 07, 2013
Great Explanation and very useful to remember the Azure Storage concepts. Thanks!!Anonymous
May 07, 2014
Simple and Clear... Very good explanation. Thanks!Anonymous
July 02, 2014
Simple and crystal clear description. It would be really useful for the starters in the azure table storage. Thank you for the post.Anonymous
July 25, 2014
Thanks for taking the time to put this together. It has been very helpful.Anonymous
October 08, 2014
Brilliant pics