Windows Azure Storage Client for Java Tables Deep Dive
This blog post serves as an overview to the recently released Windows Azure Storage Client for Java which includes support for the Azure Table Service. Azure Tables is a NoSQL datastore. For detailed information on the Azure Tables data model, see the resources section below.
Design
There are three key areas we emphasized in the design of the Table client: usability, extensibility, and performance. The basic scenarios are simple and “just work”; in addition, we have also provided three distinct extension points to allow developers to customize the client behaviors to their specific scenario. We have also maintained a degree of consistency with the other storage clients (Blob and Queue) so that moving between them feels seamless. There are also some features and requirements that make the table service unique.
For more on the overall design philosophy and guidelines of the Windows Azure Storage Client for Java see the related blog post in the Links section below.
Packages
The Storage Client for Java is distributed in the Windows Azure SDK for Java jar (see below for locations). The Windows Azure SDK for Java jar also includes a “service layer” implementation for several Azure services, including storage, which is intended to provide a low level interface for users to access various services in a common way. In contrast, the client layer provides a much higher level API surface that is more approachable and has many conveniences that are frequently required when developing scalable Windows Azure Storage applications. For the optimal development experience avoid importing the base package directly and instead import the client sub package (com.microsoft.windowsazure.services.table.client). This blog post refers to this client layer.
Common
com.microsoft.windowsazure.services.core.storage – This package contains all storage primitives such as CloudStorageAccount, StorageCredentials, Retry Policies, etc.
Tables
com.microsoft.windowsazure.services.table.client – This package contains all the functionality for working with the Windows Azure Table service, including CloudTableClient, TableServiceEntity, etc.
Object Model
A diagram of the table object model is provided below. The core flow of the client is that a user defines an action (TableOperation, TableBatchOperation, or TableQuery) over entities in the Table service and executes these actions via the CloudTableClient. For usability, these classes provide static factory methods to assist in the definition of actions.
For example, the code below inserts a single entity:
tableClient.execute([Table Name], TableOperation.insert(entity));
Figure 1: Table client object model
Execution
CloudTableClient
Similar to the other Azure storage clients, the table client provides a logical service client, CloudTableClient, which is responsible for service wide operations and enables execution of other operations. The CloudTableClient class can update the Storage Analytics settings for the Table service, list all the tables in the account, and execute operations against a given table, among other operations.
TableRequestOptions
The TableRequestOptions class defines additional parameters which govern how a given operation is executed, specifically the timeout and RetryPolicy that are applied to each request. The CloudTableClient provides default timeout and RetryPolicy settings; TableRequestOptions can override them for a particular operation.
TableResult
The TableResult class encapsulates the result of a single TableOperation. This object includes the HTTP status code, the ETag and a weak typed reference to the associated entity.
Actions
TableOperation
The TableOperation class encapsulates a single operation to be performed against a table. Static factory methods are provided to create a TableOperation that will perform an insert, delete, merge, replace, retrieve, insertOrReplace, and insertOrMerge operation on the given entity. TableOperations can be reused so long as the associated entity is updated. As an example, a client wishing to use table storage as a heartbeat mechanism could define a merge operation on an entity and execute it to update the entity state to the server periodically.
Sample – Inserting an Entity into a Table
// You will need the following imports
import com.microsoft.windowsazure.services.core.storage.CloudStorageAccount;
import com.microsoft.windowsazure.services.table.client.CloudTableClient;
import com.microsoft.windowsazure.services.table.client.TableOperation;
import com.microsoft.windowsazure.services.table.client.TableServiceEntity;
// Create the table client.
CloudTableClient tableClient = storageAccount.createCloudTableClient();
tableClient.createTableIfNotExists("people");
// Create a new customer entity.
CustomerEntity customer1 = new CustomerEntity("Harp", "Walter");
customer1.setEmail("Walter@contoso.com");
customer1.setPhoneNumber("425-555-0101");
// Create an operation to add the new customer to the people table.
TableOperation insertCustomer1 = TableOperation.insert(customer1);
// Submit the operation to the table service.
tableClient.execute("people", insertCustomer1);
TableBatchOperation
The TableBatchOperation class represents multiple TableOperation objects which are executed as a single atomic action within the table service. There are a few restrictions on batch operations that should be noted:
- You can perform batch updates, deletes, inserts, merge and replace operations.
- A batch operation can have a retrieve operation, if it is the only operation in the batch.
- A single batch operation can include up to 100 table operations.
- All entities in a single batch operation must have the same partition key.
- A batch operation is limited to a 4MB data payload.
The CloudTableClient.execute overload which takes as input a TableBatchOperation will return an ArrayList of TableResults which will correspond in order to the entries in the batch itself. For example, the result of a merge operation that is the first in the batch will be the first entry in the returned ArrayList of TableResults. In the case of an error the server may return a numerical id as part of the error message that corresponds to the sequence number of the failed operation in the batch unless the failure is associated with no specific command such as ServerBusy, in which case -1 is returned. TableBatchOperations, or Entity Group Transactions, are executed atomically meaning that either all operations will succeed or if there is an error caused by one of the individual operations the entire batch will fail.
Sample – Insert two entities in a single atomic Batch Operation
// You will need the following imports
import com.microsoft.windowsazure.services.core.storage.CloudStorageAccount;
import com.microsoft.windowsazure.services.table.client.CloudTableClient;
import com.microsoft.windowsazure.services.table.client.TableBatchOperation;
import com.microsoft.windowsazure.services.table.client.TableServiceEntity;
// Create the table client.
CloudTableClient tableClient = storageAccount.createCloudTableClient();
tableClient.createTableIfNotExists("people");
// Define a batch operation.
TableBatchOperation batchOperation = new TableBatchOperation();
// Create a customer entity and add to the table
CustomerEntity customer = new CustomerEntity("Smith", "Jeff");
customer.setEmail("Jeff@contoso.com");
customer.setPhoneNumber("425-555-0104");
batchOperation.insert(customer);
// Create another customer entity and add to the table
CustomerEntity customer2 = new CustomerEntity("Smith", "Ben");
customer2.setEmail("Ben@contoso.com");
customer2.setPhoneNumber("425-555-0102");
batchOperation.insert(customer2);
// Submit the operation to the table service.
tableClient.execute("people", batchOperation);
TableQuery
The TableQuery class is a lightweight query mechanism used to define queries to be executed against the table service. See “Querying” below.
Entities
TableEntity interface
The TableEntity interface is used to define an object that can be serialized and deserialized with the table client. It contains getters and setters for the PartitionKey, RowKey, Timestamp, Etag, as well as methods to read and write the entity. This interface is implemented by the TableServiceEntity and subsequently the DynamicTableEntity that are included in the library; a client may implement this interface directly to persist different types of objects or objects from 3rd-party libraries. By overriding the readEntity or writeEntity methods a client may customize the serialization logic for a given entity type.
TableServiceEntity
The TableServiceEntity class is an implementation of the TableEntity interface and contains the RowKey, PartitionKey, and Timestamp properties. The default serialization logic TableServiceEntity uses is based off of reflection where an entity “property” is defined by a class which contains corresponding get and set methods where the return type of the getter is the same as that of the input parameter of the setter. This will be discussed in greater detail in the extension points section below. This class is not final and may be extended to add additional properties to an entity type.
Sample – Define a POJO that extends TableServiceEntity
// This class defines one additional property of integer type, since it extends
// TableServiceEntity it will be automatically serialized and deserialized.
public class SampleEntity extends TableServiceEntity {
private String SampleProperty;
public String getSampleProperty() {
return this.SampleProperty;
}
public void setSampleProperty (String sampleProperty) {
this.SampleProperty= sampleProperty;
}
}
DynamicTableEntity
The DynamicTableEntity class allows clients to update heterogeneous entity types without the need to define base classes or special types. The DynamicTableEntity class defines the required properties for RowKey, PartitionKey, Timestamp, and Etag; all other properties are stored in a HashMap form. Aside from the convenience of not having to define concrete POJO types, this can also provide increased performance by not having to perform serialization or deserialization tasks. We have also provided sample code that demonstrates this.
Sample – Retrieve a single property on a collection of heterogeneous entities
// You will need the following imports
import com.microsoft.windowsazure.services.table.client.CloudTableClient;
import com.microsoft.windowsazure.services.table.client.DynamicTableEntity;
import com.microsoft.windowsazure.services.table.client.EntityProperty;
import com.microsoft.windowsazure.services.table.client.TableQuery;
// Define the query to retrieve the entities, notice in this case we
// only need to retrieve the Count property.
TableQuery<DynamicTableEntity> query = TableQuery.from(tableName, DynamicTableEntity.class).select(new String[] { "Count" });
// Note the TableQuery is actually executed when we iterate over the
// results. Also, this sample uses the DynamicTableEntity to avoid
// having to worry about various types, as well as avoiding any
// serialization processing.
for (DynamicTableEntity ent : tableClient.execute(query)) {
EntityProperty countProp = ent.getProperties().get("Count");
// Users should always assume property is not there in case another
// client removed it.
if (countProp == null) {
throw new IllegalArgumentException("Invalid entity, Count property not found!");
// Display Count property, however you could modify it here and persist it back to the service.
System.out.println(countProp.getValueAsInteger());
}
}
EntityProperty
The EntityProperty class encapsulates a single property of an entity for the purposes of serialization and deserialization. The only time the client has to work directly with EntityProperties is when using DynamicTableEntity or implementing the TableEntity.readEntity and TableEntity.writeEntity methods. The EntityProperty stores the given value in its serialized string form and deserializes it on each subsequent get.
Please note, when using a non-String type property in a tight loop or performance critical scenario, it is best practice to not update an EntityProperty directly, as there will be a performance implication in doing so. Instead, a client should deserialize the entity into an object, update that object directly, and then persist that object back to the table service (See POJO Sample below).
The samples below show two approaches that can be a players score property. The first approach uses DynamicTableEntity to avoid having to declare a client side object and updates the property directly, whereas the second will deserialize the entity into a POJO and update that object directly.
Sample –Update of entity property using EntityProperty
// You will need the following imports
import com.microsoft.windowsazure.services.core.storage.StorageException;
import com.microsoft.windowsazure.services.table.client.CloudTableClient;
import com.microsoft.windowsazure.services.table.client.DynamicTableEntity;
import com.microsoft.windowsazure.services.table.client.EntityProperty;
import com.microsoft.windowsazure.services.table.client.TableOperation;
import com.microsoft.windowsazure.services.table.client.TableResult;
// Retrieve entity
TableResult res = tableClient.execute("gamers", TableOperation.retrieve("Smith", "Jeff", DynamicTableEntity.class));
DynamicTableEntity player = res.getResultAsType();
// Retrieve Score property
EntityProperty scoreProp = player.getProperties().get("Score");
if (scoreProp == null) {
throw new IllegalArgumentException("Invalid entity, Score property not found!");
}
scoreProp.setValue(scoreProp.getValueAsInteger() + 1;
// Store the updated score
tableClient.execute("gamers", TableOperation.merge(player));
Sample – Update of entity property using POJO
// You will need the following imports
import com.microsoft.windowsazure.services.core.storage.StorageException;
import com.microsoft.windowsazure.services.table.client.CloudTableClient;
import com.microsoft.windowsazure.services.table.client.DynamicTableEntity;
import com.microsoft.windowsazure.services.table.client.EntityProperty;
import com.microsoft.windowsazure.services.table.client.TableOperation;
import com.microsoft.windowsazure.services.table.client.TableResult;
import com.microsoft.windowsazure.services.table.client.TableServiceEntity;
// Entity type with a score property
public class GamerEntity extends TableServiceEntity {
private int score;
public int getScore() {
return this.score;
}
public void setScore(int Score) {
this.score = Score;
}
}
// Retrieve entity
TableResult res = tableClient.execute("gamers", TableOperation.retrieve("Smith", "Jeff", GamerEntity.class));
GamerEntity player = res.getResultAsType();
// Update Score
player.setScore(player.getScore() + 1);
// Store the updated score
tableClient.execute("gamers", TableOperation.merge(player));
Serialization
There are three main extension points in the table client that allow a user to customize serialization and deserialization of entities. Although completely optional, these extension points enable a number of use-specific or NoSQL scenarios.
EntityResolver
The EntityResolver interface defines a single method (resolve) and allows client-side projection and processing for each entity during serialization and deserialization. This interface is designed to be implemented by an anonymous inner class to provide custom client side projections, query-specific filtering, and so forth. This enables key scenarios such as deserializing a collection of heterogeneous entities from a single query.
Sample – Use EntityResolver to perform client side projection
// You will need the following imports
import com.microsoft.windowsazure.services.table.client.CloudTableClient;
import com.microsoft.windowsazure.services.table.client.EntityProperty;
import com.microsoft.windowsazure.services.table.client.EntityResolver;
import com.microsoft.windowsazure.services.table.client.TableQuery;
// Define the query to retrieve the entities, notice in this case we
// only need to retrieve the Email property.
TableQuery<Customer> query = TableQuery.from(tableName, Customer.class).select(new String[] { "Email" });
// Define a Entity resolver to mutate the entity payload upon retrieval.
// In this case we will simply return a String representing the customers Email
// address.
EntityResolver<String> emailResolver = new EntityResolver<String>() {
@Override
public String resolve(String PartitionKey, String RowKey, Date timeStamp, HashMap<String, EntityProperty> props, String etag) {
return props.get("Email").getValueAsString();
}
};
// Display the results of the query, note that the query now returns
// Strings instead of entity types since this is the type of
// EntityResolver we created.
for (String projectedString : tableClient.execute(query, emailResolver)) {
System.out.println(projectedString);
}
Annotations
@StoreAs
The @StoreAs annotation is used by a client to customize the serialized property name for a given property. If @StoreAs is not used, then the property name will be used in table storage. The @StoreAs annotation cannot be used to store PartitionKey, RowKey, or Timestamp, if a property is annoted as such it will be ignored by the serializer. Two common scenarios are to reduce the length of the property name for performance reasons, or to override the default name the property may have.
Sample – Alter a property name via the @StoreAs Annotation
// You will need the following imports
import com.microsoft.windowsazure.services.table.client.StoreAs;
import com.microsoft.windowsazure.services.table.client.TableServiceEntity;
// This entity will store the CustomerPlaceOfResidenceProperty as “cpor” on the service.
public class StoreAsEntity extends TableServiceEntity {
private String cpor;
@StoreAs(name = "cpor")
public String getCustomerPlaceOfResidence() {
return this.cpor;
}
@StoreAs(name = "cpor")
public void setCustomerPlaceOfResidence (String customerPlaceOfResidence) {
this.cpor = customerPlaceOfResidence;
}
}
@Ignore
The @Ignore annotation is used on the getter or setter to indicates to the default reflection-based serializer that it should ignore the property during serialization and deserialization.
Sample – Use @Ignore annotation to expose friendly client side property that is backed by PartitionKey
// You will need the following imports
import com.microsoft.windowsazure.services.table.client.Ignore;
import com.microsoft.windowsazure.services.table.client.TableServiceEntity;
// In this sample, the Customer ID is used as the PartitionKey. A property
// CustomerID is exposed on the client side to allow friendly access, but
// is annotated with @Ignore to prevent it from being duplicated in the
// table entity.
public class OnlineStoreBaseEntity extends TableServiceEntity {
@Ignore
public String getCustomerID() {
return this.getPartitionKey();
}
@Ignore
public void setCustomerID(String customerID) {
this.setPartitionKey(customerID);
}
}
TableEntity.readEntity and TableEntity.writeEntity methods
While they are part of the TableEntity interface, the TableEntity .readEntity and TableEntity writeEntity methods provide the third major extension points to serialization. By implementing or overriding these methods in an object a client can customize how entities are stored, and potentially improve performance compared to the default reflection-based serializer. See the javadoc for the respective method for more information.
For more on the overall design object model of the Windows Azure Storage Client for Java see the related blog post in the Links section below.
Querying
There are two query constructs in the table client: a retrieve TableOperation which addresses a single unique entity, and a TableQuery which is a standard query mechanism used against multiple entities in a table. Both querying constructs need to be used in conjunction with either a class type that implements the TableEntity interface or with an EntityResolver which will provide custom deserialization logic.
Retrieve
A retrieve operation is a query which addresses a single entity in the table by specifying both its PartitionKey and RowKey. This is exposed via TableOperation.retrieve and TableBatchOperation.retrieve and executed like a typical operation via the CloudTableClient.
Sample – Retrieve a single entity
// You will need the following imports
import com.microsoft.windowsazure.services.table.client.CloudTableClient;
import com.microsoft.windowsazure.services.table.client.TableOperation;
// Create the table client.
CloudTableClient tableClient = storageAccount.createCloudTableClient();
// Retrieve the entity with partition key of "Smith" and row key of "Jeff"
TableOperation retrieveSmithJeff = TableOperation.retrieve("Smith", "Jeff", CustomerEntity.class);
// Submit the operation to the table service and get the specific entity.
CustomerEntity specificEntity = tableClient.execute("people", retrieveSmithJeff).getResultAsType();
TableQuery
Unlike TableOperation and BatchTableOperation the TableQuery requires a source table name as part of its definition. TableQuery contains a static factory method from used to create a new query and provides methods for fluent query construction. The code below produces a query to take the top 5 results from the customers table which have a RowKey greater than 5.
Sample – Query top 5 entities with RowKey greater than or equal to 5
// You will need the following imports
import com.microsoft.windowsazure.services.table.client.TableQuery;
import com.microsoft.windowsazure.services.table.client.TableServiceEntity;
TableQuery<TableServiceEntity> query =
TableQuery.from(“customers”, TableServiceEntity.class).
where(TableQuery.generateFilterCondition("RowKey", QueryComparisons.GREATER_THAN_OR_EQUAL, "5")).take(5);
The TableQuery is strong typedand must be instantiated with a class type that is accessible and contains a nullary constructor; otherwise an exception will be thrown. The class type must also implement the TableEntity interface. If the client wishes to use a resolver to deserialize entities they may specify one via execute on CloudTableClient and specify the TableServiceEntity class type as demonstrated above.
The TableQuery object provides methods for take, select, where, and source table name. There are static methods provided such as generateFilterCondition and joinFilter which construct other filter strings. Also note, generateFilterCondition provides several overloads that can handle all supported types, some examples are listed below:
// 1. Filter on String
TableQuery.generateFilterCondition("Prop", QueryComparisons.GREATER_THAN, "foo");
// 2. Filter on UUID
TableQuery.generateFilterCondition("Prop", QueryComparisons.EQUAL, uuid));
// 3. Filter on Long
TableQuery.generateFilterCondition("Prop", QueryComparisons.GREATER_THAN, 50L);
// 4. Filter on Double
TableQuery.generateFilterCondition("Prop", QueryComparisons.GREATER_THAN, 50.50);
// 5. Filter on Integer
TableQuery.generateFilterCondition("Prop", QueryComparisons.GREATER_THAN, 50);
// 6. Filter on Date
TableQuery.generateFilterCondition("Prop", QueryComparisons.LESS_THAN, new Date());
// 7. Filter on Boolean
TableQuery.generateFilterCondition("Prop", QueryComparisons.EQUAL, true);
// 8. Filter on Binary
TableQuery.generateFilterCondition("Prop", QueryComparisons.EQUAL, new byte[] { 0x01, 0x02, 0x03 });
Sample – Query all entities with a PartitionKey=”SamplePK” and RowKey greater than or equal to “5”
// You will need the following imports
import com.microsoft.windowsazure.services.table.client.TableConstants;
import com.microsoft.windowsazure.services.table.client.TableQuery;
import com.microsoft.windowsazure.services.table.client.TableQuery.Operators;
import com.microsoft.windowsazure.services.table.client.TableQuery.QueryComparisons;
String pkFilter = TableQuery.generateFilterCondition(TableConstants.PARTITION_KEY, QueryComparisons.EQUAL,"samplePK");
String rkFilter = TableQuery.generateFilterCondition(TableConstants.ROW_KEY, QueryComparisons.GREATER_THAN_OR_EQUAL, "5");
String combinedFilter = TableQuery.combineFilters(pkFilter, Operators.AND, rkFilter);
TableQuery<SampleEntity> query = TableQuery.from(tableName, SampleEntity.class).where(combinedFilter);
Note: There is no logical expression tree provided in the current release, and as a result repeated calls to the fluent methods on TableQuery overwrite the relevant aspect of the query.
Scenarios
NoSQL
A common pattern in a NoSQL datastore is to work with storing related entities with different schema in the same table. A frequent example relates to customers and orders which are stored in the same table. In our case, the PartitionKey for both Customer and Order will be a unique CustomerID which will allow us to retrieve and alter a customer and their respective orders together. The challenge becomes how to work with these heterogeneous entities on the client side in an efficient and usable manner. We discuss this here, and you can also download sample code.
The table client provides an EntityResolver interface which allows client side logic to execute during deserialization. In the scenario detailed above, let’s use a base entity class named OnlineStoreEntity which extends TableServiceEntity.
// You will need the following imports
import com.microsoft.windowsazure.services.table.client.Ignore;
import com.microsoft.windowsazure.services.table.client.TableServiceEntity;
public abstract class OnlineStoreEntity extends TableServiceEntity {
@Ignore
public String getCustomerID() {
return this.getPartitionKey();
}
@Ignore
public void setCustomerID(String customerID) {
this.setPartitionKey(customerID);
}
}
Let’s also define two additional entity types, Customer and Order which derive from OnlineStoreEntity and prepend their RowKey with an entity type enumeration, “0001” for customers and “0002” for Orders. This will allow us to query for just a customer, their orders, or both—while also providing a persisted definition as to what client side type is used to interact with the object. Given this, let’s define a class that implements the EntityResolver interface to assist in deserializing the heterogeneous types.
Sample – Using EntityResolver to deserialize heterogeneous entities
// You will need the following imports
import com.microsoft.windowsazure.services.core.storage.StorageException;
import com.microsoft.windowsazure.services.table.client.EntityProperty;
import com.microsoft.windowsazure.services.table.client.EntityResolver;
EntityResolver<OnlineStoreEntity> webStoreResolver = new EntityResolver<OnlineStoreEntity>() {
@Override
public OnlineStoreEntity resolve(String partitionKey, String rowKey, Date timeStamp, HashMap<String, EntityProperty> properties, String etag) throws StorageException {
OnlineStoreEntity ref = null;
if (rowKey.startsWith("0001")) {
// Customer
ref = new Customer();
}
else if (rowKey.startsWith("0002")) {
// Order
ref = new Order();
}
else {
throw new IllegalArgumentException(String.format("Unknown entity type detected! RowKey: %s", rowKey));
}
ref.setPartitionKey(partitionKey);
ref.setRowKey(rowKey);
ref.setTimestamp(timeStamp);
ref.setEtag(etag);
ref.readEntity(properties, null);
return ref;
}
};
Now, on iterating through the results with the following code:
for (OnlineStoreEntity entity : tableClient.execute(customerAndOrderQuery, webStoreResolver)) {
System.out.println(entity.getClass());
}
It will output:
class tablesamples.NoSQL$Customer
class tablesamples.NoSQL$Order
class tablesamples.NoSQL$Order
class tablesamples.NoSQL$Order
class tablesamples.NoSQL$Order
….
For the complete OnlineStoreSample sample please see the Samples section below.
Heterogeneous update
In some cases it may be required to update entities regardless of their type or other properties. Let’s say we have a table named “employees”. This table contains entity types for developers, secretaries, contractors, and so forth. The example below shows how to query all entities in a given partition (in our example the state the employee works in is used as the PartitionKey) and update their salaries regardless of job position. Since we are using merge, the only property that is going to be updated is the Salary property, and all other information regarding the employee will remain unchanged.
// You will need the following imports
import com.microsoft.windowsazure.services.core.storage.StorageException;
import com.microsoft.windowsazure.services.table.client.CloudTableClient;
import com.microsoft.windowsazure.services.table.client.DynamicTableEntity;
import com.microsoft.windowsazure.services.table.client.EntityProperty;
import com.microsoft.windowsazure.services.table.client.TableBatchOperation;
import com.microsoft.windowsazure.services.table.client.TableQuery;
TableQuery<DynamicTableEntity> query = TableQuery.from("employees", DynamicTableEntity.class).where("PartitionKey eq 'Washington'").select(new String[] { "Salary" });
// Note for brevity sake this sample assumes there are 100 or less employees, however the client should ensure batches are kept to 100 operations or less.
TableBatchOperation mergeBatch = new TableBatchOperation();
for (DynamicTableEntity ent : tableClient.execute(query)) {
EntityProperty salaryProp = ent.getProperties().get("Salary");
// Check to see if salary property is present
if (salaryProp != null) {
double currentSalary = salaryProp.getValueAsDouble();
if (currentSalary < 50000) {
// Give a 10% raise
salaryProp.setValue(currentSalary * 1.1);
} else if (currentSalary < 100000) {
// Give a 5% raise
salaryProp.setValue(currentSalary * 1.05);
}
mergeBatch.merge(ent);
}
else {
throw new IllegalArgumentException("Entity does not contain salary!");
}
}
// Execute batch to save changes back to the table service
tableClient.execute("employees", mergeBatch);
Complex Properties
The Windows Azure Table service provides two indexed columns that together provide the key for a given entity (PartitionKey and RowKey). A common best practice is to include multiple aspects of an entity in these keys since they can be queried efficiently. Using the @Ignore annotation, it is possible to define friendly client-side properties that are part of this complex key without persisting them individually.
Let’s say that we are creating a directory of all the people in America. By creating a complex key such as [STATE];[CITY] I can enable efficient queries for all people in a given state or city using a lexical comparison while utilizing only one indexed column. This optimization is exposed in a convenient way by providing friendly client properties on an object that mutate the key appropriately but are not actually persisted to the service.
Note: Take care when choosing to provide setters on columns backed by keys which could cause failures for some operations (delete, merge, replace) since you are effectively changing the identity of the entity.
The sample below illustrates how to provide friendly accessors to complex keys. When only providing getters the @Ignore annotation is optional, since the serializer will not use properties that do not expose a corresponding setter.
Sample – Complex Properties on a POJO using the @Ignore Annotation
// You will need the following imports
import com.microsoft.windowsazure.services.table.client.Ignore;
import com.microsoft.windowsazure.services.table.client.TableServiceEntity;
public class Person extends TableServiceEntity {
@Ignore
public String getState() {
return this.getPartitionKey().substring(0, this.getPartitionKey().indexOf(";"));
}
@Ignore
public String getCity() {
return this.getPartitionKey().substring(this.getPartitionKey().indexOf(";") + 1);
}
}
Persisting 3rd party objects
In some cases we may need to persist objects exposed by 3rd party libraries, or those which do not fit the requirements of a TableEntity and cannot be modified to do so. In such cases, the recommended best practice is to encapsulate the 3rd party object in a new client object that implements the TableEntity interface, and provide the custom serialization logic needed to persist the object to the table service via TableEntity.readEntity and TableEntity.writeEntity.
Note: when implementing readEntity/writeEntity, TableServiceEntity provides two static helper methods (readEntityWithReflection and writeEntityWithReflection) that expose the default reflection based serialization which will use the same rules as previously discussed.
Best Practices
- When persisting inner classes they must be marked static and provide a nullary constructor to enable deserialization.
- Consider batch restrictions when developing your application. While a single entity may be up to 1 MB and a batch can contain 100 operations, the 4 MB payload limit on a batch operation may decrease the total number of operations allowed in a single batch. All operations in a given batch must address entities that have identical PartitionKey values.
- Class types should initialize property values to null / default. The Table service will not send null / removed properties to the client which will fail to overwrite these properties on the client side. As such, it is possible to perceive a data loss in this scenario as the non-default properties will have values that do not exist in the received entity.
- Take Count on TableQuery is applied to each request and not rewritten between requests. If used in conjunction with the non-segmented execute method this will effectively alter the page size and not the maximum results. For example if we define a TableQuery with take(5) and executes it via executeSegmented we will receive 5 results (potentially less if there is a continuation token involved). However if we enumerate results via the Iterator returned by the execute method then we will eventually receive all results in the table 5 at a time. Please be aware of this distinction.
- When implementing readEntity or working with DynamicTableEntity the user should always assume a given property does not exist in the HashMap as it may have been removed by another client or not selected via a projected query. Therefore, it is considered best practice to check for the existence of a property in the HashMap prior to retrieving it.
- The EntityProperty class is utilized during serialization to encapsulate a given property for an entity and stores data in its serialized String form. Subsequently, each call to a get method will deserialize the data and each call to a setter / constructor will serialize it. Avoid repeated updates directly on an EntityProperty wherever possible. If your application needs to make repeated updates / reads to a property on a persisted type, use a POJO object directly.
- The @StoreAs annotation is provided to customize serialization which can be utilized to provide friendly client side property names and potentially increase performance by decreasing payload size. For example, if there is an entity with many long named properties such as customerEmailAddress we could utilize the @StoreAs annotation to persist this property under the name “cea” which would decrease every payload by 17 bytes for this single property alone. For large entities with numerous properties the latency and bandwidth savings can become significant. Note: the @StoreAs annotationcannot be used to write the PartitionKey, RowKey, or Timestamp as these properties are written separately: attempting to do so will cause the annotated property to be skipped during serialization. To accomplish this scenario provide a friendly client side property annotated with the @Ignore annotation and set the PartitionKey, RowKey, or Timestamp property internally.
Table Samples
As part of the release of the Windows Azure Storage Client for Java we have provided a series of samples that address some common scenarios that users may encounter when developing cloud applications.
Setup
- Download the samples jar
- Configure the classpath to include the Windows Azure Storage Client for Java, which can be downloaded here.
- Edit the Utility.java file to specify your connection string in storageConnectionString. Alternatively if you want to use local storage emulator that ships as part of the Windows Azure SDK you can uncomment the specified key in Utility.java.
- Execute each sample via eclipse or command line. For some blob samples some command line arguments are required.
Samples
- TableBasics - This sample illustrates basic use of the Table primitives provided. Scenarios covered are:
- How to create a table client
- Insert an entity and retrieve it
- Insert a batch of entities and query against them
- Projection (server and client side)
- DynamicUpdate – update entities regardless of types using DynamicTableEntity and projection to optimize performance.
- OnlineStoreSample – This sample illustrates a common scenario when using a schema-less datastore. In this example we define both customers and orders which are stored in the same table. By utilizing the EntityResolver we can query against the table and retrieve the heterogeneous entity collection in a type safe way.
Summary
This blog post has provided an in-depth overview of the table client in the recently released Windows Azure Storage Client for Java. We continue to maintain and evolve the libraries we provide based on upcoming features and customer feedback. Feel free to leave comments below,
Joe Giardino
Developer
Windows Azure Storage
Resources
Get the Windows Azure SDK for Java
Learn more about the Windows Azure Storage Client for Java
- Windows Azure Storage Client for Java Object Model Overview
- Windows Azure Storage Client for Java Storage Samples
- How to Use the Table Storage Service from Java
- Windows Azure SDK for Java Developer Center
Learn more about Windows Azure Storage
- Understanding the Table Service Data Model
- How to get most out of Windows Azure Tables
- Windows Azure Storage Abstractions and their Scalability Targets
- Windows Azure Storage, CDN, and Caching Forum
Comments
- Anonymous
August 02, 2012
Thanks for a great article, super informative, got me up and running with the Java API for Azure Tables in lightning quick time. Minor bug in text: public String setSampleProperty (String sampleProperty) { this.SampleProperty= sampleProperty; } Should be: public void setSampleProperty (String sampleProperty) { this.SampleProperty= sampleProperty; }