Windows Azure Storage Overview

아티클
04/06/2010

I am at the Azure Firestarter event in Redmond today and just heard Brad Calder give a quick overview of Azure data. Here are my notes; slides and sample code are to be posted later and I will update the post with them when they are.

Blobs
- REST APIs
- Can have a lease on the blob - allows for limiting access to the blob (used by drives)
- To create a blob…
  - Use StorageCredentialsAccountAndKey to create the authentication object
  - Use CloudBlobClient to establish a connection using the authentication object and a URI to the blob store (from the portal)
  - Use CloudBlobContainer to create/access a container
  - Use CloudBlob to access/create a blob
- Two types of blobs
  - Block blob - up to 200 GB
    - Targeted at streaming workloads (e.g. photos, images)
    - Can update blocks in whatever order (e.g. potentially mulitple streams)
  - Page blob - up to 1 TB
    - Targeted at random read/write workloads
    - Used for drives
    - Pages not stored are effectively initialized to all zeros.
      - Only charged for pages you actually store.
      - Can create a 100 GB blob, but write 1 MB to it - only charged for 1 MB of pages.
    - Page size == 512 bytes
      - Updates must be 512 byte aligned (up to 4 MB at a time)
      - Can read from any offset
    - ClearPages removes the content - not charged for cleared pages.

CDN
- Storage account can be enabled for CDN.
- Will get back a domain name to access blobs - can register a custom domain name via CDN.
- Different from base domain used to access blobs directly - if you use the main storage account URL, will retrieve from blob store not using CDN.
- To use CDN
  - Create a blob
  - When creating a blob, specify "TTL" - time to live in the CDN in seconds.
  - Reference the blob using the CDN URL and it will cache it in the nearest CDN store.

Signed URLs (Shared Access Signatures) for Blobs
- Can give limited access to blobs without giving out your secret key.
- Create a Shared Access Signature (SAS) that gives time-based access to the blob.
  - Specify start-time and end-time.
  - What resource-granularity (all blobs in a container, just one blob in the container)
  - Read/write/delete access permissions.
- Give out URL with signature.
- Signature is validated against a signed identifier. You can instantaneously revoke access to a signature issued by removing the signed identifier.
  - Can also store time range and permissions with the signed identifier rather than in the URL.
  - Can change them after issuing the URL and the signature is still valid in the URL.

Windows Azure Drive
- Provides a durable NTFS drive using page blob storage.
  - Actually a formatted single-volume NTFS VHD up to 1 TB in size (same limit as page blob)
  - Can only be mounted by one VM instance at a time.
    - Note that each role instance runs on a VM, so only one role instance can mount a drive read/write
    - Could not have both a worker role and a web role mounting the same drive read/write
  - One VM instance can mount up to 16 drives.
- Because a drive is just a page blob, can upload your VHD from a client.
- An Azure instance mounts the blob
  - Obtains a lease
  - Specifies how much local disk storage to use for caching the page blob
- APIs
  - CloudDriveInitializeCache - initalize how much local cache to use for the drive
  - CloudStorageAccount - to access the blob
  - Create a CloudDrive object using CreateCloudDrive specifying the URI to the page blob
  - Against CloudDrive…
    - Create to initialize it.
    - Mount to mount it - returns path on local file system and then access using normal NTFS APIs
    - Snapshot to create backups
      - Can mount snapshots as read-only VHDs
    - Unmount to unmount it.
- Driver for mounting blobs only in the cloud - not on development fabric.
  - Instead, just use VHDs

Tables
- Table can have billions of entities and terabytes of data.
- Highly scalable.
- WCF Data Services - LINQ or REST APIs
- Table row has a partition key and a row key
  - Partition key:
    - controls granularity of locality (all entities with same partition key will be stored and cached together)
    - provides entity group transactions - as long as entities have same partition key, can do up to 100 insert/update/delete operations as a batch and will be atomic.
    - enable scalability - monitor usage patterns and use partition key to scale out across different servers based on partition keys
      - More granularity of partition key = better scalability options
      - Less granularity of partition key = better ability to do atomic operations across multiple rows (because all must have same partition key)
- To create / use an entity
  - Create a .NET class modeling an entity
  - Specify the DataServiceKey attribute to tell WCF Data Services the primary keys (partitionkey, rowkey)
  - APIs
    - CloudTableClient - establish URI and account to access table store
    - TableServiceContext - get from CloudTableClient
    - Add entities using the context AddObject method specifying the table name and the class with the data for the new entity
      - SaveChangesWithRetries against context to save the object.
    - To Query… using LINQ with AsTableServiceQuery<xxxx> where xxx is the .NET class modeling the entity.
      - Manages continuation tokens for you
    - Then do a foreach and can use UpdateObject to update objects in the LINQ stream.
    - Use SaveChangesWithRetries
      - Add SaveChangesOptions.batch if < =100 records and all have same partition key - save as one batch.
      - If not, sends a transaction for each object.
- Table tips
  - ServicePointManager.DefaultConnectionLimit = x (default .NET HTTP connections = 2)
  - Use SaveChangesWithRetries and AsTableServiceQuery to get best performance
  - Handle Conflict errors on inserts and NotFound errors on Delete
    - Can happen because of retries
  - Avoid append only write patterns based on partitionkey values
    - Can happen if partition key is based on timestamp.
    - If you keep appending, defeating scale out strategy of Azure
    - Make partition key be distributed not all in one area.

Queues
- Provide reliable delivery of messages
- Allow loosely coupled workflow between roles
  - Work gets loaded into a queue
  - Multiple workers consume the queue
- When dequeuing a message, specify an "invisibility time" which leaves the message in the queue but makes it temporarily invisible to other workers
  - Allows for reliability.
- APIs
  - Create a CloudQueueClient using account and credentials
  - Create a CloudQueue using the client and GetQueueReference - queue name
  - CreateIfNotExist to create it if not there
  - Create a CloudQueueMessage with content
  - Use CloudQueue.AddMessage to add it to the queue
  - Use CloudQueue.GetMessage to get it out (passing invisibility time)
- Tips on Queues
  - Messages up to 8 KB in size
    - Put in a blob if more and send blob pointer as message
  - Remember that a message can be processed more than once.
    - Make that not be a problem - idempotent
  - Assume messages are not processed in any particular order
  - Queues can handle up to about 500 messages/second.
    - For higher throughput - batch items into a blob and send a message with reference to blob containing 10 work items.
    - Worker does 10 items at a time.
    - Increased throughput by 10x.
  - Use DequeueCount to remove "poison messages" that seem to be repeatedly crashing workers.
  - Monitor message count to increase/decrease worker instances using service management APIs.
- Q: can you set priorities on queue messages?
  - A: No - would have to create different queues
- Q: Are blobs stored within the EU complying with the EU privacy policies?
  - A: Microsoft has a standard privacy policy which we adhere to.

Comments

Anonymous
February 13, 2011
Hi Mike, Could you explain more on Create and CreateIfNotExits for Queue, I tried it manually but there doesn't seem much difference i.e. no change in the queue data members or the queue. Naveen

다음을 통해 공유

Windows Azure Storage Overview

Comments

추가 리소스