Elastic processing with PowerShell in Windows Azure HDinsight
Azure HDInsight is Hadoop in the cloud, built to handle any amount of relational or non-relational data, scaling from terabytes to petabytes on demand.HDInsight allows customers to deploy a variety of cluster types, for different data analytics workloads.Each cluster type consists of a set of nodes. Customers are billed for the usage of those nodes for the duration of the cluster’s life. Billing starts once a cluster is created and stops when the cluster is deleted .So to reduce cost of the cluster,automatically management is very necessary in HDinsight.
Powershell is a very useful tool to implement elastic processing.Here I want to share with you some useful PowerShell script:
Firstly,we need to set up variables used to create an HDinsight cluster:
Make sure the storage account exists:
Remove the cluster if it already exists:
Create a cluster
After creating the cluster,we'll use the cluster to analyse the file 'sample.log'.
The file is just like this:
Then we need to upload the log from local workstation to blob container:
Set up the pig job and the query:
Submit the pig job and download the result,display the result in our powershell:
The result should be like this:
Remove the cluster after we complete our job:
Finally,we completed our HDinsight job using PowerShell.
You can download the script from the attachment.Hope this demo will be helpful to you.