Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Azure HDInsight is Hadoop in the cloud, built to handle any amount of relational or non-relational data, scaling from terabytes to petabytes on demand.HDInsight allows customers to deploy a variety of cluster types, for different data analytics workloads.Each cluster type consists of a set of nodes. Customers are billed for the usage of those nodes for the duration of the cluster’s life. Billing starts once a cluster is created and stops when the cluster is deleted .So to reduce cost of the cluster,automatically management is very necessary in HDinsight.
Powershell is a very useful tool to implement elastic processing.Here I want to share with you some useful PowerShell script:
Firstly,we need to set up variables used to create an HDinsight cluster:
Make sure the storage account exists:
Remove the cluster if it already exists:
Create a cluster
After creating the cluster,we'll use the cluster to analyse the file 'sample.log'.
The file is just like this:
Then we need to upload the log from local workstation to blob container:
Set up the pig job and the query:
Submit the pig job and download the result,display the result in our powershell:
The result should be like this:
Remove the cluster after we complete our job:
Finally,we completed our HDinsight job using PowerShell.
You can download the script from the attachment.Hope this demo will be helpful to you.