Jaa


Building custom clients

patterns & practices Developer Center

From: Developing big data solutions on Microsoft Azure HDInsight

In most cases the execution of big data processing jobs forms part of a larger business process or BI solution. While you can execute jobs interactively using the Hadoop command line interface in a remote desktop connection to the cluster, it is common to incorporate the logic to submit and coordinate individual jobs and Oozie workflows from client applications. This may be in the form of dedicated job management utilities and scripts, or as complete applications that fully integrate big data processing into business processes.

On the Windows platform you can choose from a variety of technologies and APIs for submitting jobs to an HDInsight cluster. The primary APIs are provided through the Azure and HDInsight modules for Windows PowerShell, and the .NET SDK for HDInsight.

Building custom clients with Windows PowerShell scripts

PowerShell is a good choice for automating HDInsight jobs when an individual data analyst wants to experiment interactively with data, or when you want to automate big data processing through the use of command line scripts that can be scheduled to run when required.

You can run PowerShell scripts interactively in a Windows command line window or in a PowerShell-specific command line console. Additionally, you can edit and run PowerShell scripts in the Windows PowerShell Interactive Scripting Environment (ISE), which provides IntelliSense and other user interface enhancements that make it easier to write PowerShell code. You can schedule the execution of PowerShell scripts using Windows Scheduler, SQL Server Agent, or other tools as described in Building end-to-end solutions using HDInsight.

Before you use PowerShell to work with HDInsight you must configure the PowerShell environment to connect to your Azure subscription. To do this you must first download and install the Azure PowerShell module, which is available through the Microsoft Web Platform Installer. For more details see How to install and configure Azure PowerShell.

The following examples demonstrate some common scenarios for submitting and running jobs using PowerShell:

Building custom clients with the .NET Framework

The .NET SDK for HDInsight provides classes that enable developers to submit jobs to HDInsight from applications built on the .NET Framework. It is a good choice when you need to build custom data processing applications, or integrate HDInsight data processing into existing business applications that are based on the .NET Framework.

Many of the techniques used to initiate jobs from a .NET application require the use of an Azure management certificate to authenticate the request. To do this you can:

  • Use the makecert command in a Visual Studio command line to create a certificate and upload it to your subscription in the Azure management portal as described in Create and Upload a Management Certificate for Azure.
  • Use the Get-AzurePublishSettingsFile and Import-AzurePublishSettingsFile Windows PowerShell cmdlets to generate, download, and install a new certificate from your Azure subscription as described in the section How to: Connect to your subscription of the topic How to install and configure Azure PowerShell. If you want to use the same certificate on more than one client computer you can copy the Azure publishsettings file to each one and use the Import-AzurePublishSettingsFile cmdlet to import it.

After you have created and installed your certificate, it will be stored in the Personal certificate store on your computer. You can view the details in the certmgr.msc console.

The following examples demonstrate some common scenarios for submitting and running jobs using .NET Framework code:

Next Topic | Previous Topic | Home | Community