HDInsight Storm Topology Submission Via VNet
1. Introduction
To submit a Storm topology to an HDInsight cluster, a user can RDP to the headnode of the cluster and run storm command. This is not always convenient. It is actually possible to submit a Storm topology from outside of an HDInsight cluster. The idea is to create an HDInsight Storm cluster with a configured Virtual Networks (VNet), and submit Storm topology from a machine that is connected to the VNet.
There are many types of VNet so with VNet support we can actually submit topology from Azure VM, other Azure services, on-premises infrastructure or developer boxes.
To show case the idea, I’ll show you how to use an Azure VM to submit a Storm topology via VNet.
2. Step-by-step Instructions
1) Create a Cloud-Only VNet in Azure portal. You can use the “QUICK CREATE” button.
2) Create a VM using the created VNet, this will be the VM from which we submit Storm topology. You need to use “FROM GALLERY” button and on page 4 you need to choose the VNet that we just created.
3) Create a Storm cluster using the created VNet. Note you need to use “custom create” and specify the VNet name in Region/Virtual Network section.
4) Find out the FQDN of the active head node of HDInsight Storm cluster using REST API.
This is a Powershell script to help you get the FQDN of the active head node:
function Get-ActiveFQDN(
[String]
[Parameter( Position=0, Mandatory=$true )]
$ClusterDnsName,
[String]
[Parameter( Position=1, Mandatory=$true )]
$Username,
[String]
[Parameter( Position=2, Mandatory=$true )]
$Password)
{
$DnsSuffix = ".azurehdinsight.net"
$ClusterFQDN = $ClusterDnsName + $DnsSuffix
$webclient = new-object System.Net.WebClient
$webclient.Credentials = new-object System.Net.NetworkCredential($Username, $Password)
$Url = "https://" + $ClusterFQDN + "/clusteravailability/status"
$Response = $webclient.DownloadString($Url)
$JsonObject = $Response | ConvertFrom-Json
Write-host $JsonObject.LeaderDnsName
}
This script will print out something like this:
headnode1.<clusterdnsname>.b1.internal.cloudapp.net
5) RDP to the Azure VM we just created. Copy Storm bits from HDInsight head node (c:\apps\dist\storm-xxx or %STORM_HOME%) to Azure VM (let’s say we copy to c:\storm folder); Install Java 1.7 runtime (either Oracle or OpenJDK is fine).
6) On the Azure VM, make sure the following configurations (environment variable and Storm configurations) are correctly set:
Environment variable:
JAVA_HOME = "<your java installation path>"
storm.yaml (c:\storm\conf\storm.yaml):
nimbus.host: headnode1.<clusterdnsname>.b1.internal.cloudapp.net
7) On the Azure VM, submit a Storm topology using storm.cmd command line like this:
C:\storm\bin>storm jar ..\contrib\storm-starter\storm-starter-<version>-jar-with-dependencies.jar storm.starter.WordCountTopology wordcountSampleTopology
Then on the Azure VM you can manage topology status using Storm UI web page (start IE and enter the address like this):
https://headnode1.<clusterdnsname>.b1.internal.cloudapp.net:8772/
This is how easy it is. Enjoy storming!