Interactive Javascript console on MDH
Getting Started
The Microsoft Distribution of Hadoop comes with a web-based interactive Javascript console that is started along with the other Hadoop services. The console allows you to:
- Perform HDFS operations, including uploading/reading files to/from the HDFS
- Run MapReduce programs from .js scripts or JAR files, and monitor their progress
- Run a Pig job specified using a fluent query syntax in Javascript, and monitor its progress
- Visualize data with graphs built using HTML5
To get started, you can open the console in your browser by going to http://localhost:8080/ after running "isotope start" in a local installation, or by clicking on the appropriate link on the Azure portal after you have signed in.
Walkthrough: Visualizing Word Count
Write the Javascript MapReduce script
Using Notepad or your favorite text editor, create a text file with the following contents:
var map = function (key, value, context) { var words = value.split(/[^a-zA-Z]/); for (var i = 0; i < words.length; i++) { if (words[i] !== "") { context.write(words[i].toLowerCase(), 1); } } }; var reduce = function (key, values, context) { var sum = 0; while (values.hasNext()) { sum += parseInt(values.next()); } context.write(key, sum); };
Save the text file as “WordCount.js” to your hard drive. Note that UTF-8 encoding, the default often used by Visual Studio, causes an "illegal character" exception when the Pig job runs so set the encoding to "US-ASCII – Codepage 20127".
Upload the script and input data
Open the interactive Javascript console and type:
fs.put()
Then select the WordCount.js file you created in the previous step and upload it to the HDFS.
Next, create a directory on the HDFS for the Gutenberg sample by typing:
#mkdir gutenberg
Finally, upload each of the Gutenberg files by typing
fs.put("gutenberg")
and selecting a .txt file from the Gutenberg set (located in C:\Apps\dist\examples\data\gutenberg). Repeat this step for each of the text files.
To make sure the files were uploaded correctly, use the following commands:
#ls #ls gutenberg #cat WordCount.js
Run the query
Run the following to find the top 10 most frequent words in the Gutenberg sample texts:
pig.from("gutenberg").mapReduce("WordCount.js", "word, count:long").orderBy("count DESC").take(10).to("gbtop10")
Once the job completes, you can see the output files in the HDFS by typing:
#ls gbtop10
Visualize the results
(Note: if you are using Internet Explorer, this step requires IE9+)
Read the results into the Javascript context by typing:
file = fs.read("gbtop10") data = parse(file.data, "word, count:long")
Then make a bar graph of the data:
graph.bar(data)
Enjoy!
The article was Written By David Zhang.