Share via


HDFS Operations on the Interactive Javascript Console

 

Shell Commands

The web console supports execution of HDFS commands via the Javascript function “fs”. For example, you can run:

fs("ls")

This function exposes the same set of file system commands that is available from running “hadoop fs” on the command prompt. Because file system operations are so common, there is a shortcut available – the following is equivalent to above:

#ls
 

Uploading Files

You can upload a file to HDFS by running:

fs.put()

A dialog will pop up which will allow you to select a file from your local file system, and optionally specify the destination HDFS directory.

This is intended to be used to upload JARs and scripts to the HDFS, and possibly sample input data no more than a few megabytes in size. It should not be used to upload Big Data.

Reading Files

You can read a file from the HDFS into the Javascript context as follows:

file = fs.read("/path/to/file")

Once this command completes, you can store the data into a Javascript variable:

data = file.data

Now data is just a regular Javascript string, and holds the contents of the file.

You can also pass an HDFS directory into fs.read(), in which case all the files are in that directory are concatenated together and returned.

Parsing Structured Data

If the file you are reading contains structured data, it may be more useful to de-serialize the file into a Javascript array. You can do this by using the parse function with a schema, for example:

data = parse(file.data, "name, count:int, expiry:date")

The schema is a string in the following format:

<column_name>[:<column_type>][, ...]

where

  • column_name is any valid Javascript identifier for that column
  • column_type is one of
    • date
    • int (or long)
    • float (or double)
  • If column_type is omitted, or none of the above, it is assumed to be a string.

The data string should be in the form of tab-separated columns, with one line per row. The parse function will return a Javascript array of objects with the specified column names as properties, containing values with the corresponding data types.