Compartilhar via


Using RStudio Server with Microsoft R Server Parcel for Cloudera

In previous releases of Microsoft R Server, parcel installation required downloading two pre-built parcel files. The 9.1 release improves upon this experience by providing a parcel generator script generate_mrs_parcel.sh to generate a single MRS-9.1.0-*.parcel file. Here are the complete instructions to install MRS Parcel in Cloudera Cluster.

In this article we will look into how to make RStudio Server (both Open and Commercial) work with MRS 9.1.0 Parcel Installation in Cloudera. RStudio Server has an open source license as well as commercial license. You can view the differences here. RStudio Server with commercial license is also called RStudio Server Pro.

The following steps assume that you already have a Cloudera Cluster with MRS-9.1.0 parcel installed and activated. These steps can be run on edgenode/gateway node of the cluster.

  • Download and install RStudio Server Pro from here (OR) RStudio Server Open Source License from here.

RStudio Server Open Source License :

 wget https://download2.rstudio.org/rstudio-server-rhel-1.0.143-x86_64.rpm
sudo yum install --nogpgcheck rstudio-server-rhel-1.0.143-x86_64.rpm
sudo rstudio-server verify-installation
sudo rstudio-server version

rs0

RStudio Server Pro :

 wget https://download2.rstudio.org/rstudio-server-rhel-pro-1.0.143-x86_64.rpm
sudo yum install --nogpgcheck rstudio-server-rhel-pro-1.0.143-x86_64.rpm
sudo rstudio-server verify-installation
sudo rstudio-server version

rs1

  • We need to set some environment variables at the start of R session. This can be achieved using Renviron file. Append the following lines to /opt/cloudera/parcels/MRS/lib64/R/etc/Renviron
 LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/opt/cloudera/parcels/MRS/lib64/R/lib
R_LIBS=/opt/cloudera/parcels/MRS/lib64/R/library
MRS_PARCEL_PATH=/opt/cloudera/parcels/MRS
  • Create libjvm.so symlink
 sudo ln -s /usr/java/jdk1.7.0_67-cloudera/jre/lib/amd64/server/libjvm.so /opt/cloudera/parcels/MRS/hadoop/libjvm.so
  • Copy RevoHadoopEnvVars.site file from home directory into hadoop directory. (NOTE: If .RevoHadoopEnvVars.site file is not present in the home directory , just run R command once - this will generate site file in home directory)
 sudo cp ~/.RevoHadoopEnvVars.site /opt/cloudera/parcels/MRS/hadoop
sudo mv /opt/cloudera/parcels/MRS/hadoop/.RevoHadoopEnvVars.site /opt/cloudera/parcels/MRS/hadoop/RevoHadoopEnvVars.site
  • Restart RStudio Server
 sudo rstudio-server stop
sudo rstudio-server restart

rs2

  • RStudio Server will be available in the following url : https://<nodename>:8787. (Make sure port 8787 is open)

Let us run Microsoft R Server Examples on Local, Hadoop and Spark Compute Context using RStudio Server :

LOCAL COMPUTE CONTEXT

local

LOCAL COMPUTE CONTEXT ON HDFS DATA

localhdfs

HADOOP COMPUTE CONTEXT

hadoop

SPARK COMPUTE CONTEXT

spark