How to upload an R package to Azure Machine Learning
Azure Machine Learning (https://azure.com/ml) has a number of packages already installed by default. You can see them with this following sample experiment:
R script is:
data.set <-data.frame(installed.packages());
# Select data.frame to be sent to the output Dataset port
maml.mapOutputPort("data.set");
you’ll find a little more than 400 packages.
Still you may need to use a package which is not known by Azure ML. Here is how to upload it to the environment.
NB: This post takes skmeans (k-means with a cosine distance) as an example, but this works for other packages as well.
Let’s suppose you have this code in R Studio locally.
NB: you can find information on how to setup your environment in this post. It’s in French, but bing translator is your friend.
library(skmeans)
set.seed(1234)
sample_data <- matrix(sample.int(1000, size = 20*500, replace = TRUE), nrow = 20, ncol = 500,
dimnames=list(1:20, 1:500))
fit <- skmeans(sample_data,5)
result <- data.frame(list(rownames(sample_data), fit$cluster), row.names=NULL)
colnames(result) <- c("sample data row", "cluster")
print(result)
this will give this kind of result
If you try this in Azure ML, you’ll get the following result:
Here is how to have the script loading all the necessary packages in the Azure ML environment.
So let’s now see how you construct the skmeans_packages.zip and know which lines to write here:
On the local environment (in my case Windows), I remove the R packages that are installed in My Documents\R
then in R, I install the skmeans package:
install.packages("skmeans")
this gives the following result:
So I know I have to install the following packages in order:
- slam
- clue
- skmeans
Then I go to the temp folder:
I Zip the zips:
and rename this new zip file as skmeans_packages.zip
I then can upload it Azure ML:
NEW, DATASET, FROM LOCAL FILE
Then you’ll be able to find it as a saved dataset in your workspace:
After it has been connected to the third dot of the Execute R Script module instance, you’ll be able to find the content in src/ folder:
so, in order to install skmeans and its two dependencies, then reference the skmeans library, you just have to enter the following lines:
install.packages("src/slam_0.1-32.zip", lib = ".", repos = NULL, verbose = TRUE)
install.packages("src/clue_0.3-48.zip", lib = ".", repos = NULL, verbose = TRUE)
install.packages("src/skmeans_0.2-6.zip", lib = ".", repos = NULL, verbose = TRUE)
library(skmeans, lib.loc=".", verbose=TRUE)
Azure ML has a pool of VM with docker-like containers (true Windows containers, named drawbridge) where the experiments run. So each time the script runs, it starts from a blank standard Azure ML environment. By bringing a zip, you add the files to that environment.
Hope this blog post will help you if you need R packages which are not in the 400+ preloaded ones in Azure Machine Learning!
Benjamin (@benjguin)