Freigeben über


How to retrieve R data visualization from Azure Machine Learning

As you may know, Azure Machine Learning can execute R scripts. You can interactively see the output console. But what about retrieving the result as part of a production call to the API generated by Azure ML?

Let’s test with a word cloud example in R. Mollie Taylor has posted one here (https://gist.github.com/mollietaylor/3671518) that we can reuse in Azure Machine Learning:

image

The details on how to create an Azure ML workspace, insert a dataset and an R script can be found here:

for R, just use that module:

image

 

The input of the Web API is set to the input dataset of the R Script and the output is set to the R Device port. As a reminder, here is how the inputs and outputs are positioned in an R Script module:

image

the detail is available in the help documentation.

In our case the interesting ports to publish are the following:

image

and

image

 

image

After running the experiment, we can see the result in Azure ML Studio:

image

image

image

So, how could we retrieve the pictures from an API that is published that way:

image

 

image

Here is some sample script in Python that shows how to do it. The script is a modified version of the sample given in the API Help page for Batch Execution. The idea is to get the base64 encoded pictures from the output file and decode them out to local disk.

 # -*- coding: utf-8 -*-

# How this works:
#
# 1. Assume the input is present in a local file
# 2. Upload the file to an Azure blob - you'd need an Azure storage account
# 3. Call BES to process the data in the blob. 
# 4. The results get written to another Azure blob.
# 5. Download the output blob to a local file
#
# Note: You may need to download/install the Azure SDK for Python.
# See: https://azure.microsoft.com/en-us/documentation/articles/python-how-to-install/

import urllib2
import json
import time
from azure.storage import *
import sys
import base64
import json

storage_account_name = 'a****obfuscated***4'
storage_account_key = '/aV****obfuscated***vXA76w=='
storage_container_name = 'benjguin'

input_file = ur"C:\be****obfuscated***os\WordCloud\conventions.csv"
output_file = ur'C:\be****obfuscated***os\WordCloud\myresults.csv'
input_blob_name = 'conventions.csv'
api_key = r'Cczx****obfuscated***WemQ=='
url = 'https://ussouthcentral.services.azureml.net/workspaces/a7c****obfuscated***756/services/d328e03****obfuscated***5c2/jobs'
uploadfile=True
executeBES=True

blob_service = BlobService(account_name=storage_account_name, account_key=storage_account_key)

if uploadfile:
    print("Uploading the input to blob storage...")
    data_to_upload = open(input_file, 'r').read()
    blob_service.put_blob(storage_container_name, input_blob_name, data_to_upload, x_ms_blob_type='BlockBlob')

input_blob_path = '/' + storage_container_name + '/' + input_blob_name
debug_blob = blob_service.get_blob(storage_container_name, input_blob_name)

if executeBES:
    print("Submitting the BES job...")
    connection_string = "DefaultEndpointsProtocol=https;AccountName=" + storage_account_name + ";AccountKey=" + storage_account_key
    payload =  {
                "Input": {
                    "ConnectionString": connection_string,
                    "RelativeLocation": input_blob_path
                    }
                }

    body = str.encode(json.dumps(payload))
    headers = { 'Content-Type':'application/json', 'Authorization':('Bearer ' + api_key)}
    req = urllib2.Request(url, body, headers) 
    response = urllib2.urlopen(req)
    result = response.read()
    job_id = result[1:-1] # remove the enclosing double-quotes

    url2 = url + '/' + job_id

    while True:
        time.sleep(1) # wait a second
        authHeader = { 'Authorization':('Bearer ' + api_key)}
        request = urllib2.Request(url2, headers=authHeader)
        response = urllib2.urlopen(request)
        result = json.loads(response.read())
        status = result['StatusCode']
        if (status == 0):
            print("Not started...")
        elif (status == 1):
            print("Running...")
        elif (status == 2):
            print("Failed...")
            break
        elif (status == 3):
            print("Cancelled...")
            break
        elif (status == 4):
            print("Finished!")
            result_blob_location = result['Result']
            sas_token = result_blob_location['SasBlobToken']
            base_url = result_blob_location['BaseLocation']
            relative_url = result_blob_location['RelativeLocation']
            url3 = base_url + relative_url + sas_token
            response = urllib2.urlopen(url3)
            with open(output_file, 'w') as f:
                f.write(response.read())
            break

outputdata=open(output_file)
outputtxt=outputdata.read()
outputdata.close()

s=outputtxt.index('\"{')
e=len(outputtxt)
o1=outputtxt[s+1:e-3]

jsonresult = json.loads(o1)
i=1
for gd in jsonresult['Graphics Device']:
    fname = output_file + "." + str(i) + ".png"
    print 'writing png #' + str(i) + ' to ' + fname
    f = open(fname, 'wb')
    f.write(base64.b64decode(gd))
    f.close()
    i += 1

print("Done!")

Here is a sample execution output:

 Uploading the input to blob storage...
Submitting the BES job...
Running...
Running...
Running...
Running...
Running...
Running...
Running...
Finished!
writing png #1 to C:\be***obfuscated***os\WordCloud\myresults.csv.1.png
writing png #2 to C:\be***obfuscated***os\WordCloud\myresults.csv.2.png
Done!

The output sent back by Azure ML looks like this:

 R Output JSON

"{"Standard Output":"RWorker pushed \"port1\" to R workspace.\r\nBeginning R Execute Script\n\n[1] 56000\r\nLoading objects:\r\n  port1\r\n[1] \"Loading variable port1...\"\r\npng \r\n  2 \r\nnull device \r\n          1 \r\n","Standard Error":"R reported no errors.","visualizationType":"rOutput","Graphics Device":["iVBORw0K***(...)***RvX/wFzB5s8eym6ZgAAAABJRU5ErkJggg==","iVBORw0KGgo***(...)***dVorBuiQAAAABJRU5ErkJggg=="]}"

You can see the pictures Winking smile

image

 

well, Python does:

image

The resulting files are:

image

 

myresults.csv.1

and

myresults.csv.2

 

R has tons of great data visualisation. Have a look at those blogs for instance:

 

Smile 

Benjamin (@benjguin)