Share via


Multi Column GroupBy Aggregate using R in Azure ML studio

Azure ML Studio is ...... Well if you don't know what it is then you better go here and get some basics. Its the most ultimate experience in my knowledge that orchestrates the machine learning process's each level very intuitively.

Quite recently I was working on some data which required GroupBy aggregation at different column levels. Consider the data in this format

 

RegionCode RegionName StoreCode Category ProductCode ProductName Quantity Size Gender Season PricePointName ProductCategoryName
3 Michigan 499 SHOES 487369 KWANGO 1 6 Men FW 09 FTW < 10000 FOOTWEAR
3 Michigan 499 TOPS 498510 ADIPURE BRA 1 L Women SS 14 FTW 6000-6999 APPAREL
3 Michigan 499 SANDALS/SLIPPERS 499408 ADI SUN 1 8 Men FW 15 FTW 1000-1999 FOOTWEAR
3 Michigan 500 SANDALS/SLIPPERS 499429 ADI SUN 2 8 Men FW 15 APP 2000-2499 FOOTWEAR
3 Michigan 500 SHOES 500228 DURAMO 6 LEA M 1 11 Men SS 15 FTW 6000-6999 FOOTWEAR
3 Michigan 500 SHOES 500284 HOWZAT J V 1 3 Kids-Boys FW 14 FTW 3000-3999 FOOTWEAR
3 Michigan 500 PANTS 541832 ESS 3S KN PANT 3 M Women SS 14 APP 1500-1999 APPAREL
3 Michigan 499 PANTS 544313 NEW FIREBIRD TP 1 34 Women SS 15 APP 1500-1999 APPAREL
3 Michigan 499 PANTS 544314 NEW FIREBIRD TP 2 40 Women SS 15 APP 2500-2999 APPAREL

I wanted to get a group by aggregate SUM on Quantity group by multiple columns like RegionCode, StoreCode, and Category. It would only take a line of code using R, Just drag and drop the Execute R Script component and write this simple statement

# Map 1-based optional input ports to variables
dataset1 <- maml.mapInputPort(1) # class: data.frame
data.set = aggregate(dataset1$Quantity, by=list(RegionCode=dataset1$RegionCode,StoreCode=dataset1$StoreCode,Category=dataset1$Category), FUN=sum)

# Select data.frame to be sent to the output Dataset port
maml.mapOutputPort("data.set");

Run the experiment, right click the "Execute R Script" component and click on "Result Dataset --> visualize"

[caption id="attachment_166" align="aligncenter" width="919"] Visualize Data[/caption]