Working With SQL Server R Services And R Scripts In SQL Server 2016
Introduction
SQL Server 2016 has introduced the integration of the popular data science language R into the database engine. R is a powerful language and provides many packages from the community so as to create models and generate predictions using the SQL Server data.
In SQL Server 2016 and later, we can install the components required for using R, by running the SQL Server setup wizard. The installation media can be obtained from here. SQL Server setup provides us with 2 options for working with R:
*In-Database
*Install the database engine together with SQL Server R Services (In-Database), to enable running R scripts in SQL Server.*Standalone
*Install Microsoft R Server (Standalone) to create a development environment for R solutions.
Setup In-Database R Services
Installing R Server in stand-alone mode is typically done when we want to create an environment that a data scientist can use to connect to SQL Server and deploy R solutions. Ideally, it is not recommended to install both R Server and R Services in a single installation. In this article, we will see how to set up In-Database R services. Spin up the installation media.
Select the Add features option and proceed.
From the feature selection option, select ‘R Services (In-Database)’.
We will be asked for the consent to install R. Accept the agreement and start the installation.
Once we are done with the installation, we will get the below success dialog.
Enable R Services
Installation of R Services is not enough to get started with R. We have to enable R Services in SQL Server as well. We can get the current status of R services by checking the value of ‘external scripts enabled’ property.
Run the below command to see the current status of the property ‘external scripts enabled’. By default, it is turned off and the value will be 0. We will have to change it to 1.
EXEC SP_CONFIGURE 'external scripts enabled';
GO
To enable the property run the below command,
EXEC SP_CONFIGURE 'external scripts enabled', 1;
GO
RECONFIGURE;
GO
Test R Services
Let's run the below command to see if the property value has been changed. We have to ensure that the run_value has the value ‘1’
EXEC SP_CONFIGURE 'external scripts enabled';
GO
But as you can see, it is still at 0. We will head over to the SQL Server Configuration Manager and restart the below services so that the query updates are reflected.
SQL Server Launchpad(MSSQLServer)
SQL Server(MSSQLServer)
We can do this from Services.msc as well.
Now if we check the property status we can see that the run_value has been changed to ‘1’.
EXEC SP_CONFIGURE 'external scripts enabled';
GO
Let’s test R Services by running a sample R Script. Below query adds up two numbers and outputs the sum,
exec sp_execute_external_script @language =N'R',
@script=N'OutputDataSet<-InputDataSet',
@input_data_1 =N'select 1+1 as SUMOutput'
with result sets (([SUM Output] int not null));
go
Understanding R Scripts
We can implement R Scripts by using ‘EXEC sp_execute_external_script’ and we must specify few basic parameters with it which are explained below:
Parameter | Purpose |
@language = N'R' | Indicates the scripting language as R |
@script = N' ' | Contains the script body |
@input_data_1 = N’ T-SQL Statement’ | We can use T-SQL statement that reads data from SQL Table upon which the script body will be applied |
@output_data_1_name = N' Data Frame Name' | Holds the data frame generated inside the R Script. It can be used to update the table back. |
WITH RESULT SETS ((Col1 DataType,Col2 DataType )) | Specifies the Output column and Datatype of the dataframe columns. This is optional. |
Demo Walkthrough
As part of the demo we have a table that holds the Q1-Q4 sales details of various Car Products from Maruthi Suzuki. We will use SQL R Services to query the table and derive meaningful data out of it. We will also see how to update the table back with manipulated data.
Before running R Script we have to ensure that SQL Services Launchpad service is running, else we will get the below error.
We can check for its status from Services.msc.
If it is in stopped state, start it.
Script 1 - Derive the Mean
As part of the first script we will be using R Script to get the mean value of total sales in Q1. We will use the Select statement to get the table data into ‘@input_data_1’. Then the R Script ‘N'OutputDataSet <- data.frame(mean(InputDataSet[,1]))’ will be applied on the input data. Mean function will get the average of the column values which will be displayed as ‘Q1SalesMean’ which is specified in the ‘RESULT SETS’ section.
EXECUTE sp_execute_external_script
@language = N'R'
, @script = N'OutputDataSet <- data.frame(mean(InputDataSet[,1]))'
, @input_data_1 = N'SELECT Q1Sales from MarutiSuzukiSalesData as Q1SalesMean '
WITH RESULT SETS ((Q1SalesMean int not null));
Go
The output will look like below,
Script 2 - Get Row Sum
In this script, we will see how to get the sum of Q1, Q2, Q3 and Q4 sales data for each Car Product. After getting the table data into @input_data_1 we will apply the script body mentioned in @script parameter to the input data. We will sum the Q1-Q4 values and display it using RESULT SETS section.
SELECT * from MarutiSuzukiSalesData
EXECUTE sp_execute_external_script
@language = N'R'
, @script = N'OutputDataSet <- data.frame(InputDataSet$Product,InputDataSet$Q1Sales+InputDataSet$Q2Sales+
InputDataSet$Q3Sales+InputDataSet$Q4Sales)'
, @input_data_1 = N'SELECT * from MarutiSuzukiSalesData as TotalSales '
WITH RESULT SETS ((Products nvarchar(50),TotalSales int not null));
go
The output will look like below,
Script 3 - Fetch the Maximum
As part of this script, we will get the product that has the maximum sales in Q1. We do this by using the script 'OutputDataSet <- data.frame(head(InputDataSet[order(InputDataSet$Q1Sales, decreasing = T),], n = 1))'
Here, we are sorting the table in descending order by Q1Sales and using the head method to get the first record by mentioning ‘n=1’
EXECUTE sp_execute_external_script
@language = N'R'
, @script = N'OutputDataSet <- data.frame(head(InputDataSet[order(InputDataSet$Q1Sales, decreasing = T),], n = 1))'
, @input_data_1 = N'SELECT Product,Q1Sales from MarutiSuzukiSalesData as Q1SalesMax '
WITH RESULT SETS ((Product nvarchar(50),Q1SalesMax int not null));
go
The output of the script will look like below,
Script 4 - Update the Table
In this script, we will add the Q1-Q4 data and update it to a new table. We will be using the Insert and Exec commands together to sum up the values and update it to the new table. The Script body is same as in Script 2. We have an additional parameter ’@output_data_1_name’ to hold the output data that will be used by the Insert command to update the newly created Table.
SELECT * from MarutiSuzukiSalesData
Create Table TotalSalesData(Product nvarchar(50) ,TotalSales int not null);
Insert Into TotalSalesData
EXEC sp_execute_external_script
@language = N'R'
,@script = N' OutputDataSet <- data.frame(InputDataSet$Product,
InputDataSet$Q1Sales+InputDataSet$Q2Sales+
InputDataSet$Q3Sales+InputDataSet$Q4Sales);'
,@input_data_1 = N'SELECT * from MarutiSuzukiSalesData as Product ;'
,@output_data_1_name = N' OutputDataSet'
Select * from TotalSalesData
The output will look like below,
Summary
Thus, we saw how to integrate SQL Server R Services (In-Database) mode and work with R Scripts.