SQL Server Import/Export to Excel using R script
Introduction
In this post, let us see a new approach to import excel into SQL Server and export SQL server data to excel. There are some existing methods to do this using BCP, Bulk Insert, Import & Export wizard from SSMS, SSIS, Azure data factory, Linked server & OPENROWSET query and SQLCMD.
- BULK INSERT statement, the BCP tool, or Azure Data Factory can't read Excel files directly
- BCP - Workaround has to be done to include the header while exporting
- SSIS - Though it supports exporting to excel, with dynamic source & destination, handling mapping between source to target increases the complexity of the package
- SQLCMD - Cannot export output in Excel file format
R & Python language extension was introduced in SQL Server 2016 & 2017 as part of machine learning. With support of R in Azure SQL database and Java language extension support in SQL Server 2019, this new approach can be used extensively as it easy, fast and flexible.
We have used SQL Server 2019 evaluation edition on Windows 10 64 bit and WideWorldImporters SQL Server sample database for this example.
Below scripts will work starting from SQL Server 2016 and above (as execution of R language using T-SQL was introduced in SQL Server 2016).
Though prerequisite steps are bit heavy, it is one time and thereafter enable us to export / import to excel easily with just a few lines of code.
Pre-requisites:
Install R services and then from SSMS enable the external scripting feature. Restart the database engine and then verify the installation as mentioned in MSDN.
You might face “Access is denied” error while executing below Stored procedures, to overcome that right-click on the folder where excel files will be imported / exported and provide folder access (Full Control) to "ALL_APPLICATION_PACKAGES".
Step 1
Check the version of R by running below code from R.exe
sessionInfo()
We have installed SQL Server 2019 so R.exe will be found in below path:
C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\R_SERVICES\bin
Step 2
From above step we got to know that it is 64 bit, So we have downloaded Java JDK for 64 bit (.exe file) from here & installed it.
C:\Program Files\Java\jdk-11.0.1
Step 3
Now let us install rJava package from R.exe
install.packages("rJava")
Step 4
Now let us set the JAVA_HOME environment to Java JDK installed in Step 2 and load the rJava package to check if it working from R.exe.
Sys.setenv(JAVA_HOME="C:/Program Files/Java/jdk-11.0.1/")
library(rJava)
Step 5
Now it's time to install & load the xlsx package from R.exe.
- xlsx package is one of the powerful R packages to read, write and format Excel files.
- It is a java-based solution and it is available for Windows, Mac and Linux.
- It works for both Excel 2007 and Excel 97/2000/XP/2003 file formats (xls and xlsx file formats).
install.packages("xlsx")
After successful installation, load xlsx package to check everything is fine.
library("xlsx")
Step 6
Copy & paste the packages downloaded into the default R library from above steps to SQL Server R_services library.
We have copied xlsx,xlsxjars & rJava folders from C:\Users\Sathy\Documents\R\win-library\3.4
to C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\R_SERVICES\library
Now as part of last prerequisite step, from SSMS let us check the installed R packages.
sp_execute_external_script @language = N'R', @script = N'OutputDataSet <- data.frame(installed.packages())'
Exporting SQL Server data (list of tables, views) into Excel file
For this approach, we have created a Stored procedure. Executing the Stored procedure based on input parameters exports SQL Server data to excel files
Stored procedure - Has below five input parameters and writes the output to excel file
i) @ExportPath - path for exporting excel files
ii) @SchemaName - list of objects under this schema to be exported
iii) @ObjectlisttoExport - list of tables, views to be exported
iv) @Objectlistdelim - delimiter for above object list
v) @ExportFileIsSingle - Export to single or multiple files. 1 - Single, 0 - Multiple
USE [WideWorldImporters]
GO
CREATE OR ALTER PROC usp_ExportExcel (@ExportPath NVARCHAR(MAX),
@SchemaName NVARCHAR(50),
@ObjectlisttoExport NVARCHAR(MAX),
@Objectlistdelim CHAR(1),
@ExportFileIsSingle BIT
)
AS
BEGIN
SET NOCOUNT ON;
IF ISNULL(@ExportPath,'') <> ''
BEGIN
IF ISNULL(@SchemaName,'') <> '' OR (ISNULL(@ObjectlisttoExport,'') <> '' AND ISNULL(@Objectlistdelim,'') <> '' )
BEGIN
DROP TABLE IF EXISTS #ExportTablesList, #FinalExportList
CREATE TABLE #ExportTablesList(TableCount INT IDENTITY(1,1),Columns NVARCHAR(MAX),TableName NVARCHAR(200))
INSERT #ExportTablesList (Columns,TableName)
SELECT CASE WHEN TY.name IN ('date','datetime2','datetimeoffset','time','timestamp')
THEN CONCAT('TRY_CONVERT(','VARCHAR(MAX),',C.name,') AS ',QUOTENAME(C.NAME))
ELSE C.name END Columns -- To cover poor data type conversions b/n R & SQL Server
,CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) TableName
FROM Sys.tables T
JOIN sys.columns C
ON T.object_id = C.object_id
JOIN sys.types TY
ON C.[user_type_id] = TY.[user_type_id]
WHERE Schema_name(T.schema_id) = @SchemaName
AND TY.name NOT IN ('geography','varbinary','binary','text', 'ntext', 'image', 'hierarchyid', 'xml', 'sql_variant') -- Ignore the datatypes that are not required to be exported
INSERT #ExportTablesList (Columns,TableName)
SELECT CASE WHEN TY.name IN ('date','datetime2','datetimeoffset','time','timestamp')
THEN CONCAT('TRY_CONVERT(','VARCHAR(MAX),',C.name,') AS ',QUOTENAME(C.NAME))
ELSE C.name END Columns -- To cover poor data type conversions b/n R & SQL Server
,CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) TableName
FROM Sys.tables T
JOIN sys.columns C
ON T.object_id = C.object_id
JOIN sys.types TY
ON C.[user_type_id] = TY.[user_type_id]
WHERE CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) IN (SELECT value FROM STRING_SPLIT(@ObjectlisttoExport, @Objectlistdelim))
AND TY.name NOT IN ('geography','varbinary','binary','text', 'ntext', 'image', 'hierarchyid', 'xml', 'sql_variant') -- Ignore the datatypes that are not required to be exported
--Dedup of object list
;WITH dedup
AS
(
SELECT *,ROW_NUMBER()OVER(PARTITION BY TableName,Columns ORDER BY Columns) Rn FROM #ExportTablesList
)
DELETE FROM dedup
WHERE Rn > 1
SELECT TableName,IDENTITY(INT,1,1) AS TableCount
, STUFF(
(
SELECT ', ' + C.Columns
From #ExportTablesList As C
WHERE C.TableName = T.TableName
FOR XML PATH('')
), 1, 2, '') AS Columns
INTO #FinalExportList
From #ExportTablesList As T
GROUP BY TableName
DECLARE @I INT = 1
,@TableName NVARCHAR(200)
,@SQL NVARCHAR(MAX) = N''
,@RScript NVARCHAR(MAX) = N''
,@ExportFilePath NVARCHAR(MAX) = N''
--Incase of export to single file if the file already exists instead of replacing all the sheets. Excel file is dropped if exists
IF @ExportFileIsSingle = 1
BEGIN
SELECT @ExportFilePath = CONCAT(REPLACE(@ExportPath,'\','/'),'/'
,CASE WHEN @SchemaName IS NULL OR @SchemaName = '' THEN 'SQLExcelExport'
WHEN ((@SchemaName IS NOT NULL AND @SchemaName <> '')
AND (@ObjectlisttoExport IS NOT NULL AND @ObjectlisttoExport <> '') )
THEN 'SQLExcelExport'
ELSE @SchemaName END
,'.xlsx')
SET @RScript = '
if (file.exists(ExcelFilePath))
#Delete file if it exist
file.remove(ExcelFilePath)'
EXEC sp_execute_external_script
@language = N'R'
,@script = @RScript
,@input_data_1 = @SQL
,@params = N'@ExcelFilePath NVARCHAR(MAX)'
,@ExcelFilePath = @ExportFilePath -- file path where Excel files are placed
END
WHILE @I <= (SELECT COUNT(TableName) FROM #FinalExportList)
BEGIN
-- Just for testing purpose top 10 records are selected
SELECT @SQL = CONCAT('SELECT TOP 10 ',Columns,' FROM ',TableName,';')
,@TableName = TableName
FROM #FinalExportList WHERE TableCount = @I
IF @ExportFileIsSingle = 0
BEGIN
SELECT @ExportFilePath = CONCAT(REPLACE(@ExportPath,'\','/'),'/',@TableName,'.xlsx')
END
SET @RScript = CONCAT('
Sys.setenv(JAVA_HOME="C:/Program Files/Java/jdk-11.0.1/")
library(rJava);
library("xlsx");
write.xlsx2(InputDataSet, file = ExcelFilePath, sheetName= TableName,
col.names=TRUE, append=',CASE WHEN @ExportFileIsSingle = 1 THEN 'TRUE)' ELSE 'FALSE)' END)
EXEC sp_execute_external_script
@language = N'R'
,@script = @RScript
,@input_data_1 = @SQL
,@params = N'@ExcelFilePath NVARCHAR(MAX), @TableName NVARCHAR(200)'
,@ExcelFilePath = @ExportFilePath -- file path where Excel files are placed
,@TableName = @TableName
IF @ExportFileIsSingle = 0
BEGIN
SET @ExportFilePath = ''
END
SET @I = @I + 1
END
END ELSE PRINT 'Schema name of objects or list of objects (separated by delimiter) to be exported need to be mentioned'
END ELSE PRINT 'Export folder path need to be mentioned'
END
Sample execution code block:
--Execute each statement one by one
--Export folder path is mandatory
EXEC usp_ExportExcel '','Sales','','',1
--Either schema name or object list needs to mentioned
EXEC usp_ExportExcel 'H:\Temp\Blog\test','','','',1
--Objects under Sales schema in WideworldImporters database are exported into single excel file
-- and file name will be schemaname.xlsx and sheet names will be schemaname.tablename
EXEC usp_ExportExcel 'H:\Temp\Blog\test','Sales','','',1
--Objects under Sales schema in WideworldImporters database are exported into multiple excel files
-- and file name will be schemaname.tablename.xlsx and sheet names are same as file name
EXEC usp_ExportExcel 'H:\Temp\Blog\test','Sales','','',0
--One table from Purchasing schema & Warehouse schema in WideworldImporters database are exported into single excel file
--Delimiter is mandatory
EXEC usp_ExportExcel 'H:\Temp\Blog\test','','Purchasing.PurchaseOrders,Warehouse.StockItemStockGroups','',1
--One table from Purchasing schema & Warehouse schema in WideworldImporters database are exported into single excel file
--Objects separated with comma as delimiter and file name will be SQLExcelExport.xlsx
EXEC usp_ExportExcel 'H:\Temp\Blog\test','','Purchasing.PurchaseOrders,Warehouse.StockItemStockGroups',',',1
--One table from Purchasing schema & Warehouse schema in WideworldImporters database are exported into multiple excel files
--Objects separated with comma as delimiter and file name will be schemaname.tablename.xlsx
EXEC usp_ExportExcel 'H:\Temp\Blog\test','','Purchasing.PurchaseOrders,Warehouse.StockItemStockGroups',',',0
--Same object is listed twice, code de duplicates and exports only once
EXEC usp_ExportExcel 'H:\Temp\Blog\test','','Purchasing.PurchaseOrders,Purchasing.PurchaseOrders',',',0
--One table from Purchasing schema & Warehouse schema along with Objects under Sales schema in WideworldImporters database
--are exported into single excel file and file name will be SQLExcelExport.xlsx and sheet names will be schemaname.tablename
EXEC usp_ExportExcel 'H:\Temp\Blog\test','Sales','Purchasing.PurchaseOrders|Warehouse.StockItemStockGroups','|',1
--One table from Purchasing schema & Warehouse schema along with Objects under Sales schema in WideworldImporters database
--are exported into multiple excel files and file name will be schemaname.tablename.xlsx and sheet names are same as file name
EXEC usp_ExportExcel 'H:\Temp\Blog\test','Sales','Purchasing.PurchaseOrders|Warehouse.StockItemStockGroups','|',0
--Note files will be replaced if any of the above statements are re-executed
Importing Excel file into SQL Server
Now let us install the readxl package for reading the excel file, this package doesn't have any external dependency (say Java, unlike xlsx package)
Open R.exe from below path:
C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\R_SERVICES\bin
From R.exe, execute
install.packages("readxl")
After successful installation of above package, Copy & paste all the packages downloaded from above step into default R library to SQL Server R_services library.
From C:\Users\Sathy\Documents\R\win-library\3.4
to C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\R_SERVICES\library
We have created a database called "Staging" to clearly see the excel files data imported into SQL tables that are created dynamically.
For this approach, we have created a Stored procedure in teh above-created database. Executing the Stored procedure based on input parameters imports excel files to SQL Server table
Stored procedure - Has below four input parameters
i) @ImportPath - path where excel files are placed for importing into SQL Server
ii) @ImportAll - If set to 1 then all files in the mentioned path are imported. If set to 0 then only mentioned files are imported
iii) @ExcelFileName - If @ImportAll = 0 then excel file name needs to be passed to this parameter
iv) @ExcelSheetName - If @ImportAll = 0 then corresponding sheet name of the excel file needs to be passed to this parameter
Provide SQL Server credentials to variable sqlConnString
before compiling below Stored procedure.
USE [Staging]
GO
CREATE OR ALTER PROC usp_ImportExcel (@ImportPath NVARCHAR(MAX),
@ImportAll BIT,
@ExcelFileName NVARCHAR(200),
@ExcelSheetName NVARCHAR(50)
)
AS
BEGIN
SET NOCOUNT ON;
IF ISNULL(@ImportPath,'') <> ''
BEGIN
SELECT @ImportPath = REPLACE(@ImportPath,'\','/')
IF (@ImportAll = 0 AND (ISNULL(@ExcelFileName,'') <> '' AND ISNULL(@ExcelSheetName,'') <> ''))
OR (@ImportAll = 1 AND (ISNULL(@ExcelFileName,'') = '' AND ISNULL(@ExcelSheetName,'') = ''))
BEGIN
DECLARE @RScript NVARCHAR(MAX) =N'
library(readxl);
library(foreach);
#pass database server, username & password
#conection string can also be placed in text file for security purpose and read using function readLines(".txt")
sqlConnString <- "Driver=SQL Server;Server=''LAPTOP-LCUB6HBB''; Database=Staging;Uid=sa;Pwd=***"
# Setting recursive =TRUE in list_filespath, list_filenames will read the excel files in sub folders as well
list_filespath <- list.files(path = ImportFilePath, pattern = "*.xlsx", all.files = FALSE,
full.names = TRUE, recursive = FALSE,
ignore.case = FALSE, include.dirs = FALSE, no.. = TRUE)
list_filenames <- list.files(path = ImportFilePath, pattern = "*.xlsx", all.files = FALSE,
full.names = FALSE, recursive = FALSE,
ignore.case = FALSE, include.dirs = FALSE, no.. = TRUE)
#Import single file
if (ImportAll == 0) {
if (is.element(ExcelFileName, list_filenames) == TRUE)
{
filepath_var <- paste(ImportFilePath, ExcelFileName, sep="/")
tab_names <- excel_sheets(path = filepath_var)
if (is.element(ExcelSheetName, tab_names) == TRUE)
{
Output <- read_excel(path = filepath_var, sheet = ExcelSheetName)
sqlTable <- paste(strsplit(ExcelFileName,split = ".xlsx"),ExcelSheetName,sep = "_")
sqlTable <- gsub("\\s", "", sqlTable)
sqlTable <- gsub("[^[:alnum:]]", "", sqlTable)
#check if the sheet has header
if (dim(Output)[2] != 0) {
sqlDS <- RxSqlServerData(connectionString = sqlConnString,table = sqlTable)
rxDataStep(inData = Output, outFile = sqlDS,overwrite = TRUE)}}}}
#Import all files
if (ImportAll == 1) {
#Check file exists in path
if (length(list_filespath) != 0) {
foreach(m = 1:length(list_filespath)) %do%
{
filepath_var <- list_filespath[m]
tab_names <- excel_sheets(path = filepath_var)
foreach(n = 1:length(tab_names)) %do%
{
sqlTable <- paste(strsplit(list_filenames[m],split = ".xlsx"),tab_names[n],sep = "_")
sqlTable <- gsub("\\s", "", sqlTable)
sqlTable <- gsub("[^[:alnum:]]", "", sqlTable)
Output <- read_excel(path = filepath_var, sheet = n)
#check if the sheet has header
if (dim(Output)[2] != 0) {
sqlDS <- RxSqlServerData(connectionString = sqlConnString,table = sqlTable)
rxDataStep(inData = Output, outFile = sqlDS,overwrite = TRUE)
}}}}}'
EXEC sp_execute_external_script
@language = N'R'
,@script = @RScript
,@params = N'@ImportFilePath NVARCHAR(MAX),@ImportAll CHAR(1),@ExcelFileName NVARCHAR(200),@ExcelSheetName NVARCHAR(50)'
,@ImportFilePath = @ImportPath
,@ImportAll = @ImportAll
,@ExcelFileName = @ExcelFileName
,@ExcelSheetName = @ExcelSheetName
END ELSE PRINT 'Invalid parameters: If ImportAll = 0 then pass Excel file & Sheet Name as input. If ImportAll = 1 then pass Excel file & Sheet Name blank'
END ELSE PRINT 'Import folder path need to be mentioned'
END
Sample execution code block:
--Execute each statement one by one
--Note: We will make use of files generated in above example
--Valid Folder path is mandatory
EXEC usp_ImportExcel '',0,'',''
--If excel file name doesn't exist in that path, no operation is performed
EXEC usp_ImportExcel 'H:\Temp\Blog\test',0,'Purchasing.PurchaseOrderszz.xlsx','Purchasing.PurchaseOrders'
--If the sheet name doesn't exist in the mentioned excel file, no operation is performed
EXEC usp_ImportExcel 'H:\Temp\Blog\test',0,'Purchasing.PurchaseOrders.xlsx','Purchasing.PurchaseOrdersxx'
--Delete data in excel file that is passed as input
--If the sheet is empty, no operation is performed
EXEC usp_ImportExcel 'H:\Temp\Blog\test',0,'Purchasing.PurchaseOrders.xlsx','Purchasing.PurchaseOrders'
--If ImportAll is set to 0 then only file mentioned in the parameter will be imported
EXEC usp_ImportExcel 'H:\Temp\Blog\test',0,'Warehouse.StockItemStockGroups.xlsx','Warehouse.StockItemStockGroups'
--If ImportAll is set to 0 then only file mentioned in the parameter will be imported
--But if we have set of files & sheets to be imported then we can do like this
SET NOCOUNT ON;
DECLARE @ExportList TABLE (Id INT IDENTITY(1,1),ExcelFileName NVARCHAR(200),ExcelSheetName NVARCHAR(50))
INSERT @ExportList (ExcelFileName,ExcelSheetName) SELECT 'SQLExcelExport.xlsx','Purchasing.PurchaseOrders'
INSERT @ExportList (ExcelFileName,ExcelSheetName) SELECT 'Sales.Customers.xlsx','Sales.Customers'
DECLARE @I INT = 1,@FileName NVARCHAR(200),@SheetName NVARCHAR(50)
WHILE @I <= (SELECT COUNT(Id) FROM @ExportList)
BEGIN
SELECT @FileName = ExcelFileName,@SheetName = ExcelSheetName
FROM @ExportList
WHERE Id = @I
PRINT @FileName+@SheetName
EXEC usp_ImportExcel 'H:\Temp\Blog\test',0,@FileName,@SheetName
SET @I = @I + 1
END
--If ImportAll is set to 1 then all the files mentioned in the path will be imported
--But other parameters has to be set to blank except file path
EXEC usp_ImportExcel 'H:\Temp\Blog\test',1,'SQLExcelExport.xlsx','Purchasing.PurchaseOrders'
--If ImportAll is set to 1 then all the files mentioned in the path will be imported
EXEC usp_ImportExcel 'H:\Temp\Blog\test',1,'',''
--Note tables will be replaced/overwritten if any of the above statements are re-executed
Summary
This post is just to give an overview of this new approach of importing / exporting to excel files in SQL Server using R script. With respect to moving data (Import / Export) between SQL Server & Excel, there are various scenarios based on each requirement. We have covered some of them but tweaking the solutions mentioned above can cover any scenario.