다음을 통해 공유


SQL Server Import/Export to Excel using R script

Introduction

In this post, let us see a new approach to import excel into SQL Server and export SQL server data to excel. There are some existing methods to do this using BCP, Bulk Insert, Import & Export wizard from SSMS, SSIS, Azure data factory, Linked server & OPENROWSET query and SQLCMD.

  • BULK INSERT statement, the BCP tool, or Azure Data Factory can't read Excel files directly 
  • BCP - Workaround has to be done to include the header while exporting
  • SSIS - Though it supports exporting to excel, with dynamic source & destination, handling mapping between source to target increases the complexity of the package
  • SQLCMD - Cannot export output in Excel file format

R & Python language extension was introduced in SQL Server 2016 & 2017 as part of machine learning. With support of R in Azure SQL database and Java language extension support in SQL Server 2019, this new approach can be used extensively as it easy, fast and flexible.

We have used SQL Server 2019 evaluation edition on Windows 10 64 bit and WideWorldImporters SQL Server sample database  for this example.

Below scripts will work starting from SQL Server 2016 and above (as execution of R language using T-SQL was introduced in SQL Server 2016).

Though prerequisite steps are bit heavy, it is one time and thereafter enable us to export / import to excel easily with just a few lines of code.

Pre-requisites:

Install R services  and then from SSMS enable the external scripting feature. Restart the database engine and then verify the installation as mentioned in MSDN.

You might face “Access is denied” error while executing below Stored procedures, to overcome that right-click on the folder where excel files will be imported / exported and provide folder access (Full Control) to "ALL_APPLICATION_PACKAGES".

Step 1

Check the version of R by running below code from R.exe

sessionInfo()

We have installed SQL Server 2019 so R.exe will be found in below path:
C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\R_SERVICES\bin

Step 2

From above step we got to know that it is 64 bit, So we have downloaded Java JDK for 64 bit (.exe file) from here & installed it.
C:\Program Files\Java\jdk-11.0.1

Step 3

Now let us install rJava package from R.exe

install.packages("rJava")

Step 4

Now let us set the JAVA_HOME environment to Java JDK installed in Step 2 and load the rJava package to check if it working from R.exe.

  Sys.setenv(JAVA_HOME="C:/Program Files/Java/jdk-11.0.1/")
library(rJava)


Step 5

Now it's time to install & load the xlsx package from R.exe.

  • xlsx package is one of the powerful R packages to read, write and format Excel files.
  • It is a java-based solution and it is available for Windows, Mac and Linux.
  • It works for both Excel 2007 and Excel 97/2000/XP/2003 file formats (xls and xlsx file formats).
install.packages("xlsx")

After successful installation, load xlsx package to check everything is fine.

library("xlsx")

Step 6

Copy & paste the packages downloaded into the default R library from above steps to SQL Server R_services library.

We have copied xlsx,xlsxjars & rJava folders from C:\Users\Sathy\Documents\R\win-library\3.4
to C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\R_SERVICES\library

Now as part of last prerequisite step, from SSMS let us check the installed R packages.

sp_execute_external_script @language = N'R', @script = N'OutputDataSet <- data.frame(installed.packages())'

Exporting SQL Server data (list of tables, views) into Excel file

For this approach, we have created a Stored procedure. Executing the Stored procedure based on input parameters exports SQL Server data to excel files

Stored procedure  -  Has below five input parameters and writes the output to excel file

i)   @ExportPath            -  path for exporting excel files
ii)  @SchemaName       -   list of objects under this schema to be exported
iii) @ObjectlisttoExport  -   list of tables, views to be exported
iv) @Objectlistdelim       -  delimiter for above object list
v)  @ExportFileIsSingle  - Export to single or multiple files. 1 - Single, 0 - Multiple

USE [WideWorldImporters]
GO
 
 
 
CREATE OR  ALTER PROC usp_ExportExcel (@ExportPath NVARCHAR(MAX),
                                      @SchemaName NVARCHAR(50),
                                      @ObjectlisttoExport NVARCHAR(MAX),
                                      @Objectlistdelim CHAR(1),
                                      @ExportFileIsSingle BIT
                                      )
AS
BEGIN
SET NOCOUNT ON;
 
IF ISNULL(@ExportPath,'') <> ''
 BEGIN
 
 IF ISNULL(@SchemaName,'') <> ''  OR  (ISNULL(@ObjectlisttoExport,'') <> '' AND  ISNULL(@Objectlistdelim,'') <> '' )
 BEGIN
   
DROP TABLE  IF EXISTS #ExportTablesList, #FinalExportList
 
CREATE TABLE  #ExportTablesList(TableCount INT IDENTITY(1,1),Columns NVARCHAR(MAX),TableName NVARCHAR(200))
 
INSERT #ExportTablesList (Columns,TableName)
SELECT  CASE  WHEN TY.name IN ('date','datetime2','datetimeoffset','time','timestamp') 
               THEN CONCAT('TRY_CONVERT(','VARCHAR(MAX),',C.name,') AS ',QUOTENAME(C.NAME)) 
          ELSE C.name END  Columns  -- To cover poor data type conversions b/n R & SQL Server
  ,CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) TableName
FROM Sys.tables T
JOIN sys.columns C
ON T.object_id = C.object_id
JOIN sys.types TY 
ON C.[user_type_id] = TY.[user_type_id]
WHERE Schema_name(T.schema_id) = @SchemaName
AND TY.name NOT IN  ('geography','varbinary','binary','text', 'ntext',  'image', 'hierarchyid', 'xml',  'sql_variant')   -- Ignore the datatypes that are not required to be exported
 
INSERT #ExportTablesList (Columns,TableName)
SELECT  CASE  WHEN TY.name IN ('date','datetime2','datetimeoffset','time','timestamp') 
               THEN CONCAT('TRY_CONVERT(','VARCHAR(MAX),',C.name,') AS ',QUOTENAME(C.NAME)) 
          ELSE C.name END  Columns  -- To cover poor data type conversions b/n R & SQL Server
  ,CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) TableName
FROM Sys.tables T
JOIN sys.columns C
ON T.object_id = C.object_id
JOIN sys.types TY 
ON C.[user_type_id] = TY.[user_type_id] 
WHERE CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) IN (SELECT value FROM STRING_SPLIT(@ObjectlisttoExport, @Objectlistdelim))
AND TY.name NOT IN  ('geography','varbinary','binary','text', 'ntext',  'image', 'hierarchyid', 'xml',  'sql_variant')   -- Ignore the datatypes that are not required to be exported
 
--Dedup of object list
;WITH dedup
AS
(
SELECT *,ROW_NUMBER()OVER(PARTITION BY TableName,Columns ORDER BY  Columns) Rn FROM  #ExportTablesList
)
DELETE FROM  dedup
WHERE Rn > 1
 
SELECT TableName,IDENTITY(INT,1,1) AS  TableCount
    , STUFF(
        (
        SELECT ', ' + C.Columns
        From #ExportTablesList As C
        WHERE C.TableName = T.TableName
        FOR XML PATH('')
        ), 1, 2, '') AS Columns
INTO #FinalExportList
From #ExportTablesList As T
GROUP BY  TableName
 
 
DECLARE @I INT = 1
       ,@TableName NVARCHAR(200)
       ,@SQL NVARCHAR(MAX) = N''
       ,@RScript NVARCHAR(MAX) = N''
       ,@ExportFilePath NVARCHAR(MAX) = N''
 
--Incase of export to single file if the file already exists instead of replacing all the sheets. Excel file is dropped if exists
 IF @ExportFileIsSingle = 1
BEGIN
 
  SELECT @ExportFilePath = CONCAT(REPLACE(@ExportPath,'\','/'),'/'
                                 ,CASE WHEN @SchemaName IS NULL OR @SchemaName = '' THEN 'SQLExcelExport' 
                                       WHEN ((@SchemaName IS NOT NULL AND @SchemaName <> '')
                                            AND (@ObjectlisttoExport IS NOT NULL AND @ObjectlisttoExport <> '') )   
                                        THEN 'SQLExcelExport'
                                   ELSE @SchemaName END
                                  ,'.xlsx')
 
  SET @RScript = '
    if (file.exists(ExcelFilePath)) 
    #Delete file if it exist
    file.remove(ExcelFilePath)'
 
EXEC   sp_execute_external_script
      @language = N'R'
     ,@script = @RScript
     ,@input_data_1 = @SQL
     ,@params = N'@ExcelFilePath NVARCHAR(MAX)'
     ,@ExcelFilePath = @ExportFilePath -- file path where Excel files are placed
 
END
  
WHILE @I <= (SELECT COUNT(TableName) FROM #FinalExportList)
BEGIN
  
 -- Just for testing purpose top 10 records are selected
SELECT @SQL = CONCAT('SELECT TOP  10 ',Columns,'  FROM ',TableName,';')
      ,@TableName = TableName
FROM #FinalExportList WHERE TableCount = @I
 
 IF @ExportFileIsSingle = 0
 BEGIN
  SELECT @ExportFilePath = CONCAT(REPLACE(@ExportPath,'\','/'),'/',@TableName,'.xlsx')
 END
  
SET @RScript = CONCAT('
    Sys.setenv(JAVA_HOME="C:/Program Files/Java/jdk-11.0.1/")
    library(rJava);
    library("xlsx");
    write.xlsx2(InputDataSet, file = ExcelFilePath, sheetName= TableName,
    col.names=TRUE,  append=',CASE WHEN @ExportFileIsSingle = 1 THEN 'TRUE)' ELSE 'FALSE)' END)
  
EXEC   sp_execute_external_script
      @language = N'R'
     ,@script = @RScript
     ,@input_data_1 = @SQL
     ,@params = N'@ExcelFilePath NVARCHAR(MAX), @TableName NVARCHAR(200)'
     ,@ExcelFilePath = @ExportFilePath -- file path where Excel files are placed
     ,@TableName = @TableName
 
 IF @ExportFileIsSingle = 0
 BEGIN
   SET @ExportFilePath = ''
 END
  
SET @I = @I + 1
END
 
 END ELSE PRINT 'Schema name  of objects or list of objects (separated by delimiter) to be exported need to be mentioned'
 
 END ELSE PRINT 'Export folder path need to  be mentioned'
 
END

Sample execution code block:

--Execute each statement one by one
 
--Export folder path is mandatory
EXEC usp_ExportExcel '','Sales','','',1
 
--Either schema name or object list needs to mentioned
EXEC usp_ExportExcel 'H:\Temp\Blog\test','','','',1
 
--Objects under Sales schema in WideworldImporters database are exported into single excel file
-- and file name will be schemaname.xlsx and sheet names will be schemaname.tablename
EXEC usp_ExportExcel 'H:\Temp\Blog\test','Sales','','',1
 
--Objects under Sales schema in WideworldImporters database are exported into multiple excel files
-- and file name will be schemaname.tablename.xlsx and sheet names are same as file name
EXEC usp_ExportExcel 'H:\Temp\Blog\test','Sales','','',0
 
--One table from Purchasing schema & Warehouse schema in WideworldImporters database are exported into single excel file
--Delimiter is mandatory
EXEC usp_ExportExcel 'H:\Temp\Blog\test','','Purchasing.PurchaseOrders,Warehouse.StockItemStockGroups','',1
 
--One table from Purchasing schema & Warehouse schema in WideworldImporters database are exported into single excel file
--Objects separated with comma as delimiter and file name will be SQLExcelExport.xlsx
EXEC usp_ExportExcel 'H:\Temp\Blog\test','','Purchasing.PurchaseOrders,Warehouse.StockItemStockGroups',',',1
 
--One table from Purchasing schema & Warehouse schema in WideworldImporters database are exported into multiple excel files
--Objects separated with comma as delimiter and file name will be schemaname.tablename.xlsx
EXEC usp_ExportExcel 'H:\Temp\Blog\test','','Purchasing.PurchaseOrders,Warehouse.StockItemStockGroups',',',0
 
--Same object is listed twice, code de duplicates and exports only once
EXEC usp_ExportExcel 'H:\Temp\Blog\test','','Purchasing.PurchaseOrders,Purchasing.PurchaseOrders',',',0
 
--One table from Purchasing schema & Warehouse schema along with Objects under Sales schema in WideworldImporters database 
--are exported into single excel file and file name will be SQLExcelExport.xlsx and sheet names will be schemaname.tablename
EXEC usp_ExportExcel 'H:\Temp\Blog\test','Sales','Purchasing.PurchaseOrders|Warehouse.StockItemStockGroups','|',1
 
--One table from Purchasing schema & Warehouse schema along with Objects under Sales schema in WideworldImporters database 
--are exported into multiple excel files and file name will be schemaname.tablename.xlsx and sheet names are same as file name
EXEC usp_ExportExcel 'H:\Temp\Blog\test','Sales','Purchasing.PurchaseOrders|Warehouse.StockItemStockGroups','|',0
 
--Note files will be replaced if any of the above statements are re-executed

Importing Excel file into SQL Server

Now let us install the readxl package for reading the excel file, this package doesn't have any external dependency (say Java, unlike xlsx package)

Open R.exe from below path:
C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\R_SERVICES\bin

From R.exe, execute 

install.packages("readxl")

After successful installation of above package, Copy & paste all the packages downloaded from above step into default R library  to SQL Server R_services library.

From C:\Users\Sathy\Documents\R\win-library\3.4
to C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\R_SERVICES\library

We have created a database called "Staging" to clearly see the excel files data imported into SQL tables that are created dynamically.

For this approach, we have created a Stored procedure in teh above-created database. Executing the Stored procedure based on input parameters imports excel files to SQL Server table

Stored procedure - Has below four input parameters
i)   @ImportPath             -  path where excel files are placed for importing into SQL Server
ii)  @ImportAll                 -  If set to 1 then all files in the mentioned path are imported. If set to 0 then only mentioned files are imported
iii) @ExcelFileName       -  If @ImportAll =  0 then excel file name needs to be passed to this parameter
iv) @ExcelSheetName   -  If @ImportAll =  0 then corresponding sheet name of the excel file needs to be passed to this parameter

Provide SQL Server credentials to variable sqlConnString before compiling below Stored procedure.

USE [Staging]
GO
 
CREATE OR ALTER PROC usp_ImportExcel (@ImportPath NVARCHAR(MAX),
                                      @ImportAll BIT,
                                      @ExcelFileName NVARCHAR(200),
                                      @ExcelSheetName NVARCHAR(50)
                                      )
AS
BEGIN
SET NOCOUNT ON;
 
IF ISNULL(@ImportPath,'') <> '' 
 BEGIN
 
    SELECT @ImportPath = REPLACE(@ImportPath,'\','/')
 
  IF (@ImportAll = 0 AND (ISNULL(@ExcelFileName,'') <> '' AND ISNULL(@ExcelSheetName,'') <> '')) 
  OR (@ImportAll = 1 AND (ISNULL(@ExcelFileName,'') = '' AND ISNULL(@ExcelSheetName,'') = ''))
 
  BEGIN
 
DECLARE @RScript NVARCHAR(MAX) =N' 
    library(readxl);
    library(foreach);
 
    #pass database server, username & password
    #conection string can also be placed in text file for security purpose and read using function readLines(".txt")
 
    sqlConnString <- "Driver=SQL Server;Server=''LAPTOP-LCUB6HBB''; Database=Staging;Uid=sa;Pwd=***"
 
    # Setting recursive =TRUE in list_filespath, list_filenames will read the excel files in sub folders as well
 
    list_filespath <- list.files(path = ImportFilePath, pattern = "*.xlsx", all.files = FALSE,
           full.names = TRUE, recursive = FALSE,
           ignore.case = FALSE, include.dirs = FALSE, no.. = TRUE)
 
    list_filenames <- list.files(path = ImportFilePath, pattern = "*.xlsx", all.files = FALSE,
                        full.names = FALSE, recursive = FALSE,
                        ignore.case = FALSE, include.dirs = FALSE, no.. = TRUE)
    
    #Import single file
    if (ImportAll == 0) {
 
        if (is.element(ExcelFileName, list_filenames) == TRUE)
    {
 
    filepath_var <- paste(ImportFilePath, ExcelFileName, sep="/")
    tab_names <- excel_sheets(path = filepath_var)
 
    if (is.element(ExcelSheetName, tab_names) == TRUE)
    {
    Output <- read_excel(path = filepath_var, sheet = ExcelSheetName)
     
   sqlTable <- paste(strsplit(ExcelFileName,split = ".xlsx"),ExcelSheetName,sep = "_") 
   sqlTable <- gsub("\\s", "", sqlTable)
   sqlTable <- gsub("[^[:alnum:]]", "", sqlTable)
 
      #check  if the sheet has header 
   if (dim(Output)[2] != 0) {
 
    sqlDS <- RxSqlServerData(connectionString = sqlConnString,table = sqlTable)
    rxDataStep(inData = Output, outFile = sqlDS,overwrite = TRUE)}}}}
 
    #Import all files
 
    if (ImportAll == 1) {
 
  #Check file exists in path
    if (length(list_filespath) != 0) {
 
 foreach(m = 1:length(list_filespath)) %do%
 {
    filepath_var <- list_filespath[m]
    tab_names <- excel_sheets(path = filepath_var)
 
 foreach(n = 1:length(tab_names)) %do%
 {
  sqlTable <- paste(strsplit(list_filenames[m],split = ".xlsx"),tab_names[n],sep = "_") 
  sqlTable <- gsub("\\s", "", sqlTable)
  sqlTable <- gsub("[^[:alnum:]]", "", sqlTable)
  Output <- read_excel(path = filepath_var, sheet = n)
 
   #check  if the sheet has header 
   if (dim(Output)[2] != 0) {
 
  sqlDS <- RxSqlServerData(connectionString = sqlConnString,table = sqlTable)
  rxDataStep(inData = Output, outFile = sqlDS,overwrite = TRUE)
 }}}}}'
 
EXEC   sp_execute_external_script
      @language = N'R'
     ,@script = @RScript
     ,@params = N'@ImportFilePath NVARCHAR(MAX),@ImportAll CHAR(1),@ExcelFileName NVARCHAR(200),@ExcelSheetName NVARCHAR(50)'
     ,@ImportFilePath = @ImportPath
     ,@ImportAll = @ImportAll
     ,@ExcelFileName = @ExcelFileName
     ,@ExcelSheetName = @ExcelSheetName
 
     END ELSE PRINT 'Invalid parameters: If ImportAll = 0 then pass Excel file & Sheet Name as input. If ImportAll = 1 then pass Excel file & Sheet Name blank'
 
END ELSE PRINT 'Import folder path need to be mentioned'
 
END

Sample execution code block:

--Execute each statement one by one
--Note: We will make use of files generated in above example
 
--Valid Folder path is mandatory
EXEC usp_ImportExcel '',0,'',''
 
--If excel file name doesn't exist in that path, no operation is performed
EXEC usp_ImportExcel 'H:\Temp\Blog\test',0,'Purchasing.PurchaseOrderszz.xlsx','Purchasing.PurchaseOrders'
 
--If the sheet name doesn't exist in the mentioned excel file, no  operation is  performed
EXEC usp_ImportExcel 'H:\Temp\Blog\test',0,'Purchasing.PurchaseOrders.xlsx','Purchasing.PurchaseOrdersxx'
 
--Delete data in excel file that is passed as input 
--If the sheet is empty, no operation is performed
EXEC usp_ImportExcel 'H:\Temp\Blog\test',0,'Purchasing.PurchaseOrders.xlsx','Purchasing.PurchaseOrders'
 
--If ImportAll is set to 0 then only file mentioned in the parameter will be imported
EXEC usp_ImportExcel 'H:\Temp\Blog\test',0,'Warehouse.StockItemStockGroups.xlsx','Warehouse.StockItemStockGroups'
 
--If ImportAll is set to 0 then only file mentioned in the parameter will be imported
--But if we have set of files & sheets to be imported then we can do like this
SET NOCOUNT ON;
DECLARE @ExportList TABLE (Id INT IDENTITY(1,1),ExcelFileName NVARCHAR(200),ExcelSheetName NVARCHAR(50))
INSERT @ExportList (ExcelFileName,ExcelSheetName) SELECT 'SQLExcelExport.xlsx','Purchasing.PurchaseOrders'
INSERT @ExportList (ExcelFileName,ExcelSheetName) SELECT 'Sales.Customers.xlsx','Sales.Customers'
 
DECLARE @I INT = 1,@FileName NVARCHAR(200),@SheetName NVARCHAR(50)
 
WHILE @I <= (SELECT COUNT(Id) FROM @ExportList)
BEGIN
  
 SELECT @FileName = ExcelFileName,@SheetName = ExcelSheetName
 FROM @ExportList
 WHERE Id = @I
 
 PRINT @FileName+@SheetName
 
 EXEC usp_ImportExcel 'H:\Temp\Blog\test',0,@FileName,@SheetName
 SET @I = @I + 1
END
 
--If ImportAll is set to 1 then all the files mentioned in the path will be imported
--But other parameters has to be set to blank except file path
EXEC usp_ImportExcel 'H:\Temp\Blog\test',1,'SQLExcelExport.xlsx','Purchasing.PurchaseOrders'
 
--If ImportAll is set to 1 then all the files mentioned in the path will be imported
EXEC usp_ImportExcel 'H:\Temp\Blog\test',1,'',''
 
--Note tables will be replaced/overwritten if any of the above statements are re-executed

Summary

This post is just to give an overview of this new approach of importing / exporting to excel files in SQL Server using R script. With respect to moving data (Import / Export) between SQL Server & Excel, there are various scenarios based on each requirement. We have covered some of them but tweaking the solutions mentioned above can cover any scenario.

See Also