SQL Server: Dynamic load of flat files with heterogeneous column structures without using C#, Visual Basic, or PowerShell
Introduction
This article discusses how one can dynamically load files that are of heterogeneous structure to a SQL Server database without using languages like C#, Visual Basic, or PowerShell. What this means is that, different files have different columns as in column number and column data types. The discussion will be as simple as possible so that even if one is new to SSIS they will be able to follow the steps. I have used SQL 2008R2 for illustration purposes in this discussion.
Prerequisites
- Stack of files to be loaded – this will be in a landing folder.
- Control table with destination tables definitions – this table will have definitions of table names, column names, and data types.
- SSIS variables – create the following SSIS variables
Variable Name |
Scope |
Data Type |
Archive_Folder_Path |
My_Package |
String |
Database_Name |
My_Package |
String |
File_Name |
My_Package |
String |
Landing_Folder_Path |
My_Package |
String |
Project_Build_Version |
My_Package |
String |
Move_File_To_Archive_Folder |
My_Package |
String |
Parent_Folder_Path |
My_Package |
String |
SQL_Statement_to_Load_Final_Destination_Table |
My_Package |
String |
SQL_Statement_to_Load_Final_Destination_Table_OBJECT |
My_Package |
System.Object |
Table_Name |
My_Package |
String |
Step by step sequence
Step by step sequence follows the steps displayed on the below SSIS package diagram.
FELC Load Files (For Each Loop Container)
- This points to the file source folder. In the case of this example, the following has been used:
- Folder: \my_server\my_drive\my_folder\sub_folder\Landing_Folder\
- Files: *.txt
SQL Logic Placeholder (Execute SQL Task)
- I only included this to make it possible for me to apply filter action on the precedent constraint so that only relevant files are processed.
- Precedent Constraint Expression: SUBSTRING( @[User::File_Name], 1, 3) == "LLD" || SUBSTRING( @[User::File_Name], 1, 3) == "LLC"
SQL Create Table (Execute SQL Task)
- This task executes a stored procedure which dynamically creates target and staging tables. This can be a one off exercise. It’s all up to you depending on the requirements of the business whether there is a need to create tables each time the process is run or not. This procedure is set to create tables anew each time the package is run.
- Please note the order in which the tables are created in the stored procedure. In this example I have set,
- The final destination table to be created first using data from the control table that holds table definitions.
- The staging table is then created using INFROMATION_SCHEMA. All data types in the staging table are of VARCHAR type. Anything other than that might result in “Bulk load conversion error” due to mismatch of data types.
- There are three parameters to be mapped on this task (Execute SQL Task) which will ultimately feed into the stored procedure. These are,
- User::Table_Name
- User::Project_Build_Version
- User::Database_Name
- SQLStatement: exec usp_dynamically_create_target_and_staging_tables ?,?,?
- Find below an example stored procedure that I have used for this article:
/* =============================================
-- Author:
-- Create date:
-- Description:
This procedure dynamically creates final destination tables in line with definitions in the My_Control_Table.
-- Revision History:
-- =============================================*/
CREATE PROCEDURE [dbo].[usp_dynamically_create_target_and_staging_tables] --'My_Test_Name','2.9.0.0'
@Table_Name VARCHAR(255) --= 'My_Test_Name'
,@Project_Build_Version VARCHAR(50) --= '2.9.0.0'
,@Database_Name VARCHAR(100) --='My_Database'
AS
BEGIN
SET NOCOUNT ON;
--DECLARE @Table_Name VARCHAR(255) = 'My_Test_Name'
--DECLARE @Project_Build_Version VARCHAR(50) = '2.9.0.0'
DECLARE @Column_Name VARCHAR(255)
DECLARE @Column_Name_Fully_Qualified VARCHAR(255)
DECLARE @First_Table_Column VARCHAR(255)
DECLARE @First_Table_Column_Fully_Qualified VARCHAR(255)
DECLARE @Column_Default_Name VARCHAR(255)
DECLARE @Table_Create_SQLStatement NVARCHAR(4000)
DECLARE @ColumnSQLStatement NVARCHAR(4000)
DECLARE @AlterTableSQLStatement NVARCHAR(4000)
DECLARE @SQLDropTable NVARCHAR (4000)
/***************************************************************************************************************
Check first if staging & final tables exist
***************************************************************************************************************/
--staging table
IF EXISTS (SELECT 1 FROM My_Database.INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = @Table_Name AND TABLE_SCHEMA = 'staging')
BEGIN
SET @SQLDropTable = 'DROP TABLE My_Database.staging.' + @Table_Name
exec sp_executeSQL @SQLDropTable
END
--final table
IF EXISTS (SELECT 1 FROM My_Database.INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = @Table_Name AND TABLE_SCHEMA = 'something_schema')
BEGIN
SET @SQLDropTable = 'DROP TABLE My_Database.something_schema.' + @Table_Name
exec sp_executeSQL @SQLDropTable
END
/***************************************************************************************************************
CREATE Final Destination Table BLOCK
***************************************************************************************************************/
SET @Table_Name = (SELECT Distinct Code FROM My_Migration_Project.dbo.My_Control_Table WHERE Code = @Table_Name AND Version = @Project_Build_Version)
--Get a list of columns for the current table
IF OBJECT_ID ('tempdb..#Column_List_Temp') IS NOT NULL
DROP TABLE #Column_List_Temp
SELECT Field,
Data_Item,
CASE
WHEN Data_Type = 'Long' THEN 'Numeric'
WHEN Data_Type in ('Varcharmax','text') THEN 'Varchar'
WHEN Data_Type like '%int%' THEN 'Bigint'
ELSE Data_Type
END AS Data_Type,
LENGTH,
0 AS Flag
INTO #Column_List_Temp
FROM My_Migration_Project.dbo.My_Control_Table
WHERE Code = @Table_Name
AND VERSION = @Project_Build_Version
AND Field != 1
AND Code = @Table_Name
--Get a list of predefined columns
IF OBJECT_ID('tempdb..#First_Column_Details_List') IS NOT NULL
DROP TABLE #First_Column_Details_List
SELECT Field,
Data_Item,
CASE
WHEN Data_Type = 'Long' THEN 'Numeric'
WHEN Data_Type in ('Varcharmax','text') THEN 'Varchar'
WHEN Data_Type like '%int%' THEN 'Bigint'
ELSE Data_Type
END AS Data_Type,
LENGTH,
0 AS Flag
INTO #First_Column_Details_List
FROM My_Migration_Project.dbo.My_Control_Table
WHERE Field = 1
AND VERSION = @Project_Build_Version
AND Code = @Table_Name
--select * from #First_Column_Details_List
--set first column
SET @First_Table_Column =
(SELECT Data_Item
FROM #First_Column_Details_List
WHERE Flag = 0
AND Field =
(SELECT MIN(Field)
FROM #First_Column_Details_List
WHERE Flag = 0))
SET @First_Table_Column_Fully_Qualified =
(SELECT Data_Item + ' ' + Data_Type + CASE
WHEN Data_Type IN ('varchar',
'nvarchar',
'Char'
)
AND LENGTH IS NOT NULL THEN ' ('
ELSE ''
END + CASE
WHEN Data_Type IN ('varchar',
'nvarchar',
'Char'
)
AND LENGTH IS NOT NULL THEN CONVERT(VARCHAR,LENGTH)
ELSE ''
END + CASE
WHEN Data_Type IN ('varchar',
'nvarchar',
'Char'
)
AND LENGTH IS NOT NULL THEN ')'
ELSE ''
END
FROM #First_Column_Details_List
WHERE Data_Item = @First_Table_Column)
SET @Table_Create_SQLStatement = 'CREATE TABLE ' + @Database_Name + '.something_schema.' + '' + @Table_Name + ' (' + @First_Table_Column_Fully_Qualified + ')'
--PRINT @Table_Create_SQLStatement
EXEC sp_executeSQL @Table_Create_SQLStatement
UPDATE #First_Column_Details_List
SET Flag = 1 WHERE Data_Item = @First_Table_Column
--add predefined columns
WHILE 1=1
BEGIN
SET @First_Table_Column =
(SELECT Data_Item
FROM #First_Column_Details_List
WHERE Flag = 0
AND Field =
(SELECT MIN(Field)
FROM #First_Column_Details_List
WHERE Flag = 0))
SET @First_Table_Column_Fully_Qualified =
(SELECT Data_Item + ' ' + Data_Type + CASE
WHEN Data_Type IN ('varchar',
'nvarchar',
'Char'
)
AND LENGTH IS NOT NULL THEN ' ('
ELSE ''
END + CASE
WHEN Data_Type IN ('varchar',
'nvarchar',
'Char'
)
AND LENGTH IS NOT NULL THEN CONVERT(VARCHAR,LENGTH)
ELSE ''
END + CASE
WHEN Data_Type IN ('varchar',
'nvarchar',
'Char'
)
AND LENGTH IS NOT NULL THEN ')'
ELSE ''
END
FROM #First_Column_Details_List
WHERE Data_Item = @First_Table_Column)
SET @ColumnSQLStatement = 'ALTER TABLE ' + @Database_Name + '.something_schema.' + '' + @Table_Name + ' ADD ' + @First_Table_Column_Fully_Qualified
--PRINT @ColumnSQLStatement
EXEC sp_executeSQL @ColumnSQLStatement
UPDATE #First_Column_Details_List
SET Flag = 1 WHERE Data_Item = @First_Table_Column
IF (SELECT COUNT(*) FROM #First_Column_Details_List WHERE Flag = 0) < 1 BREAK
END
/*************************Rest of the columns*******************************/
WHILE 1=1
BEGIN
SET @Column_Name = (SELECT Data_Item FROM #Column_List_Temp
WHERE Flag = 0
AND Field =
(SELECT MIN(Field)
FROM #Column_List_Temp
WHERE Flag = 0))
SET @Column_Name_Fully_Qualified =
(SELECT Data_Item + ' ' + Data_Type + CASE
WHEN Data_Type IN ('varchar',
'nvarchar',
'Char'
)
AND LENGTH IS NOT NULL THEN ' ('
ELSE ''
END + CASE
WHEN Data_Type IN ('varchar',
'nvarchar',
'Char'
)
AND LENGTH IS NOT NULL THEN CONVERT(VARCHAR,LENGTH)
ELSE ''
END + CASE
WHEN Data_Type IN ('varchar',
'nvarchar',
'Char'
)
AND LENGTH IS NOT NULL THEN ')'
ELSE ''
END
FROM #Column_List_Temp
WHERE Data_Item = @Column_Name)
SET @ColumnSQLStatement = 'ALTER TABLE ' + @Database_Name + '.something_schema.' + '' + @Table_Name + ' ADD ' + @Column_Name_Fully_Qualified
--PRINT @ColumnSQLStatement
EXEC sp_executeSQL @ColumnSQLStatement
UPDATE #Column_List_Temp
SET Flag = 1
WHERE Data_Item = @Column_Name
IF (SELECT COUNT(*) FROM #Column_List_Temp WHERE Flag = 0) < 1 BREAK
END
/***************************************************************************************************************
CREATE Staging Table BLOCK
***************************************************************************************************************/
--set all data types to varchar max
UPDATE #Column_List_Temp
SET Data_Type = 'varchar',
LENGTH = 8000,
Flag = 0
UPDATE #First_Column_Details_List
SET Data_Type = 'varchar',
LENGTH = 8000,
Flag = 0
/**************************************************************************************************************/
SET @Table_Name = (SELECT Distinct Code FROM My_Migration_Project.dbo.My_Control_Table WHERE Code = @Table_Name AND Version = @Project_Build_Version)
--set predefined columns
SET @First_Table_Column =
(SELECT Data_Item
FROM #First_Column_Details_List
WHERE Flag = 0
AND Field =
(SELECT MIN(Field)
FROM #First_Column_Details_List
WHERE Flag = 0))
SET @First_Table_Column_Fully_Qualified =
(SELECT Data_Item + ' ' + Data_Type + CASE
WHEN Data_Type IN ('varchar',
'nvarchar',
'Char'
)
AND LENGTH IS NOT NULL THEN ' ('
ELSE ''
END + CASE
WHEN Data_Type IN ('varchar',
'nvarchar',
'Char'
)
AND LENGTH IS NOT NULL THEN CONVERT(VARCHAR,LENGTH)
ELSE ''
END + CASE
WHEN Data_Type IN ('varchar',
'nvarchar',
'Char'
)
AND LENGTH IS NOT NULL THEN ')'
ELSE ''
END
FROM #First_Column_Details_List
WHERE Data_Item = @First_Table_Column)
SET @Table_Create_SQLStatement = 'CREATE TABLE ' + @Database_Name + '.staging.' + '' + @Table_Name + ' (' + @First_Table_Column_Fully_Qualified + ')'
--PRINT @Table_Create_SQLStatement
EXEC sp_executeSQL @Table_Create_SQLStatement
UPDATE #First_Column_Details_List
SET Flag = 1 WHERE Data_Item = @First_Table_Column
--add predefined columns
WHILE 1=1
BEGIN
SET @First_Table_Column =
(SELECT Data_Item
FROM #First_Column_Details_List
WHERE Flag = 0
AND Field =
(SELECT MIN(Field)
FROM #First_Column_Details_List
WHERE Flag = 0))
SET @First_Table_Column_Fully_Qualified =
(SELECT Data_Item + ' ' + Data_Type + CASE
WHEN Data_Type IN ('varchar',
'nvarchar',
'Char'
)
AND LENGTH IS NOT NULL THEN ' ('
ELSE ''
END + CASE
WHEN Data_Type IN ('varchar',
'nvarchar',
'Char'
)
AND LENGTH IS NOT NULL THEN CONVERT(VARCHAR,LENGTH)
ELSE ''
END + CASE
WHEN Data_Type IN ('varchar',
'nvarchar',
'Char'
)
AND LENGTH IS NOT NULL THEN ')'
ELSE ''
END
FROM #First_Column_Details_List
WHERE Data_Item = @First_Table_Column)
SET @ColumnSQLStatement = 'ALTER TABLE ' + @Database_Name + '.staging.' + '' + @Table_Name + ' ADD ' + @First_Table_Column_Fully_Qualified
--PRINT @ColumnSQLStatement
EXEC sp_executeSQL @ColumnSQLStatement
UPDATE #First_Column_Details_List
SET Flag = 1 WHERE Data_Item = @First_Table_Column
IF (SELECT COUNT(*) FROM #First_Column_Details_List WHERE Flag = 0) < 1 BREAK
END
/*************************Rest of the columns*******************************/
WHILE 1=1
BEGIN
SET @Column_Name = (SELECT Data_Item FROM #Column_List_Temp
WHERE Flag = 0
AND Field =
(SELECT MIN(Field)
FROM #Column_List_Temp
WHERE Flag = 0))
SET @Column_Name_Fully_Qualified =
(SELECT Data_Item + ' ' + Data_Type + CASE
WHEN Data_Type IN ('varchar',
'nvarchar',
'Char'
)
AND LENGTH IS NOT NULL THEN ' ('
ELSE ''
END + CASE
WHEN Data_Type IN ('varchar',
'nvarchar',
'Char'
)
AND LENGTH IS NOT NULL THEN CONVERT(VARCHAR,LENGTH)
ELSE ''
END + CASE
WHEN Data_Type IN ('varchar',
'nvarchar',
'Char'
)
AND LENGTH IS NOT NULL THEN ')'
ELSE ''
END
FROM #Column_List_Temp
WHERE Data_Item = @Column_Name)
SET @ColumnSQLStatement = 'ALTER TABLE ' + @Database_Name + '.staging.' + '' + @Table_Name + ' ADD ' + @Column_Name_Fully_Qualified
--PRINT @ColumnSQLStatement
EXEC sp_executeSQL @ColumnSQLStatement
UPDATE #Column_List_Temp
SET Flag = 1
WHERE Data_Item = @Column_Name
IF (SELECT COUNT(*) FROM #Column_List_Temp WHERE Flag = 0) < 1 BREAK
END
END
GO
SQL Load Staging Table BULK INSERT TEXT FILES (Execute SQL Task)
This task executes a stored procedure to bulk insert data to the staging table.
- SQLStatement: usp_load_staging_table ?,?,?,?
- Parameters:
- User::File_Name
- User::Table_Name
- User::Landing_Folder_Path
- User::Database_Name
Below is the stored procedure:
/* =============================================
-- Author:
-- Create date:
-- Description:
This procedure loads staging tables
-- Revision History:
-- =============================================*/
CREATE PROCEDURE [dbo].[usp_load_staging_table]
@File_Name VARCHAR(255), --= 'My_Test_Name_bulk',
@Table_Name VARCHAR(255), --= 'My_Test_Name',
@Landing_Folder_Path VARCHAR(255), --= '\my_server\my_drive\my_folder\sub_folder\Landing_Folder\,
@Database_Name VARCHAR(100) --='My_Database'
AS
BEGIN
SET NOCOUNT ON;
--DECLARE @File_Name VARCHAR(255) = 'My_Test_Name_bulk'
--DECLARE @Table_Name VARCHAR(255) = 'My_Test_Name'
--DECLARE @Landing_Folder_Path VARCHAR(255) = '\my_server\my_drive\my_folder\sub_folder\Landing_Folder\
--DECLARE @Database_Name VARCHAR(100) ='My_Database'
DECLARE @LoadTempTableSQLStatement NVARCHAR(4000)
SET @LoadTempTableSQLStatement =
'BULK INSERT ' + @Database_Name + '.staging.' + @Table_Name + '
FROM ''' + @Landing_Folder_Path + @File_Name + '.txt''
WITH
(
FIRSTROW = 1 --(start on row 1)
,DATAFILETYPE = ''char''
--,FIELDTERMINATOR = ''"\"'' --use t if Tab Delimited and , if comma delimited
,FIELDTERMINATOR = ''|'' --use t if Tab Delimited and , if comma delimited
,ROWTERMINATOR = ''\n''
,KEEPNULLS
)'
--print @LoadTempTableSQLStatement
exec sp_executeSQL @LoadTempTableSQLStatement
END
GO
SQL Generate SQL Statement to Load Final Destination Table (Execute SQL Task)
This task executes a stored procedure that generates a SQL script which pulls data from the staging table to the final destination table. One thing to note about this task is the Result Set. Below are the details of the settings within the task:
- ResultSet: Full result set
- SQLStatement: exec usp_dynamically_create_script_to_load_destination_table ?,?,?
- Result Set Variable: User::SQL_Statement_to_Load_Final_Destination_Table_OBJECT
Below is the stored procedure. Please note that this stored procedure will convert data from VARCHAR to the data types defined in the final destination table.
/* =============================================
-- Author:
-- Create date:
-- Description:
This procedure dynamically creates scripts to load data from staging to destination tables
-- Revision History:
-- =============================================*/
CREATE PROCEDURE [dbo].[usp_dynamically_create_script_to_load_destination_table]
@Table_Name VARCHAR(255) --= 'My_Test_Name'
,@Project_Build_Version VARCHAR(50) --= '2.9.0.0'
,@Database_Name VARCHAR(100) --='My_Database'
AS
BEGIN
SET NOCOUNT ON;
--DECLARE @Project_Build_Version VARCHAR(250) = '2.9.0.0'
--DECLARE @Table_Name VARCHAR(255) = 'My_Test_Name'
--DECLARE @Database_Name VARCHAR(100) ='My_Database'
DECLARE @CreateSQLStatement NVARCHAR(4000)
SET @CreateSQLStatement =
'SELECT SQLStatement = ''INSERT INTO ' + @Database_Name + '.something_schema.' + @Table_Name + ' SELECT '' + STUFF(
(SELECT '', '' + CASE
WHEN CONVERT(VARCHAR,CHARACTER_MAXIMUM_LENGTH) IS NOT NULL THEN + '' CONVERT('' + Data_type + ''('' + CONVERT(VARCHAR,CHARACTER_MAXIMUM_LENGTH) + ''),'' + COLUMN_NAME + '') AS '' + COLUMN_NAME
WHEN CONVERT(VARCHAR,CHARACTER_MAXIMUM_LENGTH) IS NULL
AND DATA_TYPE IN (''datetime'',''date'') THEN + '' CONVERT(datetime, '' + COLUMN_NAME + '',101) AS '' + COLUMN_NAME
WHEN CONVERT(VARCHAR,CHARACTER_MAXIMUM_LENGTH) IS NULL
AND DATA_TYPE NOT IN (''datetime'',''date'',''text'')
AND DATA_TYPE NOT LIKE ''%int%'' THEN + '' CONVERT('' + Data_type + '','' + COLUMN_NAME + '') AS '' + COLUMN_NAME
WHEN CONVERT(VARCHAR,CHARACTER_MAXIMUM_LENGTH) IS NULL
AND DATA_TYPE LIKE ''%int%'' THEN + '' CONVERT(bigint,'' + COLUMN_NAME + '') AS '' + COLUMN_NAME
ELSE COLUMN_NAME
END
FROM ' + @Database_Name + '.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = ''' + @Table_Name + '''
AND TABLE_SCHEMA = ''something_schema''
ORDER BY ORDINAL_POSITION
FOR XML PATH('''')),1,1,'''') + '' FROM ' + @Database_Name + '.staging.' + @Table_Name + ''''
--print @CreateSQLStatement
exec sp_executeSQL @CreateSQLStatement
END
GO
FELC Load Final Destination Table (For Each Loop Container)
The purpose of this container is to 'dispense' the query that has been dynamically created in the preceding task so that data can be loaded to the final destination. At this point that query is help in a variable of Object data type. The following settings should be applied to the For Each Loop Container:
- Enumerator: Foreach ADO Enumerator
- ADO object source variable: User::SQL_Statement_to_Load_Final_Destination_Table_OBJECT
- Enumerator mode: Rows in the first table
- VARIABLE MAPPINGS: User::SQL_Statement_to_Load_Final_Destination_Table (NB: you need to create this SSIS variable beforehand)
SQL Load Final Destination Table
This task loads the final destination table using the query generated in the “SQL Generate SQL Statement to Load Final Destination Table” Task. This query is executed via SSIS variable User::SQL_Statement_to_Load_Final_Destination_Table. Take note of the following settings:
- SQL_Statement_to_Load_Final_Destination_Table variable expression: None. The variable is loaded with a query that is ready to be excuted.
The remaining two tasks can be easily completed by referring to the relevant section on the book online to get information about how to use them.
See my other article entitled "SSIS Hints & Tips: Handling Flat Files" on http://social.technet.microsoft.com/wiki/contents/articles/23883.ssis-hints-tips-handling-flat-files.aspx