Share via


SSIS Tips: A Real Time File Processing Solution

Impetus

One of common scenarios that we come across in a typical data warehouse project is the need of a file processing system which has the ability to process files and get data onto the data warehouse as per a predefined schedule. There was recently a scenario where I came across the requirement for a file processing system which has to work in near real time. The client required and operational reporting system where the data processed from the file have to be available immediately afterwards for presenting in the reports. This is a slightly different scenario where we need to ensure the processing of the files as soon as it arrives. I managed to devise a solution 
for this scenario which I'm sharing through this article.


Illustration

Consider the case of retail environment. Here sales data comes frequently as per a fixed schedule as text files. There was a need of developing a file processing system to get the sales data onto sales tables. Within minimum delay the data has to be made available for operational reports where sales data have to analyzed based on various factors like hourly trend, product based etc

For this example lets the consider a text file of the below structure. For simplicity I'm only including the gist of the columns in the below CREATE TABLE statement

TranDate,ProductID,Qty,TotalAmt
2015-01-20,10020,4,1120.00
2015-03-12,10035,5,980.50
2015-04-23,10012,3,445.00
2015-06-17,10045,3,193.00
2015-10-05,10020,2,560.00

The requirement is to get the file data processed as soon as the file arrives so that data can be made available for a series of operational reports

Solution

For the above requirement the solution implemented consists of creating a FileTable which will point to a folder. This folder will be used as the landing folder for the files.
There are few prerequisites that need to be done for setting it up.
The steps can be given as below

  1. Enable FILESTREAM support at the server instance level
  2. Create filegroup and files for holding the FILESTREAM data for the database
  3. Specify a directory for FILESTREAM data at the database level. Configure it for non Transact SQL access
  4. Create a filetable using CREATE TABLE syntax in the database which points to our landing folder
    The script would look like this
CREATE TABLE  [dbo].[MyFileExtracts] AS FILETABLE ON [PRIMARY] FILESTREAM_ON [FilestreamFileGroup]
WITH
(
FILETABLE_DIRECTORY = N'DirectoryName'
)
  1. Create a SSIS package which will do data processing for the files which are arriving in the landing folder.
    The package will look like the below

As seen from the above the package will include a simple control flow which includes a Data Flow Task and a File System Task. The Data Flow Task will transfer the data from the incoming file to the table. The File System Task is used to archive the processed file.

  1. A SQL Agent job would be setup to execute the package. The SQL Service account should be given access to the file table directory location. Alternatively you can also create a proxy account for the purpose after giving it the required access and then configure the SQL Agent job to use the proxy account.
    The job configuration would look like the below

The job will have a single step which will call and execute the SSIS package that we have created above.You can also add a notification step to the job if you want to make sure the ops team or users get notified once the job execution is complete.

  1. Create a trigger on the above FileTable. This will ensure any new file added to the file table ie the directory will execute the trigger logic written inside.

The trigger for the table will look like this

CREATE TRIGGER  [dbo].[Trg_FileAudit] ON [dbo].[MyFileExtracts]
AFTER INSERT
AS
EXEC msdb.dbo.sp_start_job 'SalesUpload'
  
  
GO

This trigger will call and execute the job created in the previous step whenever there's any INSERT DML event happens on the file table i.e. a new file is being added to the directory

Once the above steps are done we will have a FileTable which will be ready to receive the files. The trigger created on it will ensure whenever a file comes it will automatically execute the code inside which will trigger the SQL Agent job. The job will go ahead and process the files coming into the folder and transfer the data onto the table in the database. Once this is done, the reports which makes use of this table data will be able to display the latest data from the files.

Now lets try putting a file into the directory and see the result

If you compare the above results with the original file details you can confirm that data from file got transferred to database table successfully
We can also check the job run status and package run status by checking the job history from SQL Server Management Studio (SSMS) and also by running the All Executions standard report available from SSMS.

From this we can see that just when the file arrived in the folder, the job got invoked and the package got executed which transferred the data successfully to the tables in the database.


Conclusion

As seen from the above illustration we can create a real time file processing system using a trigger on a filetable which executes a SQL Agent job. The job will call a simply SSIS package with a data flow task which would transfer the data from the file onto the table we want.

References

Enable FILESTREAM
SSIS Tasks
SSIS Sample Package