BizTalk: Sorting and Grouping Flat File Data In SQL Instead of XSL
Introduction
A situation that comes up regularly is receiving multiple individual entities in a tabular (not hierarchical) format, such as CSV or other flat file formats.
In this situation, the entity breaks can’t be inferred by structure, instead, it has to be done by value.
A popular way of addressing this is regrouping in XSL using the Muenchian Method or similar technique. This is perfectly fine and does work, but one can credibly argue that XSL is not the best language for this and in the BizTalk Stack, we definitely have a serious challenger, namely SQL.
In this article, we will demonstrate how to regroup an untagged tabular dataset coming from CSV into individual entities using SQL instead of XSL.
The sample project can be found in the MSDN Code Gallery at: BizTalk: Sorting and Grouping Flat File Data In SQL…Instead of XSL
The Source Table
We receive Purchase Order data from an external source in CSV in the format:
PONumber1,2018-02-02T15:29:49.480,1,ItemID01,2
PONumber1,2018-02-02T15:29:49.480,2,ItemID02,2
PONumber2,2018-02-02T15:29:49.480,1,ItemID01,2
While it’s easily readable that there are two POs, the lack of line tags means the Flat File Disassembler cannot distinguish where PO 1 ends and PO 2 begins.
SQL You Already Know
The lack of line tags is a problem for the Flat File Disassembler, but not for SQL.
To have SQL Server regroup the data, we need nothing more than a Stored Procedure called by the WCF SQL Adapter using a User Defined Table Type. No permanent storage or other SQL artefacts are necessary.
First, we create the User Defined Table Type to represent CSV data:
CREATE TYPE [dbo].[POCSV] AS TABLE(
[PONumber] [varchar](50) NULL,
[PODate] [datetime] NULL,
[POLineNumber] [int] NULL,
[ItemID] [varchar](50) NULL,
[ItemQuantity] [int] NULL
)
GO
The sample is very simple, but it contains all the fields we would need to group by PO and Lines.
Next, we use a FOR XML Stored Procedure taking the User Defined Table Type as a parameter.
CREATE PROCEDURE [dbo].[GroupPOsFromCSV]
-- Add the parameters for the stored procedure here
@POCSV as POCSV READONLY
AS
BEGIN
SET NOCOUNT ON;
SELECT DISTINCT PurchaseOrder.PONumber, PurchaseOrder.PODate, POLine.ItemID, POLine.ItemQuantity
FROM (SELECT DISTINCT PONumber, PODate FROM @POCSV) AS PurchaseOrder
LEFT JOIN @POCSV AS POLine ON PurchaseOrder.PONumber = POLine.PONumber
FOR XML AUTO, ELEMENTS
END
GO
In the FOR XML query, the grouping by PO and Lines is a natural product of the LEFT JOIN. Compared to an XSL method, this is substantially less code and in nearly all cases, more readable and maintainable.
The BizTalk App
There is nothing special or unusual in the BizTalk app to use SQL for grouping. The app contains some regular BizTalk artefacts:
- Flat File Schema
- Stored Procedure Schema
- Map from Flat File to Stored Procedure
- Stored Procedure Result Envelope and Body Schemas
User Defined Table Types is fully supported by the WCF sqlBinding and appear in the generated Schemas in a predictable pattern. Here, is the Map from the CSV to the Stored Procedure:
Running the Sample
- Download the sample from the MSDN Code Gallery:
- Unzip and open in Visual Studio. Build and Deploy.
- Run the GroupInSQL.sql script on your SQL Server to create the database with the User Defined Table Type and Stored Procedure.
- Create two folders: C:\Samples\GroupInSQL\IN & C:\Samples\GroupInSQL\OUT. Alternatively, you can modify the ports after import.
- Import GroupInSQLBindings.xml in BT Admin.
- Copy POFlatFile.txt to the IN folder.
- Two .xml file will appear in the OUT folder, one for each PONumber in the Flat File.
<PurchaseOrder>
<PONumber>PONumber1</PONumber>
<PODate>2018-02-02T15:29:49.480</PODate>
<POLine>
<ItemID>ItemID01</ItemID>
<ItemQuantity>2</ItemQuantity>
</POLine>
<POLine>
<ItemID>ItemID02</ItemID>
<ItemQuantity>2</ItemQuantity>
</POLine>
</PurchaseOrder>
Download
The sample project can be found in the MSDN Code Gallery at BizTalk: Sorting and Grouping Flat File Data In SQL…Instead of XSL