BizTalk De-batching Options
Introduction
Most developers working with Microsoft Server, BizTalk are aware of the integration patterns, or known as Enterprise Integration Architecture (EAI) patterns, written down by Gregor Hohpe, Bobby Woolf. Patterns vary from message routing to messaging channels. You will find them on their landing page online http://www.enterpriseintegrationpatterns.com/. Within the Message Routing category you will find one of most applied patterns for Message Routing that is the splitter pattern. This pattern describes in short, that a message containing multiple elements is split into single message that can be processed individually. In BizTalk context, it means that a message with multiple records i.e. a batch message is split into single messages, which is also known as de-batching.
Picture 1. Splitter Pattern Diagram.
As a developer, you have a few options when it comes to de-batching in BizTalk. You can choose to process a batch message with a pipeline in a receive port. An incoming message is split into various single messages and published in the Message Box database. Or you can opt for debatching process within an orchestration to have more control over the single messages.
The first approach is also known as envelope de-batching or receive port pipeline de-batching. This is fast, in fact very fast and suitable for message only scenario’s. Back draw of using this implementation is that when something fails in the pipeline than the entire message fails, so less flexibility. To mitigate the risk that with one failure the debatching fails is to configure the pipeline with enabling recoverable interchange. This means the entire message (interchange) will be processed completely even if one or more single messages in the interchange fails at the disassembly or validation stage of the receive pipeline, or when a map executes on a single message.
The second approach would mean including an orchestration with XPath query to split a message into multiple single messages. This way you have more control over each individual message, and you can do sequential, ordered processing. However, the back draw of this implementation is that, when message increases in size the performance will decrease quickly and require a lot of system resources.
An alternative to second approach would be to do a pipeline call in an orchestration that would split the message for you. In fact, you combine the first and second approach into one. The benefit will be the flexibility of handling individual messages as you iterate through the batch message (interchange). However, the orchestration has to be in atomic scope or Transaction of the orchestration has to be atomic and it does not support recoverable interchange. The latter is available in envelope de-batching on messaging level.
Alternative to first approach can also be to use a map instead of a (custom) pipeline. Note that the described first two approaches can be done in BizTalk Server 2004 and up. Before you start your design and build you will learn the requirements first as size, frequency of messages matter regarding performance and the capacity of your platform. Before landing your de-batching solution or your complete solution including a de-batching part(s) you need to know the behavior and the performance.
Envelope de-batching
The simple and fast way of implementing the splitter pattern with BizTalk is envelope de-batching or receive port pipeline de-batching. You create two schema’s for batch message i.e. interchange. An envelope schema and document schema. You start with the document schema of the individual message you require. Next you create the envelope schema that imports the document schema.
Let’s consider a classic example of a message that is a collection of multiple single orders that is required to be split into single messages. Each message can be processed individually. For creating schema’s you can right click your BizTalk project and click add then new item. You select schema and specify a name for it. A schema for individual order could look like below.
http://i208.photobucket.com/albums/bb152/Steef-Jan/Splitter%20Pattern/OrderSchema_zpsjidrhkpr.png
Picture 2. Document Schema.
For the envelope schema you will need to set the envelope property in the schema to “Yes” and import the document schema through Imports.
http://i208.photobucket.com/albums/bb152/Steef-Jan/Splitter%20Pattern/OrdersSchema_zpsdjfqjokz.png
Picture 3. Envelope Schema.
In the Orders Envelope schema you only have to change the Body XPath to in this case:
/*[local-name()='Orders' and namespace-uri()='http://Envelope.Debatching.Schemas.Orders']
To create a unit test to see the behavior of the de-batching can be rather complicated. However there is a rather quick way to test the pipeline behavior, in the folder <install directory>:\Program Files (x86)\Microsoft BizTalk Server 2013 R2\SDK\Utilities\PipelineTools you can find a few useful pipeline test tools. XmlDasm.exe can be used to see if the de-batching with schema’s and an instance of batch (interchange) behaves as desired.
http://i208.photobucket.com/albums/bb152/Steef-Jan/Splitter%20Pattern/XmlDasm_zpsbci3hkhm.png
Picture 4. XMLDasm command line tool.
The XmlDasm.exe runs the XML disassembler component, directly invoking by emulating a receive pipeline to enable you to see how it will parse, disassemble, or your XML document (envelope) into one or more XML documents.
Picture 5. XMLDasm output.
The process at run-time would be as follows. The message that is received in a receive port with a receive location tied to it. The message will enter the custom pipeline and subsequently enter the pipeline stages. At the Disassembler stage the messaging engine will retrieve the root node name and target namespace of the document and construct the message type. A check by means of a query to the management database (BiztalkMgmtDb) to retrieve the schema will be done. The schema will be examined for any promoted properties and/or distinguished fields and whether it is an envelope message. In this case the Orders schema is an envelope schema. Therefore, the engine will strip out each child node from the Body XPath node in incoming message Orders and construct an Order message, which is subsequently gone through a similar process. The engine will retrieve the based on the message type of the Order message the schema from the BiztalkMgmtDb, and promote any properties and so on.
When a message is picked up through let’s say through a receive location with a file adapter, the Xml Disassembler will receive the message. The Xml Disassembler will retrieve the root node name and target namespace of the document and construct the message type. It will then query the BizTalk management database (BiztalkMgmtDb) to retrieve the schema. It will check the schema for any promoted properties and/or distinguished fields and then check if it is an envelope message. If it is an envelope it will strip out each child node from the Body XPath node in the original message and constructs a new message for each. The process of determining the message type of each de-batched message and retrieving the schema from the BiztalkMgmtDb will be repeated, promoting any properties, and so on. This is basically how envelope de-batching works within a pipeline.
An even simpler approach to would be to simply use the out of the box XmlReceive Pipeline. Specifying the DocumentSpecNames and EnvelopeSpecNames.
Picture 6. XML Receive Pipeline configuration
The process and behavior of de-batching with the custom pipeline or XmlReceive is the same. Why would you even want to create a custom pipeline for de-batching? You probably will not have to use a custom pipeline component, yet there may be required that would lead you to it. Assume you have to after the message is split in another pipeline component do further pre-processing. The control over your message on a pipeline level can be a driver to do the splitting of messages with a custom pipeline.
Orchestration de-batching
You can also de-batch a message inside an orchestration as mentioned in the introduction. There are some drawbacks to this approach you have consider when going for that approach:
- Performance as each message can result in an additional persistence point when it is sent to the MessageBox.
- If one message fails, than the subsequent messages are not sent.
- Additional complexity with XPath queries.
Regardless the drawbacks, there may be a scenario that you require more control over your messages. As long as you do not do anything transaction and have appropriate error handling and/or retry mechanism in place you can go for this approach.
As a batch message enters an orchestration an XPath query will be required to determine the number of records i.e. single messages, and an XPath query to retrieve an individual message. An orchestration that splits messages based on an XPath Query can look like below
Picture 7. Orchestration for de-batching messages.
The GetOrderCount expression shape contains an XPath query to determine the number of single orders in an order message. The query is in this example:
dOrderCount = System.Convert.ToDouble(xpath(msgOrders.MessagePart, "count(/*[local-name()='Orders' and namespace-uri()='http://Envelope.Debatching.Schemas.Orders']/*[local-name()='Order' and namespace-uri()='http://Envelope.Debatching.Schemas.Order'])"));
The query inside the Assign Message shape to extract a single order is in this example:
sXPathQuery = System.String.Format("/*[local-name()='Orders' and namespace-uri()='http://Envelope.Debatching.Schemas.Orders']/*[local-name()='Order' and namespace-uri()='http://Envelope.Debatching.Schemas.Order'][{0}]",dLoopCount);
msgOrder = xpath(msgOrders.MessagePart, sXPathQuery);
Splitting inside the orchestration is based on XPath queries determining the number of messages and extraction out of the batch message. The loop inside orchestration will iterate as many times as determined number i.e. number of orders. Within each iteration the loop will be evaluated if it meets the condition (i.e. dLoopCount <= dOrderCount). If does not, then the query will retrieve a record from the message, which will be assigned to the newly created message that can be sent to be processed or processed within the orchestration. To mitigate the risk of a single message breaking the orchestration you can revert to a first approach, where you could leverage the recoverable interchange feature.
Calling a pipeline inside an orchestration
Splitting a message by envelope de-batching, with XMLReceive pipeline or inside an orchestration was possible since 2004. With the release of BizTalk 2006, a feature was introduced by calling a pipeline from within an orchestration. Calling a pipeline can make your orchestration more efficient by minimizing the interactions between the orchestration and the MessageBox database. Through a programmatic interface inside pipelines you can call to receive and send pipelines directly from an orchestration. This means I can use a pipeline within an orchestration to split messages. You can more than splitting a message using the pipeline. The ability to call a pipeline in an orchestration provide you the means to use all the capabilities of the BizTalk receive and send pipeline (out-of the box or custom). The approach with a pipeline is as follows:
- Bottom of FormSet a reference to Microsoft.XLANGs.Pipeline assembly in the orchestration project
- Create a variable with the type Microsoft.XLANGs.Pipeline.ReceivePipelineOutputMessages
- In an expression shape following the receive message shape you place code like:
receivePipelineOrders = Microsoft.XLANGs.Pipeline.XLANGPipelineManager.ExecuteReceivePipeline(typeof(Envelope.Debatching.Pipelines.ReceivePipelineOrders), msgOrders);
- Subsequently place a loop under the expression shape with the following condition:
receivePipelineOrders.MoveNext()
- With the loop you place an assign message shape containing code like:
msgOrder = new System.Xml.XmlDocument(); receivePipelineOrders.GetCurrent(msgOrder);
- After the assign shape you can place logic to process the message in the desired way or send it to the MessageBox.
The orchestration has the following shape, which is similar to second approach yet less complex as it has no XPath queries.
Picture 8. Orchestration calling pipeline to de-batch.
When the message is received by BizTalk it will be routed by to the subscribing orchestration. Within orchestration pipeline is executed within expression shape. To be able to call pipelines a reference must be set to Microsoft.XLANGs.Pipeline assembly. This assembly contains the Microsoft.XLANGs.Pipeline.XLANGPipelineManager class that has two methods that can be called:
- ExecuteReceivePipeline(System.Type, Microsoft.XLANGs.BaseTypes.XLANGMessage)
- ExecuteSendPipeline(System.Type, Microsoft.XLANGs.Pipeline.SendPipelineInputMessages, Microsoft.XLANGs.BaseTypes.XLANGMessage)
In the described scenario the ExecuteReceivePipeline will be called in the Expression Shape. A variable with reference to Microsoft.XLANGs.Pipeline.ReceivePipelineOutputMessages enables you to call the ExecuteReceivePipeline by providing in the method the input message and type of single message you expect. In the loop the method MoveNext () is called, which provides the ability to iterate through the incoming message.
There are some limitations regarding calling a pipeline in an orchestration. You cannot on the receive pipeline provide a per instance configuration as it must run in an atomic scope and it also does not support recoverable interchanges. Yet the benefit of this implementation is that by calling a pipeline you can make your orchestration more efficient, since you minimize the interactions between the orchestration and the MessageBox database. Finally the orchestration can handle multiple messages returned from the pipeline and you can use enumeration to process each message (by MoveNext method).
Alternative to de-batching using a pipeline is having the splitting done using a map or XSLT. This again can give you more control. You could opt for this approach in case you have specific requirements for splitting the message. For instance not splitting into individual messages, but first into groups. Subsequently you would then de-batch the message.
Summary
This article went in-depth to several approaches for implementing the EAI Splitter Pattern with BizTalk Server. The pattern is one of the widely known integration patterns. Within the BizTalk world the pattern is useful when it comes down to de-batching a message into single messages. Based on requirements you can opt for one of the approaches as long as you take the pros and cons into consideration. These have been described in the article. De-batching (splitting) a message using the out-of the box XmlReceive pipeline is the simplest way, however in case you need to have more control than either a custom pipeline or an orchestration can be preferred approach. Emphasis here is can be. You could also store the single message first on file or database and then continue processing of each individual message. The options regarding splitting are divers and you should decide on what is the best fit-for purpose.
Resources
This is not the first article on splitter pattern in BizTalk as there are quite a few available online. The best resources in my view are:
- Flat file: Debatching Large Messages and Extending Flatfile Pipeline Disassembler Component in Biztalk 2006
- Custom pipeline approach: BizTalk Custom Receive Pipeline Component to Batch Messages
- WCF-SQL: Debatching Inbound Messages from BizTalk WCF SQL Adapter
- Calling Pipeline in an Orchestration: How to Use Expressions to Execute Pipelines
- Performance: Debatching Options and Performance Considerations in BizTalk 2004
- Group & De-batch: Sorting, Grouping, and Debatching Using BizTalk Messaging
See Also
Another important place to find an extensive amount of BizTalk related articles is the TechNet Wiki itself. The best entry point is BizTalk Server Resources on the TechNet Wiki.