SSIS Merge Join Transformation
[[articles:SQL Server Integration Services (SSIS)|Integration Services]] uses transformations to manipulate data during an [[articles:Extract Transform Load (ETL)|ETL]] dataflow. Transformations can be used in [[articles:SSIS Dataflow Task|Data Flow Tasks]], between data source and destination components, or other transformations.
Combining data from separate sources is quite often required by the logic of the [[articles:Extract Transform Load (ETL)|ETL process]]. This can be achieved if both data have a common set of columns which can be used as a key to combine (or join) these data flows. The Merge Join transformation can combine data from different sources. It is different from the [[articles:SSIS Lookup Transformation|Lookup transformation]], since it might produce more, or even less records as its inputs have: it will put all matching combinations to its output.
The Merge Join transformation supports two inputs and a standard output. It does not support an error output - since it transforms only the data flow, not the data values. If no match found for some of the input rows, it's not treated as an error.
It's important that passing sorted data as input to a Merge Join transformation is mandatory, and the metadata (column data types, lenghts and/or precision) must match on both side as well, in the same order. Otherwise the Merge Join will raise an error.
The [[articles:SSIS Sort Transformation|Sort transformation]] can help in this, or it's possible to give an explicit hint with changing the data flow metadata if it's a known fact that both inputs are sorted correctly.
The Merge Join transformation can execute three kind of joins: Full outer, Left outer or Inner join. Right join is not supported, simply because it's easy to swap inputs with just a button click in the Merge Join Transformation Editor:
The Swap Inputs button in the Merge Join Transformation Editor
Full outer join creates an output which includes all rows from both input, possibly combined with rows from the other one as well. If no match found from the other input, a single row will be written to the output with NULL values in the corresponding columns.
Left outer join takes all rows from the left input, and tries to find matches from the input on the right side. If no match found, then similarly to the Full outer join, NULL values will be passed to the output in the columns which should get data from the right input. Any matching combinations will be written to the output otherwise. The output row count is always greater than or equal to the left input's row number.
Inner join passes only matching combinations to the output.
However only these three join types are selectable in the editor, it's possible to construct a cross join as well: using a [[articles:SSIS Derived Column transformation|Derived Column transformation]], add a new column for both inputs before the Merge Join transformation with the same data type (a DT_BOOL, DT_UI1 or DT_I1 is small enough), with the same value of True or 1. Sort the data flows by these columns using the [[articles:SSIS Sort Transformation|Sort transformation]]. Then combine the data flows joined by these columns in a Merge Join.
The Merge Join transformation is a partially blocking (asynchronous) one: the output does not reuse existing buffers from the inputs.
For more detailed information on this transformation, please refer to this MSDN article.
For the complete list of SSIS transformations, check [[articles:SSIS: List of Transformations|this Wiki article]].