Share via


SSIS Percentage Sampling Transformation

[[articles:SQL Server Integration Services (SSIS)|Integration Services]] uses transformations to manipulate data during an [[articles:Extract Transform Load (ETL)|ETL]] dataflow. Transformations can be used in [[articles:SSIS Dataflow Task|Data Flow Tasks]], between data source and destination components, or [[articles:SSIS: List of Transformations|other transformations]].

Sometimes the source data has large volumes, and during development, less rows would be more comfortable to keep the speed of the tests high. The Percentage Sampling transformation can be used for random sampling, to create a small but representative dataset. A percentage of input rows can be selected using Percentage Sampling, defined in the SamplingValue property - which can be overridden with [[SSIS Expression]]s.

From the statistics point of view: the population (and the sampling frame) is the whole input, and the sampling method is simple random sampling - which  is a probability sampling: all element of the population has an equal probability of selection. So note that the count of the selected rows will not be exactly the product of the input row count and the specified percentage: for 100 input rows and 20 as SamplingValue it might mean sometimes 18, sometimes 25. Basically every row has a probability of being selected as it is defined in the SamplingValue property.

If a specified number of the input rows should be selected by a transformation instead, check the [[articles:SSIS Row Sampling Transformation|Row Sampling Transformation]].

This transformation is a row (synchronous) transformation, and supports one input and two outputs: the sample output and the unselected output. It does not support an error output since data values on the outputs do not depend on expression evaluation.

For more detailed information on this transformation, please refer to this MSDN article.
For the complete list of SSIS transformations, check [[articles:SSIS: List of Transformations|this Wiki article]].