sample operator

Applies to: ✅ Microsoft FabricAzure Data ExplorerAzure MonitorMicrosoft Sentinel

Returns up to the specified number of random rows from the input table.

Note

  • sample is geared for speed rather than even distribution of values. Specifically, it means that it will not produce 'fair' results if used after operators that union 2 datasets of different sizes (such as a union or join operators). It's recommended to use sample right after the table reference and filters.
  • sample is a non-deterministic operator, and returns a different result set each time it's evaluated during the query. For example, the following query yields two different rows (even if one would expect to return the same row twice).

Syntax

T | sample NumberOfRows

Learn more about syntax conventions.

Parameters

Name Type Required Description
T string ✔️ The input tabular expression.
NumberOfRows int, long, or real ✔️ The number of rows to return. You can specify any numeric expression.

Examples

The example in this section shows how to use the syntax to help you get started.

The examples in this article use publicly available tables in the help cluster, such as the StormEvents table in the Samples database.

The examples in this article use publicly available tables, such as the StormEvents table in the Weather analytics sample data.

Generate a sample

This query creates a range of numbers, samples one value, and then duplicates that sample.

let _data = range x from 1 to 100 step 1;
let _sample = _data | sample 1;
union (_sample), (_sample)

Output

x
74
63

To ensure that in example above _sample is calculated once, one can use materialize() function:

let _data = range x from 1 to 100 step 1;
let _sample = materialize(_data | sample 1);
union (_sample), (_sample)

Output

x
24
24

Generate a sample of a certain percentage of data

To sample a certain percentage of your data (rather than a specified number of rows), you can use

StormEvents | where rand() < 0.1

Output

The table contains the first few rows of the output. Run the query to view the full result.

StartTime EndTime EpisodeId EventId State EventType
2007-01-01T00:00:00Z 2007-01-20T10:24:00Z 2403 11914 INDIANA Flood
2007-01-01T00:00:00Z 2007-01-24T18:47:00Z 2408 11930 INDIANA Flood
2007-01-01T00:00:00Z 2007-01-01T12:00:00Z 1979 12631 DELAWARE Heavy Rain
2007-01-01T00:00:00Z 2007-01-01T00:00:00Z 2592 13208 NORTH CAROLINA Thunderstorm Wind
2007-01-01T00:00:00Z 2007-01-31T23:59:00Z 1492 7069 MINNESOTA Drought
2007-01-01T00:00:00Z 2007-01-31T23:59:00Z 2240 10858 TEXAS Drought
... ... ... ... ... ...

Generate a sample of keys

To sample keys rather than rows (for example - sample 10 Ids and get all rows for these Ids), you can use sample-distinct in combination with the in operator.

let sampleEpisodes = StormEvents | sample-distinct 10 of EpisodeId;
StormEvents
| where EpisodeId in (sampleEpisodes)

Output

The table contains the first few rows of the output. Run the query to view the full result.

StartTime EndTime EpisodeId EventId State EventType
2007-09-18T20:00:00Z 2007-09-19T18:00:00Z 11074 60904 FLORIDA Heavy Rain
2007-09-20T21:57:00Z 2007-09-20T22:05:00Z 11078 60913 FLORIDA Tornado
2007-09-29T08:11:00Z 2007-09-29T08:11:00Z 11091 61032 ATLANTIC SOUTH Waterspout
2007-12-07T14:00:00Z 2007-12-08T04:00:00Z 13183 73241 AMERICAN SAMOA Flash Flood
2007-12-11T21:45:00Z 2007-12-12T16:45:00Z 12826 70787 KANSAS Flood
2007-12-13T09:02:00Z 2007-12-13T10:30:00Z 11780 64725 KENTUCKY Flood
... ... ... ... ... ...