Επεξεργασία

Κοινή χρήση μέσω


partition operator

Applies to: ✅ Microsoft FabricAzure Data ExplorerAzure MonitorMicrosoft Sentinel

The partition operator partitions the records of its input table into multiple subtables according to values in a key column. The operator runs a subquery on each subtable, and produces a single output table that is the union of the results of all subqueries.

This operator is useful when you need to perform a subquery only on a subset of rows that belongs to the same partition key, and not query the whole dataset. These subqueries could include aggregate functions, window functions, top N and others.

The partition operator supports several strategies of subquery operation:

  • Native - use with an implicit data source with thousands of key partition values.
  • Shuffle - use with an implicit source with millions of key partition values.
  • Legacy - use with an implicit or explicit source for 64 or less key partition values.

Syntax

T | partition [ hint.strategy=Strategy ] [ Hints ] by Column ( TransformationSubQuery )

T | partition [ hint.strategy=legacy ] [ Hints ] by Column { SubQueryWithSource }

Learn more about syntax conventions.

Parameters

Name Type Required Description
T string ✔️ The input tabular source.
Strategy string The value legacy, shuffle, or native. This hint defines the execution strategy of the partition operator.

If no strategy is specified, the legacy strategy is used. For more information, see Strategies.
Column string ✔️ The name of a column in T whose values determine how to partition the input tabular source.
TransformationSubQuery string ✔️ A tabular transformation expression. The source is implicitly the subtables produced by partitioning the records of T. Each subtable is homogenous on the value of Column.

The expression must provide only one tabular result and shouldn't have other types of statements, such as let statements.
SubQueryWithSource string ✔️ A tabular expression that includes its own tabular source, such as a table reference. This syntax is only supported with the legacy strategy. The subquery can only reference the key column, Column, from T. To reference the column, use the syntax toscalar(Column).

The expression must provide only one tabular result and shouldn't have other types of statements, such as let statements.
Hints string Zero or more space-separated parameters in the form of: HintName = Value that control the behavior of the operator. See the supported hints per strategy type.

Supported hints

Hint name Type Strategy Description
hint.shufflekey string shuffle The partition key used to run the partition operator with the shuffle strategy.
hint.materialized bool legacy If set to true, will materialize the source of the partition operator. The default value is false.
hint.concurrency int legacy Determines how many partitions to run in parallel. The default value is 16.
hint.spread int legacy Determines how to distribute the partitions among cluster nodes. The default value is 1.

For example, if there are N partitions and the spread hint is set to P, then the N partitions will be processed by P different cluster nodes equally in parallel/sequentially depending on the concurrency hint.

Returns

The operator returns a union of the results of the individual subqueries.

Strategies

The partition operator supports several strategies of subquery operation: native, shuffle, and legacy.

Note

The distinction between the native and shuffle strategies allows the caller to indicate the cardinality and execution strategy of the subquery. This choice may affect how long the subquery takes to complete but doesn't change the end result.

Native strategy

This strategy should be applied when the number of distinct values of the partition key isn't large, roughly in the thousands.

The subquery must be a tabular transformation that doesn't specify a tabular source. The source is implicit and is assigned according to the subtable partitions. Only certain supported operators can be used in the subquery. There's no restriction on the number of partitions.

To use this strategy, specify hint.strategy=native.

Shuffle strategy

This strategy should be applied when the number of distinct values of the partition key is large, in the millions.

The subquery must be a tabular transformation that doesn't specify a tabular source. The source is implicit and is assigned according to the subtable partitions. Only certain supported operators can be used in the subquery. There's no restriction on the number of partitions.

To use this strategy, specify hint.strategy=shuffle. For more information about shuffle strategy and performance, see shuffle query.

Supported operators for the native and shuffle strategies

The following list of operators can be used in subqueries with the native or shuffle strategies:

Note

  • Operators that reference a table source other than the subtable partitions aren't compatible with the native and shuffle strategies. For example, join, union, externaldata, and evaluate (plugins). For such scenarios, resort to the legacy strategy.
  • The fork operator isn't supported for any strategy type, as the subquery must return a single tabular result.

Legacy strategy

For historical reasons, the legacy strategy is the default strategy. However, we recommend favoring the native or shuffle strategies, as the legacy approach is limited to 64 partitions and is less efficient.

In some scenarios, the legacy strategy might be necessary due to its support for including a tabular source in the subquery. In such cases, the subquery can only reference the key column, Column, from the input tabular source, T. To reference the column, use the syntax toscalar(Column).

If the subquery is a tabular transformation without a tabular source, the source is implicit and is based on the subtable partitions.

To use this strategy, specify hint.strategy=legacy or omit any other strategy indication.

Note

An error will occur if the partition column, Column, contains more than 64 distinct values.

Examples

Find top values

In some cases, it's more performant and easier to write a query using the partition operator than using the top-nested operator. The following query runs a subquery calculating summarize and top for each State starting with W: "WYOMING", "WASHINGTON", "WEST VIRGINIA", and "WISCONSIN".

StormEvents
| where State startswith 'W'
| partition hint.strategy=native by State 
    (
    summarize Events=count(), Injuries=sum(InjuriesDirect) by EventType, State
    | top 3 by Events 
    ) 

Output

EventType State Events Injuries
Hail WYOMING 108 0
High Wind WYOMING 81 5
Winter Storm WYOMING 72 0
Heavy Snow WASHINGTON 82 0
High Wind WASHINGTON 58 13
Wildfire WASHINGTON 29 0
Thunderstorm Wind WEST VIRGINIA 180 1
Hail WEST VIRGINIA 103 0
Winter Weather WEST VIRGINIA 88 0
Thunderstorm Wind WISCONSIN 416 1
Winter Storm WISCONSIN 310 0
Hail WISCONSIN 303 1

Native strategy

The following query returns the top 2 EventType values by TotalInjuries for each State that starts with 'W':

StormEvents
| where State startswith 'W'
| partition hint.strategy = native by State
    (
    summarize TotalInjueries = sum(InjuriesDirect) by EventType
    | top 2 by TotalInjueries
    )

Output

EventType TotalInjueries
Tornado 4
Hail 1
Thunderstorm Wind 1
Excessive Heat 0
High Wind 13
Lightning 5
High Wind 5
Avalanche 3

Shuffle strategy

The following query returns the top 3 DamagedProperty values foreach EpisodeId and the columns EpisodeId and State.

StormEvents
| partition hint.strategy=shuffle by EpisodeId
    (
    top 3 by DamageProperty
    | project EpisodeId, State, DamageProperty
    )
| count

Output

Count
22345

Legacy strategy with explicit source

The following query runs two subqueries:

  • When x == 1, the query returns all rows from StormEvents that have InjuriesIndirect == 1.
  • When x == 2, the query returns all rows from StormEvents that have InjuriesIndirect == 2.

The final result is the union of these two subqueries.

range x from 1 to 2 step 1
| partition hint.strategy=legacy by x {StormEvents | where x == InjuriesIndirect}
| count 

Output

Count
113

Partition reference

The following example shows how to use the as operator to give a "name" to each data partition and then reuse that name within the subquery. This approach is only relevant to the legacy strategy.

T
| partition by Dim
(
    as Partition
    | extend MetricPct = Metric * 100.0 / toscalar(Partition | summarize sum(Metric))
)