OVER Expression (U-SQL)

Artigo
03/10/2017

Summary

A windowing expression is an expression whose value is computed by applying for every row a window function to multiple row values of a column (the window defined by the OVER operator) instead of just the column’s value of the row.

Note

The OVER expression is sometimes referred to as an OVER Clause to make it search discoverable.

The OVER operator allows to specify a partition or window of column data that the window functions operate on. The windows provide the ability to access data from previous rows without having to use a self-join and provide both cumulative and sliding windows. It thus provides the ability to compute aggregated values such as moving averages, cumulative aggregates, running totals, or a top N per group results.

It is similar to aggregation over grouping but instead of grouping the whole rowset with GROUP BY and then returning a single aggregated value for each group, the window defines a set of rows for a particular column in the SELECT clause that will be aggregated for each row. In the GROUP BY case, the result will be a row per group, with windowing expressions, the number of rows returned by the SELECT will not be affected by the windows.

Windowing expressions are only supported inside expressions in a SELECT FROM clause. More than one windowing expression can be used in a single query with a single FROM clause. The OVER operator for each function can differ in partitioning and ordering.

A windowing expression can never be used if a GROUP BY clause is present.

Syntax

Windowing_Expression :=                                                                                  
     Window_Function_Call 'OVER' '('
          [ Over_Partition_By_Clause ]
          [ Order_By_Clause ]
          [ Row_Clause ]
     ')'.

Remarks

Window_Function_Call
The window function that is being applied to the window.

Syntax

  Window_Function_Call :=                                                                             
        Aggregate_Function_Call
  |     Analytic_Function_Call
  |     Ranking_Function_Call.

Window functions can be one of the following:

Aggregate Functions such as SUM or MAX.
Analytic Functions such as FIRST_VALUE or LAST_VALUE.
Ranking Functions such as RANK or ROW_NUMBER.

Aggregation functions applied to a window cannot be used with DISTINCT.

OVER ( … )
The OVER operator that specifies the window with the following components. Note that certain windowing functions have certain requirements with respect to the presence or absence of some if these components. These requirements are explained in the relevant section describing that function.
Over_Partition_By_Clause
The OVER operator’s optional partition clause defines the window by partitioning the rowset. The window function is applied to each partition separately and computation restarts for each partition.

Syntax

  Over_Partition_By_Clause :=                                                                         
       'PARTITION' 'BY' Expression_List.

The data can be partitioned according to a list of scalar expressions. The result type of each of the expression has to return an equality comparable type or an error is raised. Most commonly, the partition expressions refer to the rowset’s columns (as specified by the FROM clause and not the columns from the SELECT clause).

If PARTITION BY is not specified, the function treats all rows of the query result set as a single group.

Order_By_Clause
The OVER operator’s optional ORDER BY clause defines the order of the rows withing each of the windows.

Syntax

  Order_By_Clause :=                                                                                  
       'ORDER' 'BY' Sort_Item_Expression_List.

  Sort_Item_Expression_List :=
       Sort_Item_Expression { ',' Sort_Item_Expression }.

  Sort_Item_Expression :=
       expression [Sort_Direction].

The syntax and semantics follows the normal ORDER BY clause. For window functions that depend on the order such as the ranking functions, this order by clause specifies the logical order in which the window function calculation is performed.

If the ORDER BY clause is not specified, then the window is not ordered.

Row_Clause
The OVER operator’s optional ROWS clause further limits the rows within a partition by specifying fixed number of rows preceding or following the current row. Preceding and following rows are defined based on the ordering in the ORDER BY clause. Therefore, the ROWS clause requires that the ORDER BY clause be specified.

Syntax

  Row_Clause :=                                                                                       
       'ROWS' Window_Frame_Extent.

If the ROWS clause is not specified but ORDER BY is specified, ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is used as the default for the window frame.

The ROWS clause is incompatible with the DISTINCT aggregate option and an error is raised.

Window_Frame_Extent

The window frame extent specifies the rows specifying a lower bound and an upper bound by either just specifying the preceding rows from the current row or providing a frame between two explicitly specified end points.

Syntax

    Window_Frame_Extent :=                                                                         
         Window_Frame_Preceding | Window_Frame_Between.

For example, ROWS BETWEEN 2 PRECEDING AND CURRENT ROW means that the window of rows that the function operates on is three rows in size, starting with 2 rows preceding until and including the current row.

Window_Frame_Preceding

Defines the window frame relative from the current row. Currently supported are:

Syntax

      Window_Frame_Preceding :=                                                                 
           'UNBOUNDED' 'PRECEDING'
      |    unsigned_integer_literal 'PRECEDING' 
      |    'CURRENT' 'ROW'.

With the following semantics:

UNBOUNDED PRECEDING
Specifies that the window starts at the first row of the partition. It can only be specified as the window starting point.
unsigned_integer_literal PRECEDING
The nonnegative integer literal specifies the number of rows to precede the current row in the partition. If the value is larger than the available number of preceding rows in the partition, then the window starts at the first row of the partition.
CURRENT ROW
Specifies that the window starts or ends at the current row. It can be specified as both a starting and ending point.
Window_Frame_Between
Specifies the lower (starting) and upper (ending) boundary points of the window.

Syntax

      Window_Frame_Between :=                                                                   
           'BETWEEN' Window_Frame_Bound
           'AND' Window_Frame_Bound.

      Window_Frame_Bound :=
           Window_Frame_Preceding
      |    Window_Frame_Following.

The first frame bound specifies the lower, the second frame bound the upper boundary (inclusive). If the upper boundary is smaller than the lower boundary, an error is raised. Note that a frame can be all preceding or all following the current row. The frame bounds are specified as follows:

Window_Frame_Preceding
Specifies that the frame starts or ends before the current row as above
Window_Frame_Following
Specifies that the frame starts or ends following the current row:

Syntax

        Window_Frame_Following :=                                                            
             unsigned_integer_literal 'FOLLOWING'
        |    'CURRENT' 'ROW'.

With the following semantics:

unsigned_integer_literal FOLLOWING The nonnegative integer literal specifies the number of rows to follow the current row in the partition. If the value is larger than the available number of following rows in the partition, then last row of the partition is chosen as the boundary.
CURRENT ROW Specifies that the window boundary is at the current row.

Tip

U-SQL does currently not support UNBOUNDED FOLLOWING. One can achieve the same result by inverting the ordering and use UNBOUNDED PRECEDING instead.

Examples

Sample Data

It contains two views that we will use for the sample data.

QueryLog
To access this data use the code below.

querylog = Demo.QueryLog();

data:

Fruit	Quantity	Source
Banana	300	Image
Cherry	300	Image
Durian	500	Image
Apple	100	Web
Fig	200	Web
Papay	200	Web
Avocado	300	Web
Cherry	400	Web
Durian	500	Web

Employees
To access this data use the code below.

employees = Demo.Employees();

data:

EmpID	EmpName	DeptName	DeptID	Salary
1	Noah	Engineering	100	10000
2	Sophia	Engineering	100	20000
3	Liam	Engineering	100	30000
4	Emma	HR	200	10000
5	Jacob	HR	200	10000
6	Olivia	HR	200	10000
7	Mason	Executive	300	50000
8	Ava	Marketing	400	15000
9	Ethan	Marketing	400	10000

Now, let’s walk through windowing functions and aggregations. As we do this the answer to this question will be clear and we’ll also understand the conceptual difference of Windowing Functions from Grouping.

data3 =         
    SELECT EmpName, SUM(Salary) OVER( ) As SalaryAllDepts        
    FROM employees;

Note these things:

We are still doing a SUM() but it is modified by an OVER clause
The OVER clause is empty – there’s nothing between the parentheses
The OVER clause defines the “window” – in this case since it is empty the window is the entire set of rows
Thus you can read SUM(Salary) OVER () as “The sum of Salary across the window of rows”

EmpName	TotalAllDepts
Noah	165000
Sophia	165000
Liam	165000
Emma	165000
Jacob	165000
Olivia	165000
Mason	165000
Ava	165000
Ethan	165000

When we did the first SUM without any GROUP BY we received the correct total salary 165000. But grouping collapsed that into a single row. Here we see there are as many output rows as input rows. More interestingly the 165000 value is clearly calculated in each row but to calculate it data from EVERY row was used.

Another way of thinking about this is that each output row has knowledge of multiple input rows - in this all of the input rows. The “window” is all the rows. Now let’s try an OVER clause that contains something. In this case by using PARTITION BY DeptName our window is based on DeptName.

data4 =         
    SELECT EmpName, DeptName, SUM(Salary) OVER( PARTITION BY DeptName ) AS SalaryByDept  
    FROM employees;

result:

EmpName	DeptName	SalaryByDept:double
Noah	Engineering	60000
Sophia	Engineering	60000
Liam	Engineering	60000
Mason	Executive	50000
Emma	HR	30000
Jacob	HR	30000
Olivia	HR	30000
Ava	Marketing	25000
Ethan	Marketing	25000

Again, notice that there are the same number of input rows as output rows. However now each row has a Department total - again this is the cause the window for SUM was defined on the DeptName. Key points

grouping collapses input rows
with grouping aggregation happens on the entire set of rows or what is specified by the GROUP BY clause
windowing does not collapse rows
with windowing aggregations happen on the window defined by the OVER clause – either all rows and the rows defined by the window
OVER( ) – defines a window of all rows
OVER( PARTITION BY X ) – defines a window on column X

Now let’s calculate the percentage of salary devoted to each department form the total salary across all departments. We can’t mix Windowing Functions and GROUP BY in the same statement so we’ll build it up like this:

a =         
    SELECT TOP 1 SUM(Salary) OVER( ) AS SalaryAllDepts        
    FROM employees;     

b =           
    SELECT DeptName, SUM(Salary) AS SalaryByDept         
    FROM employees         
    GROUP BY DeptName;     
    
c = 
    SELECT DeptName, SalaryByDept, SalaryAllDepts, (SalaryByDept/SalaryAllDepts) AS Percentage         
    FROM a CROSS JOIN b;

result:

DeptName	SalaryByDept	SalaryAllDepts	Percentage
Engineering	60000	165000	0.363636363636364
Executive	50000	165000	0.303030303030303
HR	30000	165000	0.181818181818182
Marketing	25000	165000	0.151515151515152

Compartilhar via

OVER Expression (U-SQL)

Summary

Syntax

Remarks

Syntax

Syntax

Syntax

Syntax

Syntax

Syntax

Syntax

Syntax

Examples

Sample Data

See Also

Recursos adicionais