共用方式為


GROUP BY子句

適用於:核取記號為「是」Databricks SQL 核取記號為「是」Databricks Runtime

GROUP BY 子句是用來根據一組指定的群組表達式來分組數據列,並根據一或多個指定的聚合函數,計算數據列群組的匯總。 Databricks SQL 也支援進階匯總,透過 GROUPING SETSCUBEROLLUP 子句,針對相同的輸入記錄集執行多個匯總。 群組表達式和進階匯總可以在 子句中 GROUP BY 混合,並巢狀在 子句中 GROUPING SETS

如需詳細資訊,請參閱混合/巢狀群組分析一節。

FILTER當子句附加至聚合函數時,只會將相符的數據列傳遞至該函式。

語法

GROUP BY ALL

GROUP BY group_expression [, ...] [ WITH ROLLUP | WITH CUBE ]

GROUP BY { group_expression | { ROLLUP | CUBE | GROUPING SETS } ( grouping_set [, ...] ) } [, ...]

grouping_set
   { expression |
     ( [ expression [, ...] ] ) }

聚合函數定義為

aggregate_name ( [ DISTINCT ] expression [, ...] ) [ FILTER ( WHERE boolean_expression ) ]

參數

  • ALL

    適用於:核取記號為「是」 Databricks SQL 核取記號為「是」 Databricks Runtime 12.2 LTS 和更新版本

    一種速記法可以將所有不包含聚合函數的 SELECT清單表示式新增為 group_expression。 如果沒有這類表達式存在 GROUP BY ALL ,就相當於省略 GROUP BY 導致全域匯總的 子句。

    GROUP BY ALL 不保證會產生一組可解析的群組表達式。 如果產生的子句格式不正確,Azure Databricks 會 引發UNRESOLVED_ALL_IN_GROUP_BYMISSING_AGGREGATION

  • group_expression

    指定將數據列分組在一起的準則。 數據列的分組是根據群組表達式的結果值來執行。 群組表達式可以是資料行名稱,如 GROUP BY a,資料行位置如 GROUP BY 0,或是像 GROUP BY a + b的運算式。 如果 group_expression 包含匯總函數 Azure Databricks,就會引發GROUP_BY_AGGREGATE錯誤。

  • grouping_set

    群組集是由括弧中的零個或多個逗號分隔表達式所指定。 當群組集只有一個專案時,可以省略括弧。 例如,GROUPING SETS ((a), (b))GROUPING SETS (a, b) 相同。

  • 群組集

    GROUPING SETS之後所指定之每個群組集的數據列分組。 例如:

    GROUP BY GROUPING SETS ((warehouse), (product))在語意上相當於和GROUP BY warehouse的結果GROUP BY product聯集。

    這個子句是 UNION ALL 的簡寫,UNION ALL 運算子的每個運算部分會對 GROUPING SETS 子句中指定的每個群組集進行匯總處理。

    同樣地,GROUP BY GROUPING SETS ((warehouse, product), (product), ())語意上相當於 的結果GROUP BY warehouse, productGROUP BY product聯集,以及全域匯總。

注意

針對 Hive 相容性 Databricks SQL 允許 GROUP BY ... GROUPING SETS (...)。 表達式 GROUP BY 通常會被忽略,但如果它們除了表達式之外 GROUPING SETS 包含額外的表達式,額外的運算式將會包含在群組運算式中,而且值一律為 Null。 例如,在 SELECT a, b, c FROM ... GROUP BY a, b, c GROUPING SETS (a, b)中,c 數據行的輸出一律為 null。

  • ROLLUP

    在單一語句中指定多個匯總層級。 這個子句是用來根據多個群組集計算匯總。 ROLLUP 是的 GROUPING SETS速記。 例如:

    GROUP BY warehouse, product WITH ROLLUP
    

    GROUP BY ROLLUP(warehouse, product)
    

    相當於

    GROUP BY GROUPING SETS((warehouse, product),
                           (warehouse),
                           ())
    

    GROUP BY ROLLUP(warehouse, product, (warehouse, location))
    

    相當於

    GROUP BY GROUPING SETS((warehouse, product, location),
                           (warehouse, product),
                           (warehouse),
                           ())
    

    規格的 ROLLUP N 元素會產生 N+1 GROUPING SETS

  • 立方體

    CUBE 子句是用來根據 GROUP BY 子句中指定的群組數據行組合來執行匯總。 CUBE 是的 GROUPING SETS速記。 例如:

    GROUP BY warehouse, product WITH CUBE
    

    GROUP BY CUBE(warehouse, product)
    

    相當於:

    GROUP BY GROUPING SETS((warehouse, product),
                           (warehouse),
                           (product),
                           ())
    

    GROUP BY CUBE(warehouse, product, location)
    

    相當於以下内容:

    GROUP BY GROUPING SETS((warehouse, product, location),
                           (warehouse, product),
                           (warehouse, location),
                           (product, location),
                           (warehouse),
                           (product),
                           (location),
                           ())
    

    規格的 CUBE N 元素會產生 2^N GROUPING SETS

  • aggregate_name

    聚合函數名稱(MIN、MAX、COUNT、SUM、AVG 等)。

  • DISTINCT

    先移除輸入數據列中的重複專案,再傳遞至聚合函數。

  • FILTER

    篩選子句中 評估為 true 的boolean_expression輸入數據列WHERE會傳遞至聚合函數;其他數據列則會捨棄。

混合/巢狀群組分析

GROUP BY 子句可以包含多個 group_expressions、多個 CUBE,以及多個 ROLLUPGROUPING SETS

GROUPING SETS 也可以有巢狀 CUBEROLLUPGROUPING SETS 子句。 例如:

GROUPING SETS(ROLLUP(warehouse, location),
              CUBE(warehouse, location)),
GROUPING SETS(warehouse,
              GROUPING SETS(location,
                            GROUPING SETS(ROLLUP(warehouse, location),
                                          CUBE(warehouse, location))))

CUBEROLLUP 只是的 GROUPING SETS語法糖。

如需如何轉譯和CUBE轉譯ROLLUPGROUPING SETS,請參閱上述各節。

group_expression 在此內容中可以視為單一群組 GROUPING SETS

針對 子句中的GROUPING SETS多個 GROUP BY ,Databricks SQL 會藉由執行原始 GROUPING SETS的交叉乘積來產生單GROUPING SETS一 。

若為 子句中的GROUPING SETS巢狀GROUPING SETS結構,Databricks SQL 會採用其群組集並加以等量。 例如,下列查詢:

GROUP BY warehouse,
         GROUPING SETS((product), ()),
         GROUPING SETS((location, size),
         (location),
         (size),
         ());

GROUP BY warehouse,
         ROLLUP(product),
         CUBE(location, size);

相當於下列專案:

GROUP BY GROUPING SETS( (warehouse, product, location, size),
                        (warehouse, product, location),
                        (warehouse, product, size),
                        (warehouse, product),
                        (warehouse, location, size),
                        (warehouse, location),
                        (warehouse, size),
                        (warehouse))

GROUP BY GROUPING SETS(GROUPING SETS(warehouse),
                       GROUPING SETS((warehouse, product)))`

相當於

GROUP BY GROUPING SETS((warehouse),
                       (warehouse, product))`.

範例

CREATE TEMP VIEW dealer (id, city, car_model, quantity) AS
VALUES (100, 'Fremont', 'Honda Civic', 10),
       (100, 'Fremont', 'Honda Accord', 15),
       (100, 'Fremont', 'Honda CRV', 7),
       (200, 'Dublin', 'Honda Civic', 20),
       (200, 'Dublin', 'Honda Accord', 10),
       (200, 'Dublin', 'Honda CRV', 3),
       (300, 'San Jose', 'Honda Civic', 5),
       (300, 'San Jose', 'Honda Accord', 8);

-- Sum of quantity per dealership. Group by `id`.
> SELECT id, sum(quantity) FROM dealer GROUP BY id ORDER BY id;
  id sum(quantity)
 --- -------------
 100            32
 200            33
 300            13

-- Use column position in GROUP by clause.
> SELECT id, sum(quantity) FROM dealer GROUP BY 1 ORDER BY 1;
  id sum(quantity)
 --- -------------
 100            32
 200            33
 300            13

-- Multiple aggregations.
-- 1. Sum of quantity per dealership.
-- 2. Max quantity per dealership.
> SELECT id, sum(quantity) AS sum, max(quantity) AS max
    FROM dealer GROUP BY id ORDER BY id;
  id sum max
 --- --- ---
 100  32  15
 200  33  20
 300  13   8

-- Count the number of distinct dealers in cities per car_model.
> SELECT car_model, count(DISTINCT city) AS count FROM dealer GROUP BY car_model;
    car_model count
 ------------ -----
  Honda Civic     3
    Honda CRV     2
 Honda Accord     3

-- Count the number of distinct dealers in cities per car_model, using GROUP BY ALL
> SELECT car_model, count(DISTINCT city) AS count FROM dealer GROUP BY ALL;
    car_model count
 ------------ -----
  Honda Civic     3
    Honda CRV     2
 Honda Accord     3

-- Sum of only 'Honda Civic' and 'Honda CRV' quantities per dealership.
> SELECT id,
         sum(quantity) FILTER (WHERE car_model IN ('Honda Civic', 'Honda CRV')) AS `sum(quantity)`
    FROM dealer
    GROUP BY id ORDER BY id;
  id sum(quantity)
 --- -------------
 100            17
 200            23
 300             5

-- Aggregations using multiple sets of grouping columns in a single statement.
-- Following performs aggregations based on four sets of grouping columns.
-- 1. city, car_model
-- 2. city
-- 3. car_model
-- 4. Empty grouping set. Returns quantities for all city and car models.
> SELECT city, car_model, sum(quantity) AS sum
    FROM dealer
    GROUP BY GROUPING SETS ((city, car_model), (city), (car_model), ())
    ORDER BY city;
      city    car_model sum
 --------- ------------ ---
      null         null  78
      null  HondaAccord  33
      null     HondaCRV  10
      null   HondaCivic  35
    Dublin         null  33
    Dublin  HondaAccord  10
    Dublin     HondaCRV   3
    Dublin   HondaCivic  20
   Fremont         null  32
   Fremont  HondaAccord  15
   Fremont     HondaCRV   7
   Fremont   HondaCivic  10
  San Jose         null  13
  San Jose  HondaAccord   8
  San Jose   HondaCivic   5

-- Group by processing with `ROLLUP` clause.
-- Equivalent GROUP BY GROUPING SETS ((city, car_model), (city), ())
> SELECT city, car_model, sum(quantity) AS sum
    FROM dealer
    GROUP BY city, car_model WITH ROLLUP
    ORDER BY city, car_model;
      city    car_model sum
 --------- ------------ ---
      null         null  78
    Dublin         null  33
    Dublin  HondaAccord  10
    Dublin     HondaCRV   3
    Dublin   HondaCivic  20
   Fremont         null  32
   Fremont  HondaAccord  15
   Fremont     HondaCRV   7
   Fremont   HondaCivic  10
  San Jose         null  13
  San Jose  HondaAccord   8
  San Jose   HondaCivic   5

-- Group by processing with `CUBE` clause.
-- Equivalent GROUP BY GROUPING SETS ((city, car_model), (city), (car_model), ())
> SELECT city, car_model, sum(quantity) AS sum
    FROM dealer
    GROUP BY city, car_model WITH CUBE
    ORDER BY city, car_model;
      city    car_model sum
 --------- ------------ ---
      null         null  78
      null  HondaAccord  33
      null     HondaCRV  10
      null   HondaCivic  35
    Dublin         null  33
    Dublin  HondaAccord  10
    Dublin     HondaCRV   3
    Dublin   HondaCivic  20
   Fremont         null  32
   Fremont  HondaAccord  15
   Fremont     HondaCRV   7
   Fremont   HondaCivic  10
  San Jose         null  13
  San Jose  HondaAccord   8
  San Jose   HondaCivic   5

--Prepare data for ignore nulls example
> CREATE TEMP VIEW person (id, name, age) AS
   VALUES (100, 'Mary', NULL),
          (200, 'John', 30),
          (300, 'Mike', 80),
          (400, 'Dan' , 50);

--Select the first row in column age
> SELECT FIRST(age) FROM person;
  first(age, false)
 --------------------
  NULL

--Get the first row in column `age` ignore nulls,last row in column `id` and sum of column `id`.
> SELECT FIRST(age IGNORE NULLS), LAST(id), SUM(id) FROM person;
  first(age, true)    last(id, false)    sum(id)
 ------------------- ------------------ ----------
  30                  400                1000