แชร์ผ่าน


CLUSTER BY clause (TABLE)

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime 13.3 LTS and above check marked yes Delta Lake only

Defines liquid, multi-dimensional clustering for a Delta Lake table.

You can use this clause when you:

Updated rows do not get automatically re-clustered. Run OPTIMIZE to re-cluster updated rows..

For more information on liquid clustering see Use liquid clustering for Delta tables

Syntax

CLUSTER BY { ( column_name [, ...] ) |
             NONE }

Parameters

  • column_name

    Specifies columns of the table by which to cluster the data. The column order does not matter. To benefit from altering clustering you should run OPTIMIZE.

  • NONE

    Turns off clustering for the table being altered. Newly inserted or updated data will not be clustered by OPTIMIZE. To not use clustering when creating a table, omit the CLUSTER BY clause.

Examples

You can find more examples in Use liquid clustering for Delta tables.

-- Create a table with a clustering column
> CREATE TABLE t(a int, b string) CLUSTER BY (a);

-- The clustering of an existing Delta table to add a second dimension
> ALTER TABLE t CLUSTER BY (a, b);

-- Recluster the table
> OPTIMIZE t;

-- Remove the clustering
> ALTER TABLE t CLUSTER BY NONE;