Constraints on Azure Databricks

Azure Databricks supports standard SQL constraint management clauses. Constraints fall into two categories:

  • Enforced contraints ensure that the quality and integrity of data added to a table is automatically verified.
  • Informational primary key and foreign key constraints encode relationships between fields in tables and are not enforced.

All constraints on Azure Databricks require Delta Lake.

Delta Live Tables has a similar concept known as expectations. See Manage data quality with pipeline expectations.

Enforced constraints on Azure Databricks

When a constraint is violated, the transaction fails with an error. Two types of constraints are supported:

  • NOT NULL: indicates that values in specific columns cannot be null.
  • CHECK: indicates that a specified boolean expression must be true for each input row.

Important

Adding a constraint automatically upgrades the table writer protocol version if the previous writer version was less than 3. See How does Azure Databricks manage Delta Lake feature compatibility? to understand table protocol versioning and what it means to upgrade the protocol version.

Set a NOT NULL constraint in Azure Databricks

You specify NOT NULL constraints in the schema when you create a table. You drop or add NOT NULL constraints using the ALTER TABLE ALTER COLUMN command.

CREATE TABLE people10m (
  id INT NOT NULL,
  firstName STRING,
  middleName STRING NOT NULL,
  lastName STRING,
  gender STRING,
  birthDate TIMESTAMP,
  ssn STRING,
  salary INT
);

ALTER TABLE people10m ALTER COLUMN middleName DROP NOT NULL;
ALTER TABLE people10m ALTER COLUMN ssn SET NOT NULL;

Before adding a NOT NULL constraint to a table, Azure Databricks verifies that all existing rows satisfy the constraint.

If you specify a NOT NULL constraint on a column nested within a struct, the parent struct must also be not null. Columns nested within array or map types do not accept NOT NULL constraints.

See CREATE TABLE [USING] and ALTER TABLE ALTER COLUMN.

Set a CHECK constraint in Azure Databricks

You manage CHECK constraints using the ALTER TABLE ADD CONSTRAINT and ALTER TABLE DROP CONSTRAINT commands. ALTER TABLE ADD CONSTRAINT verifies that all existing rows satisfy the constraint before adding it to the table.

CREATE TABLE people10m (
  id INT,
  firstName STRING,
  middleName STRING,
  lastName STRING,
  gender STRING,
  birthDate TIMESTAMP,
  ssn STRING,
  salary INT
);

ALTER TABLE people10m ADD CONSTRAINT dateWithinRange CHECK (birthDate > '1900-01-01');
ALTER TABLE people10m DROP CONSTRAINT dateWithinRange;

See ALTER TABLE ADD CONSTRAINT and ALTER TABLE DROP CONSTRAINT.

CHECK constraints are exposed as table properties in the output of the DESCRIBE DETAIL and SHOW TBLPROPERTIES commands.

ALTER TABLE people10m ADD CONSTRAINT validIds CHECK (id > 1 and id < 99999999);

DESCRIBE DETAIL people10m;

SHOW TBLPROPERTIES people10m;

Disable check constraints

In Databricks Runtime 15.4 LTS and above, you can use the DROP FEATURE command to remove check constraints from a table and downgrade the table protocol.

See Drop Delta table features.

Declare primary key and foreign key relationships

Note

  • Primary key and foreign key constraints are available in Databricks Runtime 11.3 LTS and above, and are fully GA in Databricks Runtime 15.2 and above.
  • Primary key and foreign key constraints require Unity Catalog and Delta Lake.

You can use primary key and foreign key relationships on fields in Unity Catalog tables. Primary and foreign keys are informational only and are not enforced. Foreign keys must reference a primary key in another table.

You can declare primary keys and foreign keys as part of the table specification clause during table creation. This clause is not allowed during CTAS statements. You can also add constraints to existing tables.

CREATE TABLE T(pk1 INTEGER NOT NULL, pk2 INTEGER NOT NULL,
                CONSTRAINT t_pk PRIMARY KEY(pk1, pk2));
CREATE TABLE S(pk INTEGER NOT NULL PRIMARY KEY,
                fk1 INTEGER, fk2 INTEGER,
                CONSTRAINT s_t_fk FOREIGN KEY(fk1, fk2) REFERENCES T);

You can query the information_schema or use DESCRIBE to get details about how constraints are applied across a given catalog.

See: