Κοινή χρήση μέσω


How does Azure Databricks manage Delta Lake feature compatibility?

Delta Lake is an independent open-source project under the governance of the Linux Foundation. Databricks introduces support for new Delta Lake features and optimizations that build on top of Delta Lake in Databricks Runtime releases.

Azure Databricks optimizations that leverage Delta Lake features respect the protocols used in OSS Delta Lake for compatibility.

Many Azure Databricks optimizations require enabling Delta Lake features on a table. Delta Lake features are always backwards compatible, so tables written by a lower Databricks Runtime version can always be read and written by a higher Databricks Runtime version. Enabling some features breaks forward compatibility with workloads running in a lower Databricks Runtime version. For features that break forward compatibility, you must update all workloads that reference the upgraded tables to use a compliant Databricks Runtime version.

Note

You can drop deletionVectors, v2Checkpoint, columnMapping, typeWidening-preview, and collations-preview on Azure Databricks. See Drop Delta table features.

Important

All protocol change operations conflict with all concurrent writes.

Streaming reads fail when they encounter a commit that changes table metadata. If you want the stream to continue you must restart it. For recommended methods, see Production considerations for Structured Streaming.

What Delta Lake features require Databricks Runtime upgrades?

The following Delta Lake features break forward compatibility. Features are enabled on a table-by-table basis. This table lists the lowest Databricks Runtime version still supported by Azure Databricks.

Feature Requires Databricks Runtime version or later Documentation
CHECK constraints Databricks Runtime 9.1 LTS Set a CHECK constraint in Azure Databricks
Change data feed Databricks Runtime 9.1 LTS Use Delta Lake change data feed on Azure Databricks
Generated columns Databricks Runtime 9.1 LTS Delta Lake generated columns
Column mapping Databricks Runtime 10.4 LTS Rename and drop columns with Delta Lake column mapping
Identity columns Databricks Runtime 10.4 LTS Use identity columns in Delta Lake
Table features Databricks Runtime 12.2 LTS What are table features?
Deletion vectors Databricks Runtime 12.2 LTS What are deletion vectors?
TimestampNTZ Databricks Runtime 13.3 LTS TIMESTAMP_NTZ type
UniForm Databricks Runtime 13.3 LTS Read Delta tables with Iceberg clients
Liquid clustering Databricks Runtime 13.3 LTS Use liquid clustering for Delta tables
Row tracking Databricks Runtime 14.1 Use row tracking for Delta tables
Type widening Databricks Runtime 15.2 Type widening
Variant Databricks Runtime 15.3 Variant support in Delta Lake
Collations Databricks Runtime 16.1 Collation support for Delta Lake

See Databricks Runtime release notes versions and compatibility.

Note

Delta Live Tables and Databricks SQL automatically upgrade runtime environments with regular releases to support new features. See Delta Live Tables release notes and the release upgrade process and Databricks SQL release notes.

What is a table protocol specification?

Every Delta table has a protocol specification which indicates the set of features that the table supports. The protocol specification is used by applications that read or write the table to determine if they can handle all the features that the table supports. If an application does not know how to handle a feature that is listed as supported in the protocol of a table, then that application is not be able to read or write that table.

The protocol specification is separated into two components: the read protocol and the write protocol.

Warning

Most protocol version upgrades are irreversible, and upgrading the protocol version might break the existing Delta Lake table readers, writers, or both. Databricks recommends you upgrade specific tables only when needed, such as to opt-in to new features in Delta Lake. You should also check to make sure that all of your current and future production tools support Delta Lake tables with the new protocol version.

Protocol downgrades are available for some features. See Drop Delta table features.

Read protocol

The read protocol lists all features that a table supports and that an application must understand in order to read the table correctly. Upgrading the read protocol of a table requires that all reader applications support the added features.

Important

All applications that write to a Delta table must be able to construct a snapshot of the table. As such, workloads that write to Delta tables must respect both reader and writer protocol requirements.

If you encounter a protocol that is unsupported by a workload on Azure Databricks, you must upgrade to a higher Databricks Runtime that supports that protocol.

Write protocol

The write protocol lists all features that a table supports and that an application must understand in order to write to the table correctly. Upgrading the write protocol of a table requires that all writer applications support the added features. It does not affect read-only applications, unless the read protocol is also upgraded.

Which protocols must be upgraded?

Some features require upgrading both the read protocol and the write protocol. Other features only require upgrading the write protocol.

As an example, support for CHECK constraints is a write protocol feature: only writing applications need to know about CHECK constraints and enforce them.

In contrast, column mapping requires upgrading both the read and write protocols. Because the data is stored differently in the table, reader applications must understand column mapping so they can read the data correctly.

Minimum reader and writer versions

Note

You must explicity upgrade the table protocol version when enabling column mapping.

When you enable Delta features on a table, the table protocol is automatically upgraded. Databricks recommends against changing the minReaderVersion and minWriterVersion table properties. Changing these table properties does not prevent protocol upgrade. Setting these values to a lower value does not downgrade the table. See Drop Delta table features.

What are table features?

In Databricks Runtime 12.2 LTS and above, Delta Lake table features introduce granular flags specifying which features are supported by a given table. In Databricks Runtime 11.3 LTS and below, Delta Lake features were enabled in bundles called protocol versions. Table features are the successor to protocol versions and are designed with the goal of improved flexibility for clients that read and write Delta Lake. See What is a protocol version?.

Note

Table features have protocol version requirements. See Features by protocol version.

A Delta table feature is a marker that indicates that the table supports a particular feature. Every feature is either a write protocol feature (meaning it only upgrades the write protocol) or a read/write protocol feature (meaning both read and write protocols are upgraded to enable the feature).

To learn more about supported table features in Delta Lake, see the Delta Lake protocol.

Do table features change how Delta Lake features are enabled?

If you only interact with Delta tables through Azure Databricks, you can continue to track support for Delta Lake features using minimum Databricks Runtime requirements. Azure Databricks supports reading Delta tables that have been upgraded to table features in all Databricks Runtime LTS releases, as long as all features used by the table are supported by that release.

If you read and write from Delta tables using other systems, you might need to consider how table features impact compatibility, because there is a risk that the system could not understand the upgraded protocol versions.

Important

Table features are introduced to the Delta Lake format for writer version 7 and reader version 3. Azure Databricks has backported code to all supported Databricks Runtime LTS versions to add support for table features, but only for those features already supported in that Databricks Runtime. This means that while you can opt in to using table features to enable generated columns and still work with these tables in Databricks Runtime 9.1 LTS, tables with identity columns enabled (which requires Databricks Runtime 10.4 LTS) are still not supported in that Databricks Runtime.

What is a protocol version?

A protocol version is a protocol number that indicates a particular grouping of table features. In Databricks Runtime 11.3 LTS and below, you cannot enable table features individually. Protocol versions bundle a group of features.

Delta tables specify a separate protocol version for read protocol and write protocol. The transaction log for a Delta table contains protocol versioning information that supports Delta Lake evolution. See Review Delta Lake table details with describe detail.

The protocol versions bundle all features from previous protocols. See Features by protocol version.

Note

Starting with writer version 7 and reader version 3, Delta Lake has introduced the concept of table features. Using table features, you can now choose to only enable those features that are supported by other clients in your data ecosystem. See What are table features?.

Features by protocol version

The following table shows minimum protocol versions required for Delta Lake features.

Note

If you are only concerned with Databricks Runtime compatibility, see What Delta Lake features require Databricks Runtime upgrades?. Delta Sharing only supports reading tables with features that require minReaderVersion = 1.

Feature minWriterVersion minReaderVersion Documentation
Basic functionality 2 1 What is Delta Lake?
CHECK constraints 3 1 Set a CHECK constraint in Azure Databricks
Change data feed 4 1 Use Delta Lake change data feed on Azure Databricks
Generated columns 4 1 Delta Lake generated columns
Column mapping 5 2 Rename and drop columns with Delta Lake column mapping
Identity columns 6 2 Use identity columns in Delta Lake
Table features read 7 1 What are table features?
Table features write 7 3 What are table features?
Row tracking 7 1 Use row tracking for Delta tables
Deletion vectors 7 3 What are deletion vectors?
TimestampNTZ 7 3 TIMESTAMP_NTZ type
Liquid clustering 7 3 Use liquid clustering for Delta tables
UniForm 7 2 Read Delta tables with Iceberg clients
Type widening 7 3 Type widening
Variant 7 3 Variant support in Delta Lake
Collations 7 3 Collation support for Delta Lake