Enable Iceberg reads on Delta tables (UniForm)
This article provides details for enabling Iceberg reads on tables stored with Delta Lake in Azure Databricks. This feature requires Databricks Runtime 14.3 LTS or above.
Note
This functionality was previously called Delta Lake Universal Format (UniForm).
You can configure an external connection to have Unity Catalog act as an Iceberg catalog. See Read Databricks tables from Iceberg clients.
Tables with Iceberg reads enabled use Zstandard instead of Snappy as the compression codec for underlying Parquet data files.
Note
UniForm metadata generation runs asynchronously on the compute used to write data to Delta tables, which might increase the driver resource usage.
Important
For documentation for the legacy UniForm IcebergCompatV1
table feature, see Legacy UniForm IcebergCompatV1.
How do Iceberg reads (UniForm) work?
Both Delta Lake and Iceberg consist of Parquet data files and a metadata layer. Enabling Iceberg reads configures your tables to automatically generate Iceberg metadata asynchronously, without rewriting data, so that Iceberg clients can read Delta tables written by Azure Databricks. A single copy of the data files serves multiple formats.
Requirements
To enable Iceberg reads, the following requirements must be met:
- The Delta table must be registered to Unity Catalog. Both managed and external tables are supported.
- The table must have column mapping enabled. See Rename and drop columns with Delta Lake column mapping.
- The Delta table must have a
minReaderVersion
>= 2 andminWriterVersion
>= 7. See How does Azure Databricks manage Delta Lake feature compatibility?. - Writes to the table must use Databricks Runtime 14.3 LTS or above.
Note
You cannot enable deletion vectors on a table with Iceberg reads enabled.
Use REORG
to disable and purge deletion vectors while enabling Iceberg reads on an existing table with deletion vectors enabled. See Enable or upgrade Iceberg read support using REORG.
Enable Iceberg reads (UniForm)
Important
When you enable Iceberg reads, the write protocol feature IcebergCompatV2
is added to the table. Only clients that support this table feature can write to tables with Iceberg reads enabled. On Azure Databricks, you must use Databricks Runtime 14.3 LTS or above to write to enabled tables.
You can turn off Iceberg reads by unsetting the delta.universalFormat.enabledFormats
table property. Upgrades to Delta Lake reader and writer protocol versions cannot be undone.
You must set the following table properties to enable Iceberg reads:
'delta.enableIcebergCompatV2' = 'true'
'delta.universalFormat.enabledFormats' = 'iceberg'
When you first enable Icerberg reads, asynchronous metadata generation begins. This task must complete before external clients can query the table using Iceberg. See Check Iceberg metadata generation status.
For a list of limitations, see Limitations.
Enable Iceberg reads during table creation
Column mapping must be enabled to use Iceberg reads. This happens automatically if you enable Iceberg reads during table creation, as in the following example:
CREATE TABLE T(c1 INT) TBLPROPERTIES(
'delta.enableIcebergCompatV2' = 'true',
'delta.universalFormat.enabledFormats' = 'iceberg');
Enable Iceberg reads on an existing table
In Databricks Runtime 15.4 LTS and above, you can enable or upgrade Iceberg reads on an existing table using the following syntax:
ALTER TABLE table_name SET TBLPROPERTIES(
'delta.enableIcebergCompatV2' = 'true',
'delta.universalFormat.enabledFormats' = 'iceberg');
Enable or upgrade Iceberg read support using REORG
You can use REORG
to enable Iceberg reads and rewrite underlying data files, as in the following example:
REORG TABLE table_name APPLY (UPGRADE UNIFORM(ICEBERG_COMPAT_VERSION=2));
Use REORG
if any of following are true:
- Your table has deletion vectors enabled.
- You previously enabled the
IcebergCompatV1
version of UniForm Iceberg. - You need to read from Iceberg engines that don’t support Hive-style Parquet files, such as Athena or Redshift.
When does Iceberg metadata generation occur?
Azure Databricks triggers metadata generation asynchronously after a Delta Lake write transaction completes. This metadata generation process uses the same compute that completed the Delta transaction.
Note
You can also manually trigger Iceberg metadata generation. See Manually trigger Iceberg metadata conversion.
To avoid write latencies associated with metadata generation, Delta tables with frequent commits might group multiple Delta commits into a single commit to Iceberg metadata.
Delta Lake ensures that only one metadata generation process is in progress at any time on a given compute resource. Commits that would trigger a second concurrent metadata generation process successfully commit to Delta, but don’t trigger asynchronous Iceberg metadata generation. This prevents cascading latency for metadata generation for workloads with frequent commits (seconds to minutes between commits).
See Delta and Iceberg table versions.
Delta and Iceberg table versions
Delta Lake and Iceberg allow time travel queries using table versions or timestamps stored in table metadata.
In general, Delta table versions do not align to Iceberg versions by either the commit timestamp or the version ID. To verify which version of a Delta table a given version of an Iceberg table corresponds to, you can use the corresponding table properties. See Check Iceberg metadata generation status.
Check Iceberg metadata generation status
Enabling Iceberg reads on a table adds the following fields to Unity Catalog and Iceberg table metadata to track metadata generation status:
Metadata field | Description |
---|---|
converted_delta_version |
The latest version of the Delta table for which Iceberg metadata was successfully generated. |
converted_delta_timestamp |
The timestamp of the latest Delta commit for which Iceberg metadata was successfully generated. |
On Azure Databricks, you can review these metadata fields by doing one of the following:
- Reviewing the
Delta Uniform Iceberg
section returned byDESCRIBE EXTENDED table_name
. - Reviewing table metadata with Catalog Explorer.
- Using the REST API to get a table.
See documentation for your Iceberg reader client for how to review table properties outside Azure Databricks. For OSS Apache Spark, you can see these properties using the following syntax:
SHOW TBLPROPERTIES <table-name>;
Manually trigger Iceberg metadata conversion
You can manually trigger Iceberg metadata generation for the latest version of the Delta table. This operation runs synchronously, meaning that when it completes, the table contents available in Iceberg reflect the latest version of the Delta table available when the conversion process started.
This operation should not be necessary under normal conditions, but can help if you encounter the following:
- A cluster terminates before automatic metadata generation succeeds.
- An error or job failure interrupts metadata generation.
- A client that does not support UniForm Iceberg metadata gneration writes to the Delta table.
Use the following syntax to manually trigger Iceberg metadata generation:
MSCK REPAIR TABLE <table-name> SYNC METADATA
See REPAIR TABLE.
Read Icerberg using a metadata JSON path
Some Iceberg clients require that you provide a path to versioned metadata files to register external Iceberg tables. Each time Azure Databricks converts a new version of the Delta table to Iceberg, it creates a new metadata JSON file.
Clients that use metadata JSON paths for configuring Iceberg include BigQuery. Refer to documentation for the Iceberg reader client for configuration details.
Delta Lake stores Iceberg metadata under the table directory, using the following pattern:
<table-path>/metadata/<version-number>-<uuid>.metadata.json
On Azure Databricks, you can review this metadata location by doing one of the following:
- Reviewing the
Delta Uniform Iceberg
section returned byDESCRIBE EXTENDED table_name
. - Reviewing table metadata with Catalog Explorer.
- Using the following command with the REST API:
GET api/2.1/unity-catalog/tables/<catalog-name>.<schame-name>.<table-name>
The response includes the following information:
{
...
"delta_uniform_iceberg": {
"metadata_location": "<cloud-storage-uri>/metadata/v<version-number>-<uuid>.metadata.json"
}
}
Important
Path-based Iceberg reader clients might require manually updating and refreshing metadata JSON paths to read current table versions. Users might encounter errors when querying Iceberg tables using out-of-date versions as Parquet data files are removed from the Delta table with VACUUM
.
Limitations
The following limitations exist for all tables with Iceberg reads enabled:
- Iceberg reads do not work on tables with deletion vectors enabled. See What are deletion vectors?.
- Delta tables with Iceberg reads enabled do not support
VOID
types. - Iceberg client support is read-only. Writes are not supported.
- Iceberg reader clients might have individual limitations, regardless of Azure Databricks support for Iceberg reads. See documentation for your chosen client.
- The recipients of Delta Sharing can only read the table as Delta, even when Iceberg reads are enabled.
- Some Delta Lake table features used by Iceberg reads are not supported by some Delta Sharing reader clients. See What is Delta Sharing?.
Change Data Feed works for Delta clients when Iceberg reads are enabled, but does not have support in Iceberg.