แชร์ผ่าน


Read Delta tables with Iceberg clients

This article provides details for enabling Iceberg reads on tables stored with Delta Lake in Azure Databricks. This feature requires Databricks Runtime 14.3 LTS or above.

Note

This functionality was previously called Delta Lake Universal Format (UniForm).

You can configure an external connection to have Unity Catalog act as an Iceberg catalog. See Read Databricks tables from Iceberg clients.

How do Iceberg reads (UniForm) work?

Both Delta Lake and Iceberg consist of Parquet data files and a metadata layer. Enabling Iceberg reads configures your tables to automatically generate Iceberg metadata asynchronously, without rewriting data, so that Iceberg clients can read Delta tables written by Azure Databricks. A single copy of the data files serves multiple formats.

Important

  • Tables with Iceberg reads enabled use Zstandard instead of Snappy as the compression codec for underlying Parquet data files.
  • Iceberg metadata generation runs asynchronously on the compute used to write data to Delta tables, which might increase the driver resource usage.
  • For documentation for the legacy UniForm IcebergCompatV1 table feature, see Legacy UniForm IcebergCompatV1.

Requirements

To enable Iceberg reads, the following requirements must be met:

Note

You cannot enable deletion vectors on a table with Iceberg reads enabled.

Use REORG to disable and purge deletion vectors while enabling Iceberg reads on an existing table with deletion vectors enabled. See Enable or upgrade Iceberg read support using REORG.

Enable Iceberg reads (UniForm)

Important

When you enable Iceberg reads, the write protocol feature IcebergCompatV2 is added to the table. Only clients that support this table feature can write to tables with Iceberg reads enabled. On Azure Databricks, you must use Databricks Runtime 14.3 LTS or above to write to enabled tables.

You can turn off Iceberg reads by unsetting the delta.universalFormat.enabledFormats table property. Upgrades to Delta Lake reader and writer protocol versions cannot be undone.

You must set the following table properties to enable Iceberg reads:

'delta.enableIcebergCompatV2' = 'true'
'delta.universalFormat.enabledFormats' = 'iceberg'

When you first enable Iceberg reads, asynchronous metadata generation begins. This task must complete before external clients can query the table using Iceberg. See Check Iceberg metadata generation status.

For a list of limitations, see Limitations.

Enable Iceberg reads during table creation

Column mapping must be enabled to use Iceberg reads. This happens automatically if you enable Iceberg reads during table creation, as in the following example:

CREATE TABLE T(c1 INT) TBLPROPERTIES(
  'delta.columnMapping.mode' = 'name',
  'delta.enableIcebergCompatV2' = 'true',
  'delta.universalFormat.enabledFormats' = 'iceberg');

Enable Iceberg reads on an existing table

In Databricks Runtime 15.4 LTS and above, you can enable or upgrade Iceberg reads on an existing table using the following syntax:

ALTER TABLE table_name SET TBLPROPERTIES(
  'delta.columnMapping.mode' = 'name',
  'delta.enableIcebergCompatV2' = 'true',
  'delta.universalFormat.enabledFormats' = 'iceberg');

Enable or upgrade Iceberg read support using REORG

You can use REORG to enable Iceberg reads and rewrite underlying data files, as in the following example:

REORG TABLE table_name APPLY (UPGRADE UNIFORM(ICEBERG_COMPAT_VERSION=2));

Use REORG if any of the following are true:

  • Your table has deletion vectors enabled.
  • You previously enabled the IcebergCompatV1 version of UniForm Iceberg.
  • You need to read from Iceberg engines that don’t support Hive-style Parquet files, such as Athena or Redshift.

When does Iceberg metadata generation occur?

Azure Databricks triggers metadata generation asynchronously after a Delta Lake write transaction completes. This metadata generation process uses the same compute that completed the Delta transaction.

Note

You can also manually trigger Iceberg metadata generation. See Manually trigger Iceberg metadata conversion.

To avoid write latencies associated with metadata generation, Delta tables with frequent commits might group multiple Delta commits into a single commit to Iceberg metadata.

Delta Lake ensures that only one metadata generation process is in progress on a given compute resource. Commits that would trigger a second concurrent metadata generation process successfully commit to Delta but don’t trigger asynchronous Iceberg metadata generation. This prevents cascading latency for metadata generation for workloads with frequent commits (seconds to minutes between commits).

See Delta and Iceberg table versions.

Delta and Iceberg table versions

Delta Lake and Iceberg allow time travel queries using table versions or timestamps stored in table metadata.

In general, Delta table versions do not align with Iceberg versions by either the commit timestamp or the version ID. To verify which version of a Delta table a given version of an Iceberg table corresponds to, you can use the corresponding table properties. See Check Iceberg metadata generation status.

Check Iceberg metadata generation status

Enabling Iceberg reads on a table adds the following fields to Unity Catalog and Iceberg table metadata to track metadata generation status:

Metadata field Description
converted_delta_version The latest version of the Delta table for which Iceberg metadata was successfully generated.
converted_delta_timestamp The timestamp of the latest Delta commit for which Iceberg metadata was successfully generated.

On Azure Databricks, you can review these metadata fields by doing one of the following:

  • Reviewing the Delta Uniform Iceberg section returned by DESCRIBE EXTENDED table_name.
  • Reviewing table metadata with Catalog Explorer.
  • Using the REST API to get a table.

See documentation for your Iceberg reader client for how to review table properties outside Azure Databricks. For OSS Apache Spark, you can see these properties using the following syntax:

SHOW TBLPROPERTIES <table-name>;

Manually trigger Iceberg metadata conversion

You can manually trigger Iceberg metadata generation for the latest version of the Delta table. This operation runs synchronously, meaning that when it completes, the table contents available in Iceberg reflect the latest version of the Delta table available when the conversion process started.

This operation should not be necessary under normal conditions, but can help if you encounter the following:

  • A cluster terminates before automatic metadata generation succeeds.
  • An error or job failure interrupts metadata generation.
  • A client that does not support UniForm Iceberg metadata generation writes to the Delta table.

Use the following syntax to trigger Iceberg metadata generation manually:

MSCK REPAIR TABLE <table-name> SYNC METADATA

See REPAIR TABLE.

Read Iceberg using a metadata JSON path

Some Iceberg clients require that you provide a path to versioned metadata files to register external Iceberg tables. Each time Azure Databricks converts a new version of the Delta table to Iceberg, it creates a new metadata JSON file.

Clients that use metadata JSON paths for configuring Iceberg include BigQuery. Refer to documentation for the Iceberg reader client for configuration details.

Delta Lake stores Iceberg metadata under the table directory using the following pattern:

<table-path>/metadata/<version-number>-<uuid>.metadata.json

On Azure Databricks, you can review this metadata location by doing one of the following:

  • Reviewing the Delta Uniform Iceberg section returned by DESCRIBE EXTENDED table_name.
  • Reviewing table metadata with Catalog Explorer.
  • Using the following command with the REST API:
GET api/2.1/unity-catalog/tables/<catalog-name>.<schame-name>.<table-name>

The response includes the following information:

{
    ...
          "delta_uniform_iceberg": {
              "metadata_location":  "<cloud-storage-uri>/metadata/v<version-number>-<uuid>.metadata.json"
    }
}

Important

Path-based Iceberg reader clients might require manually updating and refreshing metadata JSON paths to read current table versions. Users might encounter errors when querying Iceberg tables using out-of-date versions as Parquet data files are removed from the Delta table with VACUUM.

Limitations

The following limitations exist for all tables with Iceberg reads enabled:

  • Iceberg reads do not work on tables with deletion vectors enabled. See What are deletion vectors?.
  • Delta tables with Iceberg reads enabled do not support VOID types.
  • Iceberg client support is read-only. Writes are not supported.
  • Iceberg reader clients might have individual limitations, regardless of Azure Databricks support for Iceberg reads. See the documentation for your chosen client.
  • The recipients of Delta Sharing can only read the table as Delta, even when Iceberg reads are enabled.
  • Some Delta Lake table features used by Iceberg reads are not supported by some Delta Sharing reader clients. See What is Delta Sharing?.

Change Data Feed works for Delta clients when Iceberg reads are enabled but does not have support in Iceberg.