Dela via


Read Databricks tables from Iceberg clients

Use the Iceberg REST catalog to read Unity Catalog-registered tables on Azure Databricks from supported Iceberg clients, including Apache Spark, Apache Flink, Trino, and Snowflake.

Read using the Unity Catalog Iceberg catalog endpoint

Unity Catalog provides a read-only implementation of the Iceberg REST catalog API for tables with Iceberg reads enabled.

Configure access using the endpoint /api/2.1/unity-catalog/iceberg. See the Iceberg REST API spec for details on using this REST API.

Note

Azure Databricks has introduced credential vending for some Iceberg reader clients. Databricks recommends using credential vending to control access to cloud storage locations for supported systems. See Unity Catalog credential vending for external system access.

If credential vending is unsupported for your client, you must configure access from the client to the cloud storage location containing the files and metadata for the Delta table with Iceberg reads (UniForm) enabled. Refer to documentation for your Iceberg reader client for configuration details.

Requirements

Azure Databricks supports Iceberg REST catalog access to tables as part of Unity Catalog. You must have Unity Catalog enabled in your workspace to use these endpoints. The following table types are eligible for Iceberg REST catalog reads:

  • Unity Catalog managed tables with Iceberg reads (UniForm) enabled.
  • Unity Catalog external tables stored with Delta Lake with Iceberg reads (UniForm) enabled.

See Read Delta tables with Iceberg clients.

You must complete the following configuration steps to configure access to read Databricks tables from Iceberg clients using the Iceberg REST catalog:

Read Iceberg tables with Apache Spark

The following is an example of the settings to configure Apache Spark to read Azure Databricks tables as Iceberg:

"spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",

# Configuration for accessing Uniform tables in Unity Catalog
"spark.sql.catalog.<spark-catalog-name>": "org.apache.iceberg.spark.SparkCatalog",
"spark.sql.catalog.<spark-catalog-name>.type": "rest",
"spark.sql.catalog.<spark-catalog-name>.uri": "<workspace-url>/api/2.1/unity-catalog/iceberg",
"spark.sql.catalog.<spark-catalog-name>.token":"<token>",
"spark.sql.catalog.<spark-catalog-name>.warehouse":"<uc-catalog-name>"

Substitute the following variables:

  • <uc-catalog-name>: The name of the catalog in Unity Catalog that contains your tables.
  • <spark-catalog-name>: The name you want to assign the catalog in your Spark session.
  • <workspace-url>: URL of the Azure Databricks workspace.
  • <token>: PAT token for the principal configuring the integration.

With these configurations, you can query Azure Databricks tables as Iceberg in Apache Spark using the identifier <catalog-name>.<schema-name>.<table-name>. To access tables across multiple catalogs, you must configure each catalog separately.

When you query tables in Unity Catalog using Spark configurations, keep the following in mind:

  • You need "spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions" only if you are running Iceberg-specific stored procedures.

  • Azure Databricks uses cloud object storage for all tables. You must add the cloud-specific Iceberg bundle JAR as a Spark package:

    • AWS: org.apache.iceberg:iceberg-aws-bundle:<iceberg-version>
    • Azure: org.apache.iceberg:iceberg-azure-bundle:<iceberg-version>
    • GCP: org.apache.iceberg:iceberg-gcp-bundle:<iceberg-version>

    For details, see the documentation for the Iceberg AWS integration for Spark.

Read Databricks tables with Snowflake

The following is an example of the recommended configuration settings to allow Snowflake to read Azure Databricks tables as Iceberg:

CREATE OR REPLACE CATALOG INTEGRATION <catalog-integration-name>
  CATALOG_SOURCE = ICEBERG_REST
  TABLE_FORMAT = ICEBERG
  CATALOG_NAMESPACE = '<uc-schema-name>'
  REST_CONFIG = (
    CATALOG_URI = '<workspace-url>/api/2.1/unity-catalog/iceberg',
    WAREHOUSE = '<uc-catalog-name>'
  )
  REST_AUTHENTICATION = (
    TYPE = BEARER
    BEARER_TOKEN = '<token>'
  )
  ENABLED = TRUE;

Replace the following variables:

  • <catalog-integration-name>: The name you want to assign the catalog registered to Snowflake.
  • <uc-schema-name>: The name of the schema in Unity Catalog you need to access.
  • <uc-catalog-name>: The name of the catalog in Unity Catalog you need to access.
  • <workspace-url>: URL of the Azure Databricks workspace.
  • <token>: PAT token for the principal configuring the integration.

REST API curl example

You can also use a REST API call like the one in this curl example to load a table:

curl -X GET -H "Authentication: Bearer $OAUTH_TOKEN" -H "Accept: application/json" \
https://<workspace-instance>/api/2.1/unity-catalog/iceberg/v1/catalogs/<uc_catalog_name>/namespaces/<uc_schema_name>/tables/<uc_table_name>

You should then receive a response like this:

{
  "metadata-location": "abfss://my-container@my-storage-account.dfs.core.windows.net/path/to/iceberg/table/metadata/file",
  "metadata": <iceberg-table-metadata-json>,
  "config": {
    "expires-at-ms": "<epoch-ts-in-millis>",
    "adls.sas-token.<storage-account-name>.dfs.core.windows.net": "<temporary-sas-token>"
  }
}

Note

The expires-at-ms field in the response indicates the expiration time of the credentials and has a default expiry time of one hour. For better performance, have the client cache the credentials until the expiration time before requesting a new one.