Control external access to data in Unity Catalog

Important

This feature is in Public Preview.

Tip

For information about how to read Azure Databricks data using Microsoft Fabric, see Use Microsoft Fabric to read data that is registered in Unity Catalog.

This article describes how to control access to data in Azure Databricks when external processing engines use Unity Catalog open APIs or Iceberg APIs to access that data, specifically when engines use the Unity Catalog credential vending functionality to gain access.

The access controls described in this article cover scenarios where temporary credentials are used to read data from your Unity Catalog catalogs using external engines and interfaces like:

  • Iceberg REST APIs
  • Microsoft Fabric
  • duckDB
  • Apache Spark and Trino

Note

Unity Catalog implements Iceberg REST APIs via Delta Lake UniForm, an alternative way to provide Iceberg clients with read-only access to Delta tables in Unity Catalog. See Use UniForm to read Delta tables with Iceberg clients.

Overview of credential vending and granting external engine access

When you want to use an external engine to access data that is registered in your Unity Catalog metastore, you must request a short-lived credential using the Unity Catalog REST API. The process by which Unity Catalog grants that credential is called credential vending.

In order to be granted a temporary credential, the Azure Databricks principal (user, group, or service principal) who makes the request must have the EXTERNAL USE SCHEMA privilege on the schema that contains the table that they want to access from the external engine. The Unity Catalog metastore that contains the schema must also be enabled explicitly for external access.

When the privileged principal has been granted the temporary credential, they receive a short-lived access token string and cloud storage location URL that users of the external engine can use to access the data object (table) from the cloud storage location. How the credential and token are used by the external engine is specific to the external engine and is not covered here.

The external engine and the Azure Databricks Unity Catalog configuration must also meet specific networking requirements that are enumerated in the sections that follow.

Requirements

This section lists networking configurations, Unity Catalog metastore options, table types, and permissions required for secure access to Unity Catalog data objects from external engines using the Unity Catalog open APIs or Iceberg REST APIs.

Networking requirements

  • To access the Azure Databricks workspace using Unity Catalog Open APIs or Iceberg REST APIs, the workspace URL must be accessible to the engine performing the request. This includes workspaces that use IP access lists or Azure Private Link.
  • To access the underlying cloud storage location for Unity Catalog-registered data objects, the storage URLs generated by the Unity Catalog temporary credentials API must be accessible to the engine performing the request. This means that the engine must be allowed on the firewall and network access control lists for the underlying cloud storage accounts.

Unity Catalog metastore and data object requirements

  • The metastore must be enabled for External Data Access.
  • Only tables are supported during the public preview.
    • External tables support read and write.
    • Managed tables can only be read.
  • The following table types are not supported:
    • Tables with row filters or column masks.
    • Tables shared using Delta Sharing.
    • Lakehouse federated tables (foreign tables).
    • Views
    • Materialized views
    • Delta Live Tables streaming tables
    • Online tables
    • Vector Search indexes

Permission requirements

The principal who requests the temporary credential must have:

  • The EXTERNAL USE SCHEMA privilege on the containing schema or its parent catalog.

    This privilege must always be granted explicitly. Only the parent catalog owner can grant it. To avoid accidental exfiltration, ALL PRIVILEGES does not include the EXTERNAL USE SCHEMA privilege, and schema owners do not have this privilege by default.

  • SELECT permission on the table, USE CATALOG on its parent catalog, and USE SCHEMA on its parent schema.

Enable external data access on the metastore

To allow external engines to access data in a metastore, a metastore admin must enable external data access for the metastore. This option is disabled by default to prevent unauthorized external access.

  1. In an Azure Databricks workspace attached to the metastore, click Catalog icon Catalog.
  2. Click the Gear icon gear icon at the top of the Catalog pane and select Metastore.
  3. On the Details tab, enable External data access.

Request a temporary credential for external data access

To request a temporary credential for external data access, a workspace user who meets the requirements listed above must use the /api/2.1/unity-catalog/temporary-table-credentials API.

Note

You can retrieve a list of tables that support credential vending by invoking the ListTables API with the include_manifest_capabilities option enabled. Only tables marked HAS_DIRECT_EXTERNAL_ENGINE_READ_SUPPORT or HAS_DIRECT_EXTERNAL_ENGINE_WRITE_SUPPORT are eligible for reference in the temporary-table-credentials API. See GET /api/2.1/unity-catalog/tables.

For example:

curl -X POST -H "Authentication: Bearer $OAUTH_TOKEN" \
https://<workspace-instance>/api/2.1/unity-catalog/temporary-table-credentials \
-d '{"table_id": "<string>", "operation_name": "<READ|READ_WRITE>"}'

For details, see POST /api/2.1/unity-catalog/temporary-table-credentials in the Azure Databricks REST API reference.