Dela via


Access Databricks data using external systems

This article provides an overview of functionality and recommendations for making data managed and governed by Azure Databricks available to other systems.

These patterns focus on scenarios where your organization needs to integrate trusted tools or systems to Azure Databricks data. If you are looking for guidance on sharing data outside of your organization, see Share data and AI assets securely with users in other organizations.

What external access does Azure Databricks support?

Azure Databricks recommends using Unity Catalog to govern all your data assets.

The following table provides an overview of support formats and access patterns for Unity Catalog objects.

Unity Catalog object Formats supported Access patterns
Managed tables Delta Lake, Iceberg Credential vending, Iceberg REST catalog, Delta Sharing
External tables Delta Lake Credential vending, Iceberg REST catalog, Delta Sharing, cloud URIs
External tables CSV, JSON, Avro, Parquet, ORC, text Cloud URIs
External volumes All data types Cloud URIs

Note

Iceberg support describes tables written by Azure Databricks using Delta Lake but with Iceberg reads (UniForm) enabled.

For more details on these Unity Catalog objects, see the following:

Unity Catalog credential vending

Unity Catalog credential vending allows users to configure external clients to inherit privileges on data governed by Azure Databricks. See Unity Catalog credential vending for external system access.

Read tables with Iceberg clients

Azure Databricks provides Iceberg clients with read-only support for tables registered to Unity Catalog. Supported clients include Apache Spark, Apache Flink, Trino, and Snowflake. See Read Databricks tables from Iceberg clients.

Share read-only tables across domains

You can use Delta Sharing to grant read-only access to managed or external Delta tables across domains and supported systems. Software systems that support zero-copy reads of Delta Sharing tables include SAP, Amperity, and Oracle. See Share data and AI assets securely with users in other organizations.

Note

You can also use Delta Sharing to grant read-only access to customers or partners. Delta Sharing also backs data shared using the Databricks Marketplace.

Read and write external Delta tables

You can access Unity Catalog external tables backed by Delta Lake from external Delta Lake reader and writer clients using cloud object storage URIs and credentials.

Unity Catalog does not govern reads and writes performed directly against cloud object storage from external systems, so you must configure additional policies and credentials in your cloud account to ensure that data governance policies are respected outside Azure Databricks.

Note

The Azure Databricks documentation lists limitations and compatibility considerations based on Databricks Runtime versions and platform features. You must confirm what reader and writer protocols and table features your client supports. See delta.io.

Access non-Delta Lake tabular data with external tables

Unity Catalog external tables support many formats other than Delta Lake, including Parquet, ORC, CSV, and JSON. External tables store all data files in directories in a cloud object storage location specified by a cloud URI provided during table creation. Other systems access these data files directly from cloud object storage.

Unity Catalog does not govern reads and writes performed directly against cloud object storage from external systems, so you must configure additional policies and credentials in your cloud account to ensure that data governance policies are respected outside Azure Databricks.

Reading and writing to external tables from multiple systems can lead to consistency issues and data corruption because no transactional guarantees are provided for formats other than Delta Lake.

Unity Catalog might not pick up new partitions written to external tables backed by formats other than Delta Lake. Databricks recommends regularly running MSCK REPAIR TABLE table_name to ensure Unity Catalog has registered all data files written by external systems.

Access non-tabular data with external volumes

Databricks recommends using external volumes to store non-tabular data files that are read or written by external systems in addition to Azure Databricks. See What are Unity Catalog volumes?.

Unity Catalog does not govern reads and writes performed directly against cloud object storage from external systems, so you must configure additional policies and credentials in your cloud account to ensure that data governance policies are respected outside Azure Databricks.

Volumes provides APIs, SDKs, and other tools for getting files from and putting files into volumes. See Manage files in volumes.

Note

Delta Sharing allows you to share volumes to other Azure Databricks accounts, but does not integrate with external systems.