Access Databricks data using external systems
This article provides an overview of functionality and recommendations for making data managed and governed by Azure Databricks available to other systems.
These patterns focus on scenarios where your organization needs to integrate trusted tools or systems to Azure Databricks data. If you are looking for guidance on sharing data outside of your organization, see Share data and AI assets securely with users in other organizations.
What external access does Azure Databricks support?
Azure Databricks recommends using Unity Catalog to govern all your data assets.
The following table provides an overview of support formats and access patterns for Unity Catalog objects.
Unity Catalog object | Formats supported | Access patterns |
---|---|---|
Managed tables | Delta Lake, Iceberg | Credential vending, Iceberg REST catalog, Delta Sharing |
External tables | Delta Lake | Credential vending, Iceberg REST catalog, Delta Sharing, cloud URIs |
External tables | CSV, JSON, Avro, Parquet, ORC, text | Cloud URIs |
External volumes | All data types | Cloud URIs |
Note
Iceberg support describes tables written by Azure Databricks using Delta Lake but with Iceberg reads (UniForm) enabled.
For more details on these Unity Catalog objects, see the following:
Unity Catalog credential vending
Unity Catalog credential vending allows users to configure external clients to inherit privileges on data governed by Azure Databricks. See Unity Catalog credential vending for external system access.
Read tables with Iceberg clients
Azure Databricks provides Iceberg clients with read-only support for tables registered to Unity Catalog. Supported clients include Apache Spark, Apache Flink, Trino, and Snowflake. See Read Databricks tables from Iceberg clients.
Share read-only tables across domains
You can use Delta Sharing to grant read-only access to managed or external Delta tables across domains and supported systems. Software systems that support zero-copy reads of Delta Sharing tables include SAP, Amperity, and Oracle. See Share data and AI assets securely with users in other organizations.
Note
You can also use Delta Sharing to grant read-only access to customers or partners. Delta Sharing also backs data shared using the Databricks Marketplace.
Read and write external Delta tables
You can access Unity Catalog external tables backed by Delta Lake from external Delta Lake reader and writer clients using cloud object storage URIs and credentials.
Unity Catalog does not govern reads and writes performed directly against cloud object storage from external systems, so you must configure additional policies and credentials in your cloud account to ensure that data governance policies are respected outside Azure Databricks.
Note
The Azure Databricks documentation lists limitations and compatibility considerations based on Databricks Runtime versions and platform features. You must confirm what reader and writer protocols and table features your client supports. See delta.io.
Access non-Delta Lake tabular data with external tables
Unity Catalog external tables support many formats other than Delta Lake, including Parquet, ORC, CSV, and JSON. External tables store all data files in directories in a cloud object storage location specified by a cloud URI provided during table creation. Other systems access these data files directly from cloud object storage.
Unity Catalog does not govern reads and writes performed directly against cloud object storage from external systems, so you must configure additional policies and credentials in your cloud account to ensure that data governance policies are respected outside Azure Databricks.
Reading and writing to external tables from multiple systems can lead to consistency issues and data corruption because no transactional guarantees are provided for formats other than Delta Lake.
Unity Catalog might not pick up new partitions written to external tables backed by formats other than Delta Lake. Databricks recommends regularly running MSCK REPAIR TABLE table_name
to ensure Unity Catalog has registered all data files written by external systems.
Access non-tabular data with external volumes
Databricks recommends using external volumes to store non-tabular data files that are read or written by external systems in addition to Azure Databricks. See What are Unity Catalog volumes?.
Unity Catalog does not govern reads and writes performed directly against cloud object storage from external systems, so you must configure additional policies and credentials in your cloud account to ensure that data governance policies are respected outside Azure Databricks.
Volumes provides APIs, SDKs, and other tools for getting files from and putting files into volumes. See Manage files in volumes.
Note
Delta Sharing allows you to share volumes to other Azure Databricks accounts, but does not integrate with external systems.