Share data using the Delta Sharing Databricks-to-Databricks protocol (for providers)
This article gives an overview of how to use Databricks-to-Databricks Delta Sharing to share data securely with any Databricks user, regardless of account or cloud host, as long as that user has access to a workspace enabled for Unity Catalog.
Note
If you are a data recipient (a user or group of users with whom Databricks data is being shared), see Access data shared with you using Delta Sharing (for recipients).
Who should use Databricks-to-Databricks Delta Sharing?
There are three ways to share data using Delta Sharing.
The Databricks-to-Databricks sharing protocol, covered in this article, lets you share data from your Unity Catalog-enabled workspace with users who also have access to a Unity Catalog-enabled Databricks workspace.
This approach uses the Delta Sharing server that is built into Azure Databricks and provides support for notebook sharing, Unity Catalog data governance, auditing, and usage tracking for both providers and recipients. The integration with Unity Catalog simplifies setup and governance for both providers and recipients and improves performance.
The Databricks open sharing protocol lets you share data that you manage in a Unity Catalog-enabled Databricks workspace with users on any computing platform.
See Share data using the Delta Sharing open sharing protocol (for providers).
A customer-managed implementation of the open-source Delta Sharing server lets you share from any platform to any platform, whether Databricks or not.
For an introduction to Delta Sharing and more information about these three approaches, see What is Delta Sharing?.
Databricks-to-Databricks Delta Sharing workflow
This section provides a high-level overview of the Databricks-to-Databricks sharing workflow, with links to detailed documentation for each step.
In the Databricks-to-Databricks Delta Sharing model:
A data recipient gives a data provider the unique sharing identifier for the Databricks Unity Catalog metastore that is attached to the Databricks workspace that the recipient (which represents a user or group of users) will use to access the data that the data provider is sharing.
For details, see Step 1: Request the recipient’s sharing identifier.
The data provider creates a share in the provider’s Unity Catalog metastore. This named object contains a collection of tables, views, volumes, and notebooks registered in the metastore.
For details, see Create and manage shares for Delta Sharing.
The data provider creates a recipient object in the provider’s Unity Catalog metastore. This named object represents the user or group of users who will access the data included in the share, along with the sharing identifier of the Unity Catalog metastore that is attached to the workspace that the user or group of users will use to access the share. The sharing identifier is the key identifier that enables the secure connection.
For details, see Step 2: Create the recipient.
The data provider grants the recipient access to the share.
For details, see Manage access to Delta Sharing data shares (for providers).
The share becomes available in the recipient’s Databricks workspace, and users can access it using Catalog Explorer, the Databricks CLI, or SQL commands in an Azure Databricks notebook or the Databricks SQL query editor.
To access the tables, views, volumes, and notebooks in a share, a metastore admin or privileged user must create a catalog from the share. Then that user or another user who is granted the appropriate privilege can give other users access to the catalog and objects in the catalog. Granting permissions on shared catalogs and data assets works just like it does with any other assets registered in Unity Catalog, with the important distinction being that users can be granted only read access on objects in catalogs that are created from Delta Sharing shares.
Shared notebooks live at the catalog level, and any user with the
USE CATALOG
privilege on the catalog can access them.For details, see Read data shared using Databricks-to-Databricks Delta Sharing (for recipients).
Improve table read performance with history sharing
Important
This feature is in Public Preview.
Databricks-to-Databricks table shares can improve performance by enabling history sharing. Sharing history improves performance by leveraging temporary security credentials from your cloud storage, scoped-down to the root directory of the provider’s shared Delta table, resulting in performance that is comparable to direct access to source tables.
- For new table shares, specify
WITH HISTORY
when creating the table share. See Add tables to a share. - For existing table shares, you must alter the share to share table history. See Update shares.
Note
Tables with partitioning enabled do not receive the performance benefits of history sharing. See Specify table partitions to share
History sharing data privacy
Providers should be aware that Databricks-to-Databricks history sharing grants Delta Sharing recipients temporary read access to both the data files and the Delta log. The Delta log contains the commit history for each table version, information about the committer (similar to GitHub commit history), and deleted data that has not been vacuumed.