Create clean rooms
Important
This feature is in Public Preview.
This article describes how to create a clean room, a secure and privacy-protecting environment where multiple parties can work together on sensitive enterprise data without direct access to each other’s data.
Before you begin
The privileges needed to use clean rooms vary depending on the task:
To create a clean room, you must have the
CREATE CLEAN ROOM
privilege or be a metastore admin. The creator is automatically assigned as the owner of the clean room in their Unity Catalog metastore.To initiate participation in a clean room that is shared with you, you must be a metastore admin.
When a clean room is shared, the collaborator organization’s metastore admin is automatically assigned ownership of the clean room. The metastore admin can reassign ownership to a non-metastore admin. As a best practice for data governance, Databricks recommends that ownership be assigned to a group.
If your workspace does not have a metastore admin assigned, you must assign the role. See Assign a metastore admin and Manage Unity Catalog object ownership.
To add and remove data assets and notebooks in a clean room you must be the owner of the clean room or have the
MODIFY CLEAN ROOM
privilege on the clean room. Additionally, you and the owner of the clean room (if you are not the owner) must haveSELECT
on tables and views that you add andREAD VOLUME
on volumes that you add.
To learn about permission requirements for updating clean rooms and running tasks (notebooks) in clean rooms, see Manage clean rooms and Run notebooks in clean rooms.
You can create up to five clean rooms per metastore.
Step 1. Request the collaborator’s sharing identifier
Before you can create a clean room, you must have the Clean Room sharing identifier of the organization that you will be collaborating with. The sharing identifier is a string that consists of the organization’s global metastore ID + workspace ID + the contact’s username (email address). The collaborator can be in any cloud or region.
Reach out to the collaborator to request their sharing identifier.
The collaborator can get the sharing identifier using the instructions in Find your sharing identifier.
Step 2. Create a clean room
To create a clean room, you must use Catalog Explorer.
In your Azure Databricks workspace, click Catalog.
On the Quick access page, click the Clean Rooms > button.
Alternatively, click the gear icon at the top of the Catalog pane and select Clean Rooms.
Click Create Clean Room.
On the Create Clean Room page, enter a user-friendly name for the clean room.
The name cannot use spaces, periods, or forward slashes (/).
You cannot change the clean room name once it’s saved. Use a name that the collaborator will find useful and descriptive.
Select the cloud provider and region where the central clean room will be created.
The cloud provider must be the same as your current workspace, but the region does not. Consider your organization’s data residency or other policies when you make your selection.
(Optional) Add a comment.
Enter the collaborator’s Clean Room sharing identifier.
See Step 1. Request the collaborator’s sharing identifier.
You can test your clean room before full deployment by using either your sharing identifier or the identifier of another user in your current metastore. Doing so creates two clean rooms in your current metastore. For example, if you create a clean room titled
test_clean_room
, a second clean room namedtest_clean_room_collaborator
also appears. Running notebooks with a collaborator in the same metastore functions the same as with an external collaborator. See Run notebooks in clean rooms.Make note of the catalog names assigned to you (the creator) and the collaborator.
All data assets added to the clean room will appear under that catalog in the central clean room, and can be referenced using that catalog in the Unity Catalog three-level namespace (
<catalog>.<schema>.<table-etc>
).Select the network access policy type. This cannot be changed after the clean room is created.
- Full Access: Unrestricted outbound internet access.
- Restricted Access: This limits outbound access to internet destinations that you specify. See Network policy overview and Managing network policies for serverless egress control.
Note
Restricted access can delay asset availability for up to ten minutes and does not support Google Cloud collaborators.
After you create the clean room, you can view the network access policy in the Security tab.
Click Create Clean Room.
Step 3. Add data assets and notebooks to the clean room
Either party in the clean room (the creator and the collaborator) can add tables, volumes, views, and notebooks to the clean room.
Permissions required:
You must be the owner or have the
MODIFY CLEAN ROOM
privilege on the clean room.You and the clean room owner (if you are not the owner) must have
SELECT
on any table or view andREAD VOLUME
on any volume that you add, along withUSE CATALOG
andUSE SCHEMA
on the parent catalog and schema.The clean room owner must keep these privileges throughout the life of the clean room.
Note
The following instructions assume you are returning to an already-created clean room to add assets. If you just created a clean room for the first time, a wizard walks you through adding data assets and notebooks. The actual UI for adding these assets is the same, regardless of whether you are guided by the wizard or not.
To add assets:
In your Azure Databricks workspace, click Catalog.
On the Quick access page, click the Clean Rooms > button.
Alternatively, click the gear icon at the top of the Catalog pane and select Clean Rooms.
Find and click the name of the clean room you want to update.
Click + Add data assets to add tables, volumes, or views.
Select the data assets you want to share and click Add data assets.
When you share a table, volume, or view, you can optionally add an alias. The alias name will be the only name visible in the clean room.
When you share a table, you can optionally add partition clauses that enable you to share only part of the table. For details about how to use partitions to limit what you share, see Specify table partitions to share.
To add notebooks, click the + Add notebooks button and browse for the notebook you want to add.
You can optionally give the notebook an alternative Notebook name.
Notebooks that you share in clean rooms query data and run data analysis workloads on the tables, views, and volumes that you and the other collaborator have added to the clean room.
Notebooks operate on the principle of implicit approval: you cannot run notebooks you create. You create the notebooks that your collaborator uses, and your collaborator creates the notebooks that you use.
If you share a notebook that includes results, those results will be shared with your collaborator.
You can use a notebook to create output tables that are temporarily shared to your collaborator’s metastore when they run the notebook. See Create and work with output tables in Databricks Clean Rooms.
To use a test dataset, download our sample notebook.
Important
Any notebook references to tables, views, or volumes that were added to the clean room must use the catalog name assigned when the clean room was created (“creator” for data assets added by the clean room creator, and “collaborator” for data assets added by the invited collaborator). For example, a table added by the creator could be named
creator.sales.california
.Likewise, verify that the notebook uses any aliases assigned to that were data assets in the clean room.