Create a storage credential for connecting to Azure Data Lake Storage Gen2
This article describes how to create a storage credential in Unity Catalog to connect to Azure Data Lake Storage Gen2.
To manage access to the underlying cloud storage that holds tables and volumes, Unity Catalog uses the following object types:
- Storage credentials encapsulate a long-term cloud credential that provides access to cloud storage.
- External locations contain a reference to a storage credential and a cloud storage path.
For more information, see Manage access to cloud storage using Unity Catalog.
Note
If you want to use Unity Catalog to govern access to an external service rather than cloud storage, see Manage access to external cloud services using service credentials.
Unity Catalog supports three cloud storage options for Azure Databricks: Azure Data Lake Storage Gen2 containers, Cloudflare R2 buckets, and DBFS Root. Cloudflare R2 is intended primarily for Delta Sharing use cases in which you want to avoid data egress fees. Azure Data Lake Storage Gen2 is appropriate for most other use cases. This article focuses on creating storage credentials for Azure Data Lake Storage Gen2 containers. For Cloudflare R2, see Create a storage credential for connecting to Cloudflare R2.
DBFS Root is used to govern access to your your DBFS root. Although Databricks recommends against storing data in DBFS root storage, your workspace might do so because of legacy practices. For DBFS Root, see Create an external location for data in DBFS root.
To create a storage credential for access to an Azure Data Lake Storage Gen2 container, you create an Azure Databricks access connector that references an Azure managed identity, assigning it permissions on the storage container. You then reference that access connector in the storage credential definition.
Requirements
In Azure Databricks:
- Azure Databricks workspace enabled for Unity Catalog.
CREATE STORAGE CREDENTIAL
privilege on the Unity Catalog metastore attached to the workspace. Account admins and metastore admins have this privilege by default.
In your Azure tenant:
An Azure Data Lake Storage Gen2 storage container. To avoid egress charges, this should be in the same region as the workspace you want to access the data from.
The Azure Data Lake Storage Gen2 storage account must have a hierarchical namespace.
Contributor or Owner of an Azure resource group.
Owner or a user with the User Access Administrator Azure RBAC role on the storage account.
Create a storage credential using a managed identity
You can use either an Azure managed identity or a service principal as the identity that authorizes access to your storage container. Managed identities are strongly recommended. They have the benefit of allowing Unity Catalog to access storage accounts protected by network rules, which isn’t possible using service principals, and they remove the need to manage and rotate secrets. If you want to use a service principal, see Create Unity Catalog managed storage using a service principal (legacy).
In the Azure portal, create an Azure Databricks access connector and assign it permissions on the storage container that you would like to access, using the instructions in Configure a managed identity for Unity Catalog.
An Azure Databricks access connector is a first-party Azure resource that lets you connect managed identities to an Azure Databricks account. You must have the Contributor role or higher on the access connector resource in Azure to add the storage credential.
Make a note of the access connector’s resource ID.
Log in to your Unity Catalog-enabled Azure Databricks workspace as a user who has the
CREATE STORAGE CREDENTIAL
privilege.The metastore admin and account admin roles both include this privilege.
Click Catalog.
On the Quick access page, click the External data > button, go to the Credentials tab, and select Create credential.
Select Storage credential.
Select a Credential Type of Azure Managed Identity.
Enter a name for the credential, and enter the access connector’s resource ID in the format:
/subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.Databricks/accessConnectors/<connector-name>
(Optional) If you created the access connector using a user-assigned managed identity, enter the resource ID of the managed identity in the User-assigned managed identity ID field, in the format:
/subscriptions/<subscription-id>/resourceGroups/<resource-group-name>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<managed-identity-name>
(Optional) If you want users to have read-only access to the external locations that use this storage credential, select Read only. For more information, see Mark a storage credential as read-only.
Click Create.
(Optional) Bind the storage credential to specific workspaces.
By default, any privileged user can use the storage credential on any workspace attached to the metastore. If you want to allow access only from specific workspaces, go to the Workspaces tab and assign workspaces. See (Optional) Assign a storage credential to specific workspaces.
Create an external location that references this storage credential.
(Optional) Assign a storage credential to specific workspaces
Important
This feature is in Public Preview.
By default, a storage credential is accessible from all of the workspaces in the metastore. This means that if a user has been granted a privilege (such as CREATE EXTERNAL LOCATION
) on that storage credential, they can exercise that privilege from any workspace attached to the metastore. If you use workspaces to isolate user data access, you may want to allow access to a storage credential only from specific workspaces. This feature is known as workspace binding or storage credential isolation.
A typical use case for binding a storage credential to specific workspaces is the scenario in which a cloud admin configures a storage credential using a production cloud account credential, and you want to ensure that Azure Databricks users use this credential to create external locations only in the production workspace.
For more information about workspace binding, see (Optional) Assign an external location to specific workspaces and Limit catalog access to specific workspaces.
Note
Workspace bindings are referenced when privileges against storage credentials are exercised. For example, if a user creates an external location using a storage credential, the workspace binding on the storage credential is checked only when the external location is created. After the external location is created, it will function independently of the workspace bindings configured on the storage credential.
Bind a storage credential to one or more workspaces
To assign a storage credential to specific workspaces, you can use Catalog Explorer or the Databricks CLI.
Permissions required: Metastore admin, storage credential owner, or MANAGE
on the storage credential.
Note
Metastore admins can see all storage credentials in a metastore using Catalog Explorer—and storage credential owners can see all storage credentials that they own in a metastore—regardless of whether the storage credential is assigned to the current workspace. Storage credentials that are not assigned to the workspace appear grayed out.
Catalog Explorer
Log in to a workspace that is linked to the metastore.
In the sidebar, click Catalog.
On the Quick access page, click the External data > button and go to the Credentials tab.
Select the storage credential and go to the Workspaces tab.
On the Workspaces tab, clear the All workspaces have access checkbox.
If your storage credential is already bound to one or more workspaces, this checkbox is already cleared.
Click Assign to workspaces and enter or find the workspaces you want to assign.
To revoke access, go to the Workspaces tab, select the workspace, and click Revoke. To allow access from all workspaces, select the All workspaces have access checkbox.
CLI
There are two Databricks CLI command groups and two steps required to assign a storage credential to a workspace.
In the following examples, replace <profile-name>
with the name of your Azure Databricks authentication configuration profile. It should include the value of a personal access token, in addition to the workspace instance name and workspace ID of the workspace where you generated the personal access token. See Azure Databricks personal access token authentication.
Use the
storage-credentials
command group’supdate
command to set the storage credential’sisolation mode
toISOLATED
:databricks storage-credentials update <my-storage-credential> \ --isolation-mode ISOLATED \ --profile <profile-name>
The default
isolation-mode
isOPEN
to all workspaces attached to the metastore.Use the
workspace-bindings
command group’supdate-bindings
command to assign the workspaces to the storage credential:databricks workspace-bindings update-bindings storage-credential <my-storage-credential> \ --json '{ "add": [{"workspace_id": <workspace-id>}...], "remove": [{"workspace_id": <workspace-id>}...] }' --profile <profile-name>
Use the
"add"
and"remove"
properties to add or remove workspace bindings.Note
Read-only binding (
BINDING_TYPE_READ_ONLY
) is not available for storage credentials. Therefore there is no reason to setbinding_type
for the storage credentials binding.
To list all workspace assignments for a storage credential, use the workspace-bindings
command group’s get-bindings
command:
databricks workspace-bindings get-bindings storage-credential <my-storage-credential> \
--profile <profile-name>
Unbind a storage credential from a workspace
Instructions for revoking workspace access to a storage credential using Catalog Explorer or the workspace-bindings
CLI command group are included in Bind a storage credential to one or more workspaces.
Next steps
You can view, update, delete, and grant other users permission to use storage credentials. See Manage storage credentials.
You can define external locations using storage credentials. See Create an external location to connect cloud storage to Azure Databricks.