Allowlist libraries and init scripts on shared compute
In Databricks Runtime 13.3 LTS and above, you can add libraries and init scripts to the allowlist
in Unity Catalog. This allows users to leverage these artifacts on compute configured with shared access mode.
You can allowlist a directory or filepath before that directory or file exists. See Upload files to a Unity Catalog volume.
Note
You must be a metastore admin or have the MANAGE ALLOWLIST
privilege to modify the allowlist. See MANAGE ALLOWLIST.
Important
Libraries used as JDBC drivers or custom Spark data sources on Unity Catalog-enabled shared compute require ANY FILE
permissions.
Some installed libraries store data of all users in one common temp directory. These libraries might compromise user isolation.
How to add items to the allowlist
You can add items to the allowlist
with Catalog Explorer or the REST API.
To open the dialog for adding items to the allowlist in Catalog Explorer, do the following:
- In your Azure Databricks workspace, click Catalog.
- Click to open the metastore details and permissions UI.
- Select Allowed JARs/Init Scripts.
- Click Add.
Important
This option only displays for sufficiently privileged users. If you cannot access the allowlist UI, contact your metastore admin for assistance in allowlisting libraries and init scripts.
Add an init script to the allowlist
Complete the following steps in the allowlist dialog to add an init script to the allowlist:
- For Type, select Init Script.
- For Source Type, select Volume or the object storage protocol.
- Specify the source path to add to the allowlist. See How are permissions on paths enforced in the allowlist?.
Add a JAR to the allowlist
Complete the following steps in the allowlist dialog to add a JAR to the allowlist:
- For Type, select JAR.
- For Source Type, select Volume or the object storage protocol.
- Specify the source path to add to the allowlist. See How are permissions on paths enforced in the allowlist?.
Add Maven coordinates to the allowlist
Complete the following steps in the allowlist dialog to add Maven coordinates to the allowlist:
- For Type, select Maven.
- For Source Type, select Coordinates.
- Enter coordinates in the following format:
groudId:artifactId:version
.- You can include all versions of a library by allowlisting the following format:
groudId:artifactId
. - You can include all artifacts in a group by allowlisting the following format:
groupId
.
- You can include all versions of a library by allowlisting the following format:
How are permissions on paths enforced in the allowlist?
You can use the allowlist to grant access to JARs or init scripts stored in Unity Catalog volumes and object storage. If you add a path for a directory rather than a file, allowlist permissions propagate to contained files and directories.
Prefix matching is used for all artifacts stored in Unity Catalog volumes or object storage. To prevent prefix matching at a given directory level, include a trailing slash (/
). For example, /Volumes/prod-libraries/
will not perform prefix matching for files prefixed with prod-libraries
. Instead, all files and directories within /Volumes/prod-libraries/
are added to the allowlist.
You can define permissions at the following levels:
- The base path for the volume or storage container.
- A directory nested at any depth from the base path.
- A single file.
Adding a path to the allowlist only means that the path can be used for either init scripts or JAR installation. Azure Databricks still checks for permissions to access data in the specified location.
The principal used must have READ VOLUME
permissions on the specified volume. See SELECT.
In single user access mode, the identity of the assigned principal (a user or service principal) is used.
In shared access mode:
- Libraries use the identity of the library installer.
- Init scripts use the identity of the cluster owner.
Note
No-isolation shared access mode does not support volumes, but uses the same identity assignment as shared access mode.
Databricks recommends configuring all object storage privileges related to init scripts and libraries with read-only permissions. Users with write permissions on these locations can potentially modify code in library files or init scripts.
Databricks recommends using Microsoft Entra ID service principals to manage access to JARs or init scripts stored in Azure Data Lake Storage Gen2. Use the following linked documentation to complete this setup:
Create a service principal with read and list permissions on your desired blobs. See Access storage using a service principal & Microsoft Entra ID(Azure Active Directory).
Save your credentials using secrets. See Manage secrets.
Set the properties in the Spark configuration and environmental variables while creating a cluster, as in the following example:
Spark config:
spark.hadoop.fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net OAuth spark.hadoop.fs.azure.account.oauth.provider.type.<storage-account>.dfs.core.windows.net org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider spark.hadoop.fs.azure.account.oauth2.client.id.<storage-account>.dfs.core.windows.net <application-id> spark.hadoop.fs.azure.account.oauth2.client.secret.<storage-account>.dfs.core.windows.net {{secrets/<secret-scope>/<service-credential-key>}} spark.hadoop.fs.azure.account.oauth2.client.endpoint.<storage-account>.dfs.core.windows.net https://login.microsoftonline.com/<tenant-id>/oauth2/token
Environmental variables:
SERVICE_CREDENTIAL={{secrets/<secret-scope>/<service-credential-key>}}
(Optional) Refactor init scripts using azcopy or the Azure CLI.
You can reference environmental variables set during cluster configuration within your init scripts to pass credentials stored as secrets for validation.
Note
Allowlist permissions for JARs and init scripts are managed separately. If you use the same location to store both types of objects, you must add the location to the allowlist for each.