Use a secret in a Spark configuration property or environment variable
This article provides details about how to reference a secret in a Spark configuration property or environment variable. Retrieved secrets are redacted from notebook output and Spark driver and executor logs.
Important
This feature is in Public Preview.
Security considerations
Databricks does not recommend storing secrets in cluster environment variables if they must not be available to all users on the cluster. Keep the following security implications in mind when referencing secrets in a Spark configuration property or environment variable:
Any user with CAN ATTACH TO permissions on a cluster or Run permissions on a notebook can read cluster environment variables from within the notebook.
If table access control is not enabled on a cluster, any user with CAN ATTACH TO permissions on a cluster or Run permissions on a notebook can read Spark configuration properties from within the notebook. This includes users who do not have direct permission to read a secret.
Secrets are not redacted from the Spark driver log
stdout
andstderr
streams. To protect sensitive data, by default, Spark driver logs are viewable only by users with CAN MANAGE permission on job, single user access mode, and shared access mode clusters.On No Isolation Shared access mode clusters, the Spark driver logs can be viewed by users with CAN ATTACH TO or CAN MANAGE permission. To limit who can read the logs to only users with the CAN MANAGE permission, set
spark.databricks.acl.needAdminPermissionToViewLogs
totrue
.
Requirements
The following requirements apply to referencing secrets in Spark configuration properties and environment variables:
- Cluster owners must have CAN READ permission on the secret scope.
- You must be a cluster owner to add or edit a secret in a Spark configuration property or environment variable.
- If a secret is updated, you must restart your cluster to fetch the secret again.
- You must have the CAN MANAGE permission on the cluster to delete a secret Spark configuration property or environment variable.
Reference a secret with a Spark configuration property
You specify a reference to a secret in a Spark configuration property in the following format:
spark.<property-name> {{secrets/<scope-name>/<secret-name>}}
Replace:
<scope-name>
with the name of the secret scope.<secret-name>
with the unique name of the secret in the scope.<property-name>
with the Spark configuration property
Each Spark configuration property can only reference one secret, but you can configure multiple Spark properties to reference secrets.
For example:
spark.password {{secrets/scope1/key1}}
To fetch the secret in the notebook and use it:
Python
spark.conf.get("spark.password")
SQL
SELECT ${spark.password};
Reference a secret in an environment variable
You specify a secret path in an environment variable in the following format:
<variable-name>={{secrets/<scope-name>/<secret-name>}}
You can use any valid variable name when you reference a secret. Access to secrets referenced in environment variables is determined by the permissions of the user who configured the cluster. Although secrets stored in environment variables are accessible to all cluster users, they are redacted from plaintext display, similar to other secret references.
Environment variables that reference secrets are accessible from a cluster-scoped init script. See Set and use environment variables with init scripts.
For example:
You set an environment variable to reference a secret:
SPARKPASSWORD={{secrets/scope1/key1}}
To fetch the secret in an init script, access $SPARKPASSWORD
using the following pattern:
if [ -n "$SPARKPASSWORD" ]; then
# code to use ${SPARKPASSWORD}
fi