Set and use environment variables with init scripts

Init scripts have access to all environment variables present on a cluster. Azure Databricks sets many default variables that can be useful in init script logic.

Environment variables set in the Spark config are available to init scripts. See Environment variables.

What environment variables are exposed to the init script by default?

Cluster-scoped and global init scripts support the following environment variables:

  • DB_CLUSTER_ID: the ID of the cluster on which the script is running. See the Clusters API.
  • DB_CONTAINER_IP: the private IP address of the container in which Spark runs. The init script is run inside this container. See the Clusters API.
  • DB_IS_DRIVER: whether the script is running on a driver node.
  • DB_DRIVER_IP: the IP address of the driver node.
  • DB_INSTANCE_TYPE: the instance type of the host VM.
  • DB_CLUSTER_NAME: the name of the cluster the script is executing on.
  • DB_IS_JOB_CLUSTER: whether the cluster was created to run a job. See Configure compute for jobs.

For example, if you want to run part of a script only on a driver node, you could write a script like:

echo $DB_IS_DRIVER
if [[ $DB_IS_DRIVER = "TRUE" ]]; then
  <run this part only on driver>
else
  <run this part only on workers>
fi
<run this part on both driver and workers>

Use secrets in init scripts

You can use any valid variable name when you reference a secret. Access to secrets referenced in environment variables is determined by the permissions of the user who configured the cluster. Secrets stored in environment variables are accessible by all users of the cluster, but are redacted from plaintext display.

See Reference a secret in an environment variable.