Substitutions and variables in Databricks Asset Bundles
Databricks Asset Bundles supports substitutions and custom variables, which make your bundle configuration files more modular and reusable. Both substitutions and custom variables enable dynamic retrieval of values so that settings can be determined at the time a bundle is deployed and run.
Tip
You can also use dynamic value references for job parameter values to pass context about a job run to job tasks. See What is a dynamic value reference? and Parameterize jobs.
Substitutions
You can use substitutions to retrieve values of settings that change based on the context of the bundle deployment and run.
For example, when you run the bundle validate --output json
command, you might see a graph like this:
{
"bundle": {
"name": "hello-bundle",
"target": "dev",
"...": "..."
},
"workspace": {
"...": "...",
"current_user": {
"...": "...",
"userName": "someone@example.com",
"...": "...",
},
"...": "..."
},
"...": {
"...": "..."
}
}
Subsitutions can be used to refer to the values of the bundle name
, bundle target
, and workspace userName
fields to construct the workspace root_path
in the bundle configuration file:
bundle:
name: hello-bundle
workspace:
root_path: /Users/${workspace.current_user.userName}/.bundle/${bundle.name}/my-envs/${bundle.target}
# ...
targets:
dev:
default: true
You can also create substitutions for named resources. For example, for the pipeline configured with the name my_pipeline
, ${resources.pipelines.my_pipeline.target}
is the substitution for the value of the target of my_pipeline
.
To determine valid substitutions, you can use the schema hierarchy documented in the REST API reference or the output of the bundle schema
command.
Here are some commonly used substitutions:
${bundle.name}
${bundle.target} # Use this substitution instead of ${bundle.environment}
${workspace.host}
${workspace.current_user.short_name}
${workspace.current_user.userName}
${workspace.file_path}
${workspace.root_path}
${resources.jobs.<job-name>.id}
${resources.models.<model-name>.name}
${resources.pipelines.<pipeline-name>.name}
Custom variables
You can define both simple and complex custom variables in your bundle to enable dynamic retrieval of values needed for many scenarios. Custom variables are declared in your bundle configuration files within the variables
mapping. See variables.
The following example configuration defines the variables my_cluster_id
and my_notebook_path
:
variables:
my_cluster_id:
description: The ID of an existing cluster.
default: 1234-567890-abcde123
my_notebook_path:
description: The path to an existing notebook.
default: ./hello.py
If you do not provide a default
value for a variable as part of this declaration, you must set it when executing bundle commands, through an environment variable, or elsewhere within your bundle configuration files as described in Set a variable’s value.
To reference a custom variable within your bundle configuration, use the variable substitution ${var.<variable_name>}
. For example, to reference the variables my_cluster_id
and my_notebook_path
:
resources:
jobs:
hello-job:
name: hello-job
tasks:
- task_key: hello-task
existing_cluster_id: ${var.my_cluster_id}
notebook_task:
notebook_path: ${var.my_notebook_path}
Set a variable’s value
If you have not provided a default
value for a variable, or if you want to temporarily override the default
value for a variable, provide the variable’s new temporary value using one of the following approaches:
Provide the variable’s value as part of a
bundle
command such asvalidate
,deploy
, orrun
. To do this, use the option--var="<key>=<value>"
, where<key>
is the variable’s name, and<value>
is the variable’s value. For example, as part of thebundle validate
command, to provide the value of1234-567890-abcde123
to the variable namedmy_cluster_id
, and to provide the value of./hello.py
to the variable namedmy_notebook_path
, run:databricks bundle validate --var="my_cluster_id=1234-567890-abcde123,my_notebook_path=./hello.py" # Or: databricks bundle validate --var="my_cluster_id=1234-567890-abcde123" --var="my_notebook_path=./hello.py"
Provide the variable’s value by setting an environment variable. The environment variable’s name must start with
BUNDLE_VAR_
. To set environment variables, see your operating system’s documentation. For example, to provide the value of1234-567890-abcde123
to the variable namedmy_cluster_id
, and to provide the value of./hello.py
to the variable namedmy_notebook_path
, run the following command before you call abundle
command such asvalidate
,deploy
, orrun
:For Linux and macOS:
export BUNDLE_VAR_my_cluster_id=1234-567890-abcde123 && export BUNDLE_VAR_my_notebook_path=./hello.py
For Windows:
"set BUNDLE_VAR_my_cluster_id=1234-567890-abcde123" && "set BUNDLE_VAR_my_notebook_path=./hello.py"
Or, provide the variable’s value as part of a
bundle
command such asvalidate
,deploy
, orrun
, for example for Linux and macOS:BUNDLE_VAR_my_cluster_id=1234-567890-abcde123 BUNDLE_VAR_my_notebook_path=./hello.py databricks bundle validate
Or for Windows:
"set BUNDLE_VAR_my_cluster_id=1234-567890-abcde123" && "set BUNDLE_VAR_my_notebook_path=./hello.py" && "databricks bundle validate"
Provide the variable’s value within your bundle configuration files. To do this, use a
variables
mapping within thetargets
mapping, following this format:variables: <variable-name>: <value>
For example, to provide values for the variables named
my_cluster_id
andmy_notebook_path
for two separate targets:targets: dev: variables: my_cluster_id: 1234-567890-abcde123 my_notebook_path: ./hello.py prod: variables: my_cluster_id: 2345-678901-bcdef234 my_notebook_path: ./hello.py
Note
Whichever approach you choose to provide variable values, use the same approach during both the deployment and run stages. Otherwise, you might get unexpected results between the time of a deployment and a job or pipeline run that is based on that existing deployment.
In the preceding examples, the Databricks CLI looks for values for the variables my_cluster_id
and my_notebook_path
in the following order, stopping when it finds a value for each matching variable, skipping any other locations for that variable:
- Within any
--var
options specified as part of thebundle
command. - Within any environment variables set that begin with
BUNDLE_VAR_
. - Within any
variables
mappings, among thetargets
mappings within your bundle configuration files. - Any
default
value for that variable’s definition, among the top-levelvariables
mappings within your bundle configuration files.
Define a complex variable
A custom variable is assumed to be of type string unless you define it as a complex variable. To define a custom variable with a complex type for your bundle, set type
to complex
in your bundle configuration.
Note
The only valid value for the type
setting is complex
. In addition, bundle validation fails if type
is set to complex
and the default
defined for the variable is a single value.
In the following example, cluster settings are defined within a custom complex variable named my_cluster
:
variables:
my_cluster:
description: "My cluster definition"
type: complex
default:
spark_version: "13.2.x-scala2.11"
node_type_id: "Standard_DS3_v2"
num_workers: 2
spark_conf:
spark.speculation: true
spark.databricks.delta.retentionDurationCheck.enabled: false
resources:
jobs:
my_job:
job_clusters:
- job_cluster_key: my_cluster_key
new_cluster: ${var.my_cluster}
tasks:
- task_key: hello_task
job_cluster_key: my_cluster_key
Retrieve an object’s ID value
For the alert
, cluster_policy
, cluster
, dashboard
, instance_pool
, job
, metastore
, notification_destination
, pipeline
, query
, service_principal
, and warehouse
object types, you can define a lookup
for your custom variable to retrieve a named object’s ID using this format:
variables:
<variable-name>:
lookup:
<object-type>: "<object-name>"
If a lookup is defined for a variable, the ID of the object with the specified name is used as the value of the variable. This ensures the correct resolved ID of the object is always used for the variable.
Note
An error occurs if an object with the specified name does not exist, or if there is more than one object with the specified name.
For example, in the following configuration, ${var.my_cluster_id}
will be replaced by the ID of the 12.2 shared cluster.
variables:
my_cluster_id:
description: An existing cluster
lookup:
cluster: "12.2 shared"
resources:
jobs:
my_job:
name: "My Job"
tasks:
- task_key: TestTask
existing_cluster_id: ${var.my_cluster_id}