Use task values to pass information between tasks
Task values refer to the Databricks Utilities taskValues
subutility, which lets you pass arbitrary values between tasks in a Databricks job. See taskValues subutility (dbutils.jobs.taskValues).
You specify a key-value pair using dbutils.jobs.taskValues.set()
in one task and then can use the task name and key to reference the value in subsequent tasks.
Note
Because dbutils.jobs.taskValues.set()
and dbutils.jobs.taskValues.get()
in the dbutils.jobs.taskValues
subutility are Python functions, they can be used only in notebooks with Python selected as the language. However, you can reference task values using dynamic value references for all tasks that support parameters. See Reference task values.
Set task values
Set task values in Python notebooks using the subutility dbutils.jobs.taskValues.set()
.
Task value keys must be strings. Each key must be unique if you have multiple task values defined in a notebook.
You can manually or programmatically assign task values to keys. Only values that can be expressed as valid JSON are permitted. The size of the JSON representation of the value cannot exceed 48 KiB.
For example, the following example sets a static string for the key fave_food
:
dbutils.jobs.taskValues.set(key = "fave_food", value = "beans")
The following example uses a notebook task parameter to query all records for a particular order number and return the current order status and total count of records:
from pyspark.sql.functions import col
order_num = dbutils.widgets.get("order_num")
query = (spark.read.table("orders")
.orderBy(col("updated"), ascending=False)
.select(col("order_status"))
.where(col("order_num") == order_num)
)
dbutils.jobs.taskValues.set(key = "record_count", value = query.count())
dbutils.jobs.taskValues.set(key = "order_status", value = query.take(1)[0][0])
You can pass lists of values using this pattern and then use them to coordinate downstream logic, such as for each tasks. See Run a parameterized Azure Databricks job task in a loop.
The following example extracts the distinct values for product ID to a Python list and sets this as a task value:
prod_list = list(spark.read.table("products").select("prod_id").distinct().toPandas()["prod_id"])
dbutils.jobs.taskValues.set(key = "prod_list", value = prod_list)
Reference task values
Databricks recommends referencing task values as task parameters configured using the dynamic value reference pattern {{tasks.<task_name>.values.<value_name>}}
.
For example, to reference the task value with the key prod_list
from a task named product_inventory
, use the syntax {{tasks.product_inventory.values.prod_list}}
.
See Configure task parameters and What is a dynamic value reference?
Use dbutils.jobs.taskValues.get
The syntax dbutils.jobs.taskValues.get()
requires specifying the upstream task name. This syntax is not recommended, as you can use task values in multiple downstream tasks, meaning numerous updates are necessary if a task name changes.
Using this syntax, you can optionally specify a default
value and a debugValue
. The default value is used if the key cannot be found. The debugValue
allows you to set a static value to use during manual code development and testing in notebooks before you schedule the notebook as a task.
The following example gets the value for the key order_status
set in a task name order_lookup
. The value Delivered
is returned only when running the notebook interactively.
order_status = dbutils.jobs.taskValues.get(taskKey = "order_lookup", key = "order_status", debugValue = "Delivered")
Note
Databricks does not recommend setting default values, as they can be challenging to troubleshoot and prevent expected error messages due to missing keys or incorrectly named tasks.
View task values
The returned value of a task value for each run is displayed in the Output panel of the Task run details. See View task run history.