แชร์ผ่าน


Use task values to pass information between tasks

Task values refer to the Databricks Utilities taskValues subutility, which lets you pass arbitrary values between tasks in a Databricks job. See taskValues subutility (dbutils.jobs.taskValues).

You specify a key-value pair using dbutils.jobs.taskValues.set() in one task and then can use the task name and key to reference the value in subsequent tasks.

Note

Because dbutils.jobs.taskValues.set() and dbutils.jobs.taskValues.get() in the dbutils.jobs.taskValues subutility are Python functions, they can be used only in notebooks with Python selected as the language. However, you can reference task values using dynamic value references for all tasks that support parameters. See Reference task values.

Set task values

Set task values in Python notebooks using the subutility dbutils.jobs.taskValues.set().

Task value keys must be strings. Each key must be unique if you have multiple task values defined in a notebook.

You can manually or programmatically assign task values to keys. Only values that can be expressed as valid JSON are permitted. The size of the JSON representation of the value cannot exceed 48 KiB.

For example, the following example sets a static string for the key fave_food:

dbutils.jobs.taskValues.set(key = "fave_food", value = "beans")

The following example uses a notebook task parameter to query all records for a particular order number and return the current order status and total count of records:

from pyspark.sql.functions import col

order_num = dbutils.widgets.get("order_num")

query = (spark.read.table("orders")
  .orderBy(col("updated"), ascending=False)
  .select(col("order_status"))
  .where(col("order_num") == order_num)
)

dbutils.jobs.taskValues.set(key = "record_count", value = query.count())
dbutils.jobs.taskValues.set(key = "order_status", value = query.take(1)[0][0])

You can pass lists of values using this pattern and then use them to coordinate downstream logic, such as for each tasks. See Run a parameterized Azure Databricks job task in a loop.

The following example extracts the distinct values for product ID to a Python list and sets this as a task value:

prod_list = list(spark.read.table("products").select("prod_id").distinct().toPandas()["prod_id"])

dbutils.jobs.taskValues.set(key = "prod_list", value = prod_list)

Reference task values

Databricks recommends referencing task values as task parameters configured using the dynamic value reference pattern {{tasks.<task_name>.values.<value_name>}}.

For example, to reference the task value with the key prod_list from a task named product_inventory, use the syntax {{tasks.product_inventory.values.prod_list}}.

See Configure task parameters and What is a dynamic value reference?

Use dbutils.jobs.taskValues.get

The syntax dbutils.jobs.taskValues.get() requires specifying the upstream task name. This syntax is not recommended, as you can use task values in multiple downstream tasks, meaning numerous updates are necessary if a task name changes.

Using this syntax, you can optionally specify a default value and a debugValue. The default value is used if the key cannot be found. The debugValue allows you to set a static value to use during manual code development and testing in notebooks before you schedule the notebook as a task.

The following example gets the value for the key order_status set in a task name order_lookup. The value Delivered is returned only when running the notebook interactively.

order_status = dbutils.jobs.taskValues.get(taskKey = "order_lookup", key = "order_status", debugValue = "Delivered")

Note

Databricks does not recommend setting default values, as they can be challenging to troubleshoot and prevent expected error messages due to missing keys or incorrectly named tasks.

View task values

The returned value of a task value for each run is displayed in the Output panel of the Task run details. See View task run history.