Del via


Use a For each task to run another task in a loop

This article discusses using the For each task with your Azure Databricks jobs, including details on adding and configuring the task in the Jobs UI. Use the For each task to run a nested task in a loop, passing a different set of parameters to each iteration of the task.

Adding the For each task to a job requires defining two tasks: The For each task and a nested task. The nested task is the task to run for each iteration of the For each task and is one of the standard Azure Databricks Jobs task types. You cannot add another For each task as the nested task.

For example, you could use the For each task to perform a common set of transformations on multiple tables, passing a table name from a list of table names to each iteration of the task.

Nested tasks that do not have dependencies on each other can be run concurrently.

Add the For each task to a job

You can add a For each task when you create a job or edit a task in an existing job. To configure a For each task:

  1. In the Type drop-down menu, select For each.

  2. Enter a name for the task in the Task name field.

  3. In the Inputs text box, define the values for the For each task to iterate on as a JSON formatted array of values. To learn more about passing parameters to the nested task, see What parameter types can I use with the For each task?.

  4. To optionally set the number of iterations that can run in parallel, enter a Concurrency value for the task. The default value is 1.

  5. To optionally receive notifications for task start, success, or failure, click + Add. See Add notifications on a job.

  6. To complete the configuration of the For each task and add a nested task to run for each iteration, click Add a task to loop over.

  7. Select a task type and configuration options for the nested task. Nested tasks are standard task types and have the same configuration options. See Configure and edit Databricks tasks.

  8. To reference parameters passed from the For each task, click Parameters. Use the {{input}} reference to set the value to the array value of each iteration or {{input.<key>}} to reference individual object fields when you iterate over a list of objects.

    Add a nested task to a For each task

  9. Click Create task.

Switch between the For each task and the nested task

The For each task appears in the Jobs UI as a node with the nested task node inside the For each node. To switch between the For each task and the nested task, click the respective nodes.

Jobs UI DAG view switch to For each task

Jobs UI DAG view switch to nested task

What parameter types can I use with the For each task?

The For each task passes parameters to each iteration of the nested task. The input is an array of objects, and each object is passed to an iteration of the nested task. There are multiple ways to create the inputs that the task uses: JSON formatted arrays, task values, or job parameters.

Note

Parameters are limited to 10,000 characters, or 48 KB if you use task value references. If your parameters require more than 48 KB, you can pass a lookup to a larger config file. See Use a lookup table for large parameter arrays.

A JSON formatted array of values

When you create or edit a task, you can directly define an array of values for the nested task, using the Inputs text box. This can be an array of the following data types:

  • Key-value pairs
  • Strings, numbers, or Boolean types
  • Arbitrarily complex JSON objects

The Inputs text box is limited to 10,000 characters.

Task value references

You can pass task values from a preceding task. To reference passed task values, use the {{tasks.<task_name>.values.<task_value_name>}} syntax to set the value in the Inputs text box. For example, if a task named generate_countries_list that precedes the For each task sets the following task value:

dbutils.jobs.taskValues.set(key = "countries", value = countries_array)

Then the For each task references the task value in the Inputs text box using the following syntax:

{{tasks.generate_countries_list.values.countries}}.

Task values are limited to 48 KB. To learn more about task values, see Use task values to pass information between tasks.

Job parameters

You can also use job parameters as input. To reference a job parameter, use the following syntax in the Inputs text box: {{job.parameters.<name>}}. For example, {{job.parameters.countries}}.

Job parameters are limited to 10,000 characters. To learn more about job parameters, see Configure job parameters.

Reference a For each task in downstream tasks

The For each task is the top-level task, and downstream tasks can specify it as a dependency. Downstream tasks cannot depend on or reference the nested task.

Run and monitor a job with a For each task

Running a job with a For each task is identical to running any other job.

Viewing and managing job runs is also identical to any other job, except the task run history for a For each task, which is presented as a table of task iterations. See View task run history for a For each task.