Add tasks to jobs in Databricks Asset Bundles
This article provides examples of various types of tasks that you can add to Azure Databricks jobs in Databricks Asset Bundles. See What are Databricks Asset Bundles?.
Most job task types have task-specific parameters among their supported settings, but you can also define job parameters that get passed to tasks. Dynamic value references are supported for job parameters, which enable passing values specific to the job run between tasks. See What is a dynamic value reference?.
Note
You can override job task settings. See Override job tasks settings in Databricks Asset Bundles.
Tip
To quickly generate resource configuration for an existing job using the Databricks CLI, you can use the bundle generate job
command. See bundle commands.
Notebook task
You use this task to run a notebook.
The following example adds a notebook task to a job and sets a job parameter named my_job_run_id
. The path for the notebook to deploy is relative to the configuration file in which this task is declared. The task gets the notebook from its deployed location in the Azure Databricks workspace.
resources:
jobs:
my-notebook-job:
name: my-notebook-job
tasks:
- task_key: my-notebook-task
notebook_task:
notebook_path: ./my-notebook.ipynb
parameters:
- name: my_job_run_id
default: "{{job.run_id}}"
For additional mappings that you can set for this task, see tasks > notebook_task
in the create job operation’s request payload as defined in POST /api/2.1/jobs/create in the REST API reference, expressed in YAML format. See Notebook task for jobs.
If/else condition task
The condition_task
enables you to add a task with if/else conditional logic to your job. The task evaluates a condition that can be used to control the execution of other tasks. The condition task does not require a cluster to execute and does not support retries or notifications. For more information about the if/else task, see Add branching logic to a job with the If/else task.
The following example contains a condition task and a notebook task, where the notebook task only executes if the number of job repairs is less than 5.
resources:
jobs:
my-job:
name: my-job
tasks:
- task_key: condition_task
condition_task:
op: LESS_THAN
left: "{{job.repair_count}}"
right: "5"
- task_key: notebook_task
depends_on:
- task_key: condition_task
outcome: "true"
notebook_task:
notebook_path: ../src/notebook.ipynb
For additional mappings that you can set for this task, see tasks > condition_task
in the create job operation’s request payload as defined in POST /api/2.1/jobs/create in the REST API reference, expressed in YAML format.
For each task
The for_each_task
enables you to add a task with a for each loop to your job. The task executes a nested task for every input provided. For more information about the for_each_task
, see Run a parameterized Azure Databricks job task in a loop.
The following example adds a for_each_task
to a job, where it loops over the values of another task and processes them.
resources:
jobs:
my_job:
name: my_job
tasks:
- task_key: generate_countries_list
notebook_task:
notebook_path: ../src/generate_countries_list.ipnyb
- task_key: process_countries
depends_on:
- task_key: generate_countries_list
for_each_task:
inputs: "{{tasks.generate_countries_list.values.countries}}"
task:
task_key: process_countries_iteration
notebook_task:
notebook_path: ../src/process_countries_notebook.ipnyb
For additional mappings that you can set for this task, see tasks > for_each_task
in the create job operation’s request payload as defined in POST /api/2.1/jobs/create in the REST API reference, expressed in YAML format.
Python script task
You use this task to run a Python file.
The following example adds a Python script task to a job. The path for the Python file to deploy is relative to the configuration file in which this task is declared. The task gets the Python file from its deployed location in the Azure Databricks workspace.
resources:
jobs:
my-python-script-job:
name: my-python-script-job
tasks:
- task_key: my-python-script-task
spark_python_task:
python_file: ./my-script.py
For additional mappings that you can set for this task, see tasks > spark_python_task
in the create job operation’s request payload as defined in POST /api/2.1/jobs/create in the REST API reference, expressed in YAML format. See also Python script task for jobs.
Python wheel task
You use this task to run a Python wheel file.
The following example adds a Python wheel task to a job. The path for the Python wheel file to deploy is relative to the configuration file in which this task is declared. See Databricks Asset Bundles library dependencies.
resources:
jobs:
my-python-wheel-job:
name: my-python-wheel-job
tasks:
- task_key: my-python-wheel-task
python_wheel_task:
entry_point: run
package_name: my_package
libraries:
- whl: ./my_package/dist/my_package-*.whl
For additional mappings that you can set for this task, see tasks > python_wheel_task
in the create job operation’s request payload as defined in POST /api/2.1/jobs/create in the REST API reference, expressed in YAML format. See also Develop a Python wheel file using Databricks Asset Bundles and Python Wheel task for jobs.
JAR task
You use this task to run a JAR. You can reference local JAR libraries or those in a workspace, a Unity Catalog volume, or an external cloud storage location. See Databricks Asset Bundles library dependencies.
The following example adds a JAR task to a job. The path for the JAR is to the specified volume location.
resources:
jobs:
my-jar-job:
name: my-jar-job
tasks:
- task_key: my-jar-task
spark_jar_task:
main_class_name: org.example.com.Main
libraries:
- jar: /Volumes/main/default/my-volume/my-project-0.1.0-SNAPSHOT.jar
For additional mappings that you can set for this task, see tasks > spark_jar_task
in the create job operation’s request payload as defined in POST /api/2.1/jobs/create in the REST API reference, expressed in YAML format. See JAR task for jobs.
SQL file task
You use this task to run a SQL file located in a workspace or a remote Git repository.
The following example adds a SQL file task to a job. This SQL file task uses the specified SQL warehouse to run the specified SQL file.
resources:
jobs:
my-sql-file-job:
name: my-sql-file-job
tasks:
- task_key: my-sql-file-task
sql_task:
file:
path: /Users/someone@example.com/hello-world.sql
source: WORKSPACE
warehouse_id: 1a111111a1111aa1
To get a SQL warehouse’s ID, open the SQL warehouse’s settings page, then copy the ID found in parentheses after the name of the warehouse in the Name field on the Overview tab.
For additional mappings that you can set for this task, see tasks > sql_task > file
in the create job operation’s request payload as defined in POST /api/2.1/jobs/create in the REST API reference, expressed in YAML format. See SQL task for jobs.
Delta Live Tables pipeline task
You use this task to run a Delta Live Tables pipeline. See What is Delta Live Tables?.
The following example adds a Delta Live Tables pipeline task to a job. This Delta Live Tables pipeline task runs the specified pipeline.
resources:
jobs:
my-pipeline-job:
name: my-pipeline-job
tasks:
- task_key: my-pipeline-task
pipeline_task:
pipeline_id: 11111111-1111-1111-1111-111111111111
You can get a pipelines’s ID by opening the pipeline in the workspace and copying the Pipeline ID value on the Pipeline details tab of the pipeline’s settings page.
For additional mappings that you can set for this task, see tasks > pipeline_task
in the create job operation’s request payload as defined in POST /api/2.1/jobs/create in the REST API reference, expressed in YAML format. See Delta Live Tables pipeline task for jobs.
dbt task
You use this task to run one or more dbt commands. See Connect to dbt Cloud.
The following example adds a dbt task to a job. This dbt task uses the specified SQL warehouse to run the specified dbt commands.
resources:
jobs:
my-dbt-job:
name: my-dbt-job
tasks:
- task_key: my-dbt-task
dbt_task:
commands:
- "dbt deps"
- "dbt seed"
- "dbt run"
project_directory: /Users/someone@example.com/Testing
warehouse_id: 1a111111a1111aa1
libraries:
- pypi:
package: "dbt-databricks>=1.0.0,<2.0.0"
To get a SQL warehouse’s ID, open the SQL warehouse’s settings page, then copy the ID found in parentheses after the name of the warehouse in the Name field on the Overview tab.
For additional mappings that you can set for this task, see tasks > dbt_task
in the create job operation’s request payload as defined in POST /api/2.1/jobs/create in the REST API reference, expressed in YAML format. See dbt task for jobs.
Databricks Asset Bundles also includes a dbt-sql
project template that defines a job with a dbt task, as well as dbt profiles for deployed dbt jobs. For information about Databricks Asset Bundles templates, see Use a default bundle template.
Run job task
You use this task to run another job.
The following example contains a run job task in the second job that runs the first job.
resources:
jobs:
my-first-job:
name: my-first-job
tasks:
- task_key: my-first-job-task
new_cluster:
spark_version: "13.3.x-scala2.12"
node_type_id: "i3.xlarge"
num_workers: 2
notebook_task:
notebook_path: ./src/test.py
my_second_job:
name: my-second-job
tasks:
- task_key: my-second-job-task
run_job_task:
job_id: ${resources.jobs.my-first-job.id}
This example uses a substitution to retrieve the ID of the job to run. To get a job’s ID from the UI, open the job in the workspace and copy the ID from the Job ID value in the Job details tab of the jobs’s settings page.
For additional mappings that you can set for this task, see tasks > run_job_task
in the create job operation’s request payload as defined in POST /api/2.1/jobs/create in the REST API reference, expressed in YAML format.