Κοινή χρήση μέσω


dbt task for jobs

Use the dbt task to configure and run dbt projects on Azure Databricks.

Important

When dbt tasks run, Databricks injects the DBT_ACCESS_TOKEN for the principal configured in the Run As field.

Configure a dbt task

Add a dbt task from the Tasks tab in the Jobs UI by doing the following:

  1. In the Type drop-down menu, select dbt.

  2. In the Source drop-down menu, you can select Workspace to use a dbt project located in a Azure Databricks workspace folder or Git provider for a project located in a remote Git repository.

    • If you select Workspace, use the provided file navigator to select the Project directory.

    • If you select Git provider, click Edit to enter Git information for the project repository. See Use Git with jobs.

      If your project is not in the repo’s root directory, use the Project directory field to specify the path to it.

  3. The dbt commands text boxes default to the commands dbt deps, dbt seed, and dbt run. The provided commands run in sequential order. Add, remove, or edit these fields as necessary for your workflow. See What are dbt commands?.

  4. In SQL warehouse, select a SQL warehouse to run the SQL generated by dbt. The SQL warehouse drop-down menu shows only serverless and pro SQL warehouses.

  5. Specify a Warehouse catalog. If unset, the workspace default is used.

  6. Specify a Warehouse schema. By default, the schema default is used.

  7. Choose dbt CLI compute to run dbt Core. Databricks recommends using serverless compute for jobs or classic jobs compute configured with a single-node cluster.

  8. Specify a dbt-databricks version for the task.

    If you use Serverless compute, use the Environment and Libraries field to select, edit, or add a new environment. See Install notebook dependencies.

    For all other compute configurations, the Dependent libraries field populates to dbt-databricks>=1.0.0,<2.0.0 by default. Delete this setting and + Add a PyPi library to pin a version.

    Note

    Databricks recommends pinning your dbt tasks to a specific version of the dbt-databricks package to ensure the same version is used for development and production runs. Databricks recommends version 1.6.0 or greater of the dbt-databricks package.

  9. Click Create task.

What are dbt commands?

The dbt commands field allows you to specify commands to run using the dbt command line interface (CLI). For full details on the dbt CLI, see the dbt documentation.

Check the dbt documentation for commands supported by the specified version of dbt.

Pass options to dbt commands

The dbt node selection syntax lets you specify resources to include or exclude in a particular run. Commands such as run and build accept flags including --select and --exclude. See the dbt syntax overview docs for a complete description.

Additional configuration flags control how dbt runs your project. See the Command line options column in the official dbt docs for a list of available flags.

Some flags take positional arguments. Some arguments for flags are strings. Refer to the dbt documentation for examples and explanations.

Pass variables to dbt commands

Use the --vars flag to pass static or dynamic values to commands in dbt commands fields.

You pass a single-quote delimited JSON to --vars. All keys and values in the JSON must be double-quote delimited, as in the following example:

dbt run --vars '{"volume_path": "/Volumes/path/to/data", "date": "2024/08/16"}'

Examples of parameterized dbt commands

You can reference task values, job parameters, and dynamic job parameters when working with dbt. Values are substituted as plain text into the dbt commands field before the command runs. For information about passing values between tasks or referencing jobs metadata, see Parameterize jobs.

These examples assume the following job parameters have been configured:

Parameter name Parameter value
volume_path /Volumes/path/to/data
table_name my_table
select_clause --select "tag:nightly"
dbt_refresh --full-refresh

The following examples show valid ways to reference these parameters:

dbt run '{"volume_path": "{{job.parameters.volume_path}}"}'
dbt run --select "{{job.parameters.table_name}}"
dbt run {{job.parameters.select_clause}}
dbt run {{job.parameters.dbt_refresh}}
dbt run '{"volume_path": "{{job.parameters.volume_path}}"}' {{job.parameters.dbt_refresh}}

You can also reference dynamic parameters and task values, as in the following examples:

dbt run --vars '{"date": "{{job.start_time.iso_date}}"}'
dbt run --vars '{"sales_count": "{{tasks.sales_task.values.sales_count}}"}'