Κοινή χρήση μέσω


Databricks Asset Bundle project templates

This article describes the syntax for Databricks Asset Bundle templates. Bundles enable programmatic management of Azure Databricks workflows. See What are Databricks Asset Bundles?

Bundle templates enable users to create bundles in a consistent, repeatable way, by establishing folder structures, build steps and tasks, tests, and other DevOps infrastructure-as-code (IaC) attributes common across a development environment deployment pipeline.

For example, if you routinely run Databricks jobs that require custom packages with a time-consuming compilation step upon installation, you can speed up your development loop by creating a bundle template that supports custom container environments.

Bundle templates define the directory structure of the bundle that will be created, and they include a databricks.yml.tmpl configuration file template as well as a databricks_template_schema.json file containing user-prompt variables.

Use a default bundle template

To use a Azure Databricks default bundle template to create your bundle, use the Databricks CLI bundle init command, specifying the name of the default template to use. For example, the following command creates a bundle using the default Python bundle template:

databricks bundle init default-python

If you do not specify a default template, the bundle init command presents the set of available templates from which you can choose.

Azure Databricks provides the following default bundle templates:

Template Description
default-python A template for using Python with Databricks. This template creates a bundle with a job and Delta Live Tables pipeline. See default-python.
default-sql A template for using SQL with Databricks. This template contains a configuration file that defines a job that runs SQL queries on a SQL warehouse. See default-sql.
dbt-sql A template which leverages dbt-core for local development and bundles for deployment. This template contains the configuration that defines a job with a dbt task, as well as a configuration file that defines dbt profiles for deployed dbt jobs. See dbt-sql.
mlops-stacks An advanced full stack template for starting new MLOps Stacks projects. See mlops-stacks and Databricks Asset Bundles for MLOps Stacks.

Use a custom bundle template

To use a bundle template other than the Azure Databricks default bundle templates, pass the local path or remote URL of the template to the Databricks CLI bundle init command.

For example, the following command uses the dab-container-template template created in the Custom Bundle Template Tutorial:

databricks bundle init /projects/my-custom-bundle-templates/dab-container-template

Create a custom bundle template

Bundle templates use Go package templating syntax. See the Go package template documentation.

At a minimum, a bundle template project must have:

  • A databricks_template_schema.json file at the project root that defines one user-prompt variable for the bundle project name.
  • A databricks.yml.tmpl file located in a template folder that defines configuration for any bundles created with the template. If your databricks.yml.tmpl file references any additional *.yml.tmpl configuration templates, specify the location of these in the include mapping.

You can optionally add sub-folders and files to the template folder that you want mirrored in bundles created by the template.

Define user prompt variables

The first step in building a basic bundle template is to create a template project folder and a file named databricks_template_schema.json in the project root. This file contains the variables that users provide input values for when they use the template to create a bundle using bundle init. This file’s format follows the JSON Schema Specification.

mkdir basic-bundle-template
touch basic-bundle-template/databricks_template_schema.json

Add the following to the databricks_template_schema.json file, and then save the file:

{
   "properties": {
   "project_name": {
      "type": "string",
      "default": "basic_bundle",
      "description": "What is the name of the bundle you want to create?",
      "order": 1
   }
   },
   "success_message": "\nYour bundle '{{.project_name}}' has been created."
}

In this file:

  • project_name is the only input variable name.
  • default is an optional default value if a value is not provided by the user with --config-file as part of the bundle init command, or overridden by the user at the command prompt.
  • description is the user prompt associated with the input variable, if a value is not provided by the user with --config-file as part of the bundle init command.
  • order is an optional order in which each user prompt appears if a value is not provided by the user with --config-file as part of the bundle init command. If order is not provided, then user prompts display in the order in which they are listed in the schema.
  • success_message is an optional message that is displayed upon successful project creation.

Build the folder structure

Next, create the required template folder and build the folder structure within it. This structure will be mirrored by bundles created with this template. Also, put any files that you want included into those folders. This basic bundle template stores files in a src folder and includes one simple notebook.

mkdir -p basic-bundle-template/template/src
touch basic-bundle-template/template/src/simple_notebook.ipynb

Add the following to the simple_notebook.ipynb file:

print("Hello World!")

Populate configuration template files

Now create the required databricks.yml.tmpl file in the template folder:

touch basic-bundle-template/template/databricks.yml.tmpl

Populate this file with the basic configuration template YAML. This configuration template establishes the bundle name, one job using the specified notebook file, and two target environments for bundles created using this template. It also takes advantage of bundle substitutions, which is highly recommended. See bundle substitutions.

# This is the configuration for the Databricks Asset Bundle {{.project_name}}.

bundle:
  name: {{.project_name}}

# The main job for {{.project_name}}
resources:
    jobs:
        {{.project_name}}_job:
        name: {{.project_name}}_job
        tasks:
            - task_key: notebook_task
            job_cluster_key: job_cluster
            notebook_task:
                notebook_path: ../src/simple_notebook.ipynb
        job_clusters:
            - job_cluster_key: job_cluster
            new_cluster:
                node_type_id: i3.xlarge
                spark_version: 13.3.x-scala2.12

targets:
  # The deployment targets. See https://docs.databricks.com/en/dev-tools/bundles/deployment-modes.html
  dev:
    mode: development
    default: true
    workspace:
      host: {{workspace_host}}

  prod:
    mode: production
    workspace:
      host: {{workspace_host}}
      root_path: /Shared/.bundle/prod/${bundle.name}
    {{- if not is_service_principal}}
    run_as:
      # This runs as {{user_name}} in production. Alternatively,
      # a service principal could be used here using service_principal_name
      user_name: {{user_name}}
    {{end -}}

Test the bundle template

Finally, test your template. Create a new bundle project folder, then use the Databricks CLI to initialize a new bundle using the template:

mkdir my-test-bundle
cd my-test-bundle
databricks bundle init ../basic-bundle-template

For the prompt, What is your bundle project name?, type my_test_bundle.

Once the test bundle is created, the success message from the schema file is output. If you examine the contents of the my-test-bundle folder, you should see the following:

my-test-bundle
   ├── databricks.yml
   └── src
      └── simple_notebook.ipynb

And the databricks.yml file is now customized:

# This is the configuration for the Databricks Asset Bundle my-test-bundle.

bundle:
  name: my-test-bundle

# The main job for my-test-bundle
resources:
    jobs:
        my-test-bundle_job:
        name: my-test-bundle_job
        tasks:
            - task_key: notebook_task
                job_cluster_key: job_cluster
                notebook_task:
                    notebook_path: ../src/simple_notebook.ipynb
        job_clusters:
            - job_cluster_key: job_cluster
                new_cluster:
                    node_type_id: i3.xlarge
                    spark_version: 13.3.x-scala2.12

targets:
  # The 'dev' target, used for development purposes. See [_](https://docs.databricks.com/en/dev-tools/bundles/deployment-modes.html#development-mode)
  dev:
    mode: development
    default: true
    workspace:
      host: https://my-host.cloud.databricks.com

  # The 'prod' target, used for production deployment. See [_](https://docs.databricks.com/en/dev-tools/bundles/deployment-modes.html#production-mode)
  prod:
    mode: production
    workspace:
      host: https://my-host.cloud.databricks.com
      root_path: /Shared/.bundle/prod/${bundle.name}
    run_as:
      # This runs as someone@example.com in production. Alternatively,
      # a service principal could be used here using service_principal_name
      user_name: someone@example.com

Share the template

If you want to share this bundle template with others, you can store it in version control with any provider that Git supports and that your users have access to. To run the bundle init command with a Git URL, make sure that the databricks_template_schema.json file is in the root location relative to that Git URL.

Tip

You can put the databricks_template_schema.json file in a different folder, relative to the bundle’s root. You can then use the bundle init command’s --template-dir option to reference that folder, which contains the databricks_template_schema.json file.

Next steps