Databricks Asset Bundle project templates
Databricks Asset Bundles describe Databricks resources such as jobs, pipelines, and notebooks as source files, allow you to include metadata alongside these source files to provision infrastructure and other resources, and provide an end-to-end definition of a project, all packaged as a single deployable project. See What are Databricks Asset Bundles?.
Bundle templates enable users to create bundles in a consistent, repeatable way, by establishing folder structures, build steps and tasks, tests, and other DevOps infrastructure-as-code (IaC) attributes common across a deployment pipeline.
For example, if you routinely run Databricks jobs that require custom packages with a time-consuming compilation step upon installation, you can speed up your development loop by creating a bundle template that specifies a custom container environment.
Databricks provides a set of default bundle templates, but you can also create custom bundle templates. Users can then initialize bundles using the bundle init command, specifying a default template or your custom template.
Create a bundle using a template
To use a Azure Databricks bundle template to create your bundle, use the Databricks CLI bundle init
command, specifying the name of the template to use. For example, the following command creates a bundle using the default Python bundle template:
databricks bundle init default-python
To use a custom bundle template, pass the local path or remote URL of the template to the Databricks CLI bundle init
command.
For example, the following command uses the dab-container-template
template created in the Custom Bundle Template Tutorial:
databricks bundle init /projects/my-custom-bundle-templates/dab-container-template
If you do not specify a template, the bundle init
command prompts with the set of available default templates from which you can choose.
Default bundle templates
Azure Databricks provides the following default bundle templates:
Template | Description |
---|---|
default-python |
A template for using Python with Databricks. This template creates a bundle with a job and DLT pipeline. See default-python. |
default-sql |
A template for using SQL with Databricks. This template contains a configuration file that defines a job that runs SQL queries on a SQL warehouse. See default-sql. |
dbt-sql |
A template which leverages dbt-core for local development and bundles for deployment. This template contains the configuration that defines a job with a dbt task, as well as a configuration file that defines dbt profiles for deployed dbt jobs. See dbt-sql. |
mlops-stacks |
An advanced full stack template for starting new MLOps Stacks projects. See mlops-stacks and Databricks Asset Bundles for MLOps Stacks. |
Custom bundle templates
Bundle templates use Go package templating syntax, which provide flexibility on the custom bundle templates you can create. See the Go package template documentation.
Template project structure
At a minimum, a bundle template project must have:
- A
databricks_template_schema.json
file at the project root that defines one user-prompt property for the bundle project name. See Template schema. - A
databricks.yml.tmpl
file located in atemplate
folder that defines configuration for any bundles created with the template. If yourdatabricks.yml.tmpl
file references any additional*.yml.tmpl
configuration templates, specify the location of these in theinclude
mapping. See YML configuration templates.
In addition, the folder structure and included files of the bundle template project template
folder is mirrored by bundles created with the template. For example, if you want the template to generate a bundle with a simple notebook in the src
folder and a job definition that runs the notebook in the resources
folder, you would organize your template project like this:
basic-bundle-template
├── databricks_template_schema.json
└── template
├── databricks.yml.tmpl
├── resources
│ └── {{.project_name}}_job.yml.tmpl
└── src
└── simple_notebook.ipynb
Tip
The name of the job definition file in this bundle template uses a template variable. For information about template helpers and variables, see Template helpers and variables.
Template schema
A custom bundle template project must contain a databricks_template_schema.json
[JSON file](https://json-schema.org/specification.html at the project root. This file defines fields used by the Databricks CLI when the bundle init
command is run, such as prompt text.
The following basic databricks_template_schema.json
file defines an input variable project_name
for the bundle project, which includes the prompt message and a default value. It then defines a success message for bunde project initialization that uses the input variable value within the message.
{
"properties": {
"project_name": {
"type": "string",
"default": "basic_bundle",
"description": "What is the name of the bundle you want to create?",
"order": 1
}
},
"success_message": "\nYour bundle '{{.project_name}}' has been created."
}
Template schema fields
The databricks_template_schema.json
file supports defining input variables for collecting information during bundle initialization from your user within the properties
field, as well as additional fields to customize the initialization.
Input variables are defined in the properties
field of the template schema. Each input variable defines metadata needed to present a prompt to the user during bundle initalization. The value of the variable is then accessible using template variable syntax, such as {{.project_name}}
.
You can also set the values of some fields to customize the bundle initialization process.
Supported schema fields are listed in the following table.
Schema field | Description |
---|---|
properties |
The bundle template input variable definitions. Databricks recommends defining at least one input variable that is the name of the bundle project. |
properties.<variable_name> |
The name of the input variable. |
properties.<variable_name>.default |
A default value to use if a value is not provided by the user with --config-file as part of the bundle init command, or at the command line when they are prompted for it. |
properties.<variable_name>.description |
The user prompt message associated with the input variable. |
properties.<variable_name>.enum |
A list of possible values for the property, such as "enum": ["azure", "aws", "gcp"] . If this field is defined, the Databricks CLI presents the values in a list at the command line to prompt the user to select a value. |
properties.<variable_name>.order |
An integer that defines the relative order for the input properties. This controls the order in which the prompts for these input variables are shown at the command line. |
properties.<variable_name>.pattern |
The regexp pattern to use to validate the user input, for example "pattern": "^[^ .\\\\/]{3,}$" . For supported regexp syntax, see https://github.com/google/re2/wiki/Syntax. |
properties.<variable_name>.pattern_match_failure_message |
The message that is displayed to the user if the value entered by the user does not match the pattern specified, for example, Project name must be at least 3 characters long and cannot contain the following characters: \"\\\", \"/\", \" \" and \".\"." . |
properties.<variable_name>.skip_prompt_if |
Skip prompting for the input variable if this schema is satisfied by the configuration already present. In that case the default value of the property is used instead. For an example, see the mlops-stacks template. Only const comparisons are supported. |
properties.<variable_name>.skip_prompt_if.properties.<previous_variable_name>.const |
If the value for <previous_variable_name> matches the constant configured in skip_prompt_if , then the prompt for <variable_name> will be skipped. |
welcome_message |
The first message to output before prompting the user for input. |
success_message |
The message to print after the template is successfully initialized. |
min_databricks_cli_version |
The minimum semver version of this Databricks CLI that the template requires. databricks bundle init fails if the CLI version is less than this version. |
version |
Reserved for future use. The version of the schema. This is used to determine if the schema is compatible with the current CLI version. |
YML configuration templates
A custom bundle template should contain a databricks.yml.tmpl
file in a template
folder in the bundle template project that is used to create the bundle project databricks.yml
. Populate this template file with the basic configuration template YAML.
The following example configuration templates for databricks.yml and the associated *_job.yml, establish the bundle name and two target environments, and define a job that runs the notebook in the bundle, for bundles created using this template. These configuration templates take advantage of bundle substitutions and bundle template helpers. See bundle substitutions and bundle template helpers.
# databricks.yml
# This is the configuration for the Databricks Asset Bundle {{.project_name}}.
bundle:
name: {{.project_name}}
include:
- resources/*.yml
targets:
# The deployment targets. See https://docs.databricks.com/en/dev-tools/bundles/deployment-modes.html
dev:
mode: development
default: true
workspace:
host: {{workspace_host}}
prod:
mode: production
workspace:
host: {{workspace_host}}
root_path: /Shared/.bundle/prod/${bundle.name}
{{- if not is_service_principal}}
run_as:
# This runs as {{user_name}} in production. Alternatively,
# a service principal could be used here using service_principal_name
user_name: {{user_name}}
{{end -}}
# {{.project_name}}_job.yml
# The main job for {{.project_name}}
resources:
jobs:
{{.project_name}}_job:
name: {{.project_name}}_job
tasks:
- task_key: notebook_task
job_cluster_key: job_cluster
notebook_task:
notebook_path: ../src/simple_notebook.ipynb
job_clusters:
- job_cluster_key: job_cluster
new_cluster:
node_type_id: i3.xlarge
spark_version: 13.3.x-scala2.12
Template helpers and variables
Template helpers are functions provided by Databricks that you can use within your template files to get user-specific information at runtime or interact with the template engine. You can also define your own template variables.
The following template helpers are available for Databricks bundle template projects. For information about using Go templates and variables, see Go templates.
Helper | Description |
---|---|
{{url}} |
An alias for https://pkg.go.dev/net/url#Parse. This allows usage of all methods of url.URL . |
{{regexp}} |
An alias for https://pkg.go.dev/regexp#Compile. This allows usage of all methods of regexp.Regexp . |
{{random_int}} |
Returns, as an int, a non-negative pseudo-random number in the half-open interval (0,n). |
{{uuid}} |
Returns, as a string, a UUID which is a 128 bit (16 byte) Universal Unique IDentifier as defined in RFC 4122.This ID is stable for the duration of the template execution, and can be used to populate the bundle.uuid field in databricks.yml by template authors. |
{{bundle_uuid}} |
A unique ID for the bundle. Multiple invocations of this function will return the same UUID. |
{{pair}} |
A key value pair. This is used with the map helper to generate maps to use inside a template. |
{{map}} |
Converts a list of pairs to a map object. This is useful to pass multiple objects to templates defined in the library directory. Because Go text template syntax for invoking a template only allows specifying a single argument, this function can be used to workaround that limitation. For example in the following line, {{template "my_template" (map (pair "foo" $arg1) (pair "bar" $arg2))}} , $arg1 and $arg2 can be referred to from inside my_template as .foo and .bar . |
{{smallest_node_type}} |
Returns the smallest node type. |
{{path_separator}} |
The path separator character for the operating system. This is / for Unix-based systems and \ for Windows. |
{{workspace_host}} |
The workspace host URL the user is currently authenticated to. |
{{user_name}} |
The full name of the user initializing the template. |
{{short_name}} |
The short name of the user initializing the template. |
{{default_catalog}} |
Returns the default workspace catalog. If there is no default, or if Unity Catalog is not enabled, this returns an empty string. |
{{is_service_principal}} |
Whether or not the current user is a service principal. |
{{ skip <glob-pattern-relative-to-current-directory> }} |
Causes the template engine to skip generating all files and directories that match the input glob pattern. For an example, see the mlops-stacks template. |
To define your own variables, create a template file in the library
folder of the template project and use Go templating syntax to define variables. For example, the following contents of a library/variables.tmpl
file define the variables cli_version
and model_name
. When this template is used to initialize a bundle, the value of the model_name
variable is constructed using the input_project_name
field defined in the template schema file. The value of this field value is the user input after a prompt.
{{ define `cli_version` -}}
v0.240.0
{{- end }}
{{ define `model_name` -}}
{{ .input_project_name }}-model
{{- end }}
To see a complete example, see the mlops-stacks template variables file.
Test the bundle template
Finally, make sure to test your template. Create a new bundle project folder, then use the Databricks CLI to initialize a new bundle using the template:
databricks bundle init basic-bundle-template
For the prompt, What is your bundle project name?
, type my_test_bundle
.
Once the test bundle is created, the success message from the schema file is output. If you examine the contents of the my_test_bundle
folder, you should see the following:
my_test_bundle
├── databricks.yml
├── resources
│ └── my_test_bundle_job.yml
└── src
└── simple_notebook.ipynb
And the databricks.yml file and job is now customized:
# databricks.yml
# This is the configuration for the Databricks Asset Bundle my-test-bundle.
bundle:
name: my_test_bundle
include:
- resources/*.yml
targets:
# The 'dev' target, used for development purposes. See [_](https://docs.databricks.com/en/dev-tools/bundles/deployment-modes.html#development-mode)
dev:
mode: development
default: true
workspace:
host: https://my-host.cloud.databricks.com
# The 'prod' target, used for production deployment. See [_](https://docs.databricks.com/en/dev-tools/bundles/deployment-modes.html#production-mode)
prod:
mode: production
workspace:
host: https://my-host.cloud.databricks.com
root_path: /Shared/.bundle/prod/${bundle.name}
run_as:
# This runs as someone@example.com in production. Alternatively,
# a service principal could be used here using service_principal_name
user_name: someone@example.com
# my_test_bundle_job.yml
# The main job for my_test_bundle
resources:
jobs:
my_test_bundle_job:
name: my_test_bundle_job
tasks:
- task_key: notebook_task
job_cluster_key: job_cluster
notebook_task:
notebook_path: ../src/simple_notebook.ipynb
job_clusters:
- job_cluster_key: job_cluster
new_cluster:
node_type_id: i3.xlarge
spark_version: 13.3.x-scala2.12
Share the template
If you want to share this bundle template with others, you can store it in version control with any provider that Git supports and that your users have access to. To run the bundle init
command with a Git URL, make sure that the databricks_template_schema.json
file is in the root location relative to that Git URL.
Tip
You can put the databricks_template_schema.json
file in a different folder, relative to the bundle’s root. You can then use the bundle init
command’s --template-dir
option to reference that folder, which contains the databricks_template_schema.json
file.
Next steps
- Browse additional templates that are created and maintained by Databricks. See the bundle samples repository in GitHub.
- To use MLOps Stacks with Databricks Asset Bundle templates, see Databricks Asset Bundles for MLOps Stacks.
- Learn more about Go package templating. See the Go package template documentation.