Jaa


Override cluster settings in Databricks Asset Bundles

This article describes how to override the settings for Azure Databricks clusters in Databricks Asset Bundles. See What are Databricks Asset Bundles?

In Azure Databricks bundle configuration files, you can join the cluster settings in a top-level resources mapping with the cluster settings in a targets mapping, as follows.

For jobs, use the job_cluster_key mapping within a job definition to join the cluster settings in a top-level resources mapping with the cluster settings in a targets mapping, for example (ellipses indicate omitted content, for brevity):

# ...
resources:
  jobs:
    <some-unique-programmatic-identifier-for-this-job>:
      # ...
      job_clusters:
        - job_cluster_key: <some-unique-programmatic-identifier-for-this-key>
          new_cluster:
            # Cluster settings.

targets:
  <some-unique-programmatic-identifier-for-this-target>:
    resources:
      jobs:
        <the-matching-programmatic-identifier-for-this-job>:
          # ...
          job_clusters:
            - job_cluster_key: <the-matching-programmatic-identifier-for-this-key>
              # Any more cluster settings to join with the settings from the
              # resources mapping for the matching top-level job_cluster_key.
          # ...

If any cluster setting is defined both in the top-level resources mapping and the targets mapping for the same job_cluster_key, then the setting in the targets mapping takes precedence over the setting in the top-level resources mapping.

For Delta Live Tables pipelines, use the label mapping within the cluster of a pipeline definition to join the cluster settings in a top-level resources mapping with the cluster settings in a targets mapping, for example (ellipses indicate omitted content, for brevity):

# ...
resources:
  pipelines:
    <some-unique-programmatic-identifier-for-this-pipeline>:
      # ...
      clusters:
        - label: default | maintenance
          # Cluster settings.

targets:
  <some-unique-programmatic-identifier-for-this-target>:
    resources:
      pipelines:
        <the-matching-programmatic-identifier-for-this-pipeline>:
          # ...
          clusters:
            - label: default | maintenance
              # Any more cluster settings to join with the settings from the
              # resources mapping for the matching top-level label.
          # ...

If any cluster setting is defined both in the top-level resources mapping and the targets mapping for the same label, then the setting in the targets mapping takes precedence over the setting in the top-level resources mapping.

Example 1: New job cluster settings defined in multiple resource mappings and with no settings conflicts

In this example, spark_version in the top-level resources mapping is combined with node_type_id and num_workers in the resources mapping in targets to define the settings for the job_cluster_key named my-cluster (ellipses indicate omitted content, for brevity):

# ...
resources:
  jobs:
    my-job:
      name: my-job
      job_clusters:
        - job_cluster_key: my-cluster
          new_cluster:
            spark_version: 13.3.x-scala2.12

targets:
  development:
    resources:
      jobs:
        my-job:
          name: my-job
          job_clusters:
            - job_cluster_key: my-cluster
              new_cluster:
                node_type_id: Standard_DS3_v2
                num_workers: 1
          # ...

When you run databricks bundle validate for this example, the resulting graph is as follows (ellipses indicate omitted content, for brevity):

{
  "...": "...",
  "resources": {
    "jobs": {
      "my-job": {
        "job_clusters": [
          {
            "job_cluster_key": "my-cluster",
            "new_cluster": {
              "node_type_id": "Standard_DS3_v2",
              "num_workers": 1,
              "spark_version": "13.3.x-scala2.12"
            }
          }
        ],
        "...": "..."
      }
    }
  }
}

Example 2: Conflicting new job cluster settings defined in multiple resource mappings

In this example, spark_version, and num_workers are defined both in the top-level resources mapping and in the resources mapping in targets. In this example, spark_version and num_workers in the resources mapping in targets take precedence over spark_version and num_workers in the top-level resources mapping, to define the settings for the job_cluster_key named my-cluster (ellipses indicate omitted content, for brevity):

# ...
resources:
  jobs:
    my-job:
      name: my-job
      job_clusters:
        - job_cluster_key: my-cluster
          new_cluster:
            spark_version: 13.3.x-scala2.12
            node_type_id: Standard_DS3_v2
            num_workers: 1

targets:
  development:
    resources:
      jobs:
        my-job:
          name: my-job
          job_clusters:
            - job_cluster_key: my-cluster
              new_cluster:
                spark_version: 12.2.x-scala2.12
                num_workers: 2
          # ...

When you run databricks bundle validate for this example, the resulting graph is as follows (ellipses indicate omitted content, for brevity):

{
  "...": "...",
  "resources": {
    "jobs": {
      "my-job": {
        "job_clusters": [
          {
            "job_cluster_key": "my-cluster",
            "new_cluster": {
              "node_type_id": "Standard_DS3_v2",
              "num_workers": 2,
              "spark_version": "12.2.x-scala2.12"
            }
          }
        ],
        "...": "..."
      }
    }
  }
}

Example 3: Pipeline cluster settings defined in multiple resource mappings and with no settings conflicts

In this example, node_type_id in the top-level resources mapping is combined with num_workers in the resources mapping in targets to define the settings for the label named default (ellipses indicate omitted content, for brevity):

# ...
resources:
  pipelines:
    my-pipeline:
      clusters:
        - label: default
          node_type_id: Standard_DS3_v2

targets:
  development:
    resources:
      pipelines:
        my-pipeline:
          clusters:
            - label: default
              num_workers: 1
          # ...

When you run databricks bundle validate for this example, the resulting graph is as follows (ellipses indicate omitted content, for brevity):

{
  "...": "...",
  "resources": {
    "pipelines": {
      "my-pipeline": {
        "clusters": [
          {
            "label": "default",
            "node_type_id": "Standard_DS3_v2",
            "num_workers": 1
          }
        ],
        "...": "..."
      }
    }
  }
}

Example 4: Conflicting pipeline cluster settings defined in multiple resource mappings

In this example, num_workers is defined both in the top-level resources mapping and in the resources mapping in targets. num_workers in the resources mapping in targets take precedence over num_workers in the top-level resources mapping, to define the settings for the label named default (ellipses indicate omitted content, for brevity):

# ...
resources:
  pipelines:
    my-pipeline:
      clusters:
        - label: default
          node_type_id: Standard_DS3_v2
          num_workers: 1

targets:
  development:
    resources:
      pipelines:
        my-pipeline:
          clusters:
            - label: default
              num_workers: 2
          # ...

When you run databricks bundle validate for this example, the resulting graph is as follows (ellipses indicate omitted content, for brevity):

{
  "...": "...",
  "resources": {
    "pipelines": {
      "my-pipeline": {
        "clusters": [
          {
            "label": "default",
            "node_type_id": "Standard_DS3_v2",
            "num_workers": 2
          }
        ],
        "...": "..."
      }
    }
  }
}