Databricks 자산 번들의 클러스터 설정 재정의

아티클
01/21/2025

이 문서에서는 Azure Databricks 클러스터에 대한 설정을 Databricks 자산 번들에서 재정의하는 방법을 설명합니다. Databricks 자산 번들이란?

Azure Databricks 번들 구성 파일에서 다음과 같이 매핑의 클러스터 설정과 최상위 매핑의 클러스터 설정을 수 있습니다.

작업의 경우 작업 정의 내에서 job_cluster_key 매핑을 사용하여 targets 매핑의 클러스터 설정과 최상위 resources 매핑의 클러스터 설정을 join 수 있습니다(예: 줄임표는 생략된 콘텐츠를 간결하게 표시함).

# ...
resources:
  jobs:
    <some-unique-programmatic-identifier-for-this-job>:
      # ...
      job_clusters:
        - job_cluster_key: <some-unique-programmatic-identifier-for-this-key>
          new_cluster:
            # Cluster settings.

targets:
  <some-unique-programmatic-identifier-for-this-target>:
    resources:
      jobs:
        <the-matching-programmatic-identifier-for-this-job>:
          # ...
          job_clusters:
            - job_cluster_key: <the-matching-programmatic-identifier-for-this-key>
              # Any more cluster settings to join with the settings from the
              # resources mapping for the matching top-level job_cluster_key.
          # ...

최상위 resources 매핑과 동일한 targets대한 job_cluster_key 매핑 모두에서 클러스터 설정이 정의된 경우 targets 매핑의 설정이 최상위 resources 매핑의 설정보다 우선합니다.

Delta Live Tables 파이프라인의 경우 파이프라인 정의의 cluster 내에서 label 매핑을 사용하여 targets 매핑의 클러스터 설정과 최상위 resources 매핑의 클러스터 설정을 join(예: 간결함을 위해 생략된 콘텐츠를 나타내는 줄임표).

# ...
resources:
  pipelines:
    <some-unique-programmatic-identifier-for-this-pipeline>:
      # ...
      clusters:
        - label: default | maintenance
          # Cluster settings.

targets:
  <some-unique-programmatic-identifier-for-this-target>:
    resources:
      pipelines:
        <the-matching-programmatic-identifier-for-this-pipeline>:
          # ...
          clusters:
            - label: default | maintenance
              # Any more cluster settings to join with the settings from the
              # resources mapping for the matching top-level label.
          # ...

최상위 resources 매핑과 동일한 targets대한 label 매핑 모두에서 클러스터 설정이 정의된 경우 targets 매핑의 설정이 최상위 resources 매핑의 설정보다 우선합니다.

예제 1: 여러 리소스 매핑에 정의되고 설정 충돌이 없는 새 작업 클러스터 설정

이 예제에서는 최상위 spark_version 매핑의 resources이(가) node_type_id의 num_workers 매핑에 있는 resources 및 targets과 결합되어 job_cluster_key로 명명된 my-cluster에 대한 설정을 정의합니다(줄임표는 생략된 콘텐츠를 나타냅니다).

# ...
resources:
  jobs:
    my-job:
      name: my-job
      job_clusters:
        - job_cluster_key: my-cluster
          new_cluster:
            spark_version: 13.3.x-scala2.12

targets:
  development:
    resources:
      jobs:
        my-job:
          name: my-job
          job_clusters:
            - job_cluster_key: my-cluster
              new_cluster:
                node_type_id: Standard_DS3_v2
                num_workers: 1
          # ...

이 예제에 대해 databricks bundle validate 실행하면 결과 그래프는 다음과 같습니다(줄임표는 생략된 콘텐츠를 간결하게 표시함).

{
  "...": "...",
  "resources": {
    "jobs": {
      "my-job": {
        "job_clusters": [
          {
            "job_cluster_key": "my-cluster",
            "new_cluster": {
              "node_type_id": "Standard_DS3_v2",
              "num_workers": 1,
              "spark_version": "13.3.x-scala2.12"
            }
          }
        ],
        "...": "..."
      }
    }
  }
}

예제 2: 여러 리소스 매핑에 정의된 새 작업 클러스터 설정 충돌

이 예제에서는 최상위 spark_version 매핑과 num_workersresources 매핑 모두에서 resources및 targets 정의됩니다. 이 예제에서는 spark_version의 num_workers 매핑에서 resources 및 targets이 최상위 spark_version 매핑의 num_workers 및 resources보다 우선하여 job_cluster_key이라는 이름의 my-cluster에 대한 설정을 정의합니다(생략된 콘텐츠를 간결하게 표시하기 위해 줄임표 사용).

# ...
resources:
  jobs:
    my-job:
      name: my-job
      job_clusters:
        - job_cluster_key: my-cluster
          new_cluster:
            spark_version: 13.3.x-scala2.12
            node_type_id: Standard_DS3_v2
            num_workers: 1

targets:
  development:
    resources:
      jobs:
        my-job:
          name: my-job
          job_clusters:
            - job_cluster_key: my-cluster
              new_cluster:
                spark_version: 12.2.x-scala2.12
                num_workers: 2
          # ...

이 예제에 대해 databricks bundle validate 실행하면 결과 그래프는 다음과 같습니다(줄임표는 생략된 콘텐츠를 간결하게 표시함).

{
  "...": "...",
  "resources": {
    "jobs": {
      "my-job": {
        "job_clusters": [
          {
            "job_cluster_key": "my-cluster",
            "new_cluster": {
              "node_type_id": "Standard_DS3_v2",
              "num_workers": 2,
              "spark_version": "12.2.x-scala2.12"
            }
          }
        ],
        "...": "..."
      }
    }
  }
}

예제 3: 여러 리소스 매핑에 정의되고 설정 충돌이 없는 파이프라인 클러스터 설정

이 예제에서는 최상위 node_type_id 매핑의 resources와 num_workers에 있는 resources 매핑의 targets를 결합하여 label라고 명명된 default의 설정을 정의합니다(줄임표는 생략된 콘텐츠를 간결하게 표시함).

# ...
resources:
  pipelines:
    my-pipeline:
      clusters:
        - label: default
          node_type_id: Standard_DS3_v2

targets:
  development:
    resources:
      pipelines:
        my-pipeline:
          clusters:
            - label: default
              num_workers: 1
          # ...

이 예제에 대해 databricks bundle validate 실행하면 결과 그래프는 다음과 같습니다(줄임표는 생략된 콘텐츠를 간결하게 표시함).

{
  "...": "...",
  "resources": {
    "pipelines": {
      "my-pipeline": {
        "clusters": [
          {
            "label": "default",
            "node_type_id": "Standard_DS3_v2",
            "num_workers": 1
          }
        ],
        "...": "..."
      }
    }
  }
}

예제 4: 여러 리소스 매핑에 정의된 충돌하는 파이프라인 클러스터 설정

이 예제에서는 num_workers가 최상위 resources 매핑과 resources의 targets 매핑 둘 다에서 정의되어 있습니다. num_workers의 resources 매핑에서 targets은 최상위 num_workers 매핑의 resources보다 우선하여 label으로 명명된 default의 설정을 정의합니다(줄임표는 간결성을 위한 생략된 콘텐츠를 표시합니다).

# ...
resources:
  pipelines:
    my-pipeline:
      clusters:
        - label: default
          node_type_id: Standard_DS3_v2
          num_workers: 1

targets:
  development:
    resources:
      pipelines:
        my-pipeline:
          clusters:
            - label: default
              num_workers: 2
          # ...

이 예제에 대해 databricks bundle validate 실행하면 결과 그래프는 다음과 같습니다(줄임표는 생략된 콘텐츠를 간결하게 표시함).

{
  "...": "...",
  "resources": {
    "pipelines": {
      "my-pipeline": {
        "clusters": [
          {
            "label": "default",
            "node_type_id": "Standard_DS3_v2",
            "num_workers": 2
          }
        ],
        "...": "..."
      }
    }
  }
}

다음을 통해 공유

Databricks 자산 번들의 클러스터 설정 재정의

예제 1: 여러 리소스 매핑에 정의되고 설정 충돌이 없는 새 작업 클러스터 설정

예제 2: 여러 리소스 매핑에 정의된 새 작업 클러스터 설정 충돌

예제 3: 여러 리소스 매핑에 정의되고 설정 충돌이 없는 파이프라인 클러스터 설정

예제 4: 여러 리소스 매핑에 정의된 충돌하는 파이프라인 클러스터 설정

피드백

추가 리소스