使用 Databricks Asset Bundles 和 GitHub Actions 執行 CI/CD 工作流程

發行項
09/25/2024

本文說明如何使用 GitHub Actions 和 Databricks Asset Bundle，在 GitHub 中執行 CI/CD (持續整合/持續部署) 工作流程。請參閱什麼是 Databricks Asset Bundles？

您可以使用 GitHub Actions 與 Databricks CLI bundle 命令，從 GitHub 存放庫內自動化、自訂和執行 CI/CD 工作流程。

您可將 GitHub Actions YAML 檔案，例如下列內容新增至存放庫的 .github/workflows 目錄。下列 GitHub Actions YAML 檔案範例會在套件組合設定檔中定義的名為「qa」的生產前目標中，驗證、部署並執行套件組合中指定的工作。此 GitHub Actions YAML 檔案範例依賴下列內容：

存放庫根目錄的套件組合設定檔，該設定檔透過 GitHub Actions YAML 檔案的設定 working-directory: . 明確宣告 (如果套件組合設定檔已經在存放庫的根目錄，則可以省略此設定。)。此套件組合設定檔會定義名為 my-job 的 Azure Databricks 工作流程，以及名為 qa 的目標。請參閱 Databricks Asset Bundle 組態。
名為 SP_TOKEN 的 GitHub 祕密，代表 Azure Databricks 服務主體的 Azure Databricks 存取權杖，其與部署和執行此套件組合的 Azure Databricks 工作區相關聯。請參閱加密的祕密。

# This workflow validates, deploys, and runs the specified bundle
# within a pre-production target named "qa".
name: "QA deployment"

# Ensure that only a single job or workflow using the same concurrency group
# runs at a time.
concurrency: 1

# Trigger this workflow whenever a pull request is opened against the repo's
# main branch or an existing pull request's head branch is updated.
on:
  pull_request:
    types:
      - opened
      - synchronize
    branches:
      - main

jobs:
  # Used by the "pipeline_update" job to deploy the bundle.
  # Bundle validation is automatically performed as part of this deployment.
  # If validation fails, this workflow fails.
  deploy:
    name: "Deploy bundle"
    runs-on: ubuntu-latest

    steps:
      # Check out this repo, so that this workflow can access it.
      - uses: actions/checkout@v3

      # Download the Databricks CLI.
      # See https://github.com/databricks/setup-cli
      - uses: databricks/setup-cli@main

      # Deploy the bundle to the "qa" target as defined
      # in the bundle's settings file.
      - run: databricks bundle deploy
        working-directory: .
        env:
          DATABRICKS_TOKEN: ${{ secrets.SP_TOKEN }}
          DATABRICKS_BUNDLE_ENV: qa

  # Validate, deploy, and then run the bundle.
  pipeline_update:
    name: "Run pipeline update"
    runs-on: ubuntu-latest

    # Run the "deploy" job first.
    needs:
      - deploy

    steps:
      # Check out this repo, so that this workflow can access it.
      - uses: actions/checkout@v3

      # Use the downloaded Databricks CLI.
      - uses: databricks/setup-cli@main

      # Run the Databricks workflow named "my-job" as defined in the
      # bundle that was just deployed.
      - run: databricks bundle run my-job --refresh-all
        working-directory: .
        env:
          DATABRICKS_TOKEN: ${{ secrets.SP_TOKEN }}
          DATABRICKS_BUNDLE_ENV: qa

下列 GitHub Actions YAML 檔案可存在於與上述檔案相同的存放庫中。此檔案會在名為「prod」的生產目標內驗證、部署及執行指定的套件組合，如套件組合設定檔中所定義。此 GitHub Actions YAML 檔案範例依賴下列內容：

存放庫根目錄的套件組合設定檔，該設定檔透過 GitHub Actions YAML 檔案的設定 working-directory: . 明確宣告 (如果套件組合設定檔已經在存放庫的根目錄，則可以省略此設定。)。此套件組合設定檔會定義名為 my-job 的 Azure Databricks 工作流程，以及名為 prod 的目標。請參閱 Databricks Asset Bundle 組態。
名為 SP_TOKEN 的 GitHub 祕密，代表 Azure Databricks 服務主體的 Azure Databricks 存取權杖，其與部署和執行此套件組合的 Azure Databricks 工作區相關聯。請參閱加密的祕密。

# This workflow validates, deploys, and runs the specified bundle
# within a production target named "prod".
name: "Production deployment"

# Ensure that only a single job or workflow using the same concurrency group
# runs at a time.
concurrency: 1

# Trigger this workflow whenever a pull request is pushed to the repo's
# main branch.
on:
  push:
    branches:
      - main

jobs:
  deploy:
    name: "Deploy bundle"
    runs-on: ubuntu-latest

    steps:
      # Check out this repo, so that this workflow can access it.
      - uses: actions/checkout@v3

      # Download the Databricks CLI.
      # See https://github.com/databricks/setup-cli
      - uses: databricks/setup-cli@main

      # Deploy the bundle to the "prod" target as defined
      # in the bundle's settings file.
      - run: databricks bundle deploy
        working-directory: .
        env:
          DATABRICKS_TOKEN: ${{ secrets.SP_TOKEN }}
          DATABRICKS_BUNDLE_ENV: prod

  # Validate, deploy, and then run the bundle.
  pipeline_update:
    name: "Run pipeline update"
    runs-on: ubuntu-latest

    # Run the "deploy" job first.
    needs:
      - deploy

    steps:
      # Check out this repo, so that this workflow can access it.
      - uses: actions/checkout@v3

      # Use the downloaded Databricks CLI.
      - uses: databricks/setup-cli@main

      # Run the Databricks workflow named "my-job" as defined in the
      # bundle that was just deployed.
      - run: databricks bundle run my-job --refresh-all
        working-directory: .
        env:
          DATABRICKS_TOKEN: ${{ secrets.SP_TOKEN }}
          DATABRICKS_BUNDLE_ENV: prod

共用方式為

使用 Databricks Asset Bundles 和 GitHub Actions 執行 CI/CD 工作流程

另請參閱

意見反應

其他資源